Entering edit mode
On 30-07-2014 10:13, Alex Gutteridge wrote:
> On 29-07-2014 10:00, Alex Gutteridge wrote:
>> Hi,
>>
>> I have a time course RNASeq experiment and I'd like to detect DEU
>> between early and late stages. I am trying to use 'Age' as a
>> continuous variable in my design, but I'm getting an error which I
>> suspect is related to this e.g:
>>
>>> summary(sample.data.ipsc[,1:6])
>> Sample CellType Subtype Donor Age
>> Length:28 iPSC:28 iPSC :28 4555 :28 Min. :
2.000
>> Class :character HBR : 0 D4721 : 0 1st Qu.:
4.000
>> Mode :character L2 : 0 D4749 : 0 Median :
8.000
>> L3 : 0 D6002 : 0 Mean :
7.821
>> L4 : 0 D6005 : 0 3rd
Qu.:10.500
>> L5 : 0 D6008 : 0 Max.
:14.000
>> (Other): 0 (Other): 0
>> Passage
>> 42:3
>> 48:7
>> 49:5
>> 52:6
>> 56:7
>>
>>> dxd = DEXSeqDataSetFromHTSeq(countFiles,
>> sampleData=sample.data.ipsc,
>> design= ~ sample + exon + Age:exon,
>> flattenedfile=flattenedFile)
>>
>>> BPPARAM = MulticoreParam(workers=24)
>>
>>> dxd = estimateSizeFactors(dxd)
>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM)
>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM)
>>> dxd = estimateExonFoldChanges(dxd, fitExpToVar="Age",
>>> BPPARAM=BPPARAM)
>> Error: 8346 errors; first error:
>> Error in FUN(1:3[[1L]], ...): Non-factor in model frame
>>
>> For more information, use bplasterror(). To resume calculation,
>> re-call
>> the function and set the argument 'BPRESUME' to TRUE or wrap the
>> previous call in bpresume().
>>
>> First traceback:
>> 31: estimateExonFoldChanges(dxd, fitExpToVar = "Age", BPPARAM =
>> BPPARAM)
>> 30: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM)
>> 29: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM)
>> 28: mclapply(X = X, FUN = FUN, ..., mc.set.seed =
BPPARAM$setSeed,
>> mc.silent = !BPPARAM$verbose, mc.cores =
bpworkers(BPPARAM),
>> mc.cleanup = if (BPPARAM$cleanup) BPPARAM$cleanupSignal
else
>> FALSE)
>> 27: lapply(seq_len(cores), inner.do)
>> 26: lapply(seq_len(cores), inner.do)
>> 25: FUN(1:24[[1L]], ...)
>> 24: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))
>> 23: parallel:::sendMaster(...)
>> 22: try(lapply(X = S, FUN = FUN, ...), silent = TRUE)
>> 21: tryCatch(expr, error = function(e)
>>
>> Can DEXSeq accept a model with a continuous variable? Does it make
>> sense to do so? (I do the same thing with DESeq2 to detected DE and
it
>> works fine). Is this error due to that? Note that all the other
steps
>> seem to run fine and I can get results (though I don't have many
>> significant hits - not sure if this is related or not). If not what
is
>> best practice? Just split the data set into 'early' and 'late'
samples
>> and run that as a factor?
>>
>> Alex Gutteridge
>
> To sort of answer my own question: cutting the age vector into two
> bins (early <8 weeks and late >8 weeks) and using that factor seems
to
> work and give many significant hits:
>
>> sample.data.ipsc$AgeBin = cut(sample.data.ipsc$Age,2)
>
>> dxd = DEXSeqDataSetFromHTSeq(countFiles,
> sampleData=sample.data.ipsc,
> design= ~ sample + exon +
AgeBin:exon,
> flattenedfile=flattenedFile)
>
>> BPPARAM = MulticoreParam(workers=12)
>
>> dxd = estimateSizeFactors(dxd)
>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM)
>> dxd = testForDEU(dxd, BPPARAM=BPPARAM)
>
> Which makes me think that DEXSeq wasn't doing what I hoped with the
> previous analysis (quite apart from the error). Is there any way to
> rejig the design to cope with a continuous variable as opposed to a
> factor? I *think* the underlying biological question makes sense:
Are
> there exons showing DEU that correlates with time?
>
> Alex Gutteridge
I'm just going to bump this question one last time as I'd love to know
if there's a better solution to what I have now. In short: I have a
reasonable detailed time course RNASeq experiment (8 timepoints; 2-14
weeks; 3-4 replicates at all time points except one) and would like to
detect time dependent DEU using DEXSeq. Slicing the timecourse into
two
bins (at 8 weeks) and doing a straight early vs late comparison works
and detects plenty of interesting stuff. Trying to model age as a
continuous variable gives an error when determining fold changes and
doesn't report any significant DEU.
See first plot here for an example (exons 27/28 show strong DEU
comparing <8 weeks to >8 week samples) shown with a DEXSeq plot:
http://dx.doi.org/10.6084/m9.figshare.1124172
But the 8 week split is entirely arbitrary, really time (Age) is a
continuous variable and so I'd like to model it as such (maybe this is
just OCD on my behalf, but I'm sure I must also be loosing some power
by
crudely splitting the timecourse like this; also DESeq2 lets me model
the genewise DE with a continuous variable).
The second plot from the figshare link shows the RPKMs of those two
exons (27/28). On top of a background of increased gene-wise
expression
there is clearly a time dependent effect on the relative usage of
these
two exons, which it must be possible to model better than a crude
split
at 8w (black dashed line). Is there any trick to do this with DEXSeq?
Alex Gutteridge