Question

DESeq2 time series: error while measuring gene dispersion.

0

Entering edit mode

anand m t ▴ 40

@anand-m-t-4859

Last seen 8.3 years ago

Singapore

Hi all,

I have been trying DESeq2 to analyse my RNAseq data. Data is from sequencing a cell line treated with a drug at different time point intervals. My sampleTable looks like this:

DataFrame with 5 rows and 2 columns
        treatment     time
         <factor> <factor>
cl_0h  untreated        0
cl_2h    treated        2
cl_4h    treated        4
cl_6h    treated        6
cl_12h   treated       12

I want to find expression change in genes with respect to base line time (0 hour). My design and subsequent commands are as follows, but I get an error at estimateDispersions step:

design(dds) <- formula(~ time + treatment + treatment:time)

dds <- estimateSizeFactors(dds)

dds <- estimateDispersions(dds)
gene-wise dispersion estimates
Error in optim(betaRow, objectiveFn, method = "L-BFGS-B", lower = -30,  :
  L-BFGS-B needs finite values of 'fn'

Is it because lack of replicates? I tried changing fitType argument but I get the same error. And is my design correct for the above sampleTable ?

Thanks.

deseq2 timecourse • 2.2k views

ADD COMMENT • link updated 10.2 years ago by Michael Love 43k • written 10.2 years ago by anand m t ▴ 40

score 2 · Accepted Answer · 2015-01-31

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

Yes, it is because of a lack of replicates. If you had provided this design at object construction, you would get a more helpful message, and in the current version of DESeq2 (1.6) you would also get a more useful message from estimateDispersions(). If possible, we recommend users updating to the most current release of R and Bioconductor.

There are a few options with an experimental design like this:

add more untreated samples ( treated samples if possible), and use a design of ~time. Then you can contrast each non-zero time point against time = 0 using the 'contrast' argument of ?results.

2) or you can add degrees of freedom by assuming a linear relationship between time and log counts. I'd prefer adding biological replicates in a case like this, because without biological replicates, it's hard to know if the parametrization is correct. The linear assumption has few parameters, so you're less likely to have problems overfitting a curve to 5 points. This would look like:

dds$time = c(0,2,4,6,12)

design(dds) = ~ time

What you will miss with this design is genes where the expression rises and falls. Another option is a quadratic relationship between time and log counts:

dds = DESeq(dds, test="LRT", full = ~ 1 + time + I(time^2), reduced = ~ 1)

Then you would capture the rising and falling case (or falling and rising).

ADD COMMENT • link 10.2 years ago Michael Love 43k

0

Entering edit mode

Hi Mike,

Thank you for a detailed description. I have updated my R and bioc. As you mentioned newer version (1.6.3) does give detailed error desciption. However I have some question. Since we don not have biological replicates (blame expensive technology!) first option is not possible. When you say "add degrees of freedom by assuming a linear relationship between time and log counts" does this mean I add pseudo replicates based on counts from linear relationships? But, the quadratic relationship worked out fine. This also captured overall changes as you mentioned which is nice. Thanks again.

ADD REPLY • link 10.2 years ago anand m t ▴ 40

1

Entering edit mode

"does this mean I add pseudo replicates based on counts from linear relationships?"

no, I just meant: use a design that assumes a simple functional relationship between log counts and time. If quadratic works for your purposes, then I'd stick with that.

The residual degrees of freedom is (# of samples) - (# of parameters). When you try to fit time as a factor, you have 5 samples and 5 parameters to fit in the model (one for each level of the factor). This leaves no residual degrees of freedom. However, if you assume the functional relationship is linear or quadratic, you have 2 or 3 parameters to fit, respectively. Either of these gives you some degrees of freedom.

ADD REPLY • link 10.2 years ago Michael Love 43k