DESeq2 - couple of clarifications
Entering edit mode
Last seen 7 months ago

Hey Mike,

a couple of questions on DESeq2, but first of all, some code to make my questions reproducible:

dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway),
                                              colData = colData(airway),
dds_airway <- DESeq(dds_airway)


  • alpha & independentFiltering. Can it be a tiny bug that when I set independentFiltering to FALSE, then the alpha is somehow not "set" in the DESeqResults object? Please compare the outcomes of these commands
(results(dds_airway,contrast=c("dex","trt","untrt"),alpha= 0.05,independentFiltering = T))  %>% summary
(results(dds_airway,contrast=c("dex","trt","untrt"),alpha= 0.05,independentFiltering = F))  %>% summary
(results(dds_airway,contrast=c("dex","trt","untrt"),alpha= 0.05,independentFiltering = F))  %>% summary(alpha=0.05)
  • For an app development, I am trying to cover "automatically" the cases where the covariate is a factor, a continuous one or also where the levels are more than two. Quick check I am doing it right, according to the documentation:
     factor -> contrast = the 3-element vector
     numeric -> name = the character name of the numeric
     more than 2 levels -> rerun DESeq with "LRT" as test and then use the full & reduced model to specify the contrast

    Moreover, are you by chance aware of a dataset where there was a (possibly meaningful) use of a continuous covariate? As a toy case I am using airway with the read length and I am (correctly) getting very few hits. Or if not, do you know a robust way of simulating such a dataset?
  • I have seen you recommending the salmon path now for generating the counts, especially after the DTE/DGE/DTU paper of you and Charlotte. I found it a little harder to explain to the cooperation partners with the extra modeling-step already at the counting level, and this is kind of keeping me in the "old and safe" featureCounts-based approach. Do you have a suggestion on how to sell at best the advantages of the new method, well, apart from linking to your paper?


Thank you in advance!


deseq2 • 1.2k views
Entering edit mode
Last seen 5 days ago
United States

Regarding 'alpha' in results() and summary(), when you have independentFiltering=TRUE, then the alpha is used by the function to optimize the independent filtering, and then it's used again as a relevant threshold by summary() when alpha is not explicitly provided to summary(). If you have independentFiltering=FALSE, then alpha is ignored by results() and not passed to summary(). I've clarified this just now in the help page.

The second question sounds right, although for a factor with more than two levels, sometimes users want to do 2-3 pairwise (B vs A, C vs A, sometimes C vs B), and sometimes they want a LRT. 

When other developers have worked on wrappers for DESeq2 (for example, ReportingTools), they've encountered a number of headaches by trying to call results() internally to their software, because it takes a lot of effort to provide all the functionality that results() provides. This is why I've often recommended that, if possible, developers let users interface with DESeq2::results() directly, and then operate on the DESeqResults table instead. But it's up to you. 

I don't have a publicly available, processed dataset in mind with a numeric covariate, but I'm sure many exist. The trick is that you first need to do some exploration to make sure that a linear relationship between the covariate and log counts makes sense, i.e. to rule out the possibility of saturation, or convex or concave patterns.

Re: selling the new methods, it's good to keep in mind that the estimated counts are highly correlated with the unique counts. The bonus is: much faster and more efficient generation of these matrices, possibility to recover multi-mapping reads through probabilistic assignment, avoids any potential issues with DTU which could throw off inference from gene-level unique counts.

Entering edit mode

Thank you for the clarification!

As for the LRT vs pairwise, you are right. I wanted at least to prompt the user that (s)he can perform the lrt test when more than 2 levels are available.

I also had my personal small portion of pain with using ReportingTools, so I know what you mean - it is still quite a great tool, kudos to the developers for it!

Thanks for the tip on the dataset, I will look deeper - and in the meanwhile hope some other user might already have been looking for the same thing.

Finally, good points for the new method selling. I also found a recent presentation by Charlotte @CSAMA, so I gathered enough info on becoming a good prophet for the novel approach.


Login before adding your answer.

Traffic: 822 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6