I have a 4 level single factor dataset to analyse with edgeR. I used:
tmm<-calcNormFactors(data.dge)
y<-estimateDisp(tmm)
et<-exactTest(y,pair())
to extract DE genes in the desired comparisons. However, is there an equivalent to glmTreat() in multiple factor experiments to
use to get DE genes relative to a fold-change of 2 or similar?
Thank you for the material. I have read the edgeR user's guide and (forgive me if I got it wrong) I understood that:
for single factor designs, the exactTest should be used while for multiple factor designs glmFit/glmLRT and more precisely glmQLFit/glmQLFtest should be preferred. For the multiple factor designs, I am with you with regard to how glmTreat shall be used. But for single factor experiments, glmTreat can not be applied as the following error comes up:
So this is what confuses me.
There is no real reason to use the exact test any more. The GLM machinery is far more flexible as well as being more accurate, see A: EDGE-R exact test vs QL F-test. As for your error - well, the error message says it all. The input object should be produced by
glmFit
orglmQLFit
, notexactTest
.Thank you, that was really helpful! Would you then recommend a cutoff/threshold for dispersion or counts when you would use glmFit instead of glmQLFit for your DE analysis?
You should do filtering on abundance, see Section 2.6 of the user's guide. I don't know what you mean by applying a threshold on the dispersion; any filtering on the dispersion is a Bad Idea for empirical Bayes shrinkage. If you're worried about outlier dispersions, set
robust=TRUE
inglmQLFit
."In summary, while both of the methods will work for your data set, the QL F-test is probably the better choice. There are some situations where the QL F-test doesn't work well - for example, if you don't have replicates, you'd have to supply a fixed dispersion, which defeats the whole point of modelling estimation uncertainty. Another situation is where the dispersions are very large and the counts are very small, whereby some of the approximations in the QL framework seem to fail. In such cases, I usually switch to the LRT rather than using the exact test, for the reasons of experimental flexibility that I mentioned above."
Yes, I remember writing that. What is your point? Putting things in bold doesn't provide any extra information.
Yes, that was from the link you posted above. You said that glmQLFit approximations fail with small counts and large dispersions. Can you then elaborate when you would switch to glmFit based on this?
For single-cell data. This is not relevant for bulk unless your data is very bad (low depth, high variance).
Ok all clear then. Thanks