Hi Everyone,
I'm doing some DGE analysis with 4 groups! I have tried to use the glmQLFTest as suggested in the edgeR users manual, but there are barely any differently expressed genes with FDR < 0.99. On the other hand for the same comparisons, glmLRT gives me a few hundred genes.
Is glmQLFTest not valid for multiple contrasts?
If I move forward with my analysis using the glmLRT function, will my results be valid?
Thank you all!
I'll second the vote in favor of the QL method over the LRT method. In every dataset I've seen where I did a null comparison between replicates in which no genes should be differentially expressed, glmQLFTest never gave me more than a few significant genes, while glmLRT sometimes gave me many genes. So I trust glmQLFTest much more to control the false positive rate.
Thanks Steve,
I was wondering if glmQLFTest only works for one comparison like a case v control. I have a case1 v case2, case1 v case 3, case 1 v control, ...
Here is the code:
That's quite an extreme difference. One possible cause is that your genes have highly variable dispersions. If edgeR sees this in the data, it will (correctly) reduce the estimate of the prior d.f. in both
estimateDisp
andglmQLFit
, thus reducing the amount of shrinkage that can be performed. In general, I would consider a prior.d.f below 5 to be small, while anything above 50 is pretty high, i.e., strong shrinkage. For small prior d.f., the power ofglmQLFTest
will (appropriately) decrease, as inferences are less reliable from weakly-shrunk dispersions that are more unstable. However,glmLRT
is blind to this effect, potentially resulting in spuriously low p-values due to imprecisely estimated dispersions with small values.It's worth having a look at the output of
plotBCV
andplotQLDisp
as diagnostics. The former will reveal if there's some strong trend that hasn't been properly modelled and is subsequently inflating the variability of the dispersions - more aggressive filtering should take care of this, in cases where the strong trend is generated by technical noise at low counts. The QL dispersion plots will allow you to see if there's a mass of outlier genes, which could also inflate the variability of the dispersions (especially if it overwhelms the default winsorizing thresholds). For example, Ig chains are often highly variable if you're working with a data set containing different B-cell clones, so it's worth getting rid of them to avoid large outliers. Some types of batch effects can also increase the variability of the dispersions if they're left unmodelled, as the gene-specific batch effect gets incorporated into the dispersion estimate - check your MDS plot for any patterns in your replicates.As for your actual question, there are no restrictions on the types of contrasts you can give to
glmQLFTest
; it'll handle any number of case vs. case/control comparisons. Further, on an unrelated note,robust
doesn't do anything inglmFit
.As Aaron says, this sounds extreme. If you want to to email it to me, I'd be interested to see test your data to see why the LRT and QL pipelines give such different FDR values for your data.
Gordon