I'm analyzing an RNAseq with the following samples/groups:
A = 14 samples, B = 7, C = 7, D = 10 and E = 10.
Each sample is independent from another and represents a biological sample from one patient with a different form of a disease X.
I've analysed this experiment with DESEq2 with the "standard" workflow provided in the vignette.
Following those instructions, based in the DESeq() function, I've found a few to none DEG from pairwise contrasts, such: A vs B, C vs A etc, with an adjusted P-value <= 0.05 (FDR for multiple testing correction).
As suggested by Klaus (http://www-huber.embl.de/users/klaus/Teaching/DESeq2-Analysis.pdf), I checked the p-values distribution with an histogram and it deviates from an expected distribution (uniform dist. with a peak at 0, under the null). Then, I employed a correction using the package fdrtool using the following code:
res.de.fixed <- fdrtoolres.de$stat, statistic = "normal")
Atfer that, I've gotten an expected p-value distribution and replaced my old adjusted p-values with the new qvalues obtained with fdrtool. Now, as expected, I have a lot of DEG with alpha = 0.01, and they seem to make sense given previous experiments with microarray.
My question is: Is this a legit approach? Can I correct the p-values obtained from Negative Binomial GLM fitting and Wald statistics employed by the DESeq() function with fdrtool? Of course, when the p-values exhibit this strange distribution. Also, what's the difference between the q-values (fdr) and the local fdr (lfdr) from fdrtool?
The histograms:
Top, before correction with fdrtool. Bottom, after fdrtool code above.
Edit.2. Added PCA from DESeq2::plotPCA()
From this PCA, I can see some underlying hidden effect, maybe a batch or lane effect? I'll need to get these data from the source.
Thanks.
Thyago
dear Thyago,
A few other things to check: can you supply a PCA plot of the samples, and additionally, can you confirm that you did not use an LFC threshold when testing with results().
Dear Michael,
Added. I confirm not using any lfc cutoff from results() function. I used the default, 0, lfc.
Thank you.