DESEq2 p-values weird distribution under the H0 and fdrtool
1
0
Entering edit mode
tcalvo ▴ 100
@tcalvo-12466
Last seen 17 months ago
Brazil

I'm analyzing an RNAseq with the following samples/groups: 

A = 14 samples, B = 7, C = 7, D = 10 and E = 10.

Each sample is independent from another and represents a biological sample from one patient with a different form of a disease X. 

I've analysed this experiment with DESEq2 with the "standard" workflow provided in the vignette. 

Following those instructions, based in the DESeq() function, I've found a few to none DEG from pairwise contrasts, such: A vs B, C vs A etc, with an adjusted P-value <= 0.05 (FDR for multiple testing correction).

As suggested by Klaus (http://www-huber.embl.de/users/klaus/Teaching/DESeq2-Analysis.pdf), I checked the p-values distribution with an histogram and it deviates from an expected distribution (uniform dist. with a peak at 0, under the null). Then, I employed a correction using the package fdrtool using the following code:

res.de.fixed <- fdrtoolres.de$stat, statistic = "normal")

Atfer that, I've gotten an expected p-value distribution and replaced my old adjusted p-values with the new qvalues obtained with fdrtool. Now, as expected, I have a lot of DEG with alpha = 0.01, and they seem to make sense given previous experiments with microarray. 

My question is: Is this a legit approach? Can I correct the p-values obtained from Negative Binomial GLM fitting and Wald statistics employed by the DESeq() function with fdrtool? Of course, when the p-values exhibit this strange distribution. Also, what's the difference between the q-values (fdr) and the local fdr (lfdr) from fdrtool?

The histograms:

Top, before correction with fdrtool. Bottom, after fdrtool code above. 

P-values before

After FDR correction

 

 

 

 

 

 

 

Edit.2. Added PCA from DESeq2::plotPCA()

PCA

From this PCA, I can see some underlying hidden effect, maybe a batch or lane effect? I'll need to get these data from the source.

Thanks.

Thyago

rnaseq deseq2 fdrtool differential gene expression • 3.5k views
ADD COMMENT
0
Entering edit mode

dear Thyago,

A few other things to check: can you supply a PCA plot of the samples, and additionally, can you confirm that you did not use an LFC threshold when testing with results().

ADD REPLY
0
Entering edit mode

Dear Michael,

Added. I confirm not using any lfc cutoff from results() function. I used the default, 0, lfc. 

Thank you.

ADD REPLY
2
Entering edit mode
Bernd Klaus ▴ 610
@bernd-klaus-6281
Last seen 6.2 years ago
Germany

Hi Thyago,

 

The PCA plot looks pretty much like a batch effect being a stronger "separator" than the experimental groups.

You should tackle this directly. After application of fdrtool, you move to a rather bathtub shaped p-value histogram which often indicates p-values that are too liberal (you have ~ 10k pvals in the smallest bin, setting the y-limit to 3000 or so should enable to you to see the bathtub more clearly).

I bet if you test all samples with PC1 < 0 against all with PC1 > 0 you will have a nicer p-value histogram. Can you check this?

Try to run DESeq with at least one surrogate variable, the RNA-Seq gene workflow tells you how to do this:

http://bioconductor.org/help/workflows/rnaseqGene/#removing-hidden-batch-effects

What do you get?

Alternatively, you could simply try to include PC1 as a covariate in your DESeq model.

ADD COMMENT

Login before adding your answer.

Traffic: 573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6