Question

DESEq2 p-values weird distribution under the H0 and fdrtool

0

Entering edit mode

tcalvo ▴ 100

@tcalvo-12466

Last seen 20 months ago

Brazil

I'm analyzing an RNAseq with the following samples/groups:

A = 14 samples, B = 7, C = 7, D = 10 and E = 10.

Each sample is independent from another and represents a biological sample from one patient with a different form of a disease X.

I've analysed this experiment with DESEq2 with the "standard" workflow provided in the vignette.

Following those instructions, based in the DESeq() function, I've found a few to none DEG from pairwise contrasts, such: A vs B, C vs A etc, with an adjusted P-value <= 0.05 (FDR for multiple testing correction).

As suggested by Klaus (http://www-huber.embl.de/users/klaus/Teaching/DESeq2-Analysis.pdf), I checked the p-values distribution with an histogram and it deviates from an expected distribution (uniform dist. with a peak at 0, under the null). Then, I employed a correction using the package fdrtool using the following code:

res.de.fixed <- fdrtoolres.de$stat, statistic = "normal")

Atfer that, I've gotten an expected p-value distribution and replaced my old adjusted p-values with the new qvalues obtained with fdrtool. Now, as expected, I have a lot of DEG with alpha = 0.01, and they seem to make sense given previous experiments with microarray.

My question is: Is this a legit approach? Can I correct the p-values obtained from Negative Binomial GLM fitting and Wald statistics employed by the DESeq() function with fdrtool? Of course, when the p-values exhibit this strange distribution. Also, what's the difference between the q-values (fdr) and the local fdr (lfdr) from fdrtool?

The histograms:

Top, before correction with fdrtool. Bottom, after fdrtool code above.

P-values before

After FDR correction

Edit.2. Added PCA from DESeq2::plotPCA()

PCA

From this PCA, I can see some underlying hidden effect, maybe a batch or lane effect? I'll need to get these data from the source.

Thanks.

Thyago

rnaseq deseq2 fdrtool differential gene expression • 3.6k views

ADD COMMENT • link 7.5 years ago tcalvo ▴ 100

0

Entering edit mode

dear Thyago,

A few other things to check: can you supply a PCA plot of the samples, and additionally, can you confirm that you did not use an LFC threshold when testing with results().

ADD REPLY • link 7.5 years ago Michael Love 43k

0

Entering edit mode

Dear Michael,

Added. I confirm not using any lfc cutoff from results() function. I used the default, 0, lfc.

Thank you.

ADD REPLY • link 7.5 years ago tcalvo ▴ 100

score 2 · Answer 1 · 2017-08-24

Hi Thyago,

The PCA plot looks pretty much like a batch effect being a stronger "separator" than the experimental groups.

You should tackle this directly. After application of fdrtool, you move to a rather bathtub shaped p-value histogram which often indicates p-values that are too liberal (you have ~ 10k pvals in the smallest bin, setting the y-limit to 3000 or so should enable to you to see the bathtub more clearly).

I bet if you test all samples with PC1 < 0 against all with PC1 > 0 you will have a nicer p-value histogram. Can you check this?

Try to run DESeq with at least one surrogate variable, the RNA-Seq gene workflow tells you how to do this:

http://bioconductor.org/help/workflows/rnaseqGene/#removing-hidden-batch-effects

What do you get?

Alternatively, you could simply try to include PC1 as a covariate in your DESeq model.