Question

Inspection and Correction of p-values in DESeq2 pipeline by Bernd

1

Entering edit mode

deena ▴ 20

@deena-7415

Last seen 7.6 years ago

Germany

Hi Michael,

I following a tutorial from Huber(http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html) for detecting Diff.expressed genes using DESeq2. In the section, 10.2 Inspection and correction of p–values it suggest that if the distribution of raw pvalues follows either hill-shaped histogram or U-shaped hisogram, then the Wald statistic was used re-compute the padj values.

I am working on RNAseq data where I compare the Untreated to knock-down of particular gene. Almost in all my comparisons, I have the hill-shaped the histogram in all my comparison so should I use fdr tool as prescribed in the tutorial to correct my padj values?

When I implement this fdr package my results were drastically changed. Before Fdr package implementation it was 61, after FDR package implementation it came to 260. I am confused now to use which method. Kindly guide me

deseq2 FDR fdrtool • 3.7k views

ADD COMMENT • link updated 8.1 years ago by Bernd Klaus ▴ 610 • written 8.1 years ago by deena ▴ 20

score 0 · Answer 1 · 2017-03-04

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 9 hours ago

United States

This is somewhat of a subjective decision, particularly based on plots. I typically don't have the need to use fdrtool to re-calculate the adjusted p-values. It would help if you could add plots, and also all your code. Note that you should note use fdrtool in combination with lfcThreshold > 0.

ADD COMMENT • link 8.1 years ago Michael Love 43k

0

Entering edit mode

HI Michael, Like you said I have attached the the histogram of pvalues and the R code data = read.delim("raw_counts.txt", row.names=1) rnaseqMatrix = as.matrix(round(data)) conditions = data.frame(conditions=factor(c(rep("Control", 3), rep("gene_x_kd", 3), rep("gene_y_kd", 3), rep("genexy_kd", 3)))) rownames(conditions) = colnames(rnaseqMatrix) ddsFullCountTable <- DESeqDataSetFromMatrix(countData = rnaseqMatrix,colData = conditions,design = ~ conditions) dds = DESeq(ddsFullCountTable) res_gene_x_vs_Control = results(dds,contrast = c("conditions","gene_x_kd","Control")) res_gene_y_vs_Control = results(dds,contrast= c("conditions","gene_y_kd","Control")) res_gene_xy_vs_Control = results(dds,contrast = c("conditions","genexy_kd","Control")) hist(res_gene_x_vs_Control$pvalue, col = "lavender",main="Gene_X_KD_vs_Control", xlab = "p-values") hist(res_gene_y_vs_Control$pvalue, col = "lavender",main="Gene_Y_KD_vs_Control", xlab = "p-values") hist(res_gene_xy_vs_Control$pvalue,main="GeneXY_KD_vs_Control", col = "lavender", xlab = "p-values") Let me know if you need any more details. Kindly guide me. Thanks in advance On Sat, Mar 4, 2017 at 8:37 PM, Michael Love [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Michael Love <https: support.bioconductor.org="" u="" 5822=""/> wrote Answer: > Inspection and Correction of p-values in DESeq2 pipeline by Bernd > <https: support.bioconductor.org="" p="" 93333="" #93360="">: > > This is somewhat of a subjective decision, particularly based on plots. I > typically don't have the need to use fdrtool to re-calculate the adjusted > p-values. It would help if you could add plots, and also all your code. > Note that you should note use fdrtool in combination with lfcThreshold > 0. > > ------------------------------ > > Post tags: deseq2, FDR, fdrtool > > You may reply via email or visit https://support.bioconductor. > org/p/93333/#93360 >

ADD REPLY • link 8.1 years ago deena ▴ 20

0

Entering edit mode

As Mike said, this is a quite subjective decision. If the p-value is hill shaped, the p-values are on avarage "higher than expected" leading to a potentially to a lower power.

However, Wolfgang pointed me to the fact that these histogram shapes can result from batch effects in your data that overlap experimental groups. See the following simulated example, where samples 6-15 are (6-10 are in group 1, 11-15 are in group two) are affected by the same batch effect.

library(tidyverse) library(genefilter)

n <- 10000 m <- 20 x <- matrix(rnorm(n*m), nrow = n, ncol = m) fac <- factor(c(rep(0, 10), rep(1, 10))) rt1 <- rowttests(x, fac)

qplot(rt1$p.value, fill = I("tan3"))

x[, 6:15] <- x[, 6:15]+5 rt2 <- rowttests(x, fac)

qplot(rt2$p.value, fill = I("coral3"))

Therefore I suggest that you take a look at the PCA plot to try to see whether you have any clustering by batches and then try to see whether, when you run the DE analysis between batches, you get "nice" p-value histograms.

Best wishes,

Bernd

ADD REPLY • link 8.1 years ago Bernd Klaus ▴ 610

0

Entering edit mode

HI Bernd,

I tried to find batch effect on samples but it didnt seems to be like presence of batch effect. The historgrams of specific comparisons of a particular time point follows hill-shaped structure even after the fdr tool correction. The z-score computed for that specific results are z-Score(sd: 0.644; eta0= 0.9). Now what does that imply, is there any problem with the samples? The number significant(padj <0.05) of differential expressed genes for this specific comparison was around 50. Will few number of diff. genes also influence hill shaped histogram? Kindly guide me as this is the first time I am facing this type of problem

ADD REPLY • link 8.1 years ago deena ▴ 20

score 0 · Answer 2 · 2017-03-10

If you are sure that there are no batch effects, try to run a stronger prefiltering of your data, e.g.

 rowSums(counts(dds)) > 20

or something like that since you had spikes near 1 in the p-value histograms you sent to my by mail. The fdrtool results cannot directly indicate whether there is a problem with the samples, but it might be confused by the spikes near 1. The number of differentially expressed genes will influence the et0 that FDR tool estimates. An eta0 of 90% means that 90% of your genes are probably not differentially expressed.