Inspection and Correction of p-values in DESeq2 pipeline by Bernd
2
1
Entering edit mode
deena ▴ 20
@deena-7415
Last seen 7.2 years ago
Germany

Hi Michael,

I following a tutorial from Huber(http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html) for detecting Diff.expressed genes using DESeq2. In the section, 10.2 Inspection and correction of p–values  it suggest that if the distribution of raw pvalues follows either hill-shaped histogram or U-shaped hisogram, then the Wald statistic was used re-compute the padj values.

 

I am working on RNAseq data where I compare the Untreated to knock-down of particular gene. Almost in all my comparisons, I have the hill-shaped the histogram in all my comparison so should I use fdr tool as prescribed in the tutorial to correct my padj values?

When I implement this fdr package my results were drastically changed. Before Fdr package implementation it was 61, after FDR package implementation it came to 260. I am confused now to use which method. Kindly guide me

deseq2 FDR fdrtool • 3.5k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 35 minutes ago
United States

This is somewhat of a subjective decision, particularly based on plots. I typically don't have the need to use fdrtool to re-calculate the adjusted p-values. It would help if you could add plots, and also all your code. Note that you should note use fdrtool in combination with lfcThreshold > 0.

ADD COMMENT
0
Entering edit mode
HI Michael, Like you said I have attached the the histogram of pvalues and the R code data = read.delim("raw_counts.txt", row.names=1) rnaseqMatrix = as.matrix(round(data)) conditions = data.frame(conditions=factor(c(rep("Control", 3), rep("gene_x_kd", 3), rep("gene_y_kd", 3), rep("genexy_kd", 3)))) rownames(conditions) = colnames(rnaseqMatrix) ddsFullCountTable <- DESeqDataSetFromMatrix(countData = rnaseqMatrix,colData = conditions,design = ~ conditions) dds = DESeq(ddsFullCountTable) res_gene_x_vs_Control = results(dds,contrast = c("conditions","gene_x_kd","Control")) res_gene_y_vs_Control = results(dds,contrast= c("conditions","gene_y_kd","Control")) res_gene_xy_vs_Control = results(dds,contrast = c("conditions","genexy_kd","Control")) hist(res_gene_x_vs_Control$pvalue, col = "lavender",main="Gene_X_KD_vs_Control", xlab = "p-values") hist(res_gene_y_vs_Control$pvalue, col = "lavender",main="Gene_Y_KD_vs_Control", xlab = "p-values") hist(res_gene_xy_vs_Control$pvalue,main="GeneXY_KD_vs_Control", col = "lavender", xlab = "p-values") Let me know if you need any more details. Kindly guide me. Thanks in advance On Sat, Mar 4, 2017 at 8:37 PM, Michael Love [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Michael Love <https: support.bioconductor.org="" u="" 5822=""/> wrote Answer: > Inspection and Correction of p-values in DESeq2 pipeline by Bernd > <https: support.bioconductor.org="" p="" 93333="" #93360="">: > > This is somewhat of a subjective decision, particularly based on plots. I > typically don't have the need to use fdrtool to re-calculate the adjusted > p-values. It would help if you could add plots, and also all your code. > Note that you should note use fdrtool in combination with lfcThreshold > 0. > > ------------------------------ > > Post tags: deseq2, FDR, fdrtool > > You may reply via email or visit https://support.bioconductor. > org/p/93333/#93360 >
ADD REPLY
0
Entering edit mode

As Mike said, this is a quite subjective decision. If the p-value is hill shaped, the p-values are on avarage "higher than expected" leading to a potentially to a lower power.

However, Wolfgang pointed me to the fact that these histogram shapes can result from batch effects in your data that overlap experimental groups. See the following simulated example, where samples 6-15 are (6-10 are in group 1, 11-15 are in group two) are affected by the same batch effect.

library(tidyverse)
library(genefilter)

n <- 10000
m <- 20
x <- matrix(rnorm(n*m), nrow = n, ncol = m)
fac <- factor(c(rep(0, 10), rep(1, 10)))
rt1 <- rowttests(x, fac)

qplot(rt1$p.value, fill = I("tan3"))

x[, 6:15] <- x[, 6:15]+5
rt2 <- rowttests(x, fac)

qplot(rt2$p.value, fill = I("coral3"))

Therefore I suggest that you take a look at the PCA plot to try to see whether  you have any clustering by batches and then try to see whether, when you run the DE analysis between batches, you get  "nice" p-value histograms.

Best wishes,

Bernd

 

ADD REPLY
0
Entering edit mode

HI Bernd,

I tried to find batch effect on samples but it didnt seems to be like presence of batch effect. The historgrams of specific comparisons of a particular time point follows hill-shaped structure even after the fdr tool correction. The z-score computed for that specific results are z-Score(sd: 0.644; eta0= 0.9). Now what does that imply, is there any problem with the samples? The number significant(padj <0.05) of differential expressed genes for this specific comparison was around 50. Will few number of diff. genes also influence hill shaped histogram? Kindly guide me as this is the first time I am facing this type of problem

ADD REPLY
0
Entering edit mode
Bernd Klaus ▴ 610
@bernd-klaus-6281
Last seen 6.1 years ago
Germany

If you are sure that there are no batch effects, try to run a stronger prefiltering of your data, e.g.

 rowSums(counts(dds)) > 20 

or something like that since you had spikes near 1 in the p-value histograms you sent to my by mail. The fdrtool results cannot directly indicate whether there is a problem with the samples, but it might be confused by the spikes near 1. The number of differentially expressed genes will influence the et0 that FDR tool estimates. An eta0 of 90% means that 90% of your genes are probably not differentially expressed.

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 885 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6