FDR adjustment in Time-Course DATA DESEQ2
1
2
Entering edit mode
@jakobpetereit-13576
Last seen 6.7 years ago
Australia,Perth, ARC PEB

Hi,

I am analyzing time course RNA seq data.

My design looks like this:

ID Genotype SA.treatment
1 jc66 0h
2 jc66 0h
3 jc66 0h
4 dsr 0h
5 dsr 0h
6 dsr 0h
7 comp 0h
8 comp 0h
9 comp 0h
10 jc66 0.5h
11 jc66 0.5h
12 jc66 0.5h
13 dsr 0.5h
14 dsr 0.5h
15 dsr 0.5h
16 comp 0.5h
17 comp 0.5h
18 comp 0.5h
20 jc66 3h
21 jc66 3h
22 dsr 3h
23 dsr 3h
24 dsr 3h
25 comp 3h
26 comp 3h
27 comp 3h
28 jc66 6h
29 jc66 6h
30 jc66 6h
31 dsr 6h
32 dsr 6h
33 dsr 6h
34 comp 6h
35 comp 6h
36 comp 6h

To sum up: 3 Genotypes, 4 time points.

I mapped the bam files with star aligner, and extracted raw counts with genomicalignments

I run deseq2 on it with mixed success. My design formula looks like this:

ddstc <- DESeqDataSet(se,~Genotype + SA.treatment + Genotype:SA.treatment)
ddstc <- DESeq(ddstc_universal, test="LRT", reduced = ~Genotype + SA.treatment)

If I look at the pvalue histograms of the individual comparisons, I see, that things are not going all right, here an example pvalue histogramm:

(I switched the filters off, cause there is a heave expression induction upon treatment, and the filter filter all of these)

res <- results(ddstc, name='Genotype_dsr_vs_jc66', test='Wald', independentFiltering = F, cooksCutoff = F) 

hist(res$pvalue, col = "lavender", main = n, xlab = "p-values")

 

I found that this could be cause by the Wald test and I can run a false discovery rate correction, like explained here:

http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html#inspection-and-correction-of-pvalues

and it indeed improves my pvalue histogram and i can add a corrected adjusted pvalue (not perfect, but already a lot better):

res <- res[ !is.na(res$padj), ]
res <- res[ !is.na(res$pvalue), ]
res <- res[, -which(names(res) == "padj")]
FDR.res <- fdrtool(res$stat, statistic= "normal", plot = T)
FDR.res$param[1, "sd"]
res[,"padj"]  <- p.adjust(FDR.res$pval, method = "BH")

hist(FDR.res$pval, col = "royalblue4",main = paste0(n,' - corrected'), xlab = "CORRECTED p-values")

This is all fine and good, but can I apply the same technique for the whole time course experiment?

I took the time course workflow from the RNAseq workflow:

https://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments

resTC <- results(ddsTC)
​betas <- coef(ddstc) %>% na.omit()
colnames(betas)
topGenes <- head(order(resTC$padj),40)
mat <- betas[topGenes, -c(1)]
thr <- 3
mat[mat < -thr] <- -thr
mat[mat > thr] <- thr
pheatmap(mat, breaks=seq(from=-thr, to=thr, length=101),
         cluster_col=FALSE)

If I understand it correctly, it takes the results of the last resultsName, and identifies the most significant genes, by ordering them with their adjusted pvalue (which is the same in all results, as it's globally Hochberg corrected). Then extracting the coefficients for these genes and plotting it into a heatmap.

But can I run the FDR tool on this result and will it correct the pvalues for all result contrasts? My assumption is that the FDR will correct the pvalue only for the last called results (resTC) and adjust the pvalue based on that and will not take any other contrast into account. Which then in turn would mean, that I have different adjusted pvalues in different contrasts, which then in turn means, that the topgenes I want to extract, are not the real topgenes.

So, can I do a FDR correction on the whole experiment ? Does it make sense to do it?

And I have a second question about the interacting terms:

If I understand correctly the results of our example here:

results(ddsTC , name='Genotypedsr.SA.treatment6h') 

this shows the results for my Genotype dsr at 6h compared to the Genotype dsr at 0h, minus the effect on the control at the same time points ?

Is there a way to call the results without reducing it by the control levels ?

at least that's my interpretation of:

# the condition effect for genotype II
# this is, by definition, the main effect *plus* the interaction term
# (the extra condition effect in genotype II compared to genotype I).
results(dds, list( c("condition_B_vs_A","genotypeII.conditionB") ))

 

Cheer

Jakob

 

deseq2 fdrtool genomicalignments • 2.5k views
ADD COMMENT
0
Entering edit mode

hi Jakob,

Can you confirm that in the original pvalue histogram the spike at p=1 is not simply genes with very low counts. 

Can you try using the filter from the beginning, so before running DESeq():

keep <- rowSums(counts(dds, normalized=TRUE) >= 10) >= 3

You don't show the pvalue distribution for the original LRT. Can you please show this? E.g. hist(results(dds)$pvalue, ...)

ADD REPLY
0
Entering edit mode

Hi Michael,

the histogram of the original LRT looks like this:

hist(results(ddstc)$pvalue, col = "lavender", main = 'LRT - original', xlab = "p-values")

 

part of the spike at p=1 will be from very low count genes.  

Do you mean that i should run the filter before using results() ? (I assume sow, cause I cannot get normalized counts before running DESeq())

I tried your filter suggestion with some adjustments:

keep <- rowSums(counts(ddstc, normalized=TRUE) >= 10) >= 3
keep <- which(keep==T) %>% as.data.frame() %>% rownames_to_column() %>% select(rowname)

res <- results(ddstc,independentFiltering = F, cooksCutoff = F, tidy = T) %>% filter(row %in% keep$rowname)
hist(res$pvalue, col = "lavender", main = 'LRT - original - manual filter', xlab = "p-values")

It reduces the spike at p=1, cause it will remove all genes, which would have an adjusted pvalue of NA from filtering. 

The problematic part is, that a result of it is, that the FDRtool has nothing to correct anymore

res <- res[ !is.na(res$padj), ]
res <- res[ !is.na(res$pvalue), ]
res <- res[, -which(names(res) == "padj")]
FDR.res <- fdrtool(res$stat, statistic= "normal", plot = T)

And that it again removes induced genes, genes you can normally not find, but in treatment they show up.

I wrote a custom filter that is not removing high variance genes, but still filters for rowsums:

##high pvalue genes
high_pvalue_genes <- results(ddstc,independentFiltering = F, cooksCutoff = F, tidy = T) %>% filter(pvalue >0.9) %>% select(row)
## normalized counts
norm_high_pvalue_counts <- counts(ddstc, normalized=T) %>%
  as.data.frame() %>%
  rownames_to_column() %>%
  filter(rowname %in% high_pvalue_genes$row) %>%
  column_to_rownames() %>% 
  rowSums() %>% as.data.frame() %>%
  rownames_to_column() %>% filter(`.` < 100)
`%notin%` = function(x,y) !(x %in% y)
res <- results(ddstc,independentFiltering = F, cooksCutoff = F, tidy = T) %>% filter(row %notin% norm_high_pvalue_counts$.)
hist(res$pvalue, col = "lavender", main = 'LRT - original - my filter', xlab = "p-values")

and FDR works again

res <- res[ !is.na(res$padj), ]
res <- res[ !is.na(res$pvalue), ]
res <- res[, -which(names(res) == "padj")]
FDR.res <- fdrtool(res$stat, statistic= "normal", plot = T)
hist(FDR.res$pval, col = "royalblue4",main = 'LRT - original -  my filter - corrected', xlab = "CORRECTED p-values")

But I don't know if that's a reasonable approach or not.

And back to my original question, is it statistically right, to run FDR on the LRT ?

And question two: can I extract fold changes without having them reduced by the main effect ?

 

Cheers

Jakob

 

 

 

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi,

So my answer to this would be, to not use fdrtool without filtering, because this downstream method requires certain properties of the distribution of p-values.

The very small count genes do not produce smooth, uniform p-values (this was noted in the DESeq2 paper, Supp Fig 26).

You should filter first, if you want to achieve a smooth p-value distribution to then operate on. You can do this like so:

dds <- estimateSizeFactors(dds)
keep <- ...
dds[keep,]
dds <- DESeq(dds, ...)
res <- results(dds)

 

ADD COMMENT
0
Entering edit mode

I corrected my answer just now.

ADD REPLY
0
Entering edit mode

Hi,

cheers for the help with the filter. Sadly the FDR doesn't work anymore and my significant pvalues are low in number.

ddstc <- DESeqDataSet(se_overrep_plus_universal, ~Genotype + SA.treatment + Genotype:SA.treatment)
ddstc <- estimateSizeFactors(ddstc)
keep <- rowSums(counts(ddstc, normalized=TRUE) >= 10) >= 3
ddstc <- ddstc[keep,]
ddstc <- DESeq(ddstc,  test="LRT", reduced = ~ ~Genotype + SA.treatment)
res <- results(ddstc,independentFiltering = F, cooksCutoff = F, tidy = T) 
hist(res$pvalue, col = "lavender", main = 'LRT - Michael filter', xlab = "p-values")

res <- res[ !is.na(res$padj), ]
res <- res[ !is.na(res$pvalue), ]
res <- res[, -which(names(res) == "padj")]
FDR.res <- fdrtool(res$stat, statistic= "normal", plot = T)
hist(FDR.res$pval, col = "royalblue4",main = 'LRT - Michael filter - corrected', xlab = "CORRECTED p-values")

 

But I guess I have to live with that.

 

I am also still struggling a little bit with the resultsNames of the dds, can you confirm, that I interpret them correctly ?:

(I reduces the output for visibility)

> resultsNames(ddstc)[c(3,4,8)]

[1] "Genotype_dsr_vs_jc66"     - overall differences between Genotype comp and jc66  (the reference genotype)    
[2] "SA.treatment_0.5h_vs_0h" -  differences between timepoint 0.5h and 0h in the reference Genotype (jc66)     
[3] "Genotypedsr.SA.treatment0.5h" -differences between timepoint 0.5h and 0h in Genotype dsr, minus the response of reference Genotype at the same timepoint

Is there a way to get the output without the reduction of the differences by the reference response? (it would be pretty useful for comparisons excluding the reference genotype)  Or do I have to change my reference genotype and run it all again ?

Cheers

J

 

 

ADD REPLY
0
Entering edit mode

This code above skips the important step of applying the filter ‘keep’

dds <- dds[keep,]

See my code suggestion above.

ADD REPLY
0
Entering edit mode

You are totally right, I corrected the code and pvalue histograms, sadly the result is pretty underachieving.

Do you have any advice for my second question about resultnames and reference genotype ?

 

Cheers

J

ADD REPLY
0
Entering edit mode

Try ~genotype + genotype:treatment

ADD REPLY
0
Entering edit mode

Cheers, that works !!!

Thanks for the advice

ADD REPLY

Login before adding your answer.

Traffic: 643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6