Question

DESeq2 lfcThreshold question: different numbers when filtering in results() vs. filtering afterwards?

0

Entering edit mode

psutton • 0

@psutton-20110

Last seen 6.0 years ago

I have a question about lfcThreshold in DESeq2. In the sample code below, I use the airway demo data, have a variable lfcThreshold = log2(1.5).

The variable res1 contains results() without an lfcThreshold (i.e. default 0). Then, res2 is the result of filtering res1 for abs(res1$log2FoldChange) > lfcThreshold. Finally, res3 contains results() with the lfcThreshold explicitly set.

Naively, I expect res2 and res3 to be the same, but they are not (see output below). Can someone help me understand why they are not the same?

library("airway")
data("airway")
se <- airway

library("DESeq2")
dds <- DESeqDataSet(se, design = ~ cell + dex)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds <- DESeq(dds)
alpha <- 0.1
lfcThreshold <- log2(1.5)
contrast_dex <- c("dex", "trt", "untrt")

res1 <- results(dds, contrast = contrast_dex, alpha = alpha)
summary(res1)

res2 <- res1[ abs(res1$log2FoldChange) > lfcThreshold, ]
summary(res2)
min(abs(res2$log2FoldChange))

res3 <- results(dds, contrast = contrast_dex, alpha = alpha, lfcThreshold = lfcThreshold)
summary(res3)

which gives the summary outputs:

> res1 <- results(dds, contrast = contrast_dex, alpha = alpha)
> summary(res1)

out of 22369 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 2610, 12%
LFC < 0 (down)     : 2224, 9.9%
outliers [1]       : 0, 0%
low counts [2]     : 4337, 19%
(mean count < 5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> 
> res2 <- res1[ abs(res1$log2FoldChange) > lfcThreshold, ]
> summary(res2)

out of 5990 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 1121, 19%
LFC < 0 (down)     : 1139, 19%
outliers [1]       : 0, 0%
low counts [2]     : 2394, 40%
(mean count < 5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> min(abs(res2$log2FoldChange))
[1] 0.5849997

> 
> res3 <- results(dds, contrast = contrast_dex, alpha = alpha, lfcThreshold = lfcThreshold)
> summary(res3)

out of 22369 with nonzero total read count
adjusted p-value < 0.1
LFC > 0.58 (up)    : 401, 1.8%
LFC < -0.58 (down) : 283, 1.3%
outliers [1]       : 0, 0%
low counts [2]     : 3470, 16%
(mean count < 3)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

deseq2 • 929 views

ADD COMMENT • link updated 5.9 years ago by Michael Love 43k • written 5.9 years ago by psutton • 0

score 0 · Answer 1 · 2019-03-27

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 15 hours ago

United States

Take a look at the DESeq2 paper or vignette on what LFC threshold tests are doing.

ADD COMMENT • link 5.9 years ago Michael Love 43k

0

Entering edit mode

Thanks Michael, I think this section from the DESeq2 paper answers my question:

"A common procedure is to disregard genes whose estimated LFC βir is below some threshold, | βir | ≤ θ. However, this approach loses the benefit of an easily interpretable FDR, as the reported P value and adjusted P value still correspond to the test of zero LFC. It is therefore desirable to include the threshold in the statistical testing procedure directly, i.e., not to filter post hoc on a reported fold-change estimate, but rather to evaluate statistically directly whether there is sufficient evidence that the LFC is above the chosen threshold."

ADD REPLY • link 5.9 years ago psutton • 0