I have a question about lfcThreshold in DESeq2. In the sample code below, I use the airway demo data, have a variable lfcThreshold = log2(1.5).
The variable res1
contains results() without an lfcThreshold (i.e. default 0).
Then, res2
is the result of filtering res1 for abs(res1$log2FoldChange) > lfcThreshold.
Finally, res3
contains results() with the lfcThreshold
explicitly set.
Naively, I expect res2
and res3
to be the same, but they are not (see output below). Can someone help me understand why they are not the same?
library("airway")
data("airway")
se <- airway
library("DESeq2")
dds <- DESeqDataSet(se, design = ~ cell + dex)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds <- DESeq(dds)
alpha <- 0.1
lfcThreshold <- log2(1.5)
contrast_dex <- c("dex", "trt", "untrt")
res1 <- results(dds, contrast = contrast_dex, alpha = alpha)
summary(res1)
res2 <- res1[ abs(res1$log2FoldChange) > lfcThreshold, ]
summary(res2)
min(abs(res2$log2FoldChange))
res3 <- results(dds, contrast = contrast_dex, alpha = alpha, lfcThreshold = lfcThreshold)
summary(res3)
which gives the summary outputs:
> res1 <- results(dds, contrast = contrast_dex, alpha = alpha)
> summary(res1)
out of 22369 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 2610, 12%
LFC < 0 (down) : 2224, 9.9%
outliers [1] : 0, 0%
low counts [2] : 4337, 19%
(mean count < 5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
>
> res2 <- res1[ abs(res1$log2FoldChange) > lfcThreshold, ]
> summary(res2)
out of 5990 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 1121, 19%
LFC < 0 (down) : 1139, 19%
outliers [1] : 0, 0%
low counts [2] : 2394, 40%
(mean count < 5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
> min(abs(res2$log2FoldChange))
[1] 0.5849997
>
> res3 <- results(dds, contrast = contrast_dex, alpha = alpha, lfcThreshold = lfcThreshold)
> summary(res3)
out of 22369 with nonzero total read count
adjusted p-value < 0.1
LFC > 0.58 (up) : 401, 1.8%
LFC < -0.58 (down) : 283, 1.3%
outliers [1] : 0, 0%
low counts [2] : 3470, 16%
(mean count < 3)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
Thanks Michael, I think this section from the DESeq2 paper answers my question:
"A common procedure is to disregard genes whose estimated LFC βir is below some threshold, | βir | ≤ θ. However, this approach loses the benefit of an easily interpretable FDR, as the reported P value and adjusted P value still correspond to the test of zero LFC. It is therefore desirable to include the threshold in the statistical testing procedure directly, i.e., not to filter post hoc on a reported fold-change estimate, but rather to evaluate statistically directly whether there is sufficient evidence that the LFC is above the chosen threshold."