Differential expression due to extreme outliers in DESeq2
1
0
Entering edit mode
paul.alto ▴ 50
@paulalto-11559
Last seen 5.1 years ago

A couple of weeks or months ago, a question was posted regarding DESeq2's behavior when dealing with samples with one extreme outlier. In that case, DESeq2 identified that gene as DE, although there was only 1 out of N samples with extreme counts.

Michael Love suggested a way to force DESeq2 to look at all replicates.

I've been searching for the post for a while using both Google and the forum search but I can't find it.

Can somebody post the link, or maybe repeat the explanation here?

Thank you.
 

deseq2 outliers • 18k views
ADD COMMENT
0
Entering edit mode

Thanks for your reply, Michael. Here's an example of the case I have in mind. These are TPMs:

wt1    53.18
wt2    2.81
wt3    3.58
wt4    3.03
wt5    2.31
wt6    2.69
ko1    1.79
ko2    2.98
ko3    3.54
ko4    3.35
ko5    3.22
ko6    4.07
ko7    2.1

wt1 has an extremely high TPM compared to the other samples. DESeq2's adjusted P-value is 0.014 and log2 fold change = -3.3335.

ADD REPLY
0
Entering edit mode

So you are saying that the default outlier filtering isn’t working and that you want it to be more aggressive here at filtering the high value/count? What is the Cook’s distance for this gene? (See vignette for how to access).

ADD REPLY
0
Entering edit mode

When I use cooksCutoff this gene has an NA adjusted P-value. I may be misremembering the post I'm looking for, but I thought there was a way to still include these cases but make DESeq2 somehow consider all replicates individually.

In cases like this, the gene is simply excluded from analysis or can it be considered non-DE?

ADD REPLY
0
Entering edit mode

What is the full result row for this gene, using defaults?

ADD REPLY
0
Entering edit mode

baseMeanA           116.779010545804
baseMeanB           11.5170403274273
baseMean              60.099488120524
log2FoldChange    -3.33358245938423
lfcSE                      0.955682659416467
stat                        -3.48816882522457
pvalue                   0.000486340890441655
padj                       0.0138045444267906

ADD REPLY
0
Entering edit mode

I thought you said it had NA for padj? What code do you use to obtain an NA here?

ADD REPLY
0
Entering edit mode

Sorry, I replied too fast. I do get NA for both pvalue and padj using defaults (cooksCutoff = TRUE and independentFiltering = TRUE).

ADD REPLY
0
Entering edit mode

Try minReplicatesForReplace=6

ADD REPLY
0
Entering edit mode

Thanks for the suggestion, Mike. Now that gene has pvalue = 0.31 and padj = 0.58. Another gene that had the same problem also has higher padj.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

I don't follow the problem you're trying to recollect. Do you want to turn off outlier replacement? See here:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#how-can-i-get-unfiltered-deseq2-results

ADD COMMENT

Login before adding your answer.

Traffic: 751 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6