Question

DE analysis outlier removal

0

Entering edit mode

knholm • 0

@knholm-18825

Last seen 4.3 years ago

We would like to remove counts that are >2 SD away from the mean within each group, as was directed by our statistician.

After performing DESeq Differential Expression Analysis on our raw counts and obtaining normalized count values, we found some genes that contained extreme outliers within their normalized count values.

First, is this a an appropriate task for differential expression analysis, or does it violate any rules within DEG analysis?**

My data has two treatment groups, each with 7-12 subjects.

If I remove the outliers ( | NormalizedCounts | > 2 SD from group mean of counts ), I am not certain how to perform differential expression analysis on the normalized counts.

I read the documentation on the DESeq(dds, minReplicatesForReplace = Inf)

function, but am unclear if that would remove the outlier filter built into DESeq, or if there are other parameters I can set it to customize the outlier threshold.

If I can't customize the outlier threshold in DESeq when using my raw count values as input, is there a way to run analysis on normalized counts (after outlier removal)?

deseq2 outlier normalization removal • 1.0k views

ADD COMMENT • link updated 5.1 years ago by Michael Love 43k • written 5.1 years ago by knholm • 0

score 1 · Answer 1 · 2020-03-18

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

I wouldn't remove outliers based on SD.

We have a formal outlier procedure in DESeq2 which has already been tested during the development and 2014 publication, which I would recommend instead if you are worried about the effect of outliers.

Note that setting minReplicatesForReplace = Inf turns off outlier replacement, but it will still filter genes (set p-values to NA) which contain outliers.

There are parameters for the outlier threshold, see cooksCutoff argument in ?results.

ADD COMMENT • link 5.1 years ago Michael Love 43k

0

Entering edit mode

Great thank you, that confirms what I thought regarding the simple statistics-based removal of outliers vs. identification of outliers using DESeq2.

I keep looking into cooksCutoff - I'm not quite sure I understand how to modify it yet but will continue to explore it and see how parameters can be adjusted.

I'll post any code I find to be successful as well.

ADD REPLY • link 5.1 years ago knholm • 0

0

Entering edit mode

See the 2014 DESeq2 paper for details on the Cook's statistic.

ADD REPLY • link 5.1 years ago Michael Love 43k