We would like to remove counts that are >2 SD away from the mean within each group, as was directed by our statistician.
After performing DESeq Differential Expression Analysis on our raw counts and obtaining normalized count values, we found some genes that contained extreme outliers within their normalized count values.
First, is this a an appropriate task for differential expression analysis, or does it violate any rules within DEG analysis?**
My data has two treatment groups, each with 7-12 subjects.
If I remove the outliers ( | NormalizedCounts | > 2 SD from group mean of counts ), I am not certain how to perform differential expression analysis on the normalized counts.
I read the documentation on the
DESeq(dds, minReplicatesForReplace = Inf)
function, but am unclear if that would remove the outlier filter built into DESeq, or if there are other parameters I can set it to customize the outlier threshold.
If I can't customize the outlier threshold in DESeq when using my raw count values as input, is there a way to run analysis on normalized counts (after outlier removal)?
Great thank you, that confirms what I thought regarding the simple statistics-based removal of outliers vs. identification of outliers using DESeq2.
I keep looking into cooksCutoff - I'm not quite sure I understand how to modify it yet but will continue to explore it and see how parameters can be adjusted.
I'll post any code I find to be successful as well.
See the 2014 DESeq2 paper for details on the Cook's statistic.