Calculate Cook's cutoff per comparison in DESeq2
1
0
Entering edit mode
nikostr • 0
@user-24161
Last seen 4.2 years ago

We have the results of an RNA-seq experiment with 4 time points and 3 biological replicates. We are doing pairwise comparisons of the time points. We note that we have the same number of outliers for each comparison. This, and the fact that we have one single matrix of Cook's distances, leads us to understand that the Cook's distance is only calculated once for each sample and gene, and that a gene is considered an outlier for all comparisons if the Cook's distances is too big for one of the samples. This means that a gene may be flagged as an outlier even if none of the outlier samples is included in a specific comparison. This also means that a gene may be considered to be differentially expressed with a passing padj while being caused entirely by a single extreme data point, as long as similar values are found in other time points.

We assume that the way to bypass this would be to separate the input data into separate objects for each comparison. Would this make sense? Are there any draw backs to this that we should be aware of?

Cook'scutoff DESeq2 • 2.2k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 23 hours ago
United States

This all makes sense, and it is true that Cook's outlying-ness looks across all samples regardless of the contrast used in results().

You can either use separate objects, or you could turn off the automatic outlier flagging cooksCutoff=FALSE and use custom code, e.g.:

cooks <- assays(dds)[["cooks"]]
res <- results(dds, ...)
res$numOutliers <- rowSums( cooks[ , relevantSamples ] > threshold )
ADD COMMENT

Login before adding your answer.

Traffic: 875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6