Question

Calculate Cook's cutoff per comparison in DESeq2

0

Entering edit mode

nikostr • 0

@user-24161

Last seen 4.2 years ago

We have the results of an RNA-seq experiment with 4 time points and 3 biological replicates. We are doing pairwise comparisons of the time points. We note that we have the same number of outliers for each comparison. This, and the fact that we have one single matrix of Cook's distances, leads us to understand that the Cook's distance is only calculated once for each sample and gene, and that a gene is considered an outlier for all comparisons if the Cook's distances is too big for one of the samples. This means that a gene may be flagged as an outlier even if none of the outlier samples is included in a specific comparison. This also means that a gene may be considered to be differentially expressed with a passing padj while being caused entirely by a single extreme data point, as long as similar values are found in other time points.

We assume that the way to bypass this would be to separate the input data into separate objects for each comparison. Would this make sense? Are there any draw backs to this that we should be aware of?

Cook'scutoff DESeq2 • 2.2k views

ADD COMMENT • link updated 4.2 years ago by Michael Love 43k • written 4.2 years ago by nikostr • 0

score 2 · Accepted Answer · 2020-11-16

This all makes sense, and it is true that Cook's outlying-ness looks across all samples regardless of the contrast used in results().

You can either use separate objects, or you could turn off the automatic outlier flagging cooksCutoff=FALSE and use custom code, e.g.:

cooks <- assays(dds)[["cooks"]]
res <- results(dds, ...)
res$numOutliers <- rowSums( cooks[ , relevantSamples ] > threshold )