Question

Cook's distance outlier detection

0

Entering edit mode

hpapoli • 0

@hpapoli-24100

Last seen 2.4 years ago

Sweden

cook's distance

Hello,

I''m working with natural population with expected higher within group variance in gene expression. In the figure below, the first 5 samples belong to species 1 and the last 3 samples belong to species 2.

Here, I see the samples of species 2 all have a higher value of cook's distance when compared to species 1 samples. The summary of the results function is as follows:

out of 13647 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 956, 7%
LFC < 0 (down)     : 416, 3%
outliers [1]       : 631, 4.6%
low counts [2]     : 1058, 7.8%

So, I am observing some hundreds of outliers. I was wondering what would be the best way to move forward?

Thanks in advance

DESeq2 cook'sdistance • 2.2k views

ADD COMMENT • link 4.1 years ago hpapoli • 0

score 2 · Accepted Answer · 2021-03-16

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

The values < 1 are not very extreme, it's the ones > 1 we typically are interested in looking into.

I would recommend to look at some of the filtered genes to see if there are outliers, using plotCounts. You can find them by asking for which genes you have pvalue=NA. This will help you assess what type of count distribution is being flagged as containing outliers.

Often they can be filtered out with a simple count filter, e.g.

keep <- rowSums(counts(dds) >= 10) >= 3
dds <- dds[keep,]

ADD COMMENT • link 4.1 years ago Michael Love 43k

0

Entering edit mode

Thank you very much. I filtered the data as above. I have one question. I have a hybrid between two species and in this case, the outliers might be interesting since the ovaries in hybrids are malfunctional. I was wondering how I could extract the reported outliers from the results table? Thanks again!

ADD REPLY • link 4.1 years ago hpapoli • 0

1

Entering edit mode

See vignette section on "Access to all calculated values".

ADD REPLY • link 4.1 years ago Michael Love 43k

0

Entering edit mode

See vignette section on "Access to all calculated values".

ADD REPLY • link 4.1 years ago Michael Love 43k

0

Entering edit mode

Really great, thank you!

ADD REPLY • link 4.1 years ago hpapoli • 0