Question

Counts Worry: DESeq2: DEGs are from genes with mostly zero across groups

0

Entering edit mode

hermanapis • 0

@hermanapis-20080

Last seen 6.2 years ago

I have been tasked with looking for differential expression in a colleague's data set. The data set consists of 24 RNAseq samples, made up of 4 groups of 6 samples.

I used DESeq2 with everything set default, and I am a little concerned with my results.

We have very few genes differentially expressed (2-4 per comparison), and those that seem to be from comparisons that are 0s compared to maybe 2 samples with counts when comparing between a treatment (named BOTH) group of 6 and group of 6 control (named CLEAN). picture of results

Is it valid to use these genes where maybe 2 samples are driving DE between the groups?

deseq2 DE DEG • 2.2k views

ADD COMMENT • link updated 6.2 years ago by Michael Love 43k • written 6.2 years ago by hermanapis • 0

0

Entering edit mode

Are the differences in the gene expression biologically plausible? I think it makes sense if a gene is turned on/off depending on the environment/condition.

I am not familiar with DESeq2, but with edgeR and limma you can use their robust setting to minimise the effect from outliers to DGE analysis.

Have you tried doing PCA plots for your samples?

ADD REPLY • link 6.2 years ago mikhael.manurung ▴ 280

0

Entering edit mode

Just a note about terminology: I'd argue those counts are not outliers. If you have 3/6 samples with a high count, which is the outlier, the 0's or the high counts?

ADD REPLY • link 6.2 years ago Michael Love 43k

score 1 · Answer 1 · 2019-03-04

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

So arguably, these genes are showing some differences across the condition, in that you have e.g. 2-3 samples out of 6 with high counts vs all zeros in the other group. It's hard to have a statistical method not find these differences.

If you want to remove such genes manually, you could use a simple filter:

keep <- rowSums(counts(dds) >= 10) >= n
dds <- dds[keep,]
dds <- DESeq(dds)
...

This will require at least n samples to have a count of 10 or higher. Above you are saying that n=3 is too few, so you can increase to 4, or even 6.

ADD COMMENT • link 6.2 years ago Michael Love 43k

0

Entering edit mode

Dr. Love,

Thank you for taking the time to respond and giving the filter code. Have you encountered this situation before or seen it reported in other studies? Don't want to overstate anything about these genes if it is just a fluke of small sampling and underlying genetic variation of our samples, but it would be nice to delve into these genes if they are truly valid indicators of our treatment. It is just annoying since the only DEGs from the study match this pattern of 0 counts in one group and multiple counts in another (usually below 3 samples with counts).