Question

DESeq2 positive results for genes with high samples counts dispersion

0

Entering edit mode

Guillaume Robert • 0

@guillaume-robert-18902

Last seen 4.9 years ago

France/Nantes/Inovarion

Hi all,

Sorry for this naive question, but I haven't been able to find a response anywhere.

I'm using DESeq2 to find DE genes between two conditions on RNAseq public data, with 19 "non responders" samples, and 22 "responders" samples.

After the DE analysis I've checked the samples read counts of the genes with the best adjusted pvalues.

I often see genes with counts that are low for most samples, but there are few samples with very high counts in one condition, which drives the gene to be given DE I guess.

I'm questionning the biological relevance of those results and I'm wondering if I maybe missed or misunderstood something that would avoid those kind of results.

here is my code :

dds <- DESeqDataSetFromMatrix(countData = count_table, colData = coldata, design = ~ condition)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds$condition <- relevel(dds$condition, ref = "R")
dds <- DESeq(dds)
res05 <- results(dds, alpha=0.05)

Thanks in advance for any input !

DESeq2 • 648 views

ADD COMMENT • link updated 5.7 years ago by Michael Love 43k • written 5.7 years ago by Guillaume Robert • 0

score 2 · Accepted Answer · 2019-06-13

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

So if you have an accumulation of high counts in one group, then that's some evidence that the null of LFC=0 is not true. That's what you get with a p-value against a point null.

You may find value however in using lfcShrink to shrink the LFC and then to see if there is a cutoff on the effect size that might be useful. See the sections in the vignette on lfcShrink. This has been the area that we have spent the most time developing the methods lately, e.g. application of apeglm and ashr to DESeq2.