Question

DESeq2 - why genes with very large variance between reps show up as significantly changed?

0

Entering edit mode

ninova • 0

@ninova-23327

Last seen 4.8 years ago

I was asked to run DESeq2 on RNA-seq libraries, 2 replicas x 2 conditions (I know it is not ideal but that's how the data is). I notice that there is a small subset of about 20 genes which only showed up in one replica of one condition (and I think this is due to contamination in the second replica):

raw reads:

cond1.1 cond1.2 cond2.1 cond2.2
geneA      0       0       0    1376

after deseq normalization:

counts(dds, normalized=T)["geneA",]
cond1.1  cond1.2  cond2.1  cond2.2 
0.000    0.000    0.000 1265.967

For the rest of the genes the values are pretty consistent in each pair of replicates. I thought that this gene won't be considered significantly changed, but in fact it shows up in the result as significant. Here is how I run the analysis

baseMean log2FoldChange    lfcSE     stat       pvalue         padj    
gene A 316.49167       24.38369 4.784902 5.095965 3.469687e-07 7.367278e-06

Here's how I run the analysis, just default parameters

design<-data.frame(row.names=colnames(countsTable),condition=c(rep("cond1",2), rep("cond2",2)))
dds<-DESeqDataSetFromMatrix(countData = countsTable, colData=design, design = ~condition)
dds<-DESeq(dds)
results(dds)

I am not a statistician but biologist and I don't understand well the math behind the DESeq2. I was wondering if there is some parameter I can change in the DESeq that will help to increase the stringency of the analysis and exclude these genes, or with two replicates there is nothing that can be done?

Thank you in advance, and I apologize in advance if similar question was answered before!

deseq2 • 720 views

ADD COMMENT • link 4.8 years ago ninova • 0

score 0 · Answer 1 · 2020-04-13

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 1 hour ago

San Diego

If 50% of your samples for one condition show no counts, and 100% of the samples for the other condition have counts...on the face of it, that's a difference between the groups. It might be biologically uninteresting to you, because it's an artifact, but I don't think the software wants to make that call.

ADD COMMENT • link 4.8 years ago swbarnes2 ★ 1.4k