I was asked to run DESeq2 on RNA-seq libraries, 2 replicas x 2 conditions (I know it is not ideal but that's how the data is). I notice that there is a small subset of about 20 genes which only showed up in one replica of one condition (and I think this is due to contamination in the second replica):
raw reads:
cond1.1 cond1.2 cond2.1 cond2.2
geneA 0 0 0 1376
after deseq normalization:
counts(dds, normalized=T)["geneA",]
cond1.1 cond1.2 cond2.1 cond2.2
0.000 0.000 0.000 1265.967
For the rest of the genes the values are pretty consistent in each pair of replicates. I thought that this gene won't be considered significantly changed, but in fact it shows up in the result as significant. Here is how I run the analysis
baseMean log2FoldChange lfcSE stat pvalue padj
gene A 316.49167 24.38369 4.784902 5.095965 3.469687e-07 7.367278e-06
Here's how I run the analysis, just default parameters
design<-data.frame(row.names=colnames(countsTable),condition=c(rep("cond1",2), rep("cond2",2)))
dds<-DESeqDataSetFromMatrix(countData = countsTable, colData=design, design = ~condition)
dds<-DESeq(dds)
results(dds)
I am not a statistician but biologist and I don't understand well the math behind the DESeq2. I was wondering if there is some parameter I can change in the DESeq that will help to increase the stringency of the analysis and exclude these genes, or with two replicates there is nothing that can be done?
Thank you in advance, and I apologize in advance if similar question was answered before!