Hi , Could someone please suggest a probable reason for the following contradiction I see with DESeq2?
DESeq2 reports high logFC but the same gene expression median across condition in both normalized counts (obtained from DESeq2) and raw counts in matrix is not difference as reported. For example, DESeq2 reports a logFC as 5 and I see literally 0 difference between the median of gene expression between groupA and groupB.
Code used is straight forward from tutorial '''
DF.CD = data.frame(condition=factor(treat) rownamesDF.CD) = as.character(patient_ID) all.equal(colnames(matrix), rownamesDF.CD))#[1] TRUE
DDS = DESeqDataSetFromMatrix(countData = matrix, colData = DF.CD, design = ~ condition) DDS.ALL = DESeq(DDS, test="Wald",fitType = "parametric") RES = results(DDS.ALL, contrast=c("condition","groupA","groupB"),alpha=0.05)
I thought matrix might be an issue so I looked at the difference in both raw counts and normalized counts. Both say the same trend.
Thanks much for helping,
Hi Michael, Thanks much for reply. Please find the image at DEG in raw counts, normalized counts and DESeq2 plotCounts
Here is the result for the gene look like
This looks like the LFC is explained by four samples with large norm counts (100-5000) in the positive group. So makes sense to me, and is not surprising.
If you want to test for differences and have less sensitivity to such large count samples you can use methods such as SAMseq or Swish, which implement Wilcoxon testing (Swish is designed for input from Salmon). Alternatively testing on the log2 scale as is done is Lima-voom would deprioritize such genes likely.