DESeq2 reports high logFC but not in matrix
1
1
Entering edit mode
Raj • 0
@raj-9784
Last seen 3.1 years ago
USA

Hi , Could someone please suggest a probable reason for the following contradiction I see with DESeq2?

DESeq2 reports high logFC but the same gene expression median across condition in both normalized counts (obtained from DESeq2) and raw counts in matrix is not difference as reported. For example, DESeq2 reports a logFC as 5 and I see literally 0 difference between the median of gene expression between groupA and groupB.

Code used is straight forward from tutorial '''

DF.CD = data.frame(condition=factor(treat) rownamesDF.CD) = as.character(patient_ID) all.equal(colnames(matrix), rownamesDF.CD))#[1] TRUE

DDS = DESeqDataSetFromMatrix(countData = matrix, colData = DF.CD, design = ~ condition) DDS.ALL = DESeq(DDS, test="Wald",fitType = "parametric") RES = results(DDS.ALL, contrast=c("condition","groupA","groupB"),alpha=0.05)

I thought matrix might be an issue so I looked at the difference in both raw counts and normalized counts. Both say the same trend.

Thanks much for helping,

deseq2 • 894 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 21 hours ago
United States

Can you post plotCounts for the gene with large LFC?

ADD COMMENT
0
Entering edit mode

Hi Michael, Thanks much for reply. Please find the image at DEG in raw counts, normalized counts and DESeq2 plotCounts

Here is the result for the gene look like

> RES.PosNeg[which(RES.PosNeg$log2FoldChange > 5 & RES.PosNeg$padj < 0.05),]
log2 fold change (MLE): condition Positive vs Negative 
Wald test p-value: condition Positive vs Negative 
DataFrame with 1 row and 6 columns
                         baseMean   log2FoldChange            lfcSE
                       <numeric>        <numeric>        <numeric>
ENSG00000173237 12.3322353283969 5.04851575572932 1.08757304864703
                            stat               pvalue                padj
                       <numeric>            <numeric>           <numeric>
ENSG00000173237 4.64200152992926 3.45050305598895e-06 0.00185415246358606
ADD REPLY
1
Entering edit mode

This looks like the LFC is explained by four samples with large norm counts (100-5000) in the positive group. So makes sense to me, and is not surprising.

If you want to test for differences and have less sensitivity to such large count samples you can use methods such as SAMseq or Swish, which implement Wilcoxon testing (Swish is designed for input from Salmon). Alternatively testing on the log2 scale as is done is Lima-voom would deprioritize such genes likely.

ADD REPLY

Login before adding your answer.

Traffic: 621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6