Question

BaseMean and Log2FC values seem weird...

0

Entering edit mode

Anne • 0

@f1da1cd4

Last seen 2.8 years ago

United States

I have a DEseq2 object that have 116 samples grouped by 4 conditions (A, B, C, D).


dds <- DESeqDataSetFromMatrix(Countdata colData =meta, design = ~ Condition)
dds <- DESeq(dds)
res<-results(dds, contrast=c("Condition","B","A")) # B is case , A is control

I looked at the gene "HBA1" in the result table. its baseMean is 28.96456 its log2FoldChange is 2.842622.

The questions that i have as follows.

Why can't I get the same value of baseMean (28.96456) by averaging normalized count values of 116 samples or the samples of conditions A and B for the gene "HBA1"?. The following code process calculation of mean of normalized counts acoss samples. This gives 50.03482... Could you explain why I see the difference between the baseMean in the result table and the following code-generated value?
```
ct=counts(dds,normalized=T)%>%as.data.frame()
tmp=ct%>%rownames_to_column("gene")%>%
filter(gene=="HBA1")%>%
gather(-gene,key="sample",value="gene.exp")%>%
inner_join(meta, by="sample").
mean(tmp$gene.exp)
```

Since the log2FC value is 2.8, it indicates that HBA1 expression is likely to be higher in B compared to A. However, when I actually plotted the normalized count values of the samples grouped by conditions A and B in a form of CDF or box plot, I see the other way around (overall HBA1 expression looked higher in A than B).

Can you explain why the DEseq2 result comes out this way?

For condition A, the average of normalized counts for HBA1 is 35.57144 For condition B, the average of normalized counts for HBA1 is 45.91127

The log2FC value (B/A) can not really be 2.2..

Look forward to hearing from you!

DESeq2 • 1.4k views

ADD COMMENT • link updated 2.8 years ago by ATpoint ★ 4.6k • written 2.8 years ago by Anne • 0

score 0 · Answer 1 · 2022-05-06

The baseMean is the average of the normalized counts per gene across all samples in the object, regardless of the contrasts you test. If you do that manually and it disagrees then you have to check your code, see original source: https://github.com/mikelove/DESeq2/blob/master/R/core.R#L2128

Towards the logFC calculation, it is not the naive difference in normalized counts, it is moderated, please read the paper on what that means https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8

Since the log2FC value is 2.8, it indicates that HBA1 expression is likely to be higher in B compared to A.

For condition A, the average of normalized counts for HBA1 is 35.57144 For condition B, the average of normalized counts for HBA1 is 45.91127

These two statements agree, it is higher in B, so I do not see the problem. Use plotCounts() to check visually. If your manual code disagrees, again, revise your code, make sure there is not a mix of groups.