I have a DEseq2 object that have 116 samples grouped by 4 conditions (A, B, C, D).
dds <- DESeqDataSetFromMatrix(Countdata colData =meta, design = ~ Condition)
dds <- DESeq(dds)
res<-results(dds, contrast=c("Condition","B","A")) # B is case , A is control
I looked at the gene "HBA1" in the result table. its baseMean is 28.96456 its log2FoldChange is 2.842622.
The questions that i have as follows.
- Why can't I get the same value of baseMean (28.96456) by averaging normalized count values of 116 samples or the samples of conditions A and B for the gene "HBA1"?.
The following code process calculation of mean of normalized counts acoss samples. This gives 50.03482... Could you explain why I see the difference between the baseMean in the result table and the following code-generated value?
ct=counts(dds,normalized=T)%>%as.data.frame() tmp=ct%>%rownames_to_column("gene")%>% filter(gene=="HBA1")%>% gather(-gene,key="sample",value="gene.exp")%>% inner_join(meta, by="sample"). mean(tmp$gene.exp)
- Since the log2FC value is 2.8, it indicates that HBA1 expression is likely to be higher in B compared to A. However, when I actually plotted the normalized count values of the samples grouped by conditions A and B in a form of CDF or box plot, I see the other way around (overall HBA1 expression looked higher in A than B).
Can you explain why the DEseq2 result comes out this way?
For condition A, the average of normalized counts for HBA1 is 35.57144 For condition B, the average of normalized counts for HBA1 is 45.91127
The log2FC value (B/A) can not really be 2.2..
Look forward to hearing from you!