Hello DESeq2 community,
Could you please tell me whether I am interpreting the LFC calculation right.
Raw counts for geneX:
condition1 condition1 condition1 condition1 condition1 condition2 condition2 condition2 condition2
sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9
geneX 60 57 69 36 41 134 114 42 19
DESeq2 Normalized counts for geneX (stored in normaizedcounts):
condition1 condition1 condition1 condition1 condition1 condition2 condition2 condition2 condition2
sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9
geneX 72.0426614733 61.9232190902 61.5316071546 45.8332912714 28.5786292082 116.8379135112 111.1193128782 54.2858927513 14.8136746784
> log2(mean(normaizedcounts[c(rownames(colData(dds))[colData(dds)$condition=="condition1"])])/mean(normaizedcounts[c(rownames(colData(dds))[colData(dds)$condition=="condition2"])]))
[1] -0.4601916
> res = results(dds, alpha=0.05, contrast = c("condition","condition1","condition2"))
> res["geneX",]
log2 fold change (MLE): condition condition1 vs condition2
Wald test p-value: condition condition1 vs condition2
DataFrame with 1 row and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
geneX 429.1487 2.304769 0.6485406 3.553777 0.0003797409 0.01325116
The DESeq2 calculated LFC is 2.3 while calculating the log2 mean of normalized counts in the groups is -0.46. I wanted to know whether the LFC shrinkage is making the difference here. I checked the variance of each group. I found that the variance of "condition2" group for this gene is very high. So is this high difference in LFC calculation because DESeq2 shrinks the LFC of "condition2" group to zero by a higher degree than that of "condition1" samples?
> var(normaizedcounts[,c(rownames(colData(dds))[colData(dds)$condition=="condition1"])]["geneX",])
[1] 289.4949
> var(normaizedcounts[,c(rownames(colData(dds))[colData(dds)$condition=="condition2"])]["geneX",])
[1] 2368.106
Using DESeq2_1.16.1 version.
Thanks,
Yes, there are more conditions. Actually, there are about 45 samples belonging to different conditions. Also, my design has more covariates.
If it is not LFC shrinkage, what could be the reason behind difference in DESeq2 LFC (2.3) and manually calculated LFC?
The LFC is not the ratio of the mean of normalized counts when additional variables are in the design. It's a bit more complex to explain, but this is how you get language in research papers that say "controlling for the variables...". I'd recommend discussing this meaning more in depth with a statistical collaborator if you are interested.
Thank you very much, Michael.