Question

DESEQ2 LFC calculation

0

Entering edit mode

amalthomas111 • 0

@amalthomas111-13189

Last seen 3.7 years ago

United States

Hello DESeq2 community,

Could you please tell me whether I am interpreting the LFC calculation right.

Raw counts for geneX:
condition1 condition1 condition1 condition1 condition1 condition2 condition2 condition2 condition2
sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9
geneX 60 57 69 36 41 134 114 42 19

DESeq2 Normalized counts for geneX (stored in normaizedcounts):
condition1 condition1 condition1 condition1 condition1 condition2 condition2 condition2 condition2
sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9
geneX 72.0426614733 61.9232190902 61.5316071546 45.8332912714 28.5786292082 116.8379135112 111.1193128782 54.2858927513 14.8136746784

> log2(mean(normaizedcounts[c(rownames(colData(dds))[colData(dds)$condition=="condition1"])])/mean(normaizedcounts[c(rownames(colData(dds))[colData(dds)$condition=="condition2"])]))
[1] -0.4601916

> res = results(dds, alpha=0.05, contrast = c("condition","condition1","condition2"))
> res["geneX",]

log2 fold change (MLE): condition condition1 vs condition2
Wald test p-value: condition condition1 vs condition2
DataFrame with 1 row and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
geneX 429.1487 2.304769 0.6485406 3.553777 0.0003797409 0.01325116

The DESeq2 calculated LFC is 2.3 while calculating the log2 mean of normalized counts in the groups is -0.46. I wanted to know whether the LFC shrinkage is making the difference here. I checked the variance of each group. I found that the variance of "condition2" group for this gene is very high. So is this high difference in LFC calculation because DESeq2 shrinks the LFC of "condition2" group to zero by a higher degree than that of "condition1" samples?

> var(normaizedcounts[,c(rownames(colData(dds))[colData(dds)$condition=="condition1"])]["geneX",])
[1] 289.4949
> var(normaizedcounts[,c(rownames(colData(dds))[colData(dds)$condition=="condition2"])]["geneX",])
[1] 2368.106

Using DESeq2_1.16.1 version.

Thanks,

deseq2 • 2.7k views

ADD COMMENT • link 7.4 years ago amalthomas111 • 0

Michael Love · Answer 1 · 2017-10-27

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 5 days ago

United States

I'm not sure you've got the same gene there. The base mean is 429, but with the normalized counts you gave it should be 63. Are there more conditions? Note that there is no LFC shrinkage here: it says "log2 fold change (MLE)"

ADD COMMENT • link 7.4 years ago Michael Love 43k

0

Entering edit mode

Yes, there are more conditions. Actually, there are about 45 samples belonging to different conditions. Also, my design has more covariates.

> dds  = DESeqDataSetFromMatrix(countData= df, colData= design.matrix, ~ culture + dissociation +  celllinetargetorgan)
> dds =  DESeq(dds)

> res = results(dds, alpha=0.05, contrast = c("celllinetargetorgan","Brx50brai","Brx50H"))
> normalizedcounts = counts(dds, normalized=T)

> normalizedcounts[,c(rownames(colData(dds))[colData(dds)$celllinetargetorgan=="Brx50brai"])]["ENSG00000141655",]
X50brai.56.3 X50brai.58.3 X50brai.83.2 X50brai.84.2 X50brai.59.3 
    72.04266     61.92322     61.53161     45.83329     28.57863

> normalizedcounts[,c(rownames(colData(dds))[colData(dds)$celllinetargetorgan=="Brx50H"])]["ENSG00000141655",]
    X50HR      X50H  X50HD.R1  X50HD.R2 
 54.28589  14.81367 116.83791 111.11931

> res["ENSG00000141655",]
log2 fold change (MLE): celllinetargetorgan Brx50brai vs Brx50H 
Wald test p-value: celllinetargetorgan Brx50brai vs Brx50H 
DataFrame with 1 row and 6 columns
                 baseMean log2FoldChange     lfcSE      stat       pvalue       padj
                <numeric>      <numeric> <numeric> <numeric>    <numeric>  <numeric>
ENSG00000141655  429.1487       2.304769 0.6485406  3.553777 0.0003797409 0.01325116

> log2(mean(normalizedcounts[,c(rownames(colData(dds))[colData(dds)$celllinetargetorgan=="Brx50brai"])]["ENSG00000141655",]) / mean(normalizedcounts[,c(rownames(colData(dds))[colData(dds)$celllinetargetorgan=="Brx50H"])]["ENSG00000141655",]))
[1] -0.4601916

> var(normalizedcounts[,c(rownames(colData(dds))[colData(dds)$celllinetargetorgan=="Brx50brai"])]["ENSG00000141655",])
[1] 289.4949
> var(normalizedcounts[,c(rownames(colData(dds))[colData(dds)$celllinetargetorgan=="Brx50H"])]["ENSG00000141655",])
[1] 2368.106

If it is not LFC shrinkage, what could be the reason behind difference in DESeq2 LFC (2.3) and manually calculated LFC?

ADD REPLY • link updated 7.4 years ago by Michael Love 43k • written 7.4 years ago by amalthomas111 • 0

1

Entering edit mode

The LFC is not the ratio of the mean of normalized counts when additional variables are in the design. It's a bit more complex to explain, but this is how you get language in research papers that say "controlling for the variables...". I'd recommend discussing this meaning more in depth with a statistical collaborator if you are interested.