Question

Question on the fold change estimate

0

Entering edit mode

ting.chen.lrx • 0

@ea24864c

Last seen 3.3 years ago

United States

Dear Michael,

I have a raw count dataset with three conditions (A,B,C) and each condition has 2-3 biological replicates. I calculated the normalized counts (not log2 transformed) in DESeq2, and for instance the following lists the normalized count of a gene for each replicate.

#        A1        A2        A3        B1        B2        B3       C1       C2  
#108.74237   86.54679  109.39501  562.57695  570.28352  481.73985 1537.51446 1571.32569

I tried to estimate the fold change between C and A simply by calculating the ratio. That is, the mean of normalized counts in C divided by the mean of normalized counts in A is 15.31.

I also tried transforming the raw counts through the rlog method (set blind = FALSE), and the following gives the rlog values for the same gene in each replicate:

#        A1        A2        A3        B1        B2        B3       C1       C2 
# 7.956540  7.925518  7.983866  9.034676  9.043481  8.896545 10.052179 10.091817

I estimated the log2 fold change (C vs A) based on the rlog values, that, the mean of rlog values in C minus that in A. The resulting fold change estimate will be 4.34, much less than 15.31 above.

I understand that DESeq2 shrinks the log fold change to take care of genes of low counts, so it’s not simply the ratio of normalized counts. I also calculated the shrunken fold change (that is, 15.24) in DESeq2 and compared that to the one (15.31) I calculated previously, and they were very close.

After looking at the manual and paper, I thought log fold change shrinkage and the rlog method were conceptually consistent since they both tried to minimize the differences among samples of genes with low counts. But I am wondering why the fold change estimate using the rlog values will turn out to be so different. I would appreciate it very much that you would give any comments or correct my thought.

Also, the goal of my analysis to identify expression patterns of genes among the three conditions. I think transformed data from either rlog or vst would be a better choice to be used for the hierarchical clustering of the genes as the dependence of variance on the mean is removed and this might lead to more accurate clustering than using the normalized counts (without variance stabilization). Wondering if my understanding is correct.

Thanks so much for your help in advance!

DESeq2 • 3.9k views

ADD COMMENT • link updated 3.3 years ago by Michael Love 43k • written 3.4 years ago by ting.chen.lrx • 0

score 0 · Answer 1 · 2021-09-30

I estimated the log2 fold change (C vs A) based on the rlog values, that, the mean of rlog values in C divided by that in A. The resulting fold change estimate will be 4.34, much less than 15.31 above.

rlog is on the log2 scale, so you should subtract if you wanted to compare.

See the vignette for details on the arguments, you can use e.g. ashr with contrast to compare pairs of conditions.

I understand that DESeq2 shrinks the log fold change to take care of genes of low counts, so it’s not simply the ratio of normalized counts. I also calculated the shrunken fold change (that is, 15.24) in DESeq2 and compared that to the one (15.31) I calculated previously, and they were very close.

Yes, we recommend using lfcShrink with either apeglm or ashr. Ashr would allow you to do pairwise comparisons here using contrast.

Also, the goal of my analysis to identify expression patterns of genes among the three conditions. I think transformed data from either rlog or vst would be a better choice to be used for the hierarchical clustering of the genes as the dependence of variance on the mean is removed and this might lead to more accurate clustering than using the normalized counts (without variance stabilization). Wondering if my understanding is correct.

Yes, I'd recommend VST then clustering. We prefer VST a bit over rlog now, but both are options. VST is faster.

score 0 · Answer 2 · 2021-09-30

If you want to kind of replicate the fold change on your own, take the normalized counts, (not rlogged) get the ratio of the averages of your groups, and compare that to the unshrunken fold change. Unless you've got a whopper of another element in your design, the two values should be quite close.

rlogged and vst values are not used in log fold calculations.