Question

DESeq2 - VST eliminates zero values

0

Entering edit mode

cafosspot • 0

@cafosspot-22364

Last seen 5.2 years ago

Greetings,

I have been following the DESeq2 vignette to analyze a large number of RNA-seq samples. sizeFactors are relatively comparable for each sample. While investigating the various data transformations for visualization purposes, I found the VST (setting blind=false) more effective than using the log-transformed normalized counts (with a pseudocount of +1) at stabilizing the variance over the mean. Rlog seems to just keep running, which I assume is due to having lots of samples. Since I plan on incorporating even more samples, I was going to stick with the VST.

However, I noticed that the transformation appears to eliminate all the zero values from my counts matrix using these particular data. Comparison with the normalized counts suggests that these zero values are simply being scaled up (the value is identical in each case, ~3.5). Samples with higher counts are also scaled up, as would be expected, and biologically the results appear consistent between both the log-transformed normalized counts and the VST counts.

Heatmaps of specific genes of interest look highly similar between the two, inter-sample distances seem to make sense for both, and PCA of the samples show that samples group in a nearly identical and meaningful way, regardless of the input used.

I found a previous post with a similar issue, though it wasn't definitively answered if this is acceptable or not. I've gone through the vignette and the DESeq2 paper to look for insight, but I'm still not sure I understand fully what's happening here.

Thanks for your time.

deseq2 • 3.0k views

ADD COMMENT • link updated 5.2 years ago by Michael Love 43k • written 5.2 years ago by cafosspot • 0

score 1 · Accepted Answer · 2019-11-15

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

What do you mean by "scaled up"?

Take a look at the third plot in this section (three panels of scatterplots)

https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#the-variance-stabilizing-transformation-and-the-rlog

This is the expected behavior of the VST, that 0's are mapped to a certain value instead of to -Infinity.

ADD COMMENT • link 5.2 years ago Michael Love 43k

0

Entering edit mode

Poor choice of words, I meant that samples with higher counts are still higher after the transformation. I was not visualizing what was happening with the lower counts using the VST correctly. This helps a lot. Thanks very much, Michael.

ADD REPLY • link 5.2 years ago cafosspot • 0