Normalization by variance stabilizing transformation VST
1
0
Entering edit mode
@6c372dab
Last seen 2.4 years ago
Sweden

Hello!

I am a bit confused about the normalization performed by the DESeq2 varianceStabilizingTransformation() and vst() functions in addition to the actual variance stabilization. My understanding is that the normalization by division by size factors (which are automatically calculated?) corrects for both library size and library composition. But the reference manual specifically states that it corrects for library size, while nothing is mentioned about library composition. Is there something I'm missing here? The use of the variance stabilized data is PCA and heatmap plotting.

Finally, am I correct in assuming the design parameter only affects the variance stabilization in the vst() function, not the additional normalization? The subsetting and stabilization happen first, then the data is normalized as in varianceStabilizingTransformation?

thanks! Kris

library composition vst DESeq2 Normalization • 3.9k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 8 hours ago
United States

Can you say what you mean by library composition? Can you give an example of what you want to control for?

The design is only used for estimating the parameters of the transformation (the design is needed to assess the amount of within-group dispersion) but then afterwards the same transformation is applied to all the samples, so in that way it is not using the sample grouping in applying the transformation. Sample group information is also not used in the size factor calculation.

ADD COMMENT
0
Entering edit mode

With library composition I mean correcting for genes with vastly different expression in one sample compared to others, or only expressed in certain samples, not in others. My understanding of it is that if a specific gene X is very highly expressed in sample A compared to sample B, and you correct only for sequencing depth by calculating cpm for example, the remaining genes in sample A will appear to be much less expressed than in sample B, where gene X has not taken up such a big chunk of the counts, whereas in reality the only DE gene might be gene X.

I believe this is what median of ratios do, but I might be wrong.

ADD REPLY
0
Entering edit mode

DESeq2 (and DESeq, and other methods on Bioconductor) uses a robust estimator for scaling counts that won't be affected by these situations. We use the median feature (in terms of its LFC to a reference) to compute size factors for a sample, rather than just using the ratio of the total count to the total count of a reference.

ADD REPLY
0
Entering edit mode

Ok, that settles it. Thanks a lot!

ADD REPLY

Login before adding your answer.

Traffic: 677 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6