Standardization of rlog transformed counts before PCA
2
0
Entering edit mode
bbao • 0
@8a90e79d
Last seen 2 days ago
United States

Hi, I have a question regarding RNA-seq sample visualization using PCA. In the DESeq2 vignette, counts data that has been transformed using rlog or vst are directly fed into prcomp for PCA, without further scaling/standardization. What is the reasoning for not scaling/standardizing the transformed counts?

One thought I had, in argument against further standardization, is that there is no distortion of the variance due to unit differences since all gene expression levels are in counts.

Out of curiosity, I ran PCA on the rlog transformed counts as well as the (rlog + standardized) counts, and the overall sample clustering patterns remain the same, but the loadings change. I'd like to do some functional analysis of the loadings, so I just want to make sure I'm analyzing the correct PCA plot. Thanks!

RNASeq DESeq2 Visualization • 181 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

VST and rlog contain scaling for sequencing depth.

ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.6k
@atpoint-13662
Last seen 12 hours ago
Germany

I guess you mean scaling in terms of Z-scoring? There are plenty of posts towards scaling in PCA both here on Bioconductor and in other communties such as StackExchange and CrossValidated. There is no general rule or right/wrong here, scaling (or not) simply weights genes differently in PCA. For example, the Bioconductor world (for example the single-cell infrastructure here) typically does not scale data before PCA, while alternative frameworks such as Seurat for sure does, and iirc ScanPy does as well. Choice is yours.

ADD COMMENT

Login before adding your answer.

Traffic: 590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6