Question

Better to use rlog or vst transformation for data quality assessment of differental expression data based on size factors?

0

Entering edit mode

agdif ▴ 10

@agdif-16034

Last seen 5.8 years ago

Hello,

I finished running DESeq2 on my dataset that includes 6 timepoints and 3 biological replicates per timepoint. I would like to run a PCA analysis for quality assessment but am unsure which count transformation method I should use, vst or rlog.

The calculated the size factors for my dataset (below). Based on this, there does not appear to be a large variation in sequencing depth (dynamic range of size factors ≳ 4, mentioned in Love, Huber, and Anders, 2014) all the samples. However, note that K013 does have a smaller size factor compared to other samples but does not exceed a factor of 4.

sizeFactors (dds)

X12HPA_J022   1.2875797
X12HPA_J024   1.7052146
X12HPA_J050   0.9460303
X1DPA_K001    1.1828260
X1DPA_K011    1.0955579
X1DPA_K121    0.7791666 
X2DPA_K013    0.4708761
X2DPA_K015    1.0936920
X2DPA_K021    1.2141511
X3DPA_K012    0.7602281
X3DPA_K014    1.0525988
X3DPA_K023    0.8606639
X4DPA_K040    0.7807291
X4DPA_K080    1.0124977
X4DPA_K120    1.3053801
X5DPA_K010    0.9436090
X5DPA_K020    1.1898311
X5DPA_K030    1.1898311

In this case would it be better to use rlog and normalize for sequencing depth, or would use of vst be okay? Thanks!

DESeq2 quality assesment vst rlog transformation • 1.5k views

ADD COMMENT • link updated 6.4 years ago by Michael Love 42k • written 6.4 years ago by agdif ▴ 10

0

Entering edit mode

If it helps, I don't think there is usually much difference between the two. I use VST because it is a lot faster... personally.

ADD REPLY • link 6.4 years ago chris86 ▴ 420

score 2 · Answer 1 · 2018-06-07

2

Entering edit mode

Michael Love 42k

@mikelove

Last seen 19 hours ago

United States

We saw a slight improvement of rlog in the DESeq2 paper simulation when the range of size factors was e.g. 10x fold from smallest to largest. And even still VST did pretty well. The range you show is not a problem.

I myself exclusively use vst() these days, for its speed and it is more robust to outliers when the sample size is very large. rlog() will give a warning if the sample size is >30 now.