Question

subsetting VST counts from one dataset

0

Entering edit mode

A ▴ 60

@a-14337

Last seen 16 months ago

United Kingdom

Hi!!,

I have generated a DESeq data set, run the model and generated VST counts (vst counts code a follows):

vsd<-vst(dds, blind = TRUE)

##Remove batch effects from vsd

library(limma)
mat <- assay(vsd)
mat <-removeBatchEffect(mat, vsd$Batch)
assay(vsd) <- mat

and then generate a vst counts table
write.csv(etc)

I use the whole counts table for a downstream ML exercise which works well..

However, i also need to subset this main vst counts in to two tables (age 1 and 2) and (3,4,5,6)... run the same ML exercise and compare results. This works fine, but my questions is:

Is this is a valid approach to subsetting the data frame because the VST counts have been generated with ALL the data and therefore variance stabilising considers ALL samples.

Or is a more valid approach to subset the DESeq2 object by age 1 and 2 and by age 3,4,5,6 -- and generate VST counts individually?

I am wondering how much the counts will change if i subset them individually and generate VST counts versus generating VST counts as a whole from ALL, and then just subsetting that matrix.

thanks!

deseq2 rna-seq • 2.0k views

ADD COMMENT • link updated 4.9 years ago by Michael Love 43k • written 4.9 years ago by A ▴ 60

score 2 · Answer 1 · 2020-05-06

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

I wouldn't worry about subsetting the VST, all it did was examine the mean-variance trend over all genes and samples, and produce an approximately stabilizing transformation.

But if you are concerned about this, you could just use the shifted logarithm normTransform() with a pseudocount of 5 or so. You can examine with meanSdPlot as we do in the workflow if the pseudocount is large enough to stabilize the variance on your counts.

ADD COMMENT • link 4.9 years ago Michael Love 43k

0

Entering edit mode

wonderful thank you so much for the quick response. i think i will continue t simply generate VST counts across all samples and simply subset the bigger matrix after it has been generated.

i will also do the exploratory analysis as you describe

thanks again!

ADD REPLY • link 4.9 years ago A ▴ 60