Hi!!,
I have generated a DESeq data set, run the model and generated VST counts (vst counts code a follows):
vsd<-vst(dds, blind = TRUE)
##Remove batch effects from vsd
library(limma)
mat <- assay(vsd)
mat <-removeBatchEffect(mat, vsd$Batch)
assay(vsd) <- mat
and then generate a vst counts table
write.csv(etc)
I use the whole counts table for a downstream ML exercise which works well..
However, i also need to subset this main vst counts in to two tables (age 1 and 2) and (3,4,5,6)... run the same ML exercise and compare results. This works fine, but my questions is:
Is this is a valid approach to subsetting the data frame because the VST counts have been generated with ALL the data and therefore variance stabilising considers ALL samples.
Or is a more valid approach to subset the DESeq2 object by age 1 and 2 and by age 3,4,5,6 -- and generate VST counts individually?
I am wondering how much the counts will change if i subset them individually and generate VST counts versus generating VST counts as a whole from ALL, and then just subsetting that matrix.
thanks!
wonderful thank you so much for the quick response. i think i will continue t simply generate VST counts across all samples and simply subset the bigger matrix after it has been generated.
i will also do the exploratory analysis as you describe
thanks again!