Hi all,
Was just wondering if somebody would be able to clarify something for me regarding variance stabilising transformation and batch correction and subsequently extracting a matrix of batch corrected VST counts.
I have run an experiment as follows:
DESeqDataSetFromMatrix(countData = countdata, colData = sampledata, design = ~ Organ +Extraction+ Age )
DESeq(dds, reduced = ~Organ+Extraction, test = "LRT", parallel = TRUE)
Extraction being the batch (for this run, only two batches, 1 and 2). And i only want to see DE genes as a result of Age. Organ and batch (extraction) effects are therefore included in the reduced model. I am happy with the inclusion of extraction in the reduced model and i cannot see any clear batch related effects when plotting PCA and there is a good mix amongst batches.
I would like to do further downstream analysis away from DESeq2 however and so need to take a log or VST transformed counts table for this analysis. Although these effects are modelled within DESeq2, is:
vsd <- vst(dds)
and then extracting counts from this taking in to account the batch effects across samples? or does the variance stabilisation automatically take care of this?
If not, is there a way to extract a VST of counts with batch effects accounted for?
My downstream analysis is machine learning classification. I have been using a VST counts matrix till now which has caused no real issues during classification across all samples and ages etc, however I will soon have an additional 3 batches so want to make sure these effects are completely minimised in the future.
Many thanks!!
Hi,
have a look at this post: https://support.bioconductor.org/p/62954/
Thank you! so by running
removeBatchEffect
in limma, the mean shifts are removed in the same way they would be when including batch in the reduced model? is this a correct interpretation of Michaels comment?so then, would:
produce the counts matrix I am after?
Yes, we actually have it in the FAQ now:
http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-after-vst-are-there-still-batches-in-the-pca-plot
Brilliant! thank you so much! i will proceed like this!
The code above for generating the new counts matrix is ok?
thanks!
I said yes and it's also what's listed in the link I sent, no?
you did, apologies, i wasn't sure if yes was referring to the whole statement or the interpretation about mean shift removal! thanks again!