Dear list,
I am analyzing a RNAseq dataset with a very obvious batch-effect using the DESeq2 (v1.4.5) R package. I account for the batch-effect in my model and in terms of differential expression it really improves the analysis. Now, I would like to visualize the data by principal component analysis and hierarchical clustering, which requires the adjusted values. However, I cannot figure out how to get the batch-corrected values. Thanks you for the help!
Best,
Salah
With all due respec to Gordon Smyth and other authors of limma which is a great package, in my experience the empirical Bayes moderated regression in ComBat (package sva) simply works better than the simple linear model implemented in limma (at least last time I checked limma).
Hi Michael,
In our data, I see a strong batch effect when I just do a straight up rlog transformation. However, remove the batch effects as you suggested (removeBatchEffect()) seems to work. Now, if I want to still look at differentially expressed features, should I just account for the batch effect by adding it as another factor in the design or should I somehow use the batch effect removed values? Personally, I think it will make sense to add the batch effect as a linear factor (design = ~Sample+Batch) and adjust the results for it instead of subtracting the batch effect. But honestly I am not sure.
I know you guys don't suggest using rlog/vst transformed values for differential expression analysis, but if I have to use the batch effect removed data (rlog transformed), then how could I do that?
I hope my question makes sense and isn't that stupid.
Thanks,
Praful
hi Praful,
Yes, you should just add batch as another factor in the design.
We do not have any kind of functionality in DESeq2 for differential expression for non-count data. Using the batch within the count-based GLM is the recommended approach, as we have information about the statistical properties of the raw counts.
Hi Michael,
Thank you for the prompt response.
Praful
Hi, Michael --
Can you explain how what you are suggesting here relates to the use of the 'blind' argument in the rlog and varianceStabilizingTransformation functions? Specifically, and supposing I have included a batch term in my DESeq2 model, if I do rlog(blind=T) followed by removeBatchEffect() to remove mean shifts, how would that differ from just doing rlog(blind=F)? Would not the latter also remove mean shifts due to the batch effect?
Thanks for your help.
Bob
they are fairly unrelated.
'blind' only makes a difference in the estimation of the global shape of mean-dispersion trend (whether or not the dispersions which inform this trend are aware of the sample grouping or not).
either way, the sample information is not *directly* used by the transformation.
there is some more description of blind in the vignette and the man pages for the transformations.