Dear all,
I have performed a Differential Expression Analysis using
dds <- DESeqDataSetFromMatrix(countData = cts, colData = colData, design = ~ Type+ batch)
dds<- DESeq(dds)
I understand that even though I included "batch" in the formula, but I still see a batch effect.
I read this:
"Why after VST are there still batches in the PCA plot? The transformations implemented in DESeq2, vst and rlog , compute a variance stabilizing transformation which is roughly similar to putting the data on the log2 scale, while also dealing with the sampling variability of low counts. It uses the design formula to calculate the within-group variability (if blind=FALSE ) or the acrossall-samples variability (if blind=TRUE ). It does not use the design to remove variation in the data. It therefore does not remove variation that can be associated with batch or other covariates (nor does DESeq2 have a way to specify which covariates are nuisance and which are of interest). It is possible to visualize the transformed data with batch variation removed, using the removeBatchEffect function from limma. This simply removes any shifts in the log2-scale expression data that can be explained by batch. The paradigm for this operation for designs with balanced batches would be: For unbalanced batches (e.g. the condition groups are not distributed balanced across batches), the design argument should be used, see ?removeBatchEffect in the limma package for details."
So I tried to run removeBatchEffect after the above formula, but still it seems to be having some issues.
Would that be the right approach to run removeBatchEffect after dds<- DESeq(dds)?
Thank you, Bine
Thank you for your quick response.
I have created a heatmap and the data seems to be clustering by Batch...
What else could I do apart from adding "batch" as a variable in the design?
Thank you, Bine
Ok, the fact that the heatmap clusters by batch is not a problem for the DE results.
So it doesn't seem there is a problem here.
One suggestion if you are interested is to give ComBat-Seq a try: https://academic.oup.com/nargab/article/2/3/lqaa078/5909519.
It maps count data to batch adjusted count data and in most cases (see the paper) it does a reasonable job preserving the false positive rates (unlike the original ComBat which sometimes led to problems here). After ComBat-Seq you could check your clustering if the batches are removed and then move on the DESeq2 with the adjusted count matrix.
On Oct 21, 2020, at 10:24 AM, Bine [bioc] <noreply@bioconductor.org<a rel="nofollow" href="mailto:noreply@bioconductor.org">noreply@bioconductor.org> wrote:
Activity on a post you are following on support.bioconductor.orghttps://support.bioconductor.org/
User Binehttps://support.bioconductor.org/u/23912/ wrote Comment: DeSeq2 with three batches - Remove batch Effecthttps://support.bioconductor.org/p/134901/#134909:
Thank you for your quick response.
I have created a heatmap and the data seems to be clustering by Batch...
What else could I do apart from adding "batch" as a variable in the design?
Thank you, Bine
Thank you for your quick response.
I have created a heatmap and the data seems to be clustering by Batch...
What else could I do apart from adding "batch" as a variable in the design?
Thank you, Bine