DESeq2 batch correction best practice.
1
0
Entering edit mode
94133 • 0
@94133-14305
Last seen 4.5 years ago
USA, Stanford

I have a clear batch effect that's caused by sequencing paired-end vs. single-end on different days. I'd like to correct for this in the DESeq2 analysis as suggested in the vignette ("If there is unwanted variation present in the data (e.g. batch effects) it is always recommend to correct for this").

I added the sequencer batch effect to the design but only see a very modest change in the PCA plot that's produced. Does this truly reflect the change in the design?

ddsTxi <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ sequencer + condition)

vsd <- vst(ddsTxi, blind = FALSE)
plotPCA(vsd, returnData=FALSE, intgroup="replicate")

I then estimated the batch effect with RUVseq and limma, which both do a decent job of correcting the batch effect via PCA. What's best practice here? I'm thinking to use PCA to inform best batch correction. Is there a recommended vignette for adding limma/RUVseq correction to the design?

Thank you kindly for your time.

deseq2 limma RUVSeq • 23k views
ADD COMMENT
0
Entering edit mode

Some practical advise: if the thing is just single vs. paired then simply take only the R1 fastq file from the paired-end data and treat it as single-end. Repeat mapping/quantification and be done with it. This would eliminate that batch. You can always treat paired-end as single-end data but obviously not vice versa. Is this the only source of batch effect here? It would be helpful to add some more details like the PCA plot and a table describing all relevant metadata of this experiment.

ADD REPLY
0
Entering edit mode

Thanks for this suggestion! Yes, this is certainly something that I should do. I suspect that PE vs SE is not the only batch because the libraries were prepared at different times and sequenced on different machines etc. but will be interesting to compare the results.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

“I added the sequencer batch effect to the design but only see a very modest change in the PCA plot that's produced. Does this truly reflect the change in the design?”

The vignette has a frequently asked question section and your question is answered there.

ADD COMMENT
0
Entering edit mode

Mike, thanks for your reply. It's not clear to me how removeBatchEffect compares to including a DESeq2 batch effect variable. Is it just that removeBatchEffect is a good estimate of how DESeq2 handles the batch effect? I've scoured the vignette and online and just haven't seen this explained, but I'm probably just not understanding. Thanks again.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Yes, I think I understand that part, but that's not my question. How does the result of DESeq2 variation removal in the counts from variables in the design compare to removeBatchEffect?

ADD REPLY
0
Entering edit mode

Let's talk about these two options:

1) DESeq() with ~batch + condition 2) PCA plot of data across condition after having run removeBatchEffect

These are kind of conceptually similar but there are differences. (1) uses counts and a GLM, while (2) is working on transformed values (approximately log2 scaled counts). (1) estimate the contribution from batch and condition simultaneously while (2) first removes batch variation first, then plots the points coloring by condition. Actually, if you use the design argument in removeBatchEffect, then it is more similar to (1) in that it estimates the batch and condition effect simultaneously.

ADD REPLY

Login before adding your answer.

Traffic: 857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6