I have a clear batch effect that's caused by sequencing paired-end vs. single-end on different days. I'd like to correct for this in the DESeq2 analysis as suggested in the vignette ("If there is unwanted variation present in the data (e.g. batch effects) it is always recommend to correct for this").
I added the sequencer batch effect to the design but only see a very modest change in the PCA plot that's produced. Does this truly reflect the change in the design?
ddsTxi <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ sequencer + condition)
vsd <- vst(ddsTxi, blind = FALSE)
plotPCA(vsd, returnData=FALSE, intgroup="replicate")
I then estimated the batch effect with RUVseq and limma, which both do a decent job of correcting the batch effect via PCA. What's best practice here? I'm thinking to use PCA to inform best batch correction. Is there a recommended vignette for adding limma/RUVseq correction to the design?
Thank you kindly for your time.
Some practical advise: if the thing is just single vs. paired then simply take only the R1 fastq file from the paired-end data and treat it as single-end. Repeat mapping/quantification and be done with it. This would eliminate that batch. You can always treat paired-end as single-end data but obviously not vice versa. Is this the only source of batch effect here? It would be helpful to add some more details like the PCA plot and a table describing all relevant metadata of this experiment.
Thanks for this suggestion! Yes, this is certainly something that I should do. I suspect that PE vs SE is not the only batch because the libraries were prepared at different times and sequenced on different machines etc. but will be interesting to compare the results.