Combat: Removing multiple batch effects

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.6 years ago

Hi, I am attempting to remove two known batch effects from my Illumina BeadChip expression data. I have the following experimental design: ID Condition Batch Parents 1 wt 1 1 2 ko 1 1 3 wt 1 2 4 ko 1 2 5 wt 2 3 6 ko 2 3 These are 3 pairs of sibling mice each with different parents. Within each pair of siblings I have a wt and ko. On top of this, samples were prepared in 2 separate batches (batch1 = first 4 samples). After background correction and normalisation in limma my MDS plots shows that siblings group together and the first 4 samples are split from the final 2. I would like to adjust my expression data for both of these effects to test for differential expression between wt and ko groups. ComBat seems to be a good choice however I am struggling to figure out the best approach for multiple batches. I'd appreciate if anyone could give me some advice on how to set up ComBat in this context. Thanks in advance for your help Shaun Webb University of Edinburgh -- output of sessionInfo(): R version 3.1.1 (2014-07-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C [4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C attached base packages: [1] splines parallel grid stats graphics grDevices [7] utils datasets methods base other attached packages: [1] pamr_1.54.1 [2] survival_2.37-7 [3] cluster_1.15.2 [4] bladderbatch_1.2.0 [5] statmod_1.4.20 [6] sva_3.10.0 [7] mgcv_1.7-29 [8] nlme_3.1-117 [9] corpcor_1.6.6 [10] illuminaMousev2.db_1.22.1 [11] org.Mm.eg.db_2.14.0 [12] DESeq2_1.4.5 [13] RcppArmadillo_0.4.320.0 [14] Rcpp_0.11.2 [15] DiffBind_1.10.1 [16] GenomicAlignments_1.0.1 [17] BSgenome_1.32.0 [18] Rsamtools_1.16.0 [19] Biostrings_2.32.0 [20] limma_3.20.4 [21] XVector_0.4.0 [22] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0 [23] GenomicFeatures_1.16.2 [24] lumi_2.16.0 [25] methyAnalysis_1.6.0 [26] org.Hs.eg.db_2.14.0 [27] RSQLite_0.11.4 [28] DBI_0.2-7 [29] AnnotationDbi_1.26.0 [30] Biobase_2.24.0 [31] GenomicRanges_1.16.3 [32] GenomeInfoDb_1.0.2 [33] IRanges_1.22.6 [34] BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] AnnotationForge_1.6.1 BBmisc_1.6 [3] BatchJobs_1.2 BiocInstaller_1.14.2 [5] BiocParallel_0.6.1 Formula_1.1-1 [7] Gviz_1.8.3 Hmisc_3.14-4 [9] KernSmooth_2.23-12 MASS_7.3-33 [11] Matrix_1.1-4 R.methodsS3_1.6.1 [13] RColorBrewer_1.0-5 RCurl_1.95-4.1 [15] VariantAnnotation_1.10.1 XML_3.98-1.1 [17] affy_1.42.2 affyio_1.32.0 [19] amap_0.8-12 annotate_1.42.0 [21] base64_1.1 beanplot_1.1 [23] biomaRt_2.20.0 biovizBase_1.12.1 [25] bitops_1.0-6 brew_1.0-6 [27] bumphunter_1.4.2 caTools_1.17 [29] codetools_0.2-8 colorspace_1.2-4 [31] dichromat_2.0-0 digest_0.6.4 [33] doRNG_1.6 edgeR_3.6.2 [35] fail_1.2 foreach_1.4.2 [37] gdata_2.13.3 genefilter_1.46.1 [39] geneplotter_1.42.0 genoset_1.16.2 [41] gplots_2.13.0 gtools_3.4.1 [43] illuminaio_0.6.0 iterators_1.0.7 [45] lattice_0.20-29 latticeExtra_0.6-26 [47] locfit_1.5-9.1 matrixStats_0.10.0 [49] mclust_4.3 methylumi_2.10.0 [51] minfi_1.10.2 multtest_2.20.0 [53] munsell_0.4.2 nleqslv_2.2 [55] nor1mix_1.1-4 pkgmaker_0.22 [57] plyr_1.8.1 preprocessCore_1.26.1 [59] registry_0.2 reshape_0.8.5 [61] rngtools_1.2.4 rtracklayer_1.24.1 [63] scales_0.2.4 sendmailR_1.1-2 [65] siggenes_1.38.0 stats4_3.1.1 [67] stringr_0.6.2 tools_3.1.1 [69] xtable_1.7-3 zlibbioc_1.10.0 -- Sent via the guest posting facility at bioconductor.org.

limma limma • 2.6k views

ADD COMMENT • link updated 10.7 years ago by Ryan C. Thompson ★ 7.9k • written 10.7 years ago by Guest User ★ 13k

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 6 months ago

United States

I believe limma has functionality for removing batch effects, if you prefer that. For ComBat, you would simply use arguments batch = <your sample="" annotation="">$Batch and mod = model.matrix(~1, data = <your sample="" annotation="">). Replace <your sample="" annotation=""> by the name of the variable that holds the sample annotation data frame you have shown in your message. Since your condition is exactly independent of batch and you have so few samples, I would not include it as a covariate. I am not sure how well ComBat would work with just 2 samples in a batch, since it attempts to equalize not just location (think mean) but also scale (standard deviation) of each variable, for which 2 samples is the barest theoretical minimum. HTH, Peter On Fri, Aug 1, 2014 at 8:00 AM, Shaun Webb [guest] <guest at="" bioconductor.org=""> wrote: > > Hi, I am attempting to remove two known batch effects from my Illumina BeadChip expression data. I have the following experimental design: > > ID Condition Batch Parents > 1 wt 1 1 > 2 ko 1 1 > 3 wt 1 2 > 4 ko 1 2 > 5 wt 2 3 > 6 ko 2 3 > > These are 3 pairs of sibling mice each with different parents. Within each pair of siblings I have a wt and ko. On top of this, samples were prepared in 2 separate batches (batch1 = first 4 samples). > > After background correction and normalisation in limma my MDS plots shows that siblings group together and the first 4 samples are split from the final 2. > > I would like to adjust my expression data for both of these effects to test for differential expression between wt and ko groups. ComBat seems to be a good choice however I am struggling to figure out the best approach for multiple batches. I'd appreciate if anyone could give me some advice on how to set up ComBat in this context. > > Thanks in advance for your help > Shaun Webb > University of Edinburgh > > -- output of sessionInfo(): > > R version 3.1.1 (2014-07-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C > [4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C > [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C > > attached base packages: > [1] splines parallel grid stats graphics grDevices > [7] utils datasets methods base > > other attached packages: > [1] pamr_1.54.1 > [2] survival_2.37-7 > [3] cluster_1.15.2 > [4] bladderbatch_1.2.0 > [5] statmod_1.4.20 > [6] sva_3.10.0 > [7] mgcv_1.7-29 > [8] nlme_3.1-117 > [9] corpcor_1.6.6 > [10] illuminaMousev2.db_1.22.1 > [11] org.Mm.eg.db_2.14.0 > [12] DESeq2_1.4.5 > [13] RcppArmadillo_0.4.320.0 > [14] Rcpp_0.11.2 > [15] DiffBind_1.10.1 > [16] GenomicAlignments_1.0.1 > [17] BSgenome_1.32.0 > [18] Rsamtools_1.16.0 > [19] Biostrings_2.32.0 > [20] limma_3.20.4 > [21] XVector_0.4.0 > [22] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0 > [23] GenomicFeatures_1.16.2 > [24] lumi_2.16.0 > [25] methyAnalysis_1.6.0 > [26] org.Hs.eg.db_2.14.0 > [27] RSQLite_0.11.4 > [28] DBI_0.2-7 > [29] AnnotationDbi_1.26.0 > [30] Biobase_2.24.0 > [31] GenomicRanges_1.16.3 > [32] GenomeInfoDb_1.0.2 > [33] IRanges_1.22.6 > [34] BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] AnnotationForge_1.6.1 BBmisc_1.6 > [3] BatchJobs_1.2 BiocInstaller_1.14.2 > [5] BiocParallel_0.6.1 Formula_1.1-1 > [7] Gviz_1.8.3 Hmisc_3.14-4 > [9] KernSmooth_2.23-12 MASS_7.3-33 > [11] Matrix_1.1-4 R.methodsS3_1.6.1 > [13] RColorBrewer_1.0-5 RCurl_1.95-4.1 > [15] VariantAnnotation_1.10.1 XML_3.98-1.1 > [17] affy_1.42.2 affyio_1.32.0 > [19] amap_0.8-12 annotate_1.42.0 > [21] base64_1.1 beanplot_1.1 > [23] biomaRt_2.20.0 biovizBase_1.12.1 > [25] bitops_1.0-6 brew_1.0-6 > [27] bumphunter_1.4.2 caTools_1.17 > [29] codetools_0.2-8 colorspace_1.2-4 > [31] dichromat_2.0-0 digest_0.6.4 > [33] doRNG_1.6 edgeR_3.6.2 > [35] fail_1.2 foreach_1.4.2 > [37] gdata_2.13.3 genefilter_1.46.1 > [39] geneplotter_1.42.0 genoset_1.16.2 > [41] gplots_2.13.0 gtools_3.4.1 > [43] illuminaio_0.6.0 iterators_1.0.7 > [45] lattice_0.20-29 latticeExtra_0.6-26 > [47] locfit_1.5-9.1 matrixStats_0.10.0 > [49] mclust_4.3 methylumi_2.10.0 > [51] minfi_1.10.2 multtest_2.20.0 > [53] munsell_0.4.2 nleqslv_2.2 > [55] nor1mix_1.1-4 pkgmaker_0.22 > [57] plyr_1.8.1 preprocessCore_1.26.1 > [59] registry_0.2 reshape_0.8.5 > [61] rngtools_1.2.4 rtracklayer_1.24.1 > [63] scales_0.2.4 sendmailR_1.1-2 > [65] siggenes_1.38.0 stats4_3.1.1 > [67] stringr_0.6.2 tools_3.1.1 > [69] xtable_1.7-3 zlibbioc_1.10.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.7 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 6 months ago

Icahn School of Medicine at Mount Sinai…

Hi Shaun, If you are just testing fir differential expression between wt and ko, there is no need to preprocess your data to remove the batch effects. Simply include them in your design when you fit your model, and they will be corrected for when testing Condition for differential expression. Also, your two batch effects are exactly aliased (every sibship belongs to exactly one batch), so you should not include both effects in the model, just Parents. This is not a problem unless you want to evaluate the relative effects of Batch and Parents on expression. If all you want to do is correct for both of them, it doesn't matter that they are aliased. So, your design formula would be something like "~Condition + Parents", and then you would test the Condition coefficient for differential expression. For specific code examples, I highly recommend the limma users guide, which gives examples of exactly this kind of situation. Lastly, it's not clear from what you posted whether Batch and Parents are stored as numeric or factor variables. You should ensure that they are factors before you do anything else, otherwise R will erroneously treat them as numerical covariates. -Ryan On 08/01/2014 08:00 AM, Shaun Webb [guest] wrote: > Hi, I am attempting to remove two known batch effects from my Illumina BeadChip expression data. I have the following experimental design: > > ID Condition Batch Parents > 1 wt 1 1 > 2 ko 1 1 > 3 wt 1 2 > 4 ko 1 2 > 5 wt 2 3 > 6 ko 2 3 > > These are 3 pairs of sibling mice each with different parents. Within each pair of siblings I have a wt and ko. On top of this, samples were prepared in 2 separate batches (batch1 = first 4 samples). > > After background correction and normalisation in limma my MDS plots shows that siblings group together and the first 4 samples are split from the final 2. > > I would like to adjust my expression data for both of these effects to test for differential expression between wt and ko groups. ComBat seems to be a good choice however I am struggling to figure out the best approach for multiple batches. I'd appreciate if anyone could give me some advice on how to set up ComBat in this context. > > Thanks in advance for your help > Shaun Webb > University of Edinburgh > > -- output of sessionInfo(): > > R version 3.1.1 (2014-07-10) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C > [4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C > [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C > > attached base packages: > [1] splines parallel grid stats graphics grDevices > [7] utils datasets methods base > > other attached packages: > [1] pamr_1.54.1 > [2] survival_2.37-7 > [3] cluster_1.15.2 > [4] bladderbatch_1.2.0 > [5] statmod_1.4.20 > [6] sva_3.10.0 > [7] mgcv_1.7-29 > [8] nlme_3.1-117 > [9] corpcor_1.6.6 > [10] illuminaMousev2.db_1.22.1 > [11] org.Mm.eg.db_2.14.0 > [12] DESeq2_1.4.5 > [13] RcppArmadillo_0.4.320.0 > [14] Rcpp_0.11.2 > [15] DiffBind_1.10.1 > [16] GenomicAlignments_1.0.1 > [17] BSgenome_1.32.0 > [18] Rsamtools_1.16.0 > [19] Biostrings_2.32.0 > [20] limma_3.20.4 > [21] XVector_0.4.0 > [22] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0 > [23] GenomicFeatures_1.16.2 > [24] lumi_2.16.0 > [25] methyAnalysis_1.6.0 > [26] org.Hs.eg.db_2.14.0 > [27] RSQLite_0.11.4 > [28] DBI_0.2-7 > [29] AnnotationDbi_1.26.0 > [30] Biobase_2.24.0 > [31] GenomicRanges_1.16.3 > [32] GenomeInfoDb_1.0.2 > [33] IRanges_1.22.6 > [34] BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] AnnotationForge_1.6.1 BBmisc_1.6 > [3] BatchJobs_1.2 BiocInstaller_1.14.2 > [5] BiocParallel_0.6.1 Formula_1.1-1 > [7] Gviz_1.8.3 Hmisc_3.14-4 > [9] KernSmooth_2.23-12 MASS_7.3-33 > [11] Matrix_1.1-4 R.methodsS3_1.6.1 > [13] RColorBrewer_1.0-5 RCurl_1.95-4.1 > [15] VariantAnnotation_1.10.1 XML_3.98-1.1 > [17] affy_1.42.2 affyio_1.32.0 > [19] amap_0.8-12 annotate_1.42.0 > [21] base64_1.1 beanplot_1.1 > [23] biomaRt_2.20.0 biovizBase_1.12.1 > [25] bitops_1.0-6 brew_1.0-6 > [27] bumphunter_1.4.2 caTools_1.17 > [29] codetools_0.2-8 colorspace_1.2-4 > [31] dichromat_2.0-0 digest_0.6.4 > [33] doRNG_1.6 edgeR_3.6.2 > [35] fail_1.2 foreach_1.4.2 > [37] gdata_2.13.3 genefilter_1.46.1 > [39] geneplotter_1.42.0 genoset_1.16.2 > [41] gplots_2.13.0 gtools_3.4.1 > [43] illuminaio_0.6.0 iterators_1.0.7 > [45] lattice_0.20-29 latticeExtra_0.6-26 > [47] locfit_1.5-9.1 matrixStats_0.10.0 > [49] mclust_4.3 methylumi_2.10.0 > [51] minfi_1.10.2 multtest_2.20.0 > [53] munsell_0.4.2 nleqslv_2.2 > [55] nor1mix_1.1-4 pkgmaker_0.22 > [57] plyr_1.8.1 preprocessCore_1.26.1 > [59] registry_0.2 reshape_0.8.5 > [61] rngtools_1.2.4 rtracklayer_1.24.1 > [63] scales_0.2.4 sendmailR_1.1-2 > [65] siggenes_1.38.0 stats4_3.1.1 > [67] stringr_0.6.2 tools_3.1.1 > [69] xtable_1.7-3 zlibbioc_1.10.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.7 years ago Ryan C. Thompson ★ 7.9k

Login before adding your answer.