limma - Considering Batch Effect

0

Entering edit mode

Nathan Haigh ▴ 120

@nathan-haigh-2307

Last seen 10.6 years ago

I know this topic has arisen a few times from searching the mail archive, but I'm still not clear on what/how to do this. Here is my experimental setup: Array Param1 Param2 Rep Batch 1 H 4 1 1 2 H 4 2 2 3 H 4 3 2 4 H 20 1 1 5 H 20 2 2 6 H 20 3 2 7 H 37 1 1 8 H 37 2 2 9 H 37 3 2 10 L 4 1 1 11 L 4 2 2 12 L 4 3 2 13 L 20 1 1 14 L 20 2 2 15 L 20 3 2 16 L 37 1 1 17 L 37 2 2 18 L 37 3 2 All rep 1's were processed in batch 1 and reps 2 and 3 were processes in batch 2. From using GeneSpring, and conducting a cluster analysis on experimental parameters, there is a clear batch effect. In GeneSpring, I normalised the arrays, such that gene1 from array1 was divided by the median of gene1 over arrays from batch1. Likewise, gene1 from array2 was divided by the median of gene1 across arrays in batch2. These successfully removed the batch effect. I'd like to be able to remove the batch effect doing something similar, or accounting for it in limma. In essence, I want to remove the variability in the reps caused by the batch effect. Could anyone explain how this can/should be done? Thanks Nathan

limma GeneSpring limma GeneSpring • 1.5k views

ADD COMMENT • link updated 17.7 years ago by James W. MacDonald 68k • written 17.7 years ago by Nathan Haigh ▴ 120

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Hi Nathan, Nathan Haigh wrote: > I know this topic has arisen a few times from searching the mail > archive, but I'm still not clear on what/how to do this. > > Here is my experimental setup: > Array Param1 Param2 Rep Batch > 1 H 4 1 1 > 2 H 4 2 2 > 3 H 4 3 2 > 4 H 20 1 1 > 5 H 20 2 2 > 6 H 20 3 2 > 7 H 37 1 1 > 8 H 37 2 2 > 9 H 37 3 2 > 10 L 4 1 1 > 11 L 4 2 2 > 12 L 4 3 2 > 13 L 20 1 1 > 14 L 20 2 2 > 15 L 20 3 2 > 16 L 37 1 1 > 17 L 37 2 2 > 18 L 37 3 2 > > All rep 1's were processed in batch 1 and reps 2 and 3 were processes in > batch 2. From using GeneSpring, and conducting a cluster analysis on > experimental parameters, there is a clear batch effect. In GeneSpring, I > normalised the arrays, such that gene1 from array1 was divided by the > median of gene1 over arrays from batch1. Likewise, gene1 from array2 was > divided by the median of gene1 across arrays in batch2. These > successfully removed the batch effect. I'd like to be able to remove the > batch effect doing something similar, or accounting for it in limma. > > In essence, I want to remove the variability in the reps caused by the > batch effect. > > Could anyone explain how this can/should be done? Sure. Just add the Batch effect to your design matrix. For instance, if you want to model the differences in Param1, you would do something like: > design <- model.matrix(~0+factor(rep(1:2, each=9))+factor(rep(rep(1:2,1:2),6))) > colnames(design) <- c("H","L","Batch") > design H L Batch 1 1 0 0 2 1 0 1 3 1 0 1 4 1 0 0 5 1 0 1 6 1 0 1 7 1 0 0 8 1 0 1 9 1 0 1 10 0 1 0 11 0 1 1 12 0 1 1 13 0 1 0 14 0 1 1 15 0 1 1 16 0 1 0 17 0 1 1 18 0 1 1 attr(,"assign") [1] 1 1 2 attr(,"contrasts") attr(,"contrasts")$`factor(rep(1:2, each = 9))` [1] "contr.treatment" attr(,"contrasts")$`factor(rep(rep(1:2, 1:2), 6))` [1] "contr.treatment" Then you can do the contrast between H and L. Best, Jim > Thanks > Nathan > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, MS Biostatistician UMCCC cDNA and Affymetrix Core University of Michigan 1500 E Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD COMMENT • link 17.7 years ago James W. MacDonald 68k

0

Entering edit mode

James MacDonald wrote: > Hi Nathan, > > > Nathan Haigh wrote: >> I know this topic has arisen a few times from searching the mail >> archive, but I'm still not clear on what/how to do this. >> >> Here is my experimental setup: >> Array Param1 Param2 Rep Batch >> 1 H 4 1 1 >> 2 H 4 2 2 >> 3 H 4 3 2 >> 4 H 20 1 1 >> 5 H 20 2 2 >> 6 H 20 3 2 >> 7 H 37 1 1 >> 8 H 37 2 2 >> 9 H 37 3 2 >> 10 L 4 1 1 >> 11 L 4 2 2 >> 12 L 4 3 2 >> 13 L 20 1 1 >> 14 L 20 2 2 >> 15 L 20 3 2 >> 16 L 37 1 1 >> 17 L 37 2 2 >> 18 L 37 3 2 >> >> All rep 1's were processed in batch 1 and reps 2 and 3 were processes in >> batch 2. From using GeneSpring, and conducting a cluster analysis on >> experimental parameters, there is a clear batch effect. In GeneSpring, I >> normalised the arrays, such that gene1 from array1 was divided by the >> median of gene1 over arrays from batch1. Likewise, gene1 from array2 was >> divided by the median of gene1 across arrays in batch2. These >> successfully removed the batch effect. I'd like to be able to remove the >> batch effect doing something similar, or accounting for it in limma. >> >> In essence, I want to remove the variability in the reps caused by the >> batch effect. >> >> Could anyone explain how this can/should be done? > > Sure. Just add the Batch effect to your design matrix. For instance, > if you want to model the differences in Param1, you would do something > like: > > > design <- model.matrix(~0+factor(rep(1:2, > each=9))+factor(rep(rep(1:2,1:2),6))) > > colnames(design) <- c("H","L","Batch") > > design > H L Batch > 1 1 0 0 > 2 1 0 1 > 3 1 0 1 > 4 1 0 0 > 5 1 0 1 > 6 1 0 1 > 7 1 0 0 > 8 1 0 1 > 9 1 0 1 > 10 0 1 0 > 11 0 1 1 > 12 0 1 1 > 13 0 1 0 > 14 0 1 1 > 15 0 1 1 > 16 0 1 0 > 17 0 1 1 > 18 0 1 1 > attr(,"assign") > [1] 1 1 2 > attr(,"contrasts") > attr(,"contrasts")$`factor(rep(1:2, each = 9))` > [1] "contr.treatment" > > attr(,"contrasts")$`factor(rep(rep(1:2, 1:2), 6))` > [1] "contr.treatment" > > Then you can do the contrast between H and L. > > Best, > > Jim > > > Hi Jim, Thanks very much for your help! I'm still trying to get my head around the creation of the design matrix - it doesn't seem too intuitive to a newbie. Although I'm sure all will become clear as soon as the penny drops! Just to clarify something about the experimental design and hopefully help me understand the construction of the design matrix. Param2 is a temperature treatment, while Param1 is a "sample type". So it makes sense to do contrasts between H.20 and H.4 to get cold-shock genes in H samples, H.20 and H.37 to get heat-shock genes in H samples. How would I set up the matrix for this? Also, in the context of this experimental design, what does a contrast between H and L actually show? Cheers Nathan

ADD REPLY • link 17.7 years ago Nathan Haigh ▴ 120

0

Entering edit mode

Hi Nathan, Nathan S. Haigh wrote: > James MacDonald wrote: >> Hi Nathan, >> >> >> Nathan Haigh wrote: >>> I know this topic has arisen a few times from searching the mail >>> archive, but I'm still not clear on what/how to do this. >>> >>> Here is my experimental setup: >>> Array Param1 Param2 Rep Batch >>> 1 H 4 1 1 >>> 2 H 4 2 2 >>> 3 H 4 3 2 >>> 4 H 20 1 1 >>> 5 H 20 2 2 >>> 6 H 20 3 2 >>> 7 H 37 1 1 >>> 8 H 37 2 2 >>> 9 H 37 3 2 >>> 10 L 4 1 1 >>> 11 L 4 2 2 >>> 12 L 4 3 2 >>> 13 L 20 1 1 >>> 14 L 20 2 2 >>> 15 L 20 3 2 >>> 16 L 37 1 1 >>> 17 L 37 2 2 >>> 18 L 37 3 2 >>> >>> All rep 1's were processed in batch 1 and reps 2 and 3 were processes in >>> batch 2. From using GeneSpring, and conducting a cluster analysis on >>> experimental parameters, there is a clear batch effect. In GeneSpring, I >>> normalised the arrays, such that gene1 from array1 was divided by the >>> median of gene1 over arrays from batch1. Likewise, gene1 from array2 was >>> divided by the median of gene1 across arrays in batch2. These >>> successfully removed the batch effect. I'd like to be able to remove the >>> batch effect doing something similar, or accounting for it in limma. >>> >>> In essence, I want to remove the variability in the reps caused by the >>> batch effect. >>> >>> Could anyone explain how this can/should be done? >> Sure. Just add the Batch effect to your design matrix. For instance, >> if you want to model the differences in Param1, you would do something >> like: >> >>> design <- model.matrix(~0+factor(rep(1:2, >> each=9))+factor(rep(rep(1:2,1:2),6))) >>> colnames(design) <- c("H","L","Batch") >>> design >> H L Batch >> 1 1 0 0 >> 2 1 0 1 >> 3 1 0 1 >> 4 1 0 0 >> 5 1 0 1 >> 6 1 0 1 >> 7 1 0 0 >> 8 1 0 1 >> 9 1 0 1 >> 10 0 1 0 >> 11 0 1 1 >> 12 0 1 1 >> 13 0 1 0 >> 14 0 1 1 >> 15 0 1 1 >> 16 0 1 0 >> 17 0 1 1 >> 18 0 1 1 >> attr(,"assign") >> [1] 1 1 2 >> attr(,"contrasts") >> attr(,"contrasts")$`factor(rep(1:2, each = 9))` >> [1] "contr.treatment" >> >> attr(,"contrasts")$`factor(rep(rep(1:2, 1:2), 6))` >> [1] "contr.treatment" >> >> Then you can do the contrast between H and L. >> >> Best, >> >> Jim >> >> >> > > Hi Jim, > > Thanks very much for your help! I'm still trying to get my head around > the creation of the design matrix - it doesn't seem too intuitive to a > newbie. Although I'm sure all will become clear as soon as the penny drops! You really need to read the Limma User's Guide. In addition, you might look at a linear modeling textbook. This is a perennial subject on this listserv, as setting up a design matrix (especially for 2-color stuff) is non-trivial if you know nothing of linear modeling. > > Just to clarify something about the experimental design and hopefully > help me understand the construction of the design matrix. Param2 is a > temperature treatment, while Param1 is a "sample type". So it makes > sense to do contrasts between H.20 and H.4 to get cold-shock genes in H > samples, H.20 and H.37 to get heat-shock genes in H samples. How would I > set up the matrix for this? Also, in the context of this experimental > design, what does a contrast between H and L actually show? It gets pretty complicated if you set up a design matrix with both Param1 and Param2 and a batch effect. It would take more time and effort than I am willing to expend to figure out the contrasts matrix. A simpler (if slightly less powerful) approach if you just want to compare the H samples is to set up your design matrix with just Param2 and a batch effect, and then you can simply compare things directly. Something like > Param2 <- factor(rep(c(4,20,37), each=3)) > batch <- factor(rep(rep(1:2,1:2),3)) > design <- model.matrix(~0+Param2+batch) > design Param24 Param220 Param237 batch2 1 1 0 0 0 2 1 0 0 1 3 1 0 0 1 4 0 1 0 0 5 0 1 0 1 6 0 1 0 1 7 0 0 1 0 8 0 0 1 1 9 0 0 1 1 attr(,"assign") [1] 1 1 1 2 attr(,"contrasts") attr(,"contrasts")$Param2 [1] "contr.treatment" attr(,"contrasts")$batch [1] "contr.treatment" Then your contrasts matrix would be > contrast <- makeContrasts(Param220-Param24, Param220-Param237, levels=design) > contrast Contrasts Levels Param220 - Param24 Param220 - Param237 Param24 -1 0 Param220 1 1 Param237 0 -1 batch2 0 0 The comparison of H and L that I mentioned in the previous email would simply tell you the difference between the H and L samples without regard to the Param2. In other words, you would be fitting a t-test comparing all H samples to all L samples (adjusted for batch). Best, Jim > > Cheers > Nathan > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD REPLY • link 17.7 years ago James W. MacDonald 68k

Login before adding your answer.