Batch effect in limma
2
0
Entering edit mode
Yuan Hao ▴ 240
@yuan-hao-3658
Last seen 10.3 years ago
United States
Dear list, I got 12 Affymetrix arrays for 4 RNA samples, 3 replicates(from 3 batches individually) for each sample. I want to look at the differential expression between these samples. While after clustering, we found obvious batch effect within the data, so we decided to add the batch effect into the linear model. I set up the design matrix as followings, but we can't find any differentially expressed gene that were expected, so would someone help me to have a look whether there is any problem with my design and contrast matrix? Thank you very much in advance: Array Sample Batch 1 -/- 3 2 +/+ 3 3 -/+ 1 4 +/+ 1 5 -/+ 3 6 +/- 3 7 -/- 1 8 +/- 1 9 -/- 2 10 +/- 2 11 +/+ 2 12 -/+ 2 > S <- c("-/-","+/+","-/+","+/+","-/ +","+/-","-/-","+/-","-/-","+/-","+/+","-/+") > S <- factor(S, level = c("+/+","-/+","+/-","-/-")) > B <- c(3,3,1,1,3,3,1,1,2,2,2,2) > B <- factor(B, level = c(1,2,3)) > design <- model.matrix(~0+S+B) > colnames(design)<- c("sample1","sample2","sample3","sample4","Batch3","Batch2") > design sample1 sample2 sample3 sample4 Batch3 Batch2 1 0 0 0 1 1 0 2 1 0 0 0 1 0 3 0 1 0 0 0 0 4 1 0 0 0 0 0 5 0 1 0 0 1 0 6 0 0 1 0 1 0 7 0 0 0 1 0 0 8 0 0 1 0 0 0 9 0 0 0 1 0 1 10 0 0 1 0 0 1 11 1 0 0 0 0 1 12 0 1 0 0 0 1 attr(,"assign") [1] 1 1 1 1 2 2 attr(,"contrasts") attr(,"contrasts")$TS [1] "contr.treatment" attr(,"contrasts")$BE [1] "contr.treatment" > fit<-lmFit(eset.gcrma,design) > contrast.matrix<-makeContrasts(a=sample1-sample2,b=sample1- sample3,c=sample2-sample4,levels=design) > contrast.matrix Contrasts Levels a b c sample1 1 1 0 sample2 -1 0 1 sample3 0 -1 0 sample4 0 0 -1 Batch3 0 0 0 Batch2 0 0 0 > fit2<-contrasts.fit(fit,contrast.matrix) > fit2<-eBayes(fit2) Cheers, Yuan
Clustering Clustering • 2.8k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States
Hi Yuan, Yuan Hao wrote: > Dear list, > > I got 12 Affymetrix arrays for 4 RNA samples, 3 replicates(from 3 > batches individually) for each sample. I want to look at the > differential expression between these samples. While after clustering, > we found obvious batch effect within the data, so we decided to add the > batch effect into the linear model. I set up the design matrix as > followings, but we can't find any differentially expressed gene that > were expected, so would someone help me to have a look whether there is > any problem with my design and contrast matrix? Thank you very much in > advance: I don't see any problems here. When you say you can't find any differentially expressed gene that were expected, is that to imply that you _did_ find differentially expressed genes, but not those that you expected? Or do you mean that there are no significantly differentially expressed genes? Also note that you will need to specify a contrast with topTable() if you want the genes for a particular comparison. Best, Jim > > Array Sample Batch > 1 -/- 3 > 2 +/+ 3 > 3 -/+ 1 > 4 +/+ 1 > 5 -/+ 3 > 6 +/- 3 > 7 -/- 1 > 8 +/- 1 > 9 -/- 2 > 10 +/- 2 > 11 +/+ 2 > 12 -/+ 2 > > > S <- > c("-/-","+/+","-/+","+/+","-/+","+/-","-/-","+/-","-/-","+/-","+/+", "-/+") > > S <- factor(S, level = c("+/+","-/+","+/-","-/-")) > > B <- c(3,3,1,1,3,3,1,1,2,2,2,2) > > B <- factor(B, level = c(1,2,3)) > > design <- model.matrix(~0+S+B) > > > colnames(design)<-c("sample1","sample2","sample3","sample4","Batch3" ,"Batch2") > > > design > > sample1 sample2 sample3 sample4 Batch3 Batch2 > > 1 0 0 0 1 1 0 > > 2 1 0 0 0 1 0 > > 3 0 1 0 0 0 0 > > 4 1 0 0 0 0 0 > > 5 0 1 0 0 1 0 > > 6 0 0 1 0 1 0 > > 7 0 0 0 1 0 0 > > 8 0 0 1 0 0 0 > > 9 0 0 0 1 0 1 > > 10 0 0 1 0 0 1 > > 11 1 0 0 0 0 1 > > 12 0 1 0 0 0 1 > > attr(,"assign") > > [1] 1 1 1 1 2 2 > > attr(,"contrasts") > > attr(,"contrasts")$TS > > [1] "contr.treatment" > > > > attr(,"contrasts")$BE > > [1] "contr.treatment" > > > fit<-lmFit(eset.gcrma,design) > > > > contrast.matrix<-makeContrasts(a=sample1-sample2,b=sample1-sample3,c =sample2-sample4,levels=design) > > > > contrast.matrix > > Contrasts > > Levels a b c > > sample1 1 1 0 > > sample2 -1 0 1 > > sample3 0 -1 0 > > sample4 0 0 -1 > > Batch3 0 0 0 > > Batch2 0 0 0 > > > fit2<-contrasts.fit(fit,contrast.matrix) > > > fit2<-eBayes(fit2) > > > Cheers, > Yuan > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States
Please don't take things off-list! Yuan Hao wrote: > Hi James, > > Thank you very much for your reply! I got absolutely no significantly > differentially expressed genes at all! If by looking at the vennDiagram, > you can only see zero in all of the three contrasts, which is totally > against our expectation, so I am wondering whether it is the problem of > my design and/contrast matrices. You can see from my contrast.matrix, I > saw lines "Batch3" and "Batch2" are all '0' in there, which make me feel > wondering whether or not the model taking consideration of the batch > effect? The model is defined by your design matrix, not the contrasts matrix. By fitting the model you have defined here, what you are saying is that the batch effect is simply a difference in the mean expression for the samples in different batches, which can be accounted for by subtracting the differences between batch 2 and 1 from batch 2, and the differences between batch 3 and 1 from batch 3. Since the model has removed the differences between batches, the contrasts matrix doesn't need to do anything with the batch coefficients. The assumption of different means might not be a valid assumption, however. It might be that the different batches have different means as well as different variances. In this case you would want to account for both by fitting a linear mixed model, using the duplicateCorrelation function to estimate the correlation structure. If you search the archives of this list for something like 'batch' and 'duplicateCorrelation' you should be able to find an indication how to do this. I know Gordon has written more than one email on the subject. There may also be something in the limma User's Guide. Best, Jim > > I used another method "ComBat" to try to get rid of the batch effect. > After that, I got about 2,000 DE genes for my contrast A(in total there > are over 50,000 genes), around 500 DE genes for contrast C, but still > nothing for contrast B, which is against our expectation. I checked the > top 10 adj.p.value in this case for contrast B, and found all of them > are around 0.07 and the weird thing is they are exactly the same. > > Cheers, > Yuan > > > On 3 Sep 2009, at 14:27, James W. MacDonald wrote: > >> Hi Yuan, >> >> Yuan Hao wrote: >>> Dear list, >>> I got 12 Affymetrix arrays for 4 RNA samples, 3 replicates(from 3 >>> batches individually) for each sample. I want to look at the >>> differential expression between these samples. While after >>> clustering, we found obvious batch effect within the data, so we >>> decided to add the batch effect into the linear model. I set up the >>> design matrix as followings, but we can't find any differentially >>> expressed gene that were expected, so would someone help me to have a >>> look whether there is any problem with my design and contrast matrix? >>> Thank you very much in advance: >> >> I don't see any problems here. When you say you can't find any >> differentially expressed gene that were expected, is that to imply >> that you _did_ find differentially expressed genes, but not those that >> you expected? Or do you mean that there are no significantly >> differentially expressed genes? >> >> Also note that you will need to specify a contrast with topTable() if >> you want the genes for a particular comparison. >> >> >> Best, >> >> Jim >> >> >>> Array Sample Batch >>> 1 -/- 3 >>> 2 +/+ 3 >>> 3 -/+ 1 >>> 4 +/+ 1 >>> 5 -/+ 3 >>> 6 +/- 3 >>> 7 -/- 1 >>> 8 +/- 1 >>> 9 -/- 2 >>> 10 +/- 2 >>> 11 +/+ 2 >>> 12 -/+ 2 >>> > S <- >>> c("-/-","+/+","-/+","+/+","-/+","+/-","-/-","+/-","-/-","+/-","+/+ ","-/+") >>> >>> > S <- factor(S, level = c("+/+","-/+","+/-","-/-")) >>> > B <- c(3,3,1,1,3,3,1,1,2,2,2,2) >>> > B <- factor(B, level = c(1,2,3)) >>> > design <- model.matrix(~0+S+B) >>> > >>> colnames(design)<-c("sample1","sample2","sample3","sample4","Batch 3","Batch2") >>> > design >>> sample1 sample2 sample3 sample4 Batch3 Batch2 >>> 1 0 0 0 1 1 0 >>> 2 1 0 0 0 1 0 >>> 3 0 1 0 0 0 0 >>> 4 1 0 0 0 0 0 >>> 5 0 1 0 0 1 0 >>> 6 0 0 1 0 1 0 >>> 7 0 0 0 1 0 0 >>> 8 0 0 1 0 0 0 >>> 9 0 0 0 1 0 1 >>> 10 0 0 1 0 0 1 >>> 11 1 0 0 0 0 1 >>> 12 0 1 0 0 0 1 >>> attr(,"assign") >>> [1] 1 1 1 1 2 2 >>> attr(,"contrasts") >>> attr(,"contrasts")$TS >>> [1] "contr.treatment" >>> attr(,"contrasts")$BE >>> [1] "contr.treatment" >>> > fit<-lmFit(eset.gcrma,design) >>> > >>> contrast.matrix<-makeContrasts(a=sample1-sample2,b=sample1-sample3 ,c=sample2-sample4,levels=design) >>> > contrast.matrix >>> Contrasts >>> Levels a b c >>> sample1 1 1 0 >>> sample2 -1 0 1 >>> sample3 0 -1 0 >>> sample4 0 0 -1 >>> Batch3 0 0 0 >>> Batch2 0 0 0 >>> > fit2<-contrasts.fit(fit,contrast.matrix) >>> > fit2<-eBayes(fit2) >>> Cheers, >>> Yuan >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Douglas Lab >> University of Michigan >> Department of Human Genetics >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD COMMENT

Login before adding your answer.

Traffic: 624 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6