Question

Please advice

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 4.0 years ago

United States

Dear Mali, The reasons that we advise people to keep their questions on the mailing list are that there are more people available to assist you with your questions and everyone can benefit from the answers. I do not have time to go through every contrast that you set up. The treatment matrix including the batch is correct. The first few contrasts are correct. The batch effect removes the mean value for each batch. I did not check the second analysis. Regards, Naomi At 07:05 AM 3/7/2011, mali salmon wrote: >Dear Naomi >First I would like to apologize for writing you directly. >My name is Mali Salmon, I am a researcher in Tel Hashomer hospital in Israel. >I am currently analyzing microarray data using limma package, and I >would like to have some advice from a statistician who has an >experience with microarray analysis using Limma and bioconductor. >I was searching the mailing list, and saw your name as one who >assist many people with their analysis. >I have a simple factorial design which I build the design and >contrast matrix for, and I just want to be sure that what I did is >correct, I would be very grateful is you could give me your opinion. > >I have two factors, one is strain with three levels (C,D,S), and a >treatment factor with two levels: treated (DSR) and untreated (NO), >I also have batch effect which I included in the matrix > > > targets > > FileName Treatment Strain batch >1 NO2_C NO C 1 >2 DSR2_C DSR C 1 >3 NO2_D NO D 1 >4 DSR2_D DSR D 1 >5 NO2_S NO S 1 >6 DSR2_S DSR S 1 >7 NO3_C NO C 2 >8 DSR3_C DSR C 2 >9 NO3_D NO D 2 >10 DSR3_D DSR D 2 >11 NO3_S NO S 2 >12 DSR3_S DSR S 2 > >I am trying to follow example 8.7 in the user guide (factorial design) >In order to build the design matrix this is what I did: > >TS <- paste(targets$Treatment, targets$Strain, sep=".") >TS <- factor(TS, unique(TS)) >design<-model.matrix(~0+TS) >colnames(design) <- levels(TS) >batch<-c(0,0,0,0,0,0,1,1,1,1,1,1) >design<-cbind(design,batch) > > > design > NO.C DSR.C NO.D DSR.D NO.S DSR.S batch >1 1 0 0 0 0 0 0 >2 0 1 0 0 0 0 0 >3 0 0 1 0 0 0 0 >4 0 0 0 1 0 0 0 >5 0 0 0 0 1 0 0 >6 0 0 0 0 0 1 0 >7 1 0 0 0 0 0 1 >8 0 1 0 0 0 0 1 >9 0 0 1 0 0 0 1 >10 0 0 0 1 0 0 1 >11 0 0 0 0 1 0 1 >12 0 0 0 0 0 1 1 > > >Now the questions I would like to answer are: >1. what is the main effect of the strain (here I want to ignore the >treatment, just to find the difference between different strains) >2. what is the main effect of the treatment (genes that change their >expression because of treatment, again I want to ignore the strain here). >3. genes that are respond to treatment in each strain + interaction > >Below is how I created the contrast.matrix, the first three >contrasts are to answer the first question, contrast number 4 is to >answer question 2, and the last 6 contrasts to answer question 3 > >cont.matrix <- makeContrasts( > DvsC=(DSR.D+NO.D)-(DSR.C+NO.C), > > SvsC=(DSR.S+NO.S)-(DSR.C+NO.C), > DvsS=(DSR.D+NO.D)-(DSR.S+NO.S), > DSRvNO=(DSR.C+DSR.D+DSR.S)-(NO.C+NO.D+NO.S), > C.DSRvsNO=DSR.C-NO.C, > D.DSRvsNO=DSR.D-NO.D, > S.DSRvsNO=DSR.S-NO.S, > DiffDvsC=(DSR.D-NO.D)-(DSR.C-NO.C), > DiffSvsC=(DSR.S-NO.S)-(DSR.C-NO.C), > DiffSvsD=(DSR.S-NO.S)-(DSR.D-NO.D), > levels=design) >fit2 <- contrasts.fit(fit, cont.matrix) >fit2 <- eBayes(fit2) > >Is this matrix correct? >The batch is included in the design matrix, is that mean that the >batch will be removed? > >One of the comparisons I am interested with is the different between >strains (first question). Would you suggest to do a separate >"several group" analysis (example 8.6 in limma user guide) instead >of the one indicated above? >The analysis for answer the first question would be: > >design.Strain <- model.matrix(~ 0+factor(c(1,1,2,2,3,3,1,1,2,2,3,3))) >colnames(design.Strain) <- c("C", "D", "S") >#correct for batch effect >batch<-c(0,0,0,0,0,0,1,1,1,1,1,1) >design.Strain<-cbind(design.Strain,batch) >fit.Strain <- lmFit(selDataMatrix, design.Strain) >contrast.matrix.Strain <- makeContrasts(D-C, S-D, S-C, levels=design.Strain) >fit2.Strain <- contrasts.fit(fit.Strain, contrast.matrix.Strain) >fit2.Strain <- eBayes(fit2.Strain) > >I have tried the two approaches, and I got different list of >differential expressed genes, so which one is preferred? > >Looking forward to your reply, and apologising for bothering you >Thank you >Mali > >

Microarray GO limma Microarray GO limma • 1.3k views

ADD COMMENT • link updated 14.1 years ago by mali salmon ▴ 370 • written 14.1 years ago by Naomi Altman ★ 6.0k

score 0 · Answer 1 · 2011-03-09

Dear Mali, I have a bit more time now. I think I answered your first question in my last email. Limma does not directly do a "main effects" and interactions type of F-test, although you can can force the issue by using the F.p.value and several different sets of contrasts. The second approach is not valid. The t-test denominator will be incorrect because you did not account for batch and treatment. Best wishes, Naomi Regarding At 07:05 AM 3/7/2011, you wrote: >Dear Naomi >First I would like to apologize for writing you directly. >My name is Mali Salmon, I am a researcher in Tel Hashomer hospital in Israel. >I am currently analyzing microarray data using limma package, and I >would like to have some advice from a statistician who has an >experience with microarray analysis using Limma and bioconductor. >I was searching the mailing list, and saw your name as one who >assist many people with their analysis. >I have a simple factorial design which I build the design and >contrast matrix for, and I just want to be sure that what I did is >correct, I would be very grateful is you could give me your opinion. > >I have two factors, one is strain with three levels (C,D,S), and a >treatment factor with two levels: treated (DSR) and untreated (NO), >I also have batch effect which I included in the matrix > > > targets > > FileName Treatment Strain batch >1 NO2_C NO C 1 >2 DSR2_C DSR C 1 >3 NO2_D NO D 1 >4 DSR2_D DSR D 1 >5 NO2_S NO S 1 >6 DSR2_S DSR S 1 >7 NO3_C NO C 2 >8 DSR3_C DSR C 2 >9 NO3_D NO D 2 >10 DSR3_D DSR D 2 >11 NO3_S NO S 2 >12 DSR3_S DSR S 2 > >I am trying to follow example 8.7 in the user guide (factorial design) >In order to build the design matrix this is what I did: > >TS <- paste(targets$Treatment, targets$Strain, sep=".") >TS <- factor(TS, unique(TS)) >design<-model.matrix(~0+TS) >colnames(design) <- levels(TS) >batch<-c(0,0,0,0,0,0,1,1,1,1,1,1) >design<-cbind(design,batch) > > > design > NO.C DSR.C NO.D DSR.D NO.S DSR.S batch >1 1 0 0 0 0 0 0 >2 0 1 0 0 0 0 0 >3 0 0 1 0 0 0 0 >4 0 0 0 1 0 0 0 >5 0 0 0 0 1 0 0 >6 0 0 0 0 0 1 0 >7 1 0 0 0 0 0 1 >8 0 1 0 0 0 0 1 >9 0 0 1 0 0 0 1 >10 0 0 0 1 0 0 1 >11 0 0 0 0 1 0 1 >12 0 0 0 0 0 1 1 > > >Now the questions I would like to answer are: >1. what is the main effect of the strain (here I want to ignore the >treatment, just to find the difference between different strains) >2. what is the main effect of the treatment (genes that change their >expression because of treatment, again I want to ignore the strain here). >3. genes that are respond to treatment in each strain + interaction > >Below is how I created the contrast.matrix, the first three >contrasts are to answer the first question, contrast number 4 is to >answer question 2, and the last 6 contrasts to answer question 3 > >cont.matrix <- makeContrasts( > DvsC=(DSR.D+NO.D)-(DSR.C+NO.C), > > SvsC=(DSR.S+NO.S)-(DSR.C+NO.C), > DvsS=(DSR.D+NO.D)-(DSR.S+NO.S), > DSRvNO=(DSR.C+DSR.D+DSR.S)-(NO.C+NO.D+NO.S), > C.DSRvsNO=DSR.C-NO.C, > D.DSRvsNO=DSR.D-NO.D, > S.DSRvsNO=DSR.S-NO.S, > DiffDvsC=(DSR.D-NO.D)-(DSR.C-NO.C), > DiffSvsC=(DSR.S-NO.S)-(DSR.C-NO.C), > DiffSvsD=(DSR.S-NO.S)-(DSR.D-NO.D), > levels=design) >fit2 <- contrasts.fit(fit, cont.matrix) >fit2 <- eBayes(fit2) > >Is this matrix correct? >The batch is included in the design matrix, is that mean that the >batch will be removed? > >One of the comparisons I am interested with is the different between >strains (first question). Would you suggest to do a separate >"several group" analysis (example 8.6 in limma user guide) instead >of the one indicated above? >The analysis for answer the first question would be: > >design.Strain <- model.matrix(~ 0+factor(c(1,1,2,2,3,3,1,1,2,2,3,3))) >colnames(design.Strain) <- c("C", "D", "S") >#correct for batch effect >batch<-c(0,0,0,0,0,0,1,1,1,1,1,1) >design.Strain<-cbind(design.Strain,batch) >fit.Strain <- lmFit(selDataMatrix, design.Strain) >contrast.matrix.Strain <- makeContrasts(D-C, S-D, S-C, levels=design.Strain) >fit2.Strain <- contrasts.fit(fit.Strain, contrast.matrix.Strain) >fit2.Strain <- eBayes(fit2.Strain) > >I have tried the two approaches, and I got different list of >differential expressed genes, so which one is preferred? > >Looking forward to your reply, and apologising for bothering you >Thank you >Mali > >

score 0 · Answer 2 · 2011-03-10

Wow thanks Naomi for the clear explanation, Best regards Mali On Thu, Mar 10, 2011 at 1:07 PM, Naomi Altman <naomi@stat.psu.edu> wrote: > Dear Mali, > I do not know how much statistics you know. In a regular ANOVA routine, it > would not make much difference whether you include batch in the design > matrix (as a fixed effect) or as a variance component (random effect). In > Limma, it does make a difference, because Limma makes the assumption that > the intraclass correlation is the same for every gene. You will not get the > same p-values, but there is really no statistical way to tell which model is > "better". > > I do not use decideTests. Since you started by fitting the full model and > then fit the contrasts (which is how I do it, too) you can include as many > contrasts as you want. The t-tests for the contrasts use the MSE from the > full model and give the same t-values and p-values regardless of how many > you include. Usually, we do multiple comparisons adjustments for each > contrast separately, so that results do not depend on which contrasts you > compute. If you do some type of ensemble adjustment for multiple > comparisons, it might make a difference to which genes are selected. > > Regards, > Naomi > > > > At 12:36 AM 3/10/2011, you wrote: > > Thanks a lot Naomi for your help. There are two more things that I'm not > sure about regarding the batch effect. Is it enough to include the batch in > the design matrix for it to be corrected, or do I need to correct it using > "duplicateCorrelation" (fit <- lmFit(..., block=..., > correlation=dupcor$consensus)? > Also in few posts in the mailing lists there is a recommendation to include > in the contrast matrix only those comparisons one is really interested with. > What is the effect of including more/unnecessary contrasts? would it change > something to the results I get from decideTests? > Thanks again > Mali > > On Thu, Mar 10, 2011 at 3:04 AM, Naomi Altman <naomi@stat.psu.edu> wrote: > Dear Mali, > I have a bit more time now. I think I answered your first question in my > last email. Limma does not directly do a "main effects" and interactions > type of F-test, although you can can force the issue by using the F.p.value > and several different sets of contrasts. > > The second approach is not valid. The t-test denominator will be incorrect > because you did not account for batch and treatment. > > Best wishes, > Naomi > > > Regarding > > > At 07:05 AM 3/7/2011, you wrote: > Dear Naomi > First I would like to apologize for writing you directly. > My name is Mali Salmon, I am a researcher in Tel Hashomer hospital in > Israel. > I am currently analyzing microarray data using limma package, and I would > like to have some advice from a statistician who has an experience with > microarray analysis using Limma and bioconductor. > I was searching the mailing list, and saw your name as one who assist many > people with their analysis. > I have a simple factorial design which I build the design and contrast > matrix for, and I just want to be sure that what I did is correct, I would > be very grateful is you could give me your opinion. > > I have two factors, one is strain with three levels (C,D,S), and a > treatment factor with two levels: treated (DSR) and untreated (NO), I also > have batch effect which I included in the matrix > > > targets > > FileName Treatment Strain batch > 1 NO2_C NO C 1 > 2 DSR2_C DSR C 1 > 3 NO2_D NO D 1 > 4 DSR2_D DSR D 1 > 5 NO2_S NO S 1 > 6 DSR2_S DSR S 1 > 7 NO3_C NO C 2 > 8 DSR3_C DSR C 2 > 9 NO3_D NO D 2 > 10 DSR3_D DSR D 2 > 11 NO3_S NO S 2 > 12 DSR3_S DSR S 2 > > I am trying to follow example 8.7 in the user guide (factorial design) > In order to build the design matrix this is what I did: > > TS <- paste(targets$Treatment, targets$Strain, sep=".") > TS <- factor(TS, unique(TS)) > design<-model.matrix(~0+TS) > colnames(design) <- levels(TS) > batch<-c(0,0,0,0,0,0,1,1,1,1,1,1) > design<-cbind(design,batch) > > > design > NO.C DSR.C NO.D DSR.D NO.S DSR.S batch > 1 1 0 0 0 0 0 0 > 2 0 1 0 0 0 0 0 > 3 0 0 1 0 0 0 0 > 4 0 0 0 1 0 0 0 > 5 0 0 0 0 1 0 0 > 6 0 0 0 0 0 1 0 > 7 1 0 0 0 0 0 1 > 8 0 1 0 0 0 0 1 > 9 0 0 1 0 0 0 1 > 10 0 0 0 1 0 0 1 > 11 0 0 0 0 1 0 1 > 12 0 0 0 0 0 1 1 > > > Now the questions I would like to answer are: > 1. what is the main effect of the strain (here I want to ignore the > treatment, just to find the difference between different strains) > 2. what is the main effect of the treatment (genes that change their > expression because of treatment, again I want to ignore the strain here). > 3. genes that are respond to treatment in each strain + interaction > > Below is how I created the contrast.matrix, the first three contrasts are > to answer the first question, contrast number 4 is to answer question 2, and > the last 6 contrasts to answer question 3 > > cont.matrix <- makeContrasts( > DvsC=(DSR.D+NO.D)-(DSR.C+NO.C), > > SvsC=(DSR.S+NO.S)-(DSR.C+NO.C), > DvsS=(DSR.D+NO.D)-(DSR.S+NO.S), > DSRvNO=(DSR.C+DSR.D+DSR.S)-(NO.C+NO.D+NO.S), > C.DSRvsNO=DSR.C-NO.C, > D.DSRvsNO=DSR.D-NO.D, > S.DSRvsNO=DSR.S-NO.S, > DiffDvsC=(DSR.D-NO.D)-(DSR.C-NO.C), > DiffSvsC=(DSR.S-NO.S)-(DSR.C-NO.C), > DiffSvsD=(DSR.S-NO.S)-(DSR.D-NO.D), > levels=design) > fit2 <- contrasts.fit(fit, cont.matrix) > fit2 <- eBayes(fit2) > > Is this matrix correct? > The batch is included in the design matrix, is that mean that the batch > will be removed? > > One of the comparisons I am interested with is the different between > strains (first question). Would you suggest to do a separate "several group" > analysis (example 8.6 in limma user guide) instead of the one indicated > above? > The analysis for answer the first question would be: > > design.Strain <- model.matrix(~ 0+factor(c(1,1,2,2,3,3,1,1,2,2,3,3))) > colnames(design.Strain) <- c("C", "D", "S") > #correct for batch effect > batch<-c(0,0,0,0,0,0,1,1,1,1,1,1) > design.Strain<-cbind(design.Strain,batch) > fit.Strain <- lmFit(selDataMatrix, design.Strain) > contrast.matrix.Strain <- makeContrasts(D-C, S-D, S-C, > levels=design.Strain) > fit2.Strain <- contrasts.fit(fit.Strain, contrast.matrix.Strain) > fit2.Strain <- eBayes(fit2.Strain) > > I have tried the two approaches, and I got different list of differential > expressed genes, so which one is preferred? > > Looking forward to your reply, and apologising for bothering you > Thank you > Mali > > > > > > > [[alternative HTML version deleted]]