Limma Dual Color Agilent : Mixed model or not?
1
0
Entering edit mode
@gordon-smyth
Last seen just now
WEHI, Melbourne, Australia
Dear Guillaume, Not all questions have brief answers! However see the short paper http://www.statsci.org/smyth/pubs/ISI2005-116.pdf which is referenced on the help page for lmscFit. The paragraph on page 2 starting "Replicated arrays" explains why I would recommend a standard log-ratio analysis for the model you have fitted. In general, the separate channel analysis gives more DE genes than a standard log-ratio analysis because it makes more model assumptions (of common intra-spot correlation) and recovers information from the A-values which is otherwise ignored. An additional reason in your case is that you are imputing data for the separate channel analysis, so there are more non NA data values being used than for the log-ratio analysis. For the model you have fitted, there is actually no information in the A-values to recover, so I would not do it. Your targets frame seems to include other variables (RepNum and Polarity) not used in the analysis, but I assume you have good reasons to ignore them. Best wishes Gordon > Date: Tue, 29 Nov 2011 16:10:02 +0100 > From: Guillaume Meurice <guillaume.meurice at="" igr.fr=""> > To: Bioconductor mailing list <bioconductor at="" r-project.org=""> > Subject: [BioC] Limma Dual Color Agilent : Mixed model or not ? > Content-Type: text/plain > > Dear all, > > I have the project with the following design (dual color, with dye swap): > > Cy5 Cy3 Status RepNum Polarity > R1 SI31_vs_R1 CTRL+ R1_SI31 R1_CTRL R 1 + > R1 SI31_vs_R1 CTRL- R1_CTRL R1_SI31 R 1 - > R2 SI31_vs_R2 CTRL+ R2_SI31 R2_CTRL R 2 + > R2 SI31_vs_R2 CTRL- R2_CTRL R2_SI31 R 2 - > S1 SI31_vs_S1 CTRL+ S1_SI31 S1_CTRL S 1 + > S1 SI31_vs_S1 CTRL- S1_CTRL S1_SI31 S 1 - > S3 SI31_vs_S3 CTRL+ S3_SI31 S3_CTRL S 3 + > S3 SI31_vs_S3 CTRL- S3_CTRL S3_SI31 S 3 - > R3 SI31_vs_R3 CTRL+ R3_SI31 R3_CTRL R 3 + > R3 SI31_vs_R3 CTRL- R3_CTRL R3_SI31 R 3 - > S2 SI31_vs_S2 CTRL+ S2_SI31 S2_CTRL S 2 + > S2 SI31_vs_S2 CTRL- S2_CTRL S2_SI31 S 2 - > > > I want to compare R versus S. > > The first thing I've done is the following (where X is my processed log2(ratio) matrix): > ========================================================== > status = factor(target$Status) > dye = rep(c(1,-1),times = 6) > design = model.matrix(~0+dye+status) > colnames(design) = c("Dye","R","S") > fit <- lmFit(X, design) > cont.matrix <- makeContrasts("R-S",levels=design) > fit2 <- contrasts.fit(fit, cont.matrix) > fit2 <- eBayes(fit2) > res1 = topTable(fit2, adjust="BH", number = Inf, coef="R-S") > ========================================================== > > which raise 93 transcript significantly differentially expressed. > > > > > Then, I've tried to use the mixed model approach which raise about 4000 transcripts : > > ========================================================== > MAmixed = normalizeBetweenArrays(RG,method = "Aquantile") > # replace NA using KNN > MnoNa = gestionNA_knn(MAmixed$M) > AnoNa = gestionNA_knn(MAmixed$A) > > MAmixed$M = MnoNa > MAmixed$A = AnoNa > > t2 = targetsA2C(target) > u <- unique(t2$Status) > f <- factor(t2$Status, levels=u) > design <- model.matrix(~0+f) > colnames(design) <- u > corfit <- intraspotCorrelation(MAmixed, design) > fit <- lmscFit(MAmixed, design, correlation=corfit$consensus) > cont.matrix <- makeContrasts("R-S",levels=design) > fit2 <- contrasts.fit(fit, cont.matrix) > fit2 <- eBayes(fit2) > res2 = topTable(fit2, adjust="BH", number = Inf) > ========================================================== > > > > > I was wondering which approach better fit the design of this project. > > I would also be very grateful if you could (briefly) explain why there is such a difference ? > > Many thanks by advance. > -- > Guillaume ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
limma limma • 1.3k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen just now
WEHI, Melbourne, Australia
Dear Guillaume, My apologies, I seem to have misread your experimental design. Reading your reply and looking more closely at your targets frame, I now see that your design is a two-group problem with replicate-specific controls and technical dye-swaps. Right? Let's leave the dye-swaps aside for the moment. From the point of view of a separate channel analysis, the two-group problem with replicate- specific controls is similar to a common reference. Extra information can be recovered from the A-values, but the amount of information is not large, up to about 8% of the log-ratio information (see bottom of second page of the ISI 2005 paper). However the technical dye-swaps are an issue that needs to addressed. The lmscFit() analysis "thinks" you have 12 biological replicates in total, whereas you actually only have six. Hence I would recommend you do a log-ratio analysis, and handle the technical replicates using the duplicateCorrelation() method outlined in the middle of page 39 of the limma User's Guide (Section 8.2 on technical replication). Best wishes Gordon On Thu, 1 Dec 2011, Guillaume Meurice wrote: > Dear Gordon, > > >> Not all questions have brief answers! However see the short paper >> >> http://www.statsci.org/smyth/pubs/ISI2005-116.pdf >> >> which is referenced on the help page for lmscFit. The paragraph on >> page 2 starting "Replicated arrays" explains why I would recommend a >> standard log-ratio analysis for the model you have fitted. > > many thanks for your answer. The paper you've pointed seems to answer my > questions (even if I need to go further into mathematical formula ^^). > > So you would recommand a standard log-ratio approach, even in case of > the unconnected design ? The RNA sources is not the same for all > replicated array (3 biological replicats for each cell lines) : > > R.mut vs R.Ctrl (3 biological replicats, with dye-swap) > S.mut vs S.Ctrl (3 biological replicats, with dye-swap) > > I want to have the DEG for the following contrast : > > (R.mut - R.Ctrl) - (S.mut - S.Ctrl) > > Since I was facing an unconnected design, I though that is was better to > use mixed model, but I agree that log-ratio approach is much > straightforward. > >> In general, the separate channel analysis gives more DE genes than a >> standard log-ratio analysis because it makes more model assumptions (of >> common intra-spot correlation) and recovers information from the >> A-values which is otherwise ignored. An additional reason in your case >> is that you are imputing data for the separate channel analysis, so >> there are more non NA data values being used than for the log-ratio >> analysis. For the model you have fitted, there is actually no >> information in the A-values to recover, so I would not do it. > > For the mixed model, I was misataken. So I've compared the R.mut vs > R.ctrl (3 replicates) using log-ratio approach and the same contrast > using log-intensity approach. The second approach raise a few more DEG > >> Your targets frame seems to include other variables (RepNum and >> Polarity) not used in the analysis, but I assume you have good reasons >> to ignore them. > > totally, RepNum stand for the replicat number (3 for cell line R and 3 > for cell line S), and polarity is useful to identify dye-swap arrays. > > Best whishes > -- > Guillaume > > >> >> Best wishes >> Gordon >> >>> Date: Tue, 29 Nov 2011 16:10:02 +0100 >>> From: Guillaume Meurice <guillaume.meurice at="" igr.fr=""> >>> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >>> Subject: [BioC] Limma Dual Color Agilent : Mixed model or not ? >>> Content-Type: text/plain >>> >>> Dear all, >>> >>> I have the project with the following design (dual color, with dye swap): >>> >>> Cy5 Cy3 Status RepNum Polarity >>> R1 SI31_vs_R1 CTRL+ R1_SI31 R1_CTRL R 1 + >>> R1 SI31_vs_R1 CTRL- R1_CTRL R1_SI31 R 1 - >>> R2 SI31_vs_R2 CTRL+ R2_SI31 R2_CTRL R 2 + >>> R2 SI31_vs_R2 CTRL- R2_CTRL R2_SI31 R 2 - >>> S1 SI31_vs_S1 CTRL+ S1_SI31 S1_CTRL S 1 + >>> S1 SI31_vs_S1 CTRL- S1_CTRL S1_SI31 S 1 - >>> S3 SI31_vs_S3 CTRL+ S3_SI31 S3_CTRL S 3 + >>> S3 SI31_vs_S3 CTRL- S3_CTRL S3_SI31 S 3 - >>> R3 SI31_vs_R3 CTRL+ R3_SI31 R3_CTRL R 3 + >>> R3 SI31_vs_R3 CTRL- R3_CTRL R3_SI31 R 3 - >>> S2 SI31_vs_S2 CTRL+ S2_SI31 S2_CTRL S 2 + >>> S2 SI31_vs_S2 CTRL- S2_CTRL S2_SI31 S 2 - >>> >>> >>> I want to compare R versus S. >>> >>> The first thing I've done is the following (where X is my processed log2(ratio) matrix): >>> ========================================================== >>> status = factor(target$Status) >>> dye = rep(c(1,-1),times = 6) >>> design = model.matrix(~0+dye+status) >>> colnames(design) = c("Dye","R","S") >>> fit <- lmFit(X, design) >>> cont.matrix <- makeContrasts("R-S",levels=design) >>> fit2 <- contrasts.fit(fit, cont.matrix) >>> fit2 <- eBayes(fit2) >>> res1 = topTable(fit2, adjust="BH", number = Inf, coef="R-S") >>> ========================================================== >>> >>> which raise 93 transcript significantly differentially expressed. >>> >>> >>> >>> >>> Then, I've tried to use the mixed model approach which raise about 4000 transcripts : >>> >>> ========================================================== >>> MAmixed = normalizeBetweenArrays(RG,method = "Aquantile") >>> # replace NA using KNN >>> MnoNa = gestionNA_knn(MAmixed$M) >>> AnoNa = gestionNA_knn(MAmixed$A) >>> >>> MAmixed$M = MnoNa >>> MAmixed$A = AnoNa >>> >>> t2 = targetsA2C(target) >>> u <- unique(t2$Status) >>> f <- factor(t2$Status, levels=u) >>> design <- model.matrix(~0+f) >>> colnames(design) <- u >>> corfit <- intraspotCorrelation(MAmixed, design) >>> fit <- lmscFit(MAmixed, design, correlation=corfit$consensus) >>> cont.matrix <- makeContrasts("R-S",levels=design) >>> fit2 <- contrasts.fit(fit, cont.matrix) >>> fit2 <- eBayes(fit2) >>> res2 = topTable(fit2, adjust="BH", number = Inf) >>> ========================================================== >>> >>> >>> >>> >>> I was wondering which approach better fit the design of this project. >>> >>> I would also be very grateful if you could (briefly) explain why there is such a difference ? >>> >>> Many thanks by advance. >>> -- >>> Guillaume ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
> Dear Guillaume, > > My apologies, I seem to have misread your experimental design. Reading your reply and looking more closely at your targets frame, I now see that your design is a two-group problem with replicate- specific controls and technical dye-swaps. Right? You're totally right. > Let's leave the dye-swaps aside for the moment. From the point of view of a separate channel analysis, the two-group problem with replicate-specific controls is similar to a common reference. Extra information can be recovered from the A-values, but the amount of information is not large, up to about 8% of the log-ratio information (see bottom of second page of the ISI 2005 paper). I was wondering how did you assess this amount of information of 8 % ? > However the technical dye-swaps are an issue that needs to addressed. The lmscFit() analysis "thinks" you have 12 biological replicates in total, whereas you actually only have six. Hence I would recommend you do a log-ratio analysis, and handle the technical replicates using the duplicateCorrelation() method outlined in the middle of page 39 of the limma User's Guide (Section 8.2 on technical replication). Perfect, this is exactly what I've done using your previous advices. Many thanks again. Best regards -- Guillaume > > Best wishes > Gordon > > On Thu, 1 Dec 2011, Guillaume Meurice wrote: > >> Dear Gordon, >> >> >>> Not all questions have brief answers! However see the short paper >>> >>> http://www.statsci.org/smyth/pubs/ISI2005-116.pdf >>> >>> which is referenced on the help page for lmscFit. The paragraph on page 2 starting "Replicated arrays" explains why I would recommend a standard log-ratio analysis for the model you have fitted. >> >> many thanks for your answer. The paper you've pointed seems to answer my questions (even if I need to go further into mathematical formula ^^). >> >> So you would recommand a standard log-ratio approach, even in case of the unconnected design ? The RNA sources is not the same for all replicated array (3 biological replicats for each cell lines) : >> >> R.mut vs R.Ctrl (3 biological replicats, with dye-swap) >> S.mut vs S.Ctrl (3 biological replicats, with dye-swap) >> >> I want to have the DEG for the following contrast : >> >> (R.mut - R.Ctrl) - (S.mut - S.Ctrl) >> >> Since I was facing an unconnected design, I though that is was better to use mixed model, but I agree that log-ratio approach is much straightforward. >> >>> In general, the separate channel analysis gives more DE genes than a standard log-ratio analysis because it makes more model assumptions (of common intra-spot correlation) and recovers information from the A-values which is otherwise ignored. An additional reason in your case is that you are imputing data for the separate channel analysis, so there are more non NA data values being used than for the log-ratio analysis. For the model you have fitted, there is actually no information in the A-values to recover, so I would not do it. >> >> For the mixed model, I was misataken. So I've compared the R.mut vs R.ctrl (3 replicates) using log-ratio approach and the same contrast using log-intensity approach. The second approach raise a few more DEG >> >>> Your targets frame seems to include other variables (RepNum and Polarity) not used in the analysis, but I assume you have good reasons to ignore them. >> >> totally, RepNum stand for the replicat number (3 for cell line R and 3 for cell line S), and polarity is useful to identify dye-swap arrays. >> >> Best whishes >> -- >> Guillaume >> >> >>> >>> Best wishes >>> Gordon >>> >>>> Date: Tue, 29 Nov 2011 16:10:02 +0100 >>>> From: Guillaume Meurice <guillaume.meurice at="" igr.fr=""> >>>> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> >>>> Subject: [BioC] Limma Dual Color Agilent : Mixed model or not ? >>>> Content-Type: text/plain >>>> >>>> Dear all, >>>> >>>> I have the project with the following design (dual color, with dye swap): >>>> >>>> Cy5 Cy3 Status RepNum Polarity >>>> R1 SI31_vs_R1 CTRL+ R1_SI31 R1_CTRL R 1 + >>>> R1 SI31_vs_R1 CTRL- R1_CTRL R1_SI31 R 1 - >>>> R2 SI31_vs_R2 CTRL+ R2_SI31 R2_CTRL R 2 + >>>> R2 SI31_vs_R2 CTRL- R2_CTRL R2_SI31 R 2 - >>>> S1 SI31_vs_S1 CTRL+ S1_SI31 S1_CTRL S 1 + >>>> S1 SI31_vs_S1 CTRL- S1_CTRL S1_SI31 S 1 - >>>> S3 SI31_vs_S3 CTRL+ S3_SI31 S3_CTRL S 3 + >>>> S3 SI31_vs_S3 CTRL- S3_CTRL S3_SI31 S 3 - >>>> R3 SI31_vs_R3 CTRL+ R3_SI31 R3_CTRL R 3 + >>>> R3 SI31_vs_R3 CTRL- R3_CTRL R3_SI31 R 3 - >>>> S2 SI31_vs_S2 CTRL+ S2_SI31 S2_CTRL S 2 + >>>> S2 SI31_vs_S2 CTRL- S2_CTRL S2_SI31 S 2 - >>>> >>>> >>>> I want to compare R versus S. >>>> >>>> The first thing I've done is the following (where X is my processed log2(ratio) matrix): >>>> ========================================================== >>>> status = factor(target$Status) >>>> dye = rep(c(1,-1),times = 6) >>>> design = model.matrix(~0+dye+status) >>>> colnames(design) = c("Dye","R","S") >>>> fit <- lmFit(X, design) >>>> cont.matrix <- makeContrasts("R-S",levels=design) >>>> fit2 <- contrasts.fit(fit, cont.matrix) >>>> fit2 <- eBayes(fit2) >>>> res1 = topTable(fit2, adjust="BH", number = Inf, coef="R-S") >>>> ========================================================== >>>> >>>> which raise 93 transcript significantly differentially expressed. >>>> >>>> >>>> >>>> >>>> Then, I've tried to use the mixed model approach which raise about 4000 transcripts : >>>> >>>> ========================================================== >>>> MAmixed = normalizeBetweenArrays(RG,method = "Aquantile") >>>> # replace NA using KNN >>>> MnoNa = gestionNA_knn(MAmixed$M) >>>> AnoNa = gestionNA_knn(MAmixed$A) >>>> >>>> MAmixed$M = MnoNa >>>> MAmixed$A = AnoNa >>>> >>>> t2 = targetsA2C(target) >>>> u <- unique(t2$Status) >>>> f <- factor(t2$Status, levels=u) >>>> design <- model.matrix(~0+f) >>>> colnames(design) <- u >>>> corfit <- intraspotCorrelation(MAmixed, design) >>>> fit <- lmscFit(MAmixed, design, correlation=corfit$consensus) >>>> cont.matrix <- makeContrasts("R-S",levels=design) >>>> fit2 <- contrasts.fit(fit, cont.matrix) >>>> fit2 <- eBayes(fit2) >>>> res2 = topTable(fit2, adjust="BH", number = Inf) >>>> ========================================================== >>>> >>>> >>>> >>>> >>>> I was wondering which approach better fit the design of this project. >>>> >>>> I would also be very grateful if you could (briefly) explain why there is such a difference ? >>>> >>>> Many thanks by advance. >>>> -- >>>> Guillaume > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}
ADD REPLY

Login before adding your answer.

Traffic: 532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6