Limma User's Guide Example of design matrices

0

Entering edit mode

Mike White ▴ 30

@mike-white-1682

Last seen 10.2 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060426/ 5a014050/attachment.pl

• 775 views

ADD COMMENT • link updated 18.6 years ago by kfbargad@ehu.es ▴ 270 • written 18.6 years ago by Mike White ▴ 30

0

Entering edit mode

kfbargad@ehu.es ▴ 270

@kfbargadehues-1528

Last seen 10.2 years ago

Dear Mike, I will leave the explanations to the experts, but as a beginner I found two books very useful in understanding linear models and contrasts: Introductory Statistics with R, by Peter Dalgaard and Design and Analysis of experiments, by Douglas C Montgomery HTH, David > I am working my way through the Limma User's Guide and had a question > about the design matrices for the example in section 8.4 (2 groups, > same reference). > I understand the difference between the two design matrices in terms > of what you can extract directly from the linear model and what has > to be obtained by contrasts and how you directly construct the > matrices using cbind as in the manual. I have two questions, one of > which may trivial (i.e., stupid), and the other not. I will preface > this by admitting that my knowledge of statistics beyond the very > basics is relatively weak. > > The non-trivial question: > > I realize that more than one design matrix can be set up to analyze > the same set of data (as in the example), and that similar results > should be obtainable with each design. If you are eventually > obtaining the same information from each design (i.e., identifying > differentially expressed genes) what is the benefit of one design > over the other- could one design produce a different level of > statistical confidence that a given set of genes is differentially > regulated? Is there any rule of thumb for choosing one design matrix > over another? > > The trivial (?) question > > I set up the two types of design matrices using the factor Group and > the model.matrix function as in the manual: > > > Group-> factor(c("WT","WT","MU","MU","MU"),levels=c("WT","MU")) > > Group > [1] WT WT MU MU MU > Levels: WT MU > > design-> model.matrix(~Group) > > design > (Intercept) GroupMU > 1 1 0 > 2 1 0 > 3 1 1 > 4 1 1 > 5 1 1 > attr(,"assign") > [1] 0 1 > attr(,"contrasts") > attr(,"contrasts")$Group > [1] "contr.treatment" > > > design2-> model.matrix(~0+Group) > > design2 > GroupWT GroupMU > 1 1 0 > 2 1 0 > 3 0 1 > 4 0 1 > 5 0 1 > attr(,"assign") > [1] 1 1 > attr(,"contrasts") > attr(,"contrasts")$Group > [1] "contr.treatment" > > > I have not been able to find a clear explanation of what the tilde > (~) does in model.matrix to produce the design matrix, especially in > the context of "~0+Group." Any idea as to where I can get an > explanation of how this works? (The 2445-page R manual wasn't any > help!). > > Thanks for you help! > > Mike White > > > > Michael M. White, Ph.D. > Department of Pharmacology & Physiology > MS #488 > Drexel University College of Medicine > 245 N. 15th Street > Philadelphia, PA 19102-1192 > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 18.6 years ago kfbargad@ehu.es ▴ 270

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Mike White wrote: > I am working my way through the Limma User's Guide and had a question > about the design matrices for the example in section 8.4 (2 groups, > same reference). > I understand the difference between the two design matrices in terms > of what you can extract directly from the linear model and what has > to be obtained by contrasts and how you directly construct the > matrices using cbind as in the manual. I have two questions, one of > which may trivial (i.e., stupid), and the other not. I will preface > this by admitting that my knowledge of statistics beyond the very > basics is relatively weak. > > The non-trivial question: > > I realize that more than one design matrix can be set up to analyze > the same set of data (as in the example), and that similar results > should be obtainable with each design. If you are eventually > obtaining the same information from each design (i.e., identifying > differentially expressed genes) what is the benefit of one design > over the other- could one design produce a different level of > statistical confidence that a given set of genes is differentially > regulated? Is there any rule of thumb for choosing one design matrix > over another? The results will be the same for any reasonably specified design matrix. However, what the resulting parameter estimates are estimating and how you make comparisons will be different. Really, the only rule of thumb that I know is to use whatever design matrix makes the most sense to you. For instance, I almost always use a cell means model (design matrix without an intercept term). The downside of doing that is you cannot make any comparisons without specifying contrasts (which you might be able to do with a factor effects model, where there is an intercept). The upside for me is that I don't have to figure out each time which level is being used as the baseline. As an example, using the two design matrices below, the first model is a factor effects model where WT is used as the baseline, so the second coefficient gives the difference between MU and WT. For this you don't need a contrast, and for this simple comparison it is probably easier. If you had two factors and were interested in the interaction, then you would have to do the algebra to figure out the contrasts. The second model simply computes the mean for each factor level, (hence, cell means model) so you have to explicitly compute the contrast of interest. However, in this case it would be easier (IMO) to figure out an interaction if you have two factors. > > The trivial (?) question > > I set up the two types of design matrices using the factor Group and > the model.matrix function as in the manual: > > > Group-> factor(c("WT","WT","MU","MU","MU"),levels=c("WT","MU")) > > Group > [1] WT WT MU MU MU > Levels: WT MU > > design-> model.matrix(~Group) > > design > (Intercept) GroupMU > 1 1 0 > 2 1 0 > 3 1 1 > 4 1 1 > 5 1 1 > attr(,"assign") > [1] 0 1 > attr(,"contrasts") > attr(,"contrasts")$Group > [1] "contr.treatment" > > > design2-> model.matrix(~0+Group) > > design2 > GroupWT GroupMU > 1 1 0 > 2 1 0 > 3 0 1 > 4 0 1 > 5 0 1 > attr(,"assign") > [1] 1 1 > attr(,"contrasts") > attr(,"contrasts")$Group > [1] "contr.treatment" > > > I have not been able to find a clear explanation of what the tilde > (~) does in model.matrix to produce the design matrix, especially in > the context of "~0+Group." Any idea as to where I can get an > explanation of how this works? (The 2445-page R manual wasn't any > help!). The tilde is used to specify a model, separating the right hand side (explanatory variables) from the left hand side (dependent variable). So if you were fitting a model as above, but for just one gene, you would do something like lm(gene_expression_values ~ Group) However, when you are using model.matrix, you are only specifying the right hand side of that equation (e.g., the design matrix), so you just use the tilde followed by your explanatory variables. As for '~ 0 + Group' versus '~ Group', the first instance means that you don't want an intercept term, whereas the second means you do (as that is the default). For a more complete explanation, see ?formula. Best, Jim > > Thanks for you help! > > Mike White > > > > Michael M. White, Ph.D. > Department of Pharmacology & Physiology > MS #488 > Drexel University College of Medicine > 245 N. 15th Street > Philadelphia, PA 19102-1192 > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 18.6 years ago James W. MacDonald 67k

Login before adding your answer.