unbalanced factorial design

0

Entering edit mode

Christian Landry ▴ 30

@christian-landry-620

Last seen 10.2 years ago

Hi, I am analyzing microarray data from a factorial design, with treatments and genotypes. I would like to know if some of you have experience with Bioconductors packages doing Mixed-Model Anova that deal with unbalanced design, i.e. not the same number of replicates for each treatment- genotype combination. I am particularly interested in the interaction between treatment and genotype so I know that the design is an issue. Any references? Thanks in advance, Christian

Microarray Microarray • 1.6k views

ADD COMMENT • link updated 20.8 years ago by Kenny Ye ▴ 100 • written 20.8 years ago by Christian Landry ▴ 30

0

Entering edit mode

Kenny Ye ▴ 100

@kenny-ye-92

Last seen 10.2 years ago

I am not sure about if bioconductor includes any functions for mixed-effect models. there are several packages in R handles mixed- effect models, the most complete one is nlme. But SAS Proc MIXED probably is the better way to go. in my opinion, slightly non-balancing does not affect your inference very much. A good reference is Milliken and Johnson, Analysis of Messy Data, Volumn I; Kenny Kenny Ye Assistant Professor Department of Applied Math and Statistics SUNY at Stony Brook Stony Brook, New York 11794-3600 Phone (631)632-9344 Fax (631)632-8490 On Wed, 4 Feb 2004, Christian Landry wrote: > Hi, > > I am analyzing microarray data from a factorial design, with treatments and > genotypes. I would like to know if some of you have experience with > Bioconductors packages doing Mixed-Model Anova that deal with unbalanced > design, i.e. not the same number of replicates for each treatment- genotype > combination. I am particularly interested in the interaction between > treatment and genotype so I know that the design is an issue. Any references? > > Thanks in advance, > > Christian > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.8 years ago Kenny Ye ▴ 100

0

Entering edit mode

> > I am not sure about if bioconductor includes any functions for > mixed-effect models. there are several packages in R handles mixed- effect > models, the most complete one is nlme. it is not too difficult to run gene-specific mixed effects models using the combination of esApply (in Biobase) and lme (in nlme). the non-trivial part is to properly specify the function (esApply parameter FUN) to invoke through esApply. the design will be derivable from information in the phenoData component. all variables in phenoData are visible to the FUN for esApply, so the model formula can be specified fairly naturally, thanks to the environment manipulations provided in esApply (by RG). with appropriately structured experimental designs in which expression might vary smoothly but nonlinearly as a function of some design variable, nlme models may be of interest to fit through esApply as well. so the question "does bioconductor include functions for ... modeling" often has a negative answer -- we don't aim to have functions for all conceivable approaches to modeling bioinformatic data. we prefer to have interfaces that allow existing functions in R to be reused conveniently and at the option of the analyst, in the bioinformatic context.

ADD REPLY • link 20.8 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

I find that the simplest thing to do is to write my own function that includes the appropriate call to lme. That way I do not need to worry about grabbing components from complicated objects and passing arguments to lme. I do have to write my own calling function for each experiment, but that takes only a few minutes. --Naomi At 10:15 PM 2/4/2004, Vincent Carey 525-2265 wrote: > > > > I am not sure about if bioconductor includes any functions for > > mixed-effect models. there are several packages in R handles mixed-effect > > models, the most complete one is nlme. > >it is not too difficult to run gene-specific mixed >effects models using the combination of esApply (in >Biobase) and lme (in nlme). the non-trivial part is >to properly specify the function (esApply parameter FUN) >to invoke through esApply. the design will be derivable from >information in the phenoData component. all variables >in phenoData are visible to the FUN for esApply, so the >model formula can be specified fairly naturally, thanks >to the environment manipulations provided in esApply >(by RG). > >with appropriately structured experimental designs in >which expression might vary smoothly but nonlinearly >as a function of some design variable, nlme models may >be of interest to fit through esApply as well. > >so the question "does bioconductor include functions >for ... modeling" often has a negative answer -- we don't >aim to have functions for all conceivable approaches to >modeling bioinformatic data. we prefer to have interfaces >that allow existing functions in R to be reused conveniently >and at the option of the analyst, in the bioinformatic context. > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 20.8 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Running lme on a single probe set takes about 20 minutes computing time on my PC. I'm running R on windows, which I know can run into memory management problems, but this problem appears to be completely cpu bound. The model that I'm fitting has repeated measures, 6 Visits per Subject, for 20 Subjects. Subjects have a Treatment attribute and I'm interested in the fully saturated model with Visit, Treatment and interaction effects. The call to lme() I use is something like this: lme(fixed= Expr ~ Visit:Treatment, random= ~ Visit | Subject) The results appear ok, but it takes 20 minutes to run. Am I doing something wrong? Can you all use lme() on 20,000 probe sets and live to talk about it? Thanks for any insight into this problem. -francois --- Naomi Altman <naomi@stat.psu.edu> wrote: > I find that the simplest thing to do is to write my > own function that > includes the appropriate call to lme. That way I > do not need to worry > about grabbing components from complicated objects > and passing arguments to > lme. I do have to write my own calling function > for each experiment, but > that takes only a few minutes. > > --Naomi > > At 10:15 PM 2/4/2004, Vincent Carey 525-2265 wrote: > > > > > > > I am not sure about if bioconductor includes any > functions for > > > mixed-effect models. there are several packages > in R handles mixed-effect > > > models, the most complete one is nlme. > > > >it is not too difficult to run gene-specific mixed > >effects models using the combination of esApply (in > >Biobase) and lme (in nlme). the non-trivial part > is > >to properly specify the function (esApply parameter > FUN) > >to invoke through esApply. the design will be > derivable from > >information in the phenoData component. all > variables > >in phenoData are visible to the FUN for esApply, so > the > >model formula can be specified fairly naturally, > thanks > >to the environment manipulations provided in > esApply > >(by RG). > > > >with appropriately structured experimental designs > in > >which expression might vary smoothly but > nonlinearly > >as a function of some design variable, nlme models > may > >be of interest to fit through esApply as well. > > > >so the question "does bioconductor include > functions > >for ... modeling" often has a negative answer -- we > don't > >aim to have functions for all conceivable > approaches to > >modeling bioinformatic data. we prefer to have > interfaces > >that allow existing functions in R to be reused > conveniently > >and at the option of the analyst, in the > bioinformatic context. > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > Naomi S. Altman > 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics > 814-863-7114 (fax) > Penn State University > 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD REPLY • link 20.8 years ago Francois Collin ▴ 130

0

Entering edit mode

believe that it is CPU bound, although the algorithm in nlme is very computational intensive. but i was surprised that it takes 20 minutes for such a small problem. you might want to try SAS PROC MIXED and see how long it takes. If you send me your Rdata file, i will try it on my machine. Kenny On Wed, 25 Feb 2004, Francois Collin wrote: > Running lme on a single probe set takes about 20 > minutes computing time on my PC. I'm running R on > windows, which I know can run into memory management > problems, but this problem appears to be completely > cpu bound. > > The model that I'm fitting has repeated measures, 6 > Visits per Subject, for 20 Subjects. Subjects have a > Treatment attribute and I'm interested in the fully > saturated model with Visit, Treatment and interaction > effects. The call to lme() I use is something like > this: > > lme(fixed= Expr ~ Visit:Treatment, > random= ~ Visit | Subject) > > The results appear ok, but it takes 20 minutes to run. > Am I doing something wrong? Can you all use lme() on > 20,000 probe sets and live to talk about it? > > Thanks for any insight into this problem. > > -francois > > > --- Naomi Altman <naomi@stat.psu.edu> wrote: > > I find that the simplest thing to do is to write my > > own function that > > includes the appropriate call to lme. That way I > > do not need to worry > > about grabbing components from complicated objects > > and passing arguments to > > lme. I do have to write my own calling function > > for each experiment, but > > that takes only a few minutes. > > > > --Naomi > > > > At 10:15 PM 2/4/2004, Vincent Carey 525-2265 wrote: > > > > > > > > > > I am not sure about if bioconductor includes any > > functions for > > > > mixed-effect models. there are several packages > > in R handles mixed-effect > > > > models, the most complete one is nlme. > > > > > >it is not too difficult to run gene-specific mixed > > >effects models using the combination of esApply (in > > >Biobase) and lme (in nlme). the non-trivial part > > is > > >to properly specify the function (esApply parameter > > FUN) > > >to invoke through esApply. the design will be > > derivable from > > >information in the phenoData component. all > > variables > > >in phenoData are visible to the FUN for esApply, so > > the > > >model formula can be specified fairly naturally, > > thanks > > >to the environment manipulations provided in > > esApply > > >(by RG). > > > > > >with appropriately structured experimental designs > > in > > >which expression might vary smoothly but > > nonlinearly > > >as a function of some design variable, nlme models > > may > > >be of interest to fit through esApply as well. > > > > > >so the question "does bioconductor include > > functions > > >for ... modeling" often has a negative answer -- we > > don't > > >aim to have functions for all conceivable > > approaches to > > >modeling bioinformatic data. we prefer to have > > interfaces > > >that allow existing functions in R to be reused > > conveniently > > >and at the option of the analyst, in the > > bioinformatic context. > > > > > >_______________________________________________ > > >Bioconductor mailing list > > >Bioconductor@stat.math.ethz.ch > > > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > Naomi S. Altman > > 814-865-3791 (voice) > > Associate Professor > > Bioinformatics Consulting Center > > Dept. of Statistics > > 814-863-7114 (fax) > > Penn State University > > 814-865-1348 (Statistics) > > University Park, PA 16802-2111 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 20.8 years ago Kenny Ye ▴ 100

0

Entering edit mode

Can you all use lme() on 20,000 probe sets and live to talk about it? It depends on how much patience you have! I have e-talked with Doug Bates about this. He says that a much faster version of lme will soon be available. But lme can be very slow. I generally break the data up into smaller sets and run 1 set per night. If it takes 20 minutes per run, you need to consider using a multiprocessor unix system and splitting the sets among the systems. --Naomi At 03:13 PM 2/25/2004, Francois Collin wrote: >Running lme on a single probe set takes about 20 >minutes computing time on my PC. I'm running R on >windows, which I know can run into memory management >problems, but this problem appears to be completely >cpu bound. > >The model that I'm fitting has repeated measures, 6 >Visits per Subject, for 20 Subjects. Subjects have a >Treatment attribute and I'm interested in the fully >saturated model with Visit, Treatment and interaction >effects. The call to lme() I use is something like >this: > >lme(fixed= Expr ~ Visit:Treatment, > random= ~ Visit | Subject) > >The results appear ok, but it takes 20 minutes to run. > Am I doing something wrong? Can you all use lme() on >20,000 probe sets and live to talk about it? > >Thanks for any insight into this problem. > >-francois > > >--- Naomi Altman <naomi@stat.psu.edu> wrote: > > I find that the simplest thing to do is to write my > > own function that > > includes the appropriate call to lme. That way I > > do not need to worry > > about grabbing components from complicated objects > > and passing arguments to > > lme. I do have to write my own calling function > > for each experiment, but > > that takes only a few minutes. > > > > --Naomi > > > > At 10:15 PM 2/4/2004, Vincent Carey 525-2265 wrote: > > > > > > > > > > I am not sure about if bioconductor includes any > > functions for > > > > mixed-effect models. there are several packages > > in R handles mixed-effect > > > > models, the most complete one is nlme. > > > > > >it is not too difficult to run gene-specific mixed > > >effects models using the combination of esApply (in > > >Biobase) and lme (in nlme). the non-trivial part > > is > > >to properly specify the function (esApply parameter > > FUN) > > >to invoke through esApply. the design will be > > derivable from > > >information in the phenoData component. all > > variables > > >in phenoData are visible to the FUN for esApply, so > > the > > >model formula can be specified fairly naturally, > > thanks > > >to the environment manipulations provided in > > esApply > > >(by RG). > > > > > >with appropriately structured experimental designs > > in > > >which expression might vary smoothly but > > nonlinearly > > >as a function of some design variable, nlme models > > may > > >be of interest to fit through esApply as well. > > > > > >so the question "does bioconductor include > > functions > > >for ... modeling" often has a negative answer -- we > > don't > > >aim to have functions for all conceivable > > approaches to > > >modeling bioinformatic data. we prefer to have > > interfaces > > >that allow existing functions in R to be reused > > conveniently > > >and at the option of the analyst, in the > > bioinformatic context. > > > > > >_______________________________________________ > > >Bioconductor mailing list > > >Bioconductor@stat.math.ethz.ch > > > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > Naomi S. Altman > > 814-865-3791 (voice) > > Associate Professor > > Bioinformatics Consulting Center > > Dept. of Statistics > > 814-863-7114 (fax) > > Penn State University > > 814-865-1348 (Statistics) > > University Park, PA 16802-2111 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 20.8 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Naomi Altman <naomi@stat.psu.edu> writes: > Can you all use lme() on 20,000 probe sets and live to talk about > it? > > It depends on how much patience you have! > > I have e-talked with Doug Bates about this. He says that a much > faster version of lme will soon be available. However, "soon" should be understood in the context of Dave Balsinger's comment that "Programmers spend 95% of their time being '95% done'."

ADD REPLY • link 20.8 years ago Douglas Bates ▴ 180

Login before adding your answer.