Question

choosing explanatory variables for linear model in limma

0

Entering edit mode

Yannick Wurm ▴ 220

@yannick-wurm-2314

Last seen 10.5 years ago

Hello Jim & List, how would you go about doing model selection using two-color data? (since your response variable is actually a ratio) I'm actually surprised that no "formal/mathematical" linear model is written in the limma Users Guide... any comments? Kind regards, Yannick On 2009-09-03 13:33:0, Jim Macdonald wrote: > Hi Andre, > > If you want to do model selection, then limma is probably not the tool > for the job. > > Instead, what I would do would be to choose some (one, five, ten, > whatever) genes and use lm() for the model selection process. That way > you can do all the conventional model selection steps, and once you are > satisfied with the model you have chosen you can go back to limma and > fit the model on all the genes. > > Best, > > Jim > > Andre J. Aberer wrote: > > Dear list members, > > > > short version of my question: > > How can I determine, whether it improves the model quality of a linear > > model (in limma), when I introduce additional explanatory variables? Is > > there an equivalent to feature selection (as in machine learning) for > > choosing the explanatory variables? > > > > The complete story: > > We analyse a dataset of about ninety single channel microarray chips and > > we want to search for differentially expressed genes and enriched gene > > sets. The chips are annotated with information (at least 20 factors, > > could be extended to 50) like the organ from which the RNA was > > extracted, the experimenter that did the lab work, the labelling kit she > > used and a huge amount of features describing e.g. the genotype of the > > individual or different aspects of the disease. > > > > We would like to build one linear model (resp. one design matrix) with > > all of the factors of interest mentioned above as explanatory variables > > in order to test various contrasts. Of course, we have to include all > > the variables that we possibly want to test in the linear model. But > > what about the ``technical'' factors like the ``labelling kit'' that was > > used? One never might want to test a contrast using this explanatory > > variable, however the net chip intensity could be influenced by a > > technical factor like this. So how can I determine, if it makes sense to > > include this variable? > > > > I am using the standard procedure as described in the limma guide: > > designMatrix <- model.matrix(~0 + var1 + var2, data=someTable) > > fitBoth <- lmFit(eset, designMatrix) > > where var1 and var2 are variables like ``diseaseOutcome'' and > > ``labellingKit''. > > > > We thought, that maybe an anova table could help us here, showing us the > > influences of var1 and var2. As far as I read > > (e.g. http://data.princeton.edu/R/linearModels.html) the anvoa function > > can be simply applied to a lm object or can be used to compare two lm > > instances. Of course, in that case it is only applied to one linear model > > and not one per gene as in the limma setting. > > So, if I try anova for one or two limma fit objects (MArrayLM), R > > complains that there is no applicable method and other anova variants > > (like anova.lm) do not work neither. This holds as well, when I want to > > do an anova for just one extracted linear model for one gene > > (like anova(lmFit[1,])). > > > > Our ultimo ratio so far is, to build a design matrix with and another > > one without a certain explanatory variable. Then we would determine the > > top DEGs and compare for each DEG their fitted linear models in an anova > > table. Finally we could check for how many of the top DEGs the > > additional variable would make a difference. > > However, this does not seem to be the golden path...or are we completely > > on the wrong track? > > > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 [[alternative HTML version deleted]]

GO limma PROcess GO limma PROcess • 1.0k views

ADD COMMENT • link 15.2 years ago Yannick Wurm ▴ 220