> Date: Sun, 2 Jan 2005 14:05:15 -0800 (PST)
> From: "Fangxin Hong" <fhong@salk.edu>
> Subject: [BioC] A question about Limma
> To: bioconductor@stat.math.ethz.ch
> Message-ID: <1867.66.75.240.64.1104703515.squirrel@66.75.240.64>
> Content-Type: text/plain;charset=iso-8859-1
>
> Hi Bioconductor users;
> I have a general question about limma model.
> In limma package, usually one linear model applies to all genes, and error
> variances from all genes are modified simultaneously. What if some
> factors, for example, one main effect, is only significant for some genes.
> Then if we want identify genes based on the significance of another main
> effect (of interest). What is the best way to do it? Currently I juse
> leave this factor in the model which is applied to all genes,
That's what I do, leave all terms in the models for all the genes. I
don't see a strong case for
doing a separate model selection process for every gene.
> but this
> might under-estimate the total number of genes on which the effect of
> interest is significant.
Why do you think so? The only disadvantage of keeping a non-significant term in the model is a
reduction in residual degrees of freedom, with some consequential loss
of power, but this
disadvantage is mitigated by the empirical Bayes moderation process.
Perhaps someday someone will work out a model selection theory for
massively parallel regression
situations like microarray experiments, but there isn't such a theory
now. It seems safer to me
to have the same model for every gene, keeping all the 'a priori'
important predictors in the
model.
Gordon
> I am sorry if this question has been asked/answered here before, I
> wouldn't find it through searching the archive. Any comment, suggestion or
> experience is appreciated.
>
> Fangxin
> --
> Fangxin Hong, Ph.D.
> Plant Biology Laboratory
> The Salk Institute
> 10010 N. Torrey Pines Rd.
> La Jolla, CA 92037
> E-mail: fhong@salk.edu
40% sounds to me like a lot of genes. I keep it in. Not even the strongest effect will be significant for every gene. And non-significance doesn't mean the effect is zero.
Whether you keep a nuisance effect in also depends on the size of the experiment. With many arrays, definitely keep it in. With very few arrays the cost of estimating a nuisance parameter is relatively greater. Where's the cutoff? Don't know. Only experience will tell. I am currently thinking the cutoff for a dye-effect with two-color replicated dye-swap data is around 4 arrays, depending obviously on the technology.
Gordon