Entering edit mode
Arne.Muller@aventis.com
▴
620
@arnemulleraventiscom-466
Last seen 10.5 years ago
Hello,
A while ago I've posted "programming problem: running many ANOVAs" (I
actually got a very sophisticated reply - too sophisticated for me :-(
...). Following this posting I came across another problem with linear
models.
I usually run a simple linear model including including all my factors
(dose, time, batch) for each probeset on the array. I.e. I construct
and run >12,000 linear models and anovas. The model could be:
Value ~ batch + time, + dose
I was thinking about running just a single linear model that includes
the probes( actually the probes sets i.e. the genes)
Value ~ gene + batch + time + dose + probe*batch + probe*time +
probe*dose
The gene (probeset) interacts with each main effect.
the actual dataframe would look like this:
Value batch time dose gene
5.225589 NEW 24h 000mM 100001_at
5.207835 NEW 24h 000mM 100001_at
4.138210 NEW 24h 000mM 100001_at
7.253535 OLD 24h 000mM 100001_at
...
4.018591 PRG 04h 025mM 100001_at
7.205778 PRG 04h 000mM 100001_at
8.191978 NEW 24h 000mM 100002_at
I'm abolutely not sure about this. There are several problems:
1. What about degrees of freedom, they're huge?
2. Don't know how to interpret summary(fit)
3. Computitionally impossible (on my machine) ;-( ...
I'm more interested in whether anybody here has already tried this
seriously, i.e. worked on the statistical theory + biological
interpretation.
kind regards,
Arne
--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com