I’m hoping to find a simple, cookbook method for determining what experimental conditions most affect the expression level of a single gene. I can do that with a generic anova, but I’d like to take advantage of the more sophisticated and task-appropriate models in limma.
I’m generally getting my data from GEO. Consider the expression of estrogen receptor beta (ESRRB/223858_at) in GEO dataset GDS4011, using the HG-U133_Plus_2 chip. (or http://www.ncbi.nlm.nih.gov/geoprofiles/93486818 )
I can get that data with
> geo.profile <- getURL("http://www.ncbi.nlm.nih.gov/geo/gds/getDatum.cgi?uid=93486818")
…do a little rearranging, and test with a vanilla anova:
> summary(aov(data = rearranged.geo.profile, X1556156_at~disease.state+gender+other))
Df Sum Sq Mean Sq F value Pr(>F)
disease.state 5 11.52 2.3043 4.218 0.00216 **
gender 1 0.33 0.3279 0.600 0.44121
other 2 1.56 0.7797 1.427 0.24720
> sort(with(rearranged.geo.profile,tapply(X1556156_at, disease.state, mean)))
subgroup: G4 subgroup: G3
3.925902 4.062805
subgroup: WNT subgroup: N/A
4.205237 4.219620
subgroup: SHH outlier subgroup: SHH
4.383280 5.117186
How can I get to a similar point with limma? When I’ve used limma up to this point, I’ve started with the GEO2R templates provided by NCBI, so can I find the relationship between disease state and ESRRB expression in my fit2 object?
> fit <- lmFit(gset, design)
> cont.matrix <- makeContrasts(G1-G0, levels=design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2, 0.01)
Have you read through the limmaUser'sGuide?
I have started. Are there particular sections that you recommend for me?