what conditions affect my gene (with limma?)
3
0
Entering edit mode
mamillerpa • 0
@mamillerpa-8668
Last seen 7.0 years ago
United States

I’m hoping to find a simple, cookbook method for determining what experimental conditions most affect the expression level of a single gene.  I can do that with a generic anova, but I’d like to take advantage of the more sophisticated and task-appropriate models in limma.

I’m generally getting my data from GEO.  Consider the expression of estrogen receptor beta (ESRRB/223858_at) in GEO dataset GDS4011, using the HG-U133_Plus_2 chip.  (or http://www.ncbi.nlm.nih.gov/geoprofiles/93486818 )

I can get that data with

> geo.profile <- getURL("http://www.ncbi.nlm.nih.gov/geo/gds/getDatum.cgi?uid=93486818")

…do a little rearranging, and test with a vanilla anova:

> summary(aov(data = rearranged.geo.profile, X1556156_at~disease.state+gender+other))

Df Sum Sq Mean Sq F value  Pr(>F) 
disease.state  5  11.52  2.3043   4.218 0.00216 **
gender         1   0.33  0.3279   0.600 0.44121 
other          2   1.56  0.7797   1.427 0.24720

> sort(with(rearranged.geo.profile,tapply(X1556156_at, disease.state, mean)))
         subgroup: G4          subgroup: G3
             3.925902              4.062805
        subgroup: WNT         subgroup: N/A
             4.205237              4.219620
subgroup: SHH outlier         subgroup: SHH
             4.383280              5.117186 

How can I get to a similar point with limma?  When I’ve used limma up to this point, I’ve started with the GEO2R templates provided by NCBI, so can I find the relationship between disease state and ESRRB expression in my fit2 object?

> fit <- lmFit(gset, design)
> cont.matrix <- makeContrasts(G1-G0, levels=design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2, 0.01)

 

limma geo experimental design • 2.3k views
ADD COMMENT
0
Entering edit mode

Have you read through the limmaUser'sGuide?

ADD REPLY
0
Entering edit mode

I have started.  Are there particular sections that you recommend for me?

ADD REPLY
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 4 hours ago
The city by the bay

As Steve suggests, the limma user's guide is a good source of information. Check out section 9.3 for how to perform ANOVA-like contrasts involving multiple groups. For a more concrete example, if you want to check whether there is a difference in expression between any of your disease states, you can do something like this:

design <- model.matrix(~disease.state+gender+other)
fit <- lmFit(gset, design) 
fit <- eBayes(fit)
topTable(fit, coef=2:5)

The 2nd to 5th coefficients should represent log-fold changes of each disease state over the "reference" disease state (probably G3, if we go by alphabetical order in the disease.state vector). Dropping them all at once represents the null hypothesis that all groups have expression equal to that of G3 - and, thus, equal to each other. Genes that reject this null will be those that have some response to disease state. You can do the same for gender and other, probably by setting coef=6 and coef=7:8, respectively. This should give you the effect of each factor for each gene, à la aov.

Note that you'll need to account for multiple testing across many genes, so you should be looking at the BH-adjusted p-value rather than the raw p-value for each gene. Obviously, the raw p-values themselves will be different between aov and limma, as the latter performs moderation, i.e., sharing information between genes to stabilize the variance estimates and to increase power.

ADD COMMENT
0
Entering edit mode

That got me off to a good start.  I also used this page for guidance:  http://shiulab.plantbiology.msu.edu/wiki/index.php/Protocol:GEO:Expression_analysis.

Now I have to 

  1. figure how to automate this for multiple experiments with different conditions and column names
  2. be more careful about my assumptions, like Gordon suggested
ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

Actually aov() doesn't in general do what you think it does. It outputs a sequential anova table. Unless the experimental design is perfectly balanced, you will find the SS for each factors in the anova table will change depending on the order that you put the factors into the aov model formula. So you can't interpret the p-values as the relative importance of each term.

ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

In general, it is better to judge the importance of each factor while adjusting for all other factors. The limma tests do this, but aov() doesn't, unless the design is balanced.

ADD COMMENT
0
Entering edit mode

Great.  I dind't know that per se, but I'm operating on the assumption that I am less likely to misuse limma than I am to misuse aov.

ADD REPLY

Login before adding your answer.

Traffic: 409 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6