Question

limma -- posthoc analysis when design factor has 3 levels

1

Entering edit mode

mforde84 ▴ 20

@mforde84-12150

Last seen 7.7 years ago

Hi,

I'm analysing some microarray and rnaseq data for gene set enrichment analysis. I have three independent sample groups. I'm using GSVA to do the enrichment analysis then using limma to fit a model to the GSVA enrichment scores.

library(GSVA)
library(limma)
p <- 10 ## number of genes
n <- 45 ## number of samples
nGrp1 <- 15 ## number of samples in group 1
nGrp2 <- 15 ## number of samples in group 2
nGrp3 <- 15 ## number of samples in group 2
## consider three disjoint gene sets
geneSets <- list(set1=paste("g", 1:3, sep=""),
set2=paste("g", 4:6, sep=""),
set3=paste("g", 7:10, sep=""))
## sample data from a normal distribution with mean 0 and st.dev. 1
y <- matrix(rnorm(n*p), nrow=p, ncol=n,
dimnames=list(paste("g", 1:p, sep="") , paste("s", 1:n, sep="")))
## genes in set1 are expressed at higher levels in the last 10 samples
y[geneSets$set1, (nGrp1+1):n] <- y[geneSets$set1, (nGrp1+1):n] + 2
## build design matrix
design <- cbind(intercept=1, groups=c(rep(0, nGrp1), rep(1, nGrp2), rep(2, nGrp3)))
## estimate GSVA enrichment scores for the three sets
gsva_es <- gsva(y, geneSets, mx.diff=1)$es.obs
## fit linear model to the GSVA enrichment scores
fit <- lmFit(gsva_es, design)
## estimate moderated t-statistics
fit <- eBayes(fit)
## set1 is differentially expressed
topTable(fit, coef="groups")

This will give me the test-wise enrichment values and p-value, however I would like to do all additional group comparisons, e.g., between group 1vs2, 1vs3, and 2vs3. Is there a way to go about doing this with limma?

limma post hoc • 1.6k views

ADD COMMENT • link updated 8.3 years ago by Aaron Lun ★ 28k • written 8.3 years ago by mforde84 ▴ 20

score 2 · Answer 1 · 2017-01-11

I don't know about the GSVA part, but as for the limma code, your current design matrix is probably wrong. This is because it encodes the group IDs as a real-valued covariate rather than as a factor. For example, if you have a gene that is differentially expressed between groups, your current design expects that the log-fold change between groups 1 and 2 is the same as the log-fold change between groups 2 and 3. I doubt this is what you intended - rather, you should be doing something like this to set up your design:

groups <- factor(rep(1:3, c(nGrp1, nGrp2, nGrp3))
design <- model.matrix(~0 + groups)

Each coefficient of design represents the average (log-)expression in each group. It's simple to then use makeContrasts and to formulate comparisons between pairs of groups (or to do an ANOVA), followed by contrasts.fit and eBayes to test those contrasts. I'll refer you to the documentation and the user's guide for more details, but as an example:

con <- makeContrasts(groups1 - groups2, levels=design) # DE between groups 1 and 2
fit2 <- contrasts.fit(fit, con)
fit2 <- eBayes(fit2)
topTable(fit2)