Question

Limma validity for only hundreds of genes/metabolites

0

Entering edit mode

iglezer • 0

@iglezer-13501

Last seen 4.3 years ago

Dear all,

thanks a lot for supporting such nice packages. I would like to know if limma fit function could be used with smaller set of genes or other metabolites quantified by liquid chromatography. I use limma in genes lists with thousand of genes, and never used with smaller features. I wonder if the moderated t-test could be used with only hundreds of features measured in small sample sets.

For instance, 2 or more groups; 4 biological replicates each, 150 genes/metabolites. Can we use empirical Bayes moderation in this situation? If yes that would be great, since limma provides an excelent tool to overcome unequal variances and normality deviation in cases like that.

Tks.

limma metabolite data moderated t-test • 1.7k views

ADD COMMENT • link updated 7.6 years ago by Aaron Lun ★ 28k • written 7.6 years ago by iglezer • 0

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 1 minute ago

The city by the bay

To add to Ryan's answer; some testing suggests that, with 150 features, limma does okay:

design <- model.matrix(~c(1,1,1,1,0,0,0,0))
ngenes <- 150
p.out <- scale.out <- df.out <- list()
for (i in 1:1000) {
    s2 <- 10/rchisq(ngenes, 10)
    y <- matrix(rnorm(ngenes*8, sd=sqrt(s2)), ncol=8)
    fit <- lmFit(y, design)
    fit <- eBayes(fit)
    p.out[[i]] <- fit$p.value[,2]
}
hist(unlist(p.out))

... which gives a uniform distribution of p-values, as expected under the null hypothesis. This result also holds if the shrinkage parameters (fit$s2.prior and fit$df.prior) are not precisely estimated, which was a pleasant surprise to me. I guess that the true values of the shrinkage parameters are not important, as long as the empirical variance distribution within each iteration is modelled well. Which makes sense, as the variances are just nuisance parameters when the aim is to detect differential expression.

ADD COMMENT • link 7.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks Aaron.

ADD REPLY • link 7.6 years ago iglezer • 0

score 2 · Accepted Answer · 2017-07-15

2

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 4 months ago

Icahn School of Medicine at Mount Sinai…

I think I recall an instance where Gordon said that the empirical Bayes squeezing employed by limma could theoretically work with as few as 4 genes. I believe a few hundred should be fine.

ADD COMMENT • link 7.6 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

limma actually works on any number of genes at all. With just one or two genes, it will do linear modelling without empirical Bayes (EB) moderation.

In my lab, we use limma routinely on PCR data with as few as half a dozen genes. limma is careful to never use more df than one would get by pooling the genewise variances, and this prevents EB from overstating what can be learned from the gene ensemble.