Limma validity for only hundreds of genes/metabolites
2
0
Entering edit mode
iglezer • 0
@iglezer-13501
Last seen 4.0 years ago

Dear all,

thanks a lot for supporting such nice packages. I would like to know if limma fit function could be used with smaller set of genes or other metabolites quantified by liquid chromatography. I use limma in genes lists with thousand of genes, and never used with smaller features. I wonder if the moderated t-test could be used with only hundreds of features measured in small sample sets.

For instance, 2  or more groups; 4 biological replicates each, 150 genes/metabolites. Can we use empirical Bayes moderation in this situation? If yes that would be great, since limma provides an excelent tool to overcome unequal variances and normality deviation in cases like that.

Tks.

limma metabolite data moderated t-test • 1.6k views
ADD COMMENT
2
Entering edit mode
@ryan-c-thompson-5618
Last seen 28 days ago
Icahn School of Medicine at Mount Sinai…

I think I recall an instance where Gordon said that the empirical Bayes squeezing employed by limma could theoretically work with as few as 4 genes. I believe a few hundred should be fine.

ADD COMMENT
0
Entering edit mode

limma actually works on any number of genes at all. With just one or two genes, it will do linear modelling without empirical Bayes (EB) moderation.

In my lab, we use limma routinely on PCR data with as few as half a dozen genes. limma is careful to never use more df than one would get by pooling the genewise variances, and this prevents EB from overstating what can be learned from the gene ensemble.

ADD REPLY
0
Entering edit mode

Thanks! As far I could check all makes sense using limma with these small sets.

ADD REPLY
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 4 hours ago
The city by the bay

To add to Ryan's answer; some testing suggests that, with 150 features, limma does okay:

design <- model.matrix(~c(1,1,1,1,0,0,0,0))
ngenes <- 150
p.out <- scale.out <- df.out <- list()
for (i in 1:1000) {
    s2 <- 10/rchisq(ngenes, 10)
    y <- matrix(rnorm(ngenes*8, sd=sqrt(s2)), ncol=8)
    fit <- lmFit(y, design)
    fit <- eBayes(fit)
    p.out[[i]] <- fit$p.value[,2]
}
hist(unlist(p.out))

... which gives a uniform distribution of p-values, as expected under the null hypothesis. This result also holds if the shrinkage parameters (fit$s2.prior and fit$df.prior) are not precisely estimated, which was a pleasant surprise to me. I guess that the true values of the shrinkage parameters are not important, as long as the empirical variance distribution within each iteration is modelled well. Which makes sense, as the variances are just nuisance parameters when the aim is to detect differential expression.

ADD COMMENT
0
Entering edit mode

Thanks Aaron.

ADD REPLY

Login before adding your answer.

Traffic: 834 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6