Question

One-condition limma test

0

Entering edit mode

Marek Gierlinski ▴ 30

@mgierlinski-7369

Last seen 4 months ago

United Kingdom

Consider an experiment (e.g. proteomics) producing ratios. Our null hypothesis is that each ratio is one. In this particular example, I have a data table with n rows (peptides) and m columns (replicates). I would like to use limma to do a differential expression of each peptide versus a constant of one. This should be a generalisation of a one-sample t-test.

I'm not quite sure how to build data and design matrix for such test. With only one condition I struggle to write a linear equation.

What I did, and I have a feeling is not a correct approach, is to add a column of ones to my data and marked it as a separate condition, as in this generated example.

library(limma)

set.seed(1)
ndat <- 10
nrep <- 3

# rows progressively diverging from the null hypothesis
tab <- t(sapply(1:ndat, function(i) rnorm(nrep, mean=1 + 0.5 * (i - 1)/ndat, sd=0.1)))
# add a reference column of 1
dat <- cbind(1, tab)

meta <- data.frame(condition = c("1", rep("A", nrep)))
design <- model.matrix(~condition, meta)

# limma test
fit <- limma::lmFit(dat, design)
ebay <- limma::eBayes(fit)
res <- limma::topTable(ebay, coef="conditionA", adjust="BH", sort.by="none")

This is probably not quite right as it corresponds, at a peptide level, to a two-sample t-test, where the first sample contains only one element (1). This certainly does not correspond to a one-sample t-test against mu = 1.

An independent one-sample t-test on these data does this:

pt <- apply(tab, 1, function(x) t.test(x, mu=1, alternative="two.sided")$p.value)
plot(log10(res$P.Value), log10(pt))

If you re-do the above example for nrep=30, you will see that the limma p-values are much more conservative than those from a one-sample t-test. Since data are normal and the number of replicates is large, I would expect similar results from any test.

Can someone think of a better way of doing a one-condition test against a constant with limma?

limma t-test • 2.2k views

ADD COMMENT • link updated 7.5 years ago by James W. MacDonald 68k • written 7.5 years ago by Marek Gierlinski ▴ 30

score 2 · Answer 1 · 2017-10-09

2

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 6 months ago

Icahn School of Medicine at Mount Sinai…

Limma typically works with data on a log scale, so I think you would want to work with log ratios. If you take the log of your data, then your null hypothesis becomes logratio = 0. You can test this hypothesis by fitting an intercept-only model (i.e. ~1) and testing whether the sole coefficient is nonzero.

dat <- log2(tab)
design <- cbind(Intercept=rep(1, ncol(dat)))
fit <- lmFit(dat, design)
fit <- eBayes(fit)
topTable(fit, coef="Intercept")

If you have a more complicated setup in which each observation is a log-ratio rather than an abundance measurement, you can use limma's modelMatrix (note: different from model.matrix) function, which was designed for 2-color arrays.

ADD COMMENT • link 7.5 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

This makes sense. Certainly, log-ratios are more symmetric (as mentioned by James below) and the null hypothesis of zero allows for the intercept-only model. Thanks!

ADD REPLY • link 7.5 years ago Marek Gierlinski ▴ 30

score 2 · Answer 2 · 2017-10-09

Ratios aren't symmetric around 1, so you shouldn't be using those data as input anyway. Instead you should take logs (probably base 2, for interpretability), which will make your ratios symmetric, and will also make the mean under the null equal to zero, in which case you can proceed directly.

And do note that the p-values won't be directly comparable anyway, due to the fact that you are doing empirical Bayes moderation of the variance estimates. Plus you aren't adjusting the t-stat p-values for multiplicity, but you are adjusting those for limma.

> tab <- log2(tab)

> pt <- apply(tab, 1, function(x) t.test(x, alternative="two.sided")$p.value)

> fit <- lmFit(dat)
> res <- eBayes(fit)

## convert to 'regular' t-stats
> tstat.ord <- res$coef/res$st.dev.unscaled/res$sigma

> data.frame(Regular = pt, limma = 2*pt(-abs(tstat.ord), 2), limmaModerated = topTable(res, adjust = "none", sort.by = "none")$P.Value)
        Regular           limma      limmaModerated
1  0.2988531584 0.2988531584   0.2163826978
2  0.3437222974 0.3437222974   0.1863790403
3  0.0018106705 0.0018106705   0.0020170729
4  0.0525491928 0.0525491928   0.0076232404
5  0.2742862056 0.2742862056   0.1154152635
6  0.0102604941 0.0102604941   0.0007851966
7  0.0004774883 0.0004774883   0.0001029651
8  0.0544338857 0.0544338857   0.0058874218
9  0.0024360677 0.0024360677   0.0001232390
10 0.0134288607 0.0134288607   0.0006044474