It is pretty easy to home-brew your own power analysis, especially given that you already have some "pilot" data.
The key to making it realistic is to pull out the empirical Bayes statistics from your current dataset.
df0 <- median(fit$df.prior) # median, just in case you used robust=TRUE
s2 <- fit$s2.prior # might be a vector if trend=TRUE
ameans <- fit$Amean
Now, let's say you want to see the effect of doubling the number of samples in each group. For simplicity, I'll only deal with two groups. We generate a design matrix where the second coefficient represents the effect of interest.
grouping <- gl(2, 20)
nlibs <- length(grouping)
new.design <- model.matrix(~grouping)
We now do some stuff to generate a pretend dataset with similar parameters to the pilot:
ngenes <- length(ameans)
# Getting true variances for each gene
sigma <- sqrt(s2 / rchisq(ngenes, df=df0))
# Generating the residual effects.
resid.effects <- matrix(rnorm(ngenes*nlibs, sd=sigma), nrow=ngenes)
resid.effects[,seq_len(ncol(new.design))] <- 0
# Generating the residual matrix.
resids <- t(qr.qty(qr(new.design), t(resid.effects)))
# Adding back the abundances.
new.y <- resids + ameans
# Adding a specified effect for a random 10% of genes.
# In this case, a log-fold change of 1.5.
chosen <- sample(ngenes, round(ngenes/10))
effect <- sample(c(-1.5, 1.5), length(chosen), replace=TRUE)
new.y[chosen,] <- new.y[chosen,] + outer(effect, new.design[,2], "*")
Then we can analyze new.y
with limma again, and see how many of the DE genes are detected:
new.fit <- lmFit(new.y, new.design)
new.fit <- eBayes(new.fit)
table(decideTests(new.fit)[chosen,2])
Rinse and repeat until you get an experimental design with your desired detection rate. You can also tune the generation of the new dataset if you think your genes of interest are low or high-abundance (in which case, you just sample
from the relevant interval).
I should add that the above simulation is not entirely correct, as the addition of the effect will change the average abundance of the simulated genes in new.y
. I'm not entirely sure how to avoid that for an arbitrary design matrix; I guess one could define something like:
new.design2 <- sweep(new.design, 2, colMeans(new.design), "-")
new.design2[,"(Intercept)"] <- 1
... and use that instead, which ensures that only the intercept needs to be specified to determine the average abundance.
For the log-fold change, I would just go for 1, as this is the minimum I would consider for further experimental work. For the number of genes - this depends on how similar your biological groups are. Perhaps there are only 100 DEGs; maybe there are thousands. If you knew that already, you wouldn't need to do any experiments.
Unless you're answering your own question, don't use "Add answer". Use the "ADD COMMENT" button instead.