Hello,
I am testing what condition Limma can not be used and do some simulation (as below). I find that no matter one-color or two-color microarray, if only have one replicate, Limma can be used to get the result.
But if no any replicate, then Limma refuses analysis in the step : fit2 <- eBayes(fit2)
My question is:
If in all arrays, just one replicate, may I use Limma to get reasonal result for Differentially Expressed Genes (DEG) even for several group, since Limma does can run?
If one replicate is not enough, how many replicates are needed to get reasonal result?
Thanks,
Kevin
##################################
# one-color simulation
##################################
library(limma)
# good Targets
# simulate Targets frame for one-color
gr <- c("c", "t1", "t3", "t2", "c") ################ just "c" is replicate
# gr <- c("c", "t1", "t2") # wrong, no any replicate
Targets <- data.frame(group = gr)
rownames(Targets) <- paste0("a", 1:length(gr))
Targets
# simulate expression matrix according to Targets frame
expMat <- matrix(log2(runif(500 * length(gr))), nrow = 500, ncol = length(gr))
colnames(expMat) <- paste0("a", 1:length(gr))
expMat[1:3, ]
# extract groups
gf <- factor(Targets$group)
gf
# design matrix
design <- model.matrix(~ 0 + gf)
design
colnames(design) <- levels(gf)
design
# first linear model
fit <- lmFit(expMat, design)
# make contrast matrix
treat <- setdiff(levels(gf), "c")
ct <- paste0(treat, "-", "c")
cm <- makeContrasts(contrasts = ct, levels = design)
cm
# fit contrast matrix
fit2 <- contrasts.fit(fit, cm)
fit2
##################################
# it gives wrong info if any
##################################
# final model
fit2 <- eBayes(fit2)
##################################
# extract Differentially Expressed Genes (DEG)
for (i in 1:length(ct)) {
print(topTable(fit2, coef = i))
}
#######################################
# two-color simulation
#######################################
library(limma)
# good Targets
# simulate Targets frame for two-color
# just c("c", "t1") and c("t1", c) is one replicate with dye-swap
cy3cy5 <- rbind(
c("c", "t1"),
c("c", "t2"),
c("t1", "c"),
c("t3", "c")
)
# wrong, no any replicate
# cy3cy5 <- rbind(
# c("c", "t1"),
# c("c", "t2"),
# c("t3", "c"),
# c("t4", "c")
# )
colnames(cy3cy5) <- c("cy3", "cy5")
numArray <- nrow(cy3cy5)
Targets <- data.frame(Cy3 = cy3cy5[, "cy3"], Cy5 = cy3cy5[, "cy5"])
rownames(Targets) <- paste0("a", 1:numArray)
Targets
# simulate expression matrix according to Targets frame
expMat <- matrix(log2(runif(500 * numArray)), nrow = 500, ncol = numArray)
colnames(expMat) <- paste0("a", 1:numArray)
expMat[1:3, ]
# design matrix
design <- modelMatrix(Targets, ref = "c")
design
# first linear model
fit <- lmFit(expMat, design)
# make contrast matrix
treat <- setdiff(as.vector(cy3cy5), "c")
ct <- paste0(treat)
cm <- makeContrasts(contrasts = ct, levels = design)
cm
# fit contrast matrix
fit2 <- contrasts.fit(fit, cm)
fit2
##################################
# it gives wrong info if any
##################################
# final model
fit2 <- eBayes(fit2)
##################################
# extract Differentially Expressed Genes (DEG)
for (i in 1:length(ct)) {
print(topTable(fit2, coef = i))
}
Thanks Gordon. I modified the statements "...Limma refuses analysis in the step" according to you. Yes. I am trying to learning other people's work. They did analysis like this: If microarray is two-color: they identified DEG just using 2-fold change, no matter if there exists replicates. If microarray is one-color: if there are at least two replicates in each group, then they used SAM (significance analysis of microarray), otherwise, they just used 2-fold change to obtain DEG.
I would like to know: if it is better to using the combined 2-fold change and p-values than just 2-fold change when existing replicates, even just one replicate?
Thanks,
Kevin
I almost always use p-value, not combined with fold change. Please read the warning about combining fold change and p-value on the topTable help page: ?topTable.
Regarding the "other people's work", I suspect that their decisions relate more to the limitations of SAM (it can't analyse two colour arrays unless all common reference and it needs a minimum number of arrays) rather than what would one ideally want to do.
Thanks Gordon. Will follow your suggestions.