Hi there,
I analyzed my single-end RNA-seq data (Illumina) with limma. I have two treatments (treatment and mock) and three time points (1h, 9h and 24h) each with 3 replicates (total 18 libraries). I tried two models - one including the replicates (Model1) and another excluding the replicates (Model 2). Rest of the analysis is similar for both designs. Unfortunately both give very different results. Model1 where I included the replicates in the model gives 2000 more differentially expressed genes than Model2. My understanding is the limma analysis itself takes variance in the replicates in to account and doesn't need to include replicates in the model as in Model1. Accordance with that edgeR analysis without any model fitting and model fitting as in Model3 gives similar results to Model3. Could someone explain why is this difference and please advice which is better. Please see the code below
Thank you
*****************************
#design of linear model
****************************
#####Model1
#Define the variables
treatment = factor(rep(c('01_mock', '02_trt'), times=9))
time = factor(rep(c('01_1h', '02_9h', '03_24h'), each=2, times=3))
replicate = factor(rep(c("r1", "r2", "r3"), each=6))
comb.names = paste(treatment, time, replicate, sep='_')
design_M1 = model.matrix( ~ C(treatment:time) -1 + replicate)
colnames(design_M1) = c("mock_1h", "trt_1h" , "mock_9h", "trt_9h", "mock_24h", "trt_24h", "rep2", "rep3")
rownames(design_M1) = comb.names
##############
####Model2
##Same variables and and comb.names used and only the design is different
#This time didn't include the replicates in the model
design_M2 = model.matrix( ~ C(time:treatment) -1)
colnames(design_M2) = c("mock_1h", "trt_1h" , "mock_9h", "trt_9h", "mock_24h", "trt_24h")
rownames(design_M2) = comb.names
##############Model3
# bit different
# designed with a sample info file which looked like this
# Sample Treat Time
# 1 mk_1h_r1 mock 1h
# 2 mk_1h_r2 mock 1h
# 3 mk_1h_r3 mock 1h
# 4 mk_9h_r1 mock 9h
# 5 mk_9h_r2 mock 9h
# 6 mk_9h_r3 mock 9h
Groups <- factor(paste(targets$Treat, targets$Time, sep="."))
cbind (targets, Group=Groups)
design <- model.matrix(~0+Groups)
colnames(design) <- levels(Groups)
##############################################
#Rest was performed the same way as follows for both models
#############################################################
##############
## voom transformation
#####################
lib.size = colSums(data)*nf$samples$norm.factors
v <- voom(dat.mat, design, lib.size=lib.size)
###############
## fitting the model
##############
fit = lmFit (v, design)
###################
#specify the contrasts
contrast.matrix <- makeContrasts (trt_1h - mock_1h, trt_9h - mock_9h, trt_24h - mock_24h, levels=design)
fit = contrasts.fit (fit, contrast.matrix)
fit.pval = 2*pt( abs(fit$coef / fit$stdev.unscaled / fit$sigma), fit$df.residual, lower.tail=F )
# empirical Bayes for variance shrinkage
eb.fit = eBayes(fit)
#########################
#qValues
###########
q.val = qvalue(eb.fit$p.val)$qvalues
q.val_trt_mock_1h = q.val [, 'trt_1h - mock_1h']
q.val_trt_mock_9h = q.val [, 'trt_9h - mock_9h']
q.val_trt_mock_24h = q.val [, 'trt_24h - mock_24h']
#Number of DEGs using Model1
sum(q.va_trt_mock_1h < 0.01)
[1] 6029
sum(q.val_trt_mock_9h < 0.01)
[1] 6008
sum(q.val_trt_mock_24h < 0.01)
[1] 3785
#Number of DEGs using Model 2 where the replicates were not included in the model
sum(q.val_trt_mock_1h < 0.01)
[1] 3882
sum(q.val_trt_mock_9h < 0.01)
[1] 3905
sum(q.val_trt_mock_24h < 0.01)
[1] 2385
#Number of DEGs using Model 3 variables defined with a sample info file==similar to Model2
sum(q.val_trt_mock_1h < 0.01)
[1] 3882
sum(q.val_trt_mock_9h < 0.01)
[1] 3905
sum(q.val_trt_mock_24h < 0.01)
[1] 2385
####################END of the file###########
Thank you very much Aaron. It makes sense.
Sha