Dear BioC community,
I am seeking advice on analyzing a complex RNA-seq dataset using edgeR and limma. My factors are similar to that of the limma user guide 9.7, bit if all subjects had both normal and diseased conditions and we wanted to analyze both conditions and treatment in the same contrast. I've read the edgeR and limma user guides, the team's Law et al 2018 paper, and many posts on bioc and biostars and haven't found this analysis question answered.
Experiment Details representative of my data (but with n=15 donors)
covariate_data <- data.frame(
SampleID = c("S1", "S2", "S3", "S4", "S5", "S6", "S7", "S8", "S9", "S10", "S11", "S12", "S13", "S14", "S15", "S16", "S17", "S18", "S19", "S20", "S21", "S22", "S23", "S24"),
Donor = rep(c("D1", "D2", "D3"), each = 8),
Stimulus = rep(c("Control", "StimA", "StimB", "StimC"), times = 6),
Temperature = rep(c("Low", "Low", "Low", "Low", "High", "High", "High", "High"), times = 3),
Batch = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)
)
> covariate_data
SampleID Donor Stimulus Temperature Batch
S1 D1 Control Low 1
S2 D1 StimA Low 1
S3 D1 StimB Low 1
S4 D1 StimC Low 1
S5 D1 Control High 1
S6 D1 StimA High 1
S7 D1 StimB High 1
S8 D1 StimC High 1
S9 D2 Control Low 1
S10 D2 StimA Low 1
S11 D2 StimB Low 1
S12 D2 StimC Low 1
S13 D2 Control High 1
S14 D2 StimA High 1
S15 D2 StimB High 1
S16 D2 StimC High 1
S17 D3 Control Low 2
S18 D3 StimA Low 2
S19 D3 StimB Low 2
S20 D3 StimC Low 2
S21 D3 Control High 2
S22 D3 StimA High 2
S23 D3 StimB High 2
S24 D3 StimC High 2
Objective: We aim to determine whether the change in gene expression induced by a specific stimulus differs significantly between two temperatures, controlling for the paired nature of the samples. Specifically I would like to compare for all stimuli:
(Stimulus at high temp - Control at high temp) vs (Stimulus at low temp - Control at low temp)
After following the steps in the Law et al 2018 paper, bioc post 59700, and the linked Liu eet al 2025 paper, this is my proposed analysis:
#Create group for filtering
meta$Group <- paste(meta$Stimulus,meta$Temperature, sep=".")
#Create DGElist
y <- DGEList(counts, group = meta$Group)
#Filter genes that are in (n = smallest group size) samples
keep <- filterByExpr(y, group=y$samples$group)
y <- y[keep, ,keep.lib.sizes=FALSE]
#Calculate norm factors
y <- calcNormFactors(y, method = "TMM")
# Sample design matrix
design <- model.matrix(~ 0 + Batch + Donor + Group)
#Voom converts the read counts to log2CPM w/ associated weights, ready for lm
v <- voomWithQualityWeights(y, design)
corfit <- duplicateCorrelation(v, design, block = anno$Donor)
v <- voomWithQualityWeights(y, design, block = anno$Donor, correlation = corfit$consensus)
vfit <- lmFit(v, design, block = anno$Donor, correlation = corfit$consensus)
# Contrasts of interest
contrast.matrix <- makeContrasts(
DiffTempA = (StimulusA.TempHigh - NegControl.TempHigh) - (StimulusA.TempLow - NegControl.TempLow),
DiffTempB = (StimulusB.TempHigh - NegControl.TempHigh) - (StimulusB.TempLow - NegControl.TempLow),
DiffTempC = (StimulusC.TempHigh - NegControl.TempHigh) - (StimulusC.TempLow - NegControl.TempLow)
levels = design
)
#Then compute these contrasts and moderated t-tests
fit2 <- contrasts.fit(vfit, my.contrasts)
efit <- eBayes(fit2)
Questions
- Does my contrast setup correctly capture the temperature-specific effects for the stimulus of interest?
- Is this accurately accounting for the repeated measures from donors and batch effect?
Thank you for your help!