Question

Limma - Complex 2-way multi-level contrast with paired samples

0

Entering edit mode

Elizabeth • 0

@2ee56b52

Last seen 11 days ago

France

Dear BioC community,

I am seeking advice on analyzing a complex RNA-seq dataset using edgeR and limma. My factors are similar to that of the limma user guide 9.7, bit if all subjects had both normal and diseased conditions and we wanted to analyze both conditions and treatment in the same contrast. I've read the edgeR and limma user guides, the team's Law et al 2018 paper, and many posts on bioc and biostars and haven't found this analysis question answered.

Experiment Details representative of my data (but with n=15 donors)

covariate_data <- data.frame(
  SampleID = c("S1", "S2", "S3", "S4", "S5", "S6", "S7", "S8", "S9", "S10", "S11", "S12", "S13", "S14", "S15", "S16", "S17", "S18", "S19", "S20", "S21", "S22", "S23", "S24"),
  Donor = rep(c("D1", "D2", "D3"), each = 8),
  Stimulus = rep(c("Control", "StimA", "StimB", "StimC"), times = 6),
  Temperature = rep(c("Low", "Low", "Low", "Low", "High", "High", "High", "High"), times = 3),
  Batch = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)
)

> covariate_data
   SampleID Donor Stimulus Temperature Batch
       S1    D1  Control         Low     1
       S2    D1    StimA         Low     1
       S3    D1    StimB         Low     1
       S4    D1    StimC         Low     1
       S5    D1  Control        High     1
       S6    D1    StimA        High     1
       S7    D1    StimB        High     1
       S8    D1    StimC        High     1
       S9    D2  Control         Low     1
      S10    D2    StimA         Low     1
      S11    D2    StimB         Low     1
      S12    D2    StimC         Low     1
      S13    D2  Control        High     1
      S14    D2    StimA        High     1
      S15    D2    StimB        High     1
      S16    D2    StimC        High     1
      S17    D3  Control         Low     2
      S18    D3    StimA         Low     2
      S19    D3    StimB         Low     2
      S20    D3    StimC         Low     2
      S21    D3  Control        High     2
      S22    D3    StimA        High     2
      S23    D3    StimB        High     2
      S24    D3    StimC        High     2

Objective: We aim to determine whether the change in gene expression induced by a specific stimulus differs significantly between two temperatures, controlling for the paired nature of the samples. Specifically I would like to compare for all stimuli:

(Stimulus at high temp - Control at high temp) vs (Stimulus at low temp - Control at low temp)

After following the steps in the Law et al 2018 paper, bioc post 59700, and the linked Liu eet al 2025 paper, this is my proposed analysis:

#Create group for filtering
meta$Group <- paste(meta$Stimulus,meta$Temperature, sep=".")

#Create DGElist
y <- DGEList(counts, group = meta$Group)

#Filter genes that are in (n = smallest group size) samples
keep <- filterByExpr(y, group=y$samples$group)
y <- y[keep, ,keep.lib.sizes=FALSE]

#Calculate norm factors
y <- calcNormFactors(y, method = "TMM")

# Sample design matrix
design <- model.matrix(~ 0 + Batch + Donor + Group)

#Voom converts the read counts to log2CPM w/ associated weights, ready for lm
v <- voomWithQualityWeights(y, design)
corfit <- duplicateCorrelation(v, design, block = anno$Donor)
v <- voomWithQualityWeights(y, design, block = anno$Donor, correlation = corfit$consensus)
vfit <- lmFit(v, design, block = anno$Donor, correlation = corfit$consensus)

# Contrasts of interest
contrast.matrix <- makeContrasts(
  DiffTempA = (StimulusA.TempHigh - NegControl.TempHigh) - (StimulusA.TempLow - NegControl.TempLow),
  DiffTempB = (StimulusB.TempHigh - NegControl.TempHigh) - (StimulusB.TempLow - NegControl.TempLow),
  DiffTempC = (StimulusC.TempHigh - NegControl.TempHigh) - (StimulusC.TempLow - NegControl.TempLow)
  levels = design
)

#Then compute these contrasts and moderated t-tests
fit2 <- contrasts.fit(vfit, my.contrasts)
efit <- eBayes(fit2)

Questions

Does my contrast setup correctly capture the temperature-specific effects for the stimulus of interest?
Is this accurately accounting for the repeated measures from donors and batch effect?

Thank you for your help!

edgeR contrastmatrix limma • 1.1k views

ADD COMMENT • link 5 weeks ago • updated 4 weeks ago Elizabeth • 0

score 2 · Accepted Answer · 2025-03-12

If you are fitting donor as a fixed effect you do not also need to fit as a random effect. The limma package fits a generalized least squares (GLS) model if you specify a correlation, which incorporates the correlation structure into the model fit. But if you have complete cases and can fit donor as a fixed effect, the donor coefficient is orthogonal to treatment and temperature, so the consensus correlation should be quite close to zero. It won't hurt to include it, but it's not necessary.

In addition, if you were to fit a GLS, you should use voomLmFit which will automatically iterate between fitting the voom and duplicateCorrelation steps for you, plus handle genes with lots of zeros better.

Also, if you fit a fixed effect for donor, you will not also be able to fit a batch effect, because the donors are all nested in batch, so the two will contain redundant information.

> covariate_data <- data.frame(
  SampleID = c("S1", "S2", "S3", "S4", "S5", "S6", "S7", "S8", "S9", "S10", "S11", "S12", "S13", "S14", "S15", "S16", "S17", "S18", "S19", "S20", "S21", "S22", "S23", "S24"),
  Donor = rep(c("D1", "D2", "D3"), each = 8),
  Stimulus = rep(c("Control", "StimA", "StimB", "StimC"), times = 6),
  Temperature = rep(c("Low", "Low", "Low", "Low", "High", "High", "High", "High"), times = 3),
  Batch = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)
)
> library(limma)
## add group
> covariate_data$Group <- with(covariate_data, paste(Stimulus, Temperature, sep = "_"))
> design <- model.matrix(~0 + Batch + Donor + Group, covariate_data)
> is.fullrank(design)
[1] FALSE

So you cannot fit the model you describe. Instead you want to remove batch, and switch the order of Group and Donor

> gooddesign <- model.matrix(~ 0 + Group + Donor, covariate_data)
> colnames(gooddesign)
 [1] "GroupControl_High"
 [2] "GroupControl_Low" 
 [3] "GroupStimA_High"  
 [4] "GroupStimA_Low"   
 [5] "GroupStimB_High"  
 [6] "GroupStimB_Low"   
 [7] "GroupStimC_High"  
 [8] "GroupStimC_Low"   
 [9] "DonorD2"          
[10] "DonorD3"          
> is.fullrank(gooddesign)
[1] TRUE

And if you do that, then

Yes
Yes, but you don't need the duplicateCorrelation step or the block and correlation arguments to voomLmFit