Question

Limma design/contrast correct?

0

Entering edit mode

rasmus.agren • 0

@rasmusagren-19725

Last seen 6.2 years ago

Hi,

I wonder if the design/contrast I'm using in Limma is the best one for my experiment. In particular, I'm not sure of how duplicateCorrelation works here. I though I'd post the design here and hopefully someone can give me some hints.

The research question is quite straight forward, but there are some complications with the design. The experiment studies treated/untreated (male and female) with a drug and then before/after an intervention. The questions we'd like to answer are:

How does the transcriptional profile differ between treated/untreated?
Are there genes that respond to the intervention differently in treated/untreated? This is the most relevant question.
Are there sex differences in the response to the intervention?

Then there are two complications:

The samples were taken in 5 batches, and those are confounded with sex (3 for male and two for female). The female samples were also generated by another hospital and much later, so the batch effect between (1/2/3) and (4/5) is larger than between 1/2/3 or 4/5.
For male, both samples for a subject (i.e. the before/after intervention samples) are analysed in the same batch. For female, the samples have been randomized to batch, so there are some subjects which have their "before" sample in one batch and their "after" sample in another.
We need to account for a number of covariates; age, BMI and so on. Importantly, we'd like to account for a type of medication that's only being used by a subset of the treated subjects (not the same treatment that the study is about). All these factors are properties of subjects rather than samples.

My current reasoning is that we cannot say anything about sex differences, since it's confounded by batch. Blocking by subject would be fine if we were only interested in the effect of the intervention, but then we cannot say anything about the interaction effect with treated/untreated (since they would be colinear). It would also fail to handle the batch effect for those subjects where the before/after samples were in different batches. Also, we'd like to quantify the effect of the covariates as a sanity check. Based on this, the only solution I could see was to view subject as a random effect. The model I'm using is:

design <- model.matrix(~ 0 + factor(intervention) + factor(treated_untreated) + factor(batch) + bmi + factor(other_medication))
corfit <- duplicateCorrelation(exprs(es), design, block = pData(es)$Subject)
fit <- lmFit(exprs(es), design, block = pData(es)$Subject, correlation = corfit$consensus)

Note that other_medication is a confounding medication that I'd like to remove the effect of. It's FALSE for all untreated and some of the treated. I then test for differentially expressed genes with contrast, e.g.

contrast <- makeContrasts(interventionAfter - interventionBefore, levels=design)
fit2 <- contrasts.fit(fit, contrast)
fit2 <- eBayes(fit2, trend = TRUE)

Does this make sense? Will other_medication be corrected for as I'd like even though there are no untreated subjects who are on that drug or will it get some of the "treated signal"? Are there any complications with using duplicateCorrelaction like this when some paired samples are split between batches? Thanks a lot!

limma microarray design paired mixed model • 3.6k views

ADD COMMENT • link updated 6.2 years ago by Aaron Lun ★ 28k • written 6.2 years ago by rasmus.agren • 0

0

Entering edit mode

It really helps if you post some example code to generate something close to your design. Otherwise it's hard to tell from words. Did each patient receive the drug and the intervention, and samples were collected before and after each one? Were all patients treated with the drug, but only some of them got the intervention?

Help us out and do something like:

batch <- rep(1:5, each=10) 
sex <- rep(c("M", "M", "M", "F", "F"), each=10)
drug <- ???
intervention <- ???
patient_id <- ???

... to give us an idea of what you're really talking about. Words are cheap, but code is real.

ADD REPLY • link 6.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Sure, it would look something like this (skipping all confounding variables other than batch). patient_id would be the random factor that is called pData(es)$Subject in my initial post.

# 5 batches, 4 subjects per batch, 2 samples per subject
batch <- rep(1:5, each = 8)

# First 3 batches are male, last 2 are female
sex <- c(rep("M", 8*3), rep("F", 8*2))

# Half of the subjects are on the drug
drug <- rep(c("Yes", "Yes", "No", "No"), 8*5/4)

# All subjects have measurements before and after an intervention
intervention <- rep(c("Before", "After"), 8*5/2)

patient_id <- c(rep(paste0("pat", 1:12), each = 2), # All males (12 subjects) have paired samples in the same batch
  rep(paste0("pat", 13:14), each = 2), # The same is true for 2 of the females in batch 4
  paste0("pat", 15:18), paste0("pat", 18:15), # 4 female subjects are split between batch 4 and 5.
  rep(paste0("pat", 19:20), each = 2)) # 2 the females in batch 5 have paired samples in the same batch

# Note that sex isn't used here
design <- model.matrix(~ 0 + factor(intervention) + factor(drug) + factor(batch))

ADD REPLY • link 6.2 years ago rasmus.agren • 0

score 3 · Accepted Answer · 2019-02-04

I would recommend doing something like this:

g <- factor(paste0(sex, drug, intervention))
b <- factor(batch)
design <- model.matrix(~0 + g + b) # plus other confounding covariates
design <- design[,setdiff(colnames(design), "b4")] # get to full rank

... and block on patient_id in duplicateCorrelation(). The first 8 coefficients represent the average log-expression of each combination of sex/drug/intervention, while the remaining coefficients represent the uninteresting batch effects. This approach avoids making additive assumptions about how the sex/drug/intervention factors interact with each other.

How does the transcriptional profile differ between treated/untreated?

If you want to test for treated/untreated differences in females before interventions, you would do:

con <- makeContrasts(gFYesBefore -  gFNoBefore, levels=design)

If you want to test for treated/untreated differences in females, averaging across their intervention status, you would do:

con <- makeContrasts((gFYesAfter + gFYesBefore)/2 -  
    (gFNoAfter + gFNoBefore)/2, levels=design)

There's at least a few other ways to cut this particular pie. You could do an ANOVA with:

con <- makeContrasts(
    gFYesAfter - gFNoAfter, # post-intervention differences
    gFYesBefore - gFNoBefore, # pre-intervention differences
    levels=design)

... and look for significant genes where the log-fold change was in the same direction in both contrasts. Or you could do separate tests for each contrast and intersect the resulting lists of DE genes that change in the same direction. This logic can be extended to the males as well, giving you a total of 4 contrasts to intersect.

The choice of testing strategy depends on how stringent you want to be. A simple treated versus untreated comparison no longer exists if there are interactions between drug treatment, the intervention and sex. (And if there aren't... well, your remaining two questions won't have very interesting answers.)

In all of these contrasts, we're always comparing within each sex. Thus, there are no problems with the fact that batch is fully confounded with sex.

Are there genes that respond to the intervention differently in treated/untreated? This is the most relevant question.

Easy, two-way interactions. Here we're looking for differences in the intervention response with/without treatment in females:

con <- makeContrasts((gFYesAfter -  gFYesBefore) - (gFNoAfter -  gFNoBefore), levels=design)

The same applies for males, I won't bother writing it. Again, if you want to combine these results across males and females, you'll have to decide whether you want to average the effects, use an ANOVA, or intersect the DE lists.

As before, we're comparing within each sex, so the confounding batch is not a problem.

Are there sex differences in the response to the intervention?

Yet another two-way interaction. For differences in intervention response between sexes for the drug-treated group:

con <- makeContrasts((gFYesAfter -  gFYesBefore) - (gMYesAfter -  gMYesBefore), levels=design)

... and same again for the untreated group. Now, this does involve comparing between sexes, but actually, we don't have to worry about the fact that batch and sex are confounded. This is because we are computing the intervention response within each sex, and then comparing those responses between sexes. Any batch effect cancels out when we compute the response, making this contrast valid.

Will other_medication be corrected for as I'd like even though there are no untreated subjects who are on that drug or will it get some of the "treated signal"?

This should be okay, provided the subjects on this other medication are not all of the male subjects (or all of the female subjects).

Are there any complications with using duplicateCorrelaction like this when some paired samples are split between batches?

No, this is fine.