Hi,
I wonder if the design/contrast I'm using in Limma is the best one for my experiment. In particular, I'm not sure of how duplicateCorrelation
works here. I though I'd post the design here and hopefully someone can give me some hints.
The research question is quite straight forward, but there are some complications with the design. The experiment studies treated/untreated (male and female) with a drug and then before/after an intervention. The questions we'd like to answer are:
How does the transcriptional profile differ between treated/untreated?
Are there genes that respond to the intervention differently in treated/untreated? This is the most relevant question.
Are there sex differences in the response to the intervention?
Then there are two complications:
The samples were taken in 5 batches, and those are confounded with sex (3 for male and two for female). The female samples were also generated by another hospital and much later, so the batch effect between (1/2/3) and (4/5) is larger than between 1/2/3 or 4/5.
For male, both samples for a subject (i.e. the before/after intervention samples) are analysed in the same batch. For female, the samples have been randomized to batch, so there are some subjects which have their "before" sample in one batch and their "after" sample in another.
We need to account for a number of covariates; age, BMI and so on. Importantly, we'd like to account for a type of medication that's only being used by a subset of the treated subjects (not the same treatment that the study is about). All these factors are properties of subjects rather than samples.
My current reasoning is that we cannot say anything about sex differences, since it's confounded by batch. Blocking by subject would be fine if we were only interested in the effect of the intervention, but then we cannot say anything about the interaction effect with treated/untreated (since they would be colinear). It would also fail to handle the batch effect for those subjects where the before/after samples were in different batches. Also, we'd like to quantify the effect of the covariates as a sanity check. Based on this, the only solution I could see was to view subject as a random effect. The model I'm using is:
design <- model.matrix(~ 0 + factor(intervention) + factor(treated_untreated) + factor(batch) + bmi + factor(other_medication))
corfit <- duplicateCorrelation(exprs(es), design, block = pData(es)$Subject)
fit <- lmFit(exprs(es), design, block = pData(es)$Subject, correlation = corfit$consensus)
Note that other_medication
is a confounding medication that I'd like to remove the effect of. It's FALSE for all untreated and some of the treated. I then test for differentially expressed genes with contrast, e.g.
contrast <- makeContrasts(interventionAfter - interventionBefore, levels=design)
fit2 <- contrasts.fit(fit, contrast)
fit2 <- eBayes(fit2, trend = TRUE)
Does this make sense? Will other_medication
be corrected for as I'd like even though there are no untreated subjects who are on that drug or will it get some of the "treated signal"? Are there any complications with using duplicateCorrelaction
like this when some paired samples are split between batches? Thanks a lot!
It really helps if you post some example code to generate something close to your design. Otherwise it's hard to tell from words. Did each patient receive the drug and the intervention, and samples were collected before and after each one? Were all patients treated with the drug, but only some of them got the intervention?
Help us out and do something like:
... to give us an idea of what you're really talking about. Words are cheap, but code is real.
Sure, it would look something like this (skipping all confounding variables other than batch).
patient_id
would be the random factor that is calledpData(es)$Subject
in my initial post.