Question

limma: how to use pairing and adjust for covariates

1

Entering edit mode

gregory.l.stone ▴ 10

@gregorylstone-12225

Last seen 6.5 years ago

I am interested in looking at the differences in the effect of a condition on each sex. The contrast I am interested in is (Female.post - Female.pre) - (Male.post - Male.pre). I am uncertain as to how best incorporate pairing into my study and adjust for covariates.

One way I have tried to incorporate pairing is to randomize the patient effect by using duplicateCorrelation(). My design is therefore:

f = paste(dge$samples$sex, dge$samples$condition, sep = ".")
f = factor(f, levels = c("Male.pre", "Male.post", "Female.pre", "Female.post"))

d = model.matrix(~ 0 + f + age + dm + axc, colData)

cont = makeContrasts(M.PvP="fMale.post-fMale.pre", F.PvP="fFemale.post-fFemale.pre", sexDiff="(fFemale.post-fFemale.pre) - (fMale.post-fMale.pre)", levels = d)

where colData is my list of covariates, dge&samples is the colData object in my dge object containing all sample an count info, M.PvP is the effect of condition on males, F.PvP is the effect of condition of females, and sexDiff is the contrast of interest (the difference in the effect of condition on each sex). I get around 274 genes for the contrast of interest.

However, I am concerned that using duplicateCorrelation() in addition to adjusting for covariates in the design matrix is over-adjusting and assumes a linear relationship between my outcome and my covariates.

So another strategy I have tried is to block for the pairing effect in the design matrix as such:

design <- model.matrix(~ patient + sex:condition, colData)
design <- design[,!grepl("pre", colnames(design))]
colnames(design) <- sub(":", "_", colnames(design))
con <- makeContrasts(sexDiff = "sexFemale_conditionpost - sexMale_conditionpost", levels=design)

However, when I go to do my fit, I get the error "In contrasts.fit(fit, con) : row names of contrasts don't match col names of coefficients". Additionally, I only get 6 genes in my contrast of interest.

What strategy should I be going with? Thank you so much for taking the time to help!

limma limma-voom limma paired analysis limma contrast matrix • 3.8k views

ADD COMMENT • link updated 8.1 years ago by Aaron Lun ★ 28k • written 8.1 years ago by gregory.l.stone ▴ 10

0

Entering edit mode

If you got an error in your second approach, how did you still manage to test for DE genes with that contrast? In any case, provide a minimum working example that can recapitulate the error.

ADD REPLY • link 8.1 years ago Aaron Lun ★ 28k

0

Entering edit mode

When I make my contrast matrix (following con <- makeContrasts(sexDiff = "sexFemale_conditionpost - sexMale_conditionpost", levels=design)), I get the following message: Warning message: In makeContrasts(sexDiff = "sexFemale_conditionpost - sexMale_conditionpost", : Renaming (Intercept) to Intercept

When I make my fit based on my contrast matrix (fit = lmfit(voom, design) fit2 = contrasts.fit(fit, con)) I get the following:

Warning message:
In contrasts.fit(fit, con) :
row names of contrasts don't match col names of coefficients

Limma still allows me to create these objects

ADD REPLY • link 8.1 years ago gregory.l.stone ▴ 10

0

Entering edit mode

That's a warning, not an error.

ADD REPLY • link 8.1 years ago Aaron Lun ★ 28k

score 1 · Answer 1 · 2017-03-27

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 17 hours ago

The city by the bay

I would go with your second approach, i.e., block on patient in the design matrix. duplicateCorrelation is somewhat anticonservative so the greater number of DE genes in your first approach is not a surprise. There are also a number of assumptions (e.g., homogeneity of patient effects) and compromises (e.g., using the same consensus correlation for all genes) involved in using duplicateCorrelation, in addition to the linearity issues you've already mentioned. All of this is avoided with the second approach, albeit with some loss of power as information is spent on estimating the patient effects. Fewer DE genes, perhaps, but better safe than sorry.

ADD COMMENT • link 8.1 years ago Aaron Lun ★ 28k

0

Entering edit mode

When I block on patient in the design I now find I get the following message when normalizing with voom:

> v = voomWithQualityWeights(dge, design, plot=FALSE)
Coefficients not estimable: sexMale.conditionpost sexFemale.conditionpost

The same message prints repeatedly. I have samples for each condition, though. How can I correct this?

ADD REPLY • link 8.1 years ago gregory.l.stone ▴ 10

0

Entering edit mode

It seems like you haven't removed the pre* coefficients in the second design matrix. If that's not the case, please post a minimum working example that reproduces the warning.

ADD REPLY • link 8.1 years ago Aaron Lun ★ 28k

0

Entering edit mode

You're right! I apologize for my carelessness. However, if I am not specifying an intercept, don't I need the pre terms in my design so that I can make a contrast like the following: con <- makeContrasts(sexDiff = "(sexFemale_conditionpost - sexFemale_conditionpre) - (sexMale_conditionpost - sexMale_conditionpre)", levels=design_noInt)?

ADD REPLY • link 8.1 years ago gregory.l.stone ▴ 10

0

Entering edit mode

No. Once you've pruned the design matrix, the interaction terms are the post/pre log-fold changes.

ADD REPLY • link 8.1 years ago Aaron Lun ★ 28k

0

Entering edit mode

So in the case when my design matrix is ~ patient + sex:condition, I should remove the pre terms from the design and then construct my contrast matrix: makeContrasts(sexDiff = "sexFemale_conditionpost - sexMale_conditionpost", levels=design)

Now, in the case when my design matrix is ~ 0 + patient + sex:condition, I also should remove the pre terms and construct the same contrast matrix.

I'm now confused as to what the difference is between the designs ~ patient + sex:condition and ~ 0 + patient + sex:condition. I'm sure I have misunderstood something you've said along the way. I apologize for my confusion, and thank you so much for being so generous with your time!

ADD REPLY • link 8.1 years ago gstone2 • 0

0

Entering edit mode

The only difference (in the pruned designed matrices) is that, if you have an intercept, it represents the average log-expression in the first patient, and all other patient terms represent the log-fold change relative to the first patient. Without an intercept, each patient term directly represents the average log-expression in that patient. In both cases, the interpretation of the (interesting) interaction terms is unchanged.

ADD REPLY • link 8.1 years ago Aaron Lun ★ 28k

0

Entering edit mode

Ok thank you, that makes a lot of sense. So just that I am clear, in both cases, sexMale_conditionpost represents the log-fold change from pre to post in males?

ADD REPLY • link 8.1 years ago gstone2 • 0

0

Entering edit mode

I would assume so. Check the design matrix to be sure; you should see entries of 1 for post-treatment male samples, and zero for everything else.

ADD REPLY • link 8.1 years ago Aaron Lun ★ 28k

0

Entering edit mode

Ok great. Thank you very much for all of your help!

ADD REPLY • link 8.1 years ago gstone2 • 0

0

Entering edit mode

I am implementing the design you've recommended and am concerned that I am not getting the contrast I am interested in. The columns Male.post and Female.post in my design matrix have a 1 for all male and female samples, respectively, taken post condition. When I contrast these, Female.post-Male.post, I initially thought that the contrast I was doing was (Female.post-Female.pre)-(Male.post-Male.pre), but the interaction between sex and pre samples were removed to make the design full-rank, making me believe that I am getting the change between the sexes at time point post, not the change between the sexes between time points pre->post, which is what I'm interested in. Am I interpreting this incorrectly? My design is ~ patient + sex:condition

Is this a situation where I would use a nested factorial design? If so how would I go about doing that?

ADD REPLY • link 8.1 years ago gstone2 • 0

0

Entering edit mode

Male.post represents the log-fold change of post over pre in males. Female.post represents the log-fold change of post over pre in females. The contrast Female.post - Male.post tests the null hypothesis that the post/pre log-fold changes are the same between males and females. If this is not clear, I can only suggest that you consult a local statistician to help with your understanding.

ADD REPLY • link 8.1 years ago Aaron Lun ★ 28k