Question

Two-way interaction, linear redundancy

1

Entering edit mode

Sindre ▴ 110

@sindre-6193

Last seen 4.6 years ago

I want to conduct an (example) experiment. I have 3 skinny rats and 3 fat rats. First, I give them a standardized meal and will take blood samples to measure levels of protein "X" before and after this meal. Second, all rats undergoes weight loss for 1 week. Third, after weight loss, I again give them a standardized meal and will take blood samples to measure levels of protein "X" before and after this meal.

Here is the design:

rats <- factor(rep(rep(1:3,each=2),4)) # 3 rats in each group
weightloss <- factor(rep(c("Start", "End"), each=12), levels = c("Start","End"))
meal <- factor(rep(c("Before", "After"), 12),  levels = c("Before","After"))
group <- rep(factor(rep(c("Skinny", "Fat"), levels = c("Skinny","Fat"), each=6)),2)

There are a few questions I want to answer:
1. The response in protein "X" levels in skinny mice before weight loss: Simple paired design
2. The response in protein "X" levels in skinny mice after weight loss: Simple paired design
3. The difference in response in protein "X" levels in skinny mice before vs. after weight loss: Interaction
4. The response in protein "X" levels in fat mice before weight loss: Simple paired design
5. The response in protein "X" levels in fat mice after weight loss: Simple paired design
6. The difference in response in protein "X" levels in fat mice before vs. after weight loss: Interaction
7. The difference in response in protein "X" levels in fat mice before vs. after weight loss different from the difference in response in protein "X" levels in skinny mice before vs. after weight loss: interaction of the interactions

I have no problems with comparisons 1:6, only 7 gives me trouble. Here is my attempt:

design <- model.matrix(~0+group:meal:weightloss+rats)
colnames(design)[4:11] <- gsub(colnames(design)[4:11], pattern=":", replacement=".")
library("caret")
findLinearCombos(design) # Column 11 gives me a problem

# The intuitive way is to write this interaction of interactions formula as follows:
((groupFat.mealAfter.weightlossEnd-groupFat.mealBefore.weightlossEnd)-((groupFat.mealAfter.weightlossStart-groupFat.mealBefore.weightlossStart))-((groupSkinny.mealAfter.weightlossEnd-groupSkinny.mealBefore.weightlossEnd)-(groupSkinny.mealAfter.weightlossStart-groupSkinny.mealBefore.weightlossStart)))

However, I am not quite sure how to re-write it to remove the linear redundant column 11? Could it be like this:

design <- design[,!grepl("Before", colnames(design))] # get to full rank

# Then the interaction of interactions is the following:
(groupFat.mealAfter.weightlossEnd-groupSkinny.mealAfter.weightlossEnd)-(groupFat.mealAfter.weightlossStart-groupSkinny.mealAfter.weightlossStart)

Correct?

limma caret lm lmer lme4 • 1.2k views

ADD COMMENT • link updated 5.7 years ago by Aaron Lun ★ 28k • written 5.7 years ago by Sindre ▴ 110

score 3 · Accepted Answer · 2019-07-23

I wonder what makes people inclined to use interaction terms, when their lives could be so much easier.

G <- paste(group, meal, weightloss, sep="_")

# This is what your rats really are, Skinny "1" != Fat "1"
true.rats <- paste0(group, rats)

G <- relevel(factor(G), "Fat_Before_Start") # to ensure model.matrix behaves
design <- model.matrix(~0 + true.rats + G)
design <- design[,!grepl("Before_Start", colnames(design))] # get to full rank

The first 6 columns represent rat-specific blocking factors - more specifically, the fitted log-expression of each rat before the meal at the start of the experiment. The remaining coefficients represent the average log-fold change of each treatment relative to the before-start baseline.

(By relevel'ing G, and then removing the other Before_Start coefficient from design, we shaped the design matrix to achieve this interpretation for the coefficients. Technically, we could have done ~ 0 + G + true.rats, but I have deliberately formulated it in this manner to ensure that the coefficients of interest are nested within the true.rats levels. This means that any comparisons between G coefficients are comparisons of responses. The ~0 + G + true.rats approach invites direct comparisons between G coefficients that represent contrasts of average log-expression levels in the first skinny rat to the first fat rat - this is unlikely to be what was intended.)

At this point, your contrasts are easy, so I'll cut to the chase.

The difference in response in protein "X" levels in fat mice before vs. after weight loss different from the difference in response in protein "X" levels in skinny mice before vs. after weight loss

I will omit the makeContrasts call for clarity.

# Response in fat at the start of the week.
GFat_After_Start

# Response in fat at the end of the week.
GFat_After_End - GFat_Before_End

# Difference in response in fat between the start and end of the week.
(GFat_After_End - GFat_Before_End) - GFat_After_Start

# Difference in differences in response between fat and skinny.
((GFat_After_End - GFat_Before_End) - GFat_After_Start)
  - (GSkinny_After_End - GSkinny_Before_End) - GSkinny_After_Start)

Probably best not to use the phrasing "after weight loss", because that's easily confused with "after the meal".