Question

Combining repeated measurements and duplicateCorrelation in LIMMA

2

Entering edit mode

Jeff ▴ 20

@acaa643d

Last seen 3.9 years ago

Canada

I have two populations (normal and diseased). For each subject I have two measurements (baseline and 3 months later).

My primary interest is the difference in expression between normal and diseased subjects, regardless of time-point.

Using the LIMMA user guide I have used sections 9.5.2 and 9.7 to guide me.

Questions:

Looking at the code I wrote below, I have a feeling the contrasts matrix I created is not actually measuring the difference in expression between normal and diseased subjects. As I said earlier, my primary interest is the difference in expression between normal and diseased subjects, regardless of time-point. I think the contrasts matrix I currently have set-up will end up telling me the genes that respond differently in expression between healthy and diseased subjects over time - this is not what i want - I just want to know which genes are differently expressed between healthy and diseased regardless of the changing of time. I was hoping to get advice on correct way of conducting the analysis for my purposes.
Is it possible to use a continuous phenotype and still perform the analysis? For instance I may not be satisfied with normal vs diseased and may want to use for instance viral cells/mL as my primary predictor. In that case can I just run these lines of code?

Treat <- targets$viral_concentration

design <- model.matrix(~Treat)

corfit <- duplicateCorrelation(eset,design,block=targets$Subject)

fit <- lmFit(eset,design,block=targets$Subject,correlation=corfit$consensus)

fit <- eBayes(fit)

topTable(fit, coef=”Treat”)

Thank you in advance!

...

Subject          Condition      time_point
1                 disease        0_months
1                 disease        3_months
2                 normal         0_months
2                 normal         3_months
3                 disease        0_months
3                 disease        3_months
...


#From 9.7 in the LIMMA user guide 
Treat <- factor(paste(targets$Condition,targets$time_point,sep="."))
design <- model.matrix(~0+Treat)
colnames(design) <- levels(Treat)

#estimate the correlation between measurements made on the same subject
corfit <- duplicateCorrelation(eset,design,block=targets$Subject)
corfit$consensus

# inter-subject correlation is input into the linear model fit:
fit <- lmFit(eset,design,block=targets$Subject,correlation=corfit$consensus)

# Comparing diseased and normal subjects, using 9.5.2 in LIMMA user guide, 
**##I have a feeling this is the incorrect way of comparing diseased and normal subjects**##**
 cm <- makeContrasts( DiseasedvsNormal = (disease.3_months-disease.0_months) - (normal.3_months-normal.0_months), levels=design)

# compute these contrasts and moderated t-tests
 fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2)

topTable(fit2, coef="DiseasedvsNormal")

limma • 1.8k views

ADD COMMENT • link updated 3.9 years ago by James W. MacDonald 68k • written 3.9 years ago by Jeff ▴ 20

0

Entering edit mode

Hi everyone, Im the OP, I believe I have figured out the answer to question 1 and question 2. However, I would still appreciate input from LIMMA experts out there.

Question 1. If I'm interested in differential expression between healthy and diseased patients regardless of the time-point, my analysis would simply become an ordinary two group comparison but I would still incorporate patient as a random effect via duplicateCorrelation. So

Treat <- factor(targets$Condition)

design <- model.matrix(~0+Treat)

corfit <- duplicateCorrelation(eset,design,block=targets$Subject)

fit <- lmFit(eset,design,block=targets$Subject,correlation=corfit$consensus)

cm <- makeContrasts( DiseasedvsNormal = disease-normal, levels=design)

fit2 <- contrasts.fit(fit, cm)

fit2 <- eBayes(fit2)

topTable(fit2, coef="DiseasedvsNormal")

Question 2 - I have found an interesting answer from Aaron Lun in the past (Further clarification on when not to use duplicateCorrelation with technical replicates (RNA-seq)), whereby if I want to use a continuous predictor and duplicateCorrelation this situation resembles "Repeated samples with different interesting predictors". i guess I just want expert opinion if my code below is methodologically sound.

Treat <- targets$viral_concentration

design <- model.matrix(~Treat)

corfit <- duplicateCorrelation(eset,design,block=targets$Subject)

fit <- lmFit(eset,design,block=targets$Subject,correlation=corfit$consensus)

fit <- eBayes(fit)

topTable(fit, coef=”Treat”)

ADD REPLY • link 3.9 years ago Mcgee • 0

score 1 · Answer 1 · 2021-06-09

It's not clear what you mean by differential expression regardless of the time point. That could mean different things. But anyway, I am not sure I see a reason to use duplicateCorrelation or to block on subject for this study.

This part:

cm <- makeContrasts( DiseasedvsNormal = (disease.3_months-disease.0_months) - (normal.3_months-normal.0_months), levels=design)

Is an interaction, and is intended to find genes that are affected differently by time depending on the subject status. And this automatically accounts for subject-specific differences (because you are subtracting the two timepoints, within subject). This definitely doesn't select genes that are different at any time point - it finds genes where the differences in the parentheses are different. This can encompass many things. No change for disease, big change for normal, no change for normal, big change for disease, smaller changes for each disease status with different directions, etc. You could do

cm <- makeContrasts(DiseasedvsNormal = (disease.3_months + disease.0_months)/2 - (normal.3_months + normal.0_months)/2, levels = design)

Which will test that the average expression in diseased is different than the average in normal, which is one way of finding differential expression regardless of time point. Or you could do

cm <- makeContrasts(DiseasedvsNormal3mo = disease.3_months  - normal.3_months,  DiseasedvsNormal0mo = disease.0_months -  normal.0_months, levels = design)

which will test for any differences at either time. And then you could just select genes that are significant in either contrast, or do an F-test for both contrasts simultaneously. Although the latter tends to be less powerful and not exactly the same as the former.

And finally you could do what you suggest in your reply, which is to ignore time completely, and use duplicateCorrelation to account for repeated measures. If you do this you are assuming that time really doesn't make any difference, because you will end up treating the diseased and normal samples at different times as if they are indistinguishable. Which may be a reasonable assumption, but I would tend to check that assumption first.