limma: correcting for a correlation between contrasts
1
1
Entering edit mode
@alex-gutteridge-2935
Last seen 10.2 years ago
United States
I have an experimental design (or rather question) that goes beyond my usual limma knowledge. I am hoping there is some elegant way to answer it that someone here can point me towards. The experimental design is straightforward: one control group (Ctrl) and two treatment groups (T1 & T2) with three replicates each: S1 Ctrl S2 Ctrl S3 Ctrl S4 T1 S5 T1 S6 T1 S7 T2 S8 T2 S9 T2 It is clear from some basic exploratory analysis that the effect of T2 is very similar to T1, but has a slightly reduced response. I.e. plotting log fold changes for the Ctrl - T1 contrast vs the Ctrl - T2 contrast gives a clear linear correlation but with slope <1. This is expected given what we know about the biology of the system, but we suspect there are some genes that will be specific for the T2 treatment. My first instinct is to setup a contrast for limma like: (Ctrl - T1) - (Ctrl - T2), but because the T2 response is across the board less than T1 this ends up just highlighting that fact rather than pulling out the T2 'specific' response. Hopefully this code explains things visually with some simulated data: T1.response = rnorm(100,0,1) T2.response = T1.response * 0.5 + rnorm(100,0,0.2) T1.response[101:110] = rnorm(10,0,0.2) T2.response[101:110] = rnorm(10,1,0.2) plot(T1.response,T2.response,xlim=c(-2,2),ylim=c(-2,2),col=c(rep("gree n",100),rep("black",10)),pch=16) abline(0,1,col="blue") abline(-0.5,1,lty=2,col="blue") abline(0.5,1,lty=2,col="blue") abline(model,col="red") abline(model$coefficients[1]+0.5,model$coefficients[2],lty=2,col="red" ) abline(model$coefficients[1]-0.5,model$coefficients[2],lty=2,col="red" ) The (Ctrl - T1) - (Ctrl - T2) contrast tends to pull out the T2 specific genes (black) which I want, but also those genes outside the blue tramlines that are simply not responding as strongly to T2 as T1 (which I don't want). In the real data, those genes out-number the really T2 specific genes considerably and dominate in any downstream GSEA type analysis. Is there a way to setup the contrast instead so limma corrects for the correlation between the two datasets and looks for genes with large residuals from the red (fitted) line rather than the blue? -- Alex Gutteridge
limma limma • 1.6k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 10 minutes ago
WEHI, Melbourne, Australia

Dear Alex,

The contrast (Ctrl-T1)-(Ctrl-T2) is just equal to T2-T1, it finds genes for which T2 and T1 are not identical.

If you want T2 specific genes, would you not simply look for genes that are DE for T2 - Ctrl but not for T1 - Ctrl?

Best wishes
Gordon

ADD COMMENT
0
Entering edit mode

Yes, of course you're right T2-T1 is the simpler (correct) way to express that contrast. Muddled thinking on my part.

Your second question gets to the heart of my problem: I know that wherever I set the thresholds for determining DE there will be genes that are DE for T2 - Ctrl, but not for T1 - Ctrl, but where if the magnitude of the T1 response were simply a little stronger they *would* be DE. Put another way, if we assume a linear dose response for the system to T1, which other data suggests is not an unreasonable assumption here, an increase in the T1 concentration would, for many, but not all, genes produce the same response as T2 (or vice-versa a decrease in T2 concentration would produce the same response as T1). I want to get those genes where this is *not* true.

Put one more (final!) way: Imagine T1 and T2 are actually the same compound but applied at 1uM and 2uM concentrations and that there is linear dose response to the concentration of this compound for all genes. Looking for DE will produce lots of genes between those two conditions, but fundamentally it is the same response. I want to correct for the difference in magnitude and so would hope for *no* reported DE in this (imaginary!) experiment.

Rather than go back and tell the experimentalists to tweak their T1/T2 concentrations until the responses are equivalent, I feel there must be a smart way (using some assumptions for sure) to model this in. One simple approach that had occurred to me would be to just multiply all the T1 logFCs by a common factor that produces a T1 response that *overall* is equivalent to the T2 response. I can get that factor by simple linear regression of the two responses, but I'm not sure where in the limma workflow it makes sense (if at all) to try to shove that correction in. This test code and simulated data seems to do what I want (return 0 DE genes in the final analysis). Is it valid to do this?

ctrl     = rnorm(100,10,1)
response = rnorm(100,0,1)

t1   = ctrl + response
t2   = ctrl + response * 2

exprs = matrix(c(ctrl,ctrl+rnorm(100,0,0.1),ctrl+rnorm(100,0,0.1),
                  t1,  t1+rnorm(100,0,0.1),  t1+rnorm(100,0,0.1),
                  t2,  t2+rnorm(100,0,0.1),
t2+rnorm(100,0,0.1)),ncol=9)

groups = factor(c(rep("C",3),rep("T1",3),rep("T2",3)))

design = model.matrix(~0+groups)
colnames(design) = levels(groups)

contrasts = makeContrasts(T1 - C, T2 - C, levels=design)

fit = lmFit(exprs,design)
fit2 = contrasts.fit(fit,contrasts)
fit3 = eBayes(fit2)

model = lm(fit3$coefficients[,2] ~ fit3$coefficients[,1])

contrasts = makeContrasts(T1 - C, T2 - C, (T1 -
C)*model$coefficients[2] - (T2 - C), levels=design)

fit = lmFit(exprs,design)
fit2 = contrasts.fit(fit,contrasts)
fit3 = eBayes(fit2)

topTable(fit3, coef=4)
ADD REPLY
1
Entering edit mode

Dear Alex,

The strong way to get genes specific to T2 is to use three conditions. Find those genes that are:

   DE for T2 - Ctrl,
   Not DE for T1- Ctrl, and
   DE for T2-T1

This ensures that all the chosen genes are clearly specific.  It chooses only genes that have clearly larger responses to T2 than to T1.  The theshold for DE can be chosen fairly loose for the third condition.

This is of course a general procedure, and does not take account of any assumed linear relationship between T2 and T1.

> Put another way, if we assume a linear dose response for the system to
> T1, which other data suggests is not an unreasonable assumption here, an
> increase in the T1 concentration would, for many, but not all, genes
> produce the same response as T2 (or vice-versa a decrease in T2
> concentration would produce the same response as T1). I want to get
> those genes where this is *not* true.

Well, your plot of logFC for T2 vs logFC for T1 is already a good start. You could a least squares regression of T2 on T1 through zero:

   logFC.T1 <- fit3$coefficients[,1]
   logFC.T2 <- fit3$coefficients[,2]
   fit <- lm(logFC.T2 ~ 0 + logFC.T1)

It is important to force the regression through zero.  Suppose the slope coef(fit) is 0.7, for example.  Then the contrast you probably want to test is:

   cont <- makeContrasts(T2-C - 0.7*(T1-C), levels=design)
   fit2 <- contrasts.fit(fit,cont)
   fit2 <- eBayes(fit2)

This will find genes that don't fit the linear regression line relating the T2 response to the T1 response.

Best wishes
Gordon

ADD REPLY
1
Entering edit mode

Dear Alex,

As a further thought, it would be worthwhile to run genas() on your data:

genas(fit3, plot=TRUE)

This is because the experimental design is such that logFC.T2 and logFC.T1 would show a correlation of about 0.5 even if neither T2 and T1 had any true differential expression. This is because both of the contrasts are relative to the same control, a fact that induces a technical correlation in the observed fold changes. The genas() function is able to deconvolve the technical and true biological components of the correlation between two contrasts.

Best wishes
Gordon

ADD REPLY

Login before adding your answer.

Traffic: 858 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6