Entering edit mode
I have an experimental design (or rather question) that goes beyond my
usual limma knowledge. I am hoping there is some elegant way to answer
it that someone here can point me towards. The experimental design is
straightforward: one control group (Ctrl) and two treatment groups (T1
&
T2) with three replicates each:
S1 Ctrl
S2 Ctrl
S3 Ctrl
S4 T1
S5 T1
S6 T1
S7 T2
S8 T2
S9 T2
It is clear from some basic exploratory analysis that the effect of T2
is very similar to T1, but has a slightly reduced response. I.e.
plotting log fold changes for the Ctrl - T1 contrast vs the Ctrl - T2
contrast gives a clear linear correlation but with slope <1. This is
expected given what we know about the biology of the system, but we
suspect there are some genes that will be specific for the T2
treatment.
My first instinct is to setup a contrast for limma like: (Ctrl - T1) -
(Ctrl - T2), but because the T2 response is across the board less than
T1 this ends up just highlighting that fact rather than pulling out
the
T2 'specific' response.
Hopefully this code explains things visually with some simulated data:
T1.response = rnorm(100,0,1)
T2.response = T1.response * 0.5 + rnorm(100,0,0.2)
T1.response[101:110] = rnorm(10,0,0.2)
T2.response[101:110] = rnorm(10,1,0.2)
plot(T1.response,T2.response,xlim=c(-2,2),ylim=c(-2,2),col=c(rep("gree
n",100),rep("black",10)),pch=16)
abline(0,1,col="blue")
abline(-0.5,1,lty=2,col="blue")
abline(0.5,1,lty=2,col="blue")
abline(model,col="red")
abline(model$coefficients[1]+0.5,model$coefficients[2],lty=2,col="red"
)
abline(model$coefficients[1]-0.5,model$coefficients[2],lty=2,col="red"
)
The (Ctrl - T1) - (Ctrl - T2) contrast tends to pull out the T2
specific genes (black) which I want, but also those genes outside the
blue tramlines that are simply not responding as strongly to T2 as T1
(which I don't want). In the real data, those genes out-number the
really T2 specific genes considerably and dominate in any downstream
GSEA type analysis. Is there a way to setup the contrast instead so
limma corrects for the correlation between the two datasets and looks
for genes with large residuals from the red (fitted) line rather than
the blue?
--
Alex Gutteridge
Yes, of course you're right T2-T1 is the simpler (correct) way to express that contrast. Muddled thinking on my part.
Your second question gets to the heart of my problem: I know that wherever I set the thresholds for determining DE there will be genes that are DE for T2 - Ctrl, but not for T1 - Ctrl, but where if the magnitude of the T1 response were simply a little stronger they *would* be DE. Put another way, if we assume a linear dose response for the system to T1, which other data suggests is not an unreasonable assumption here, an increase in the T1 concentration would, for many, but not all, genes produce the same response as T2 (or vice-versa a decrease in T2 concentration would produce the same response as T1). I want to get those genes where this is *not* true.
Put one more (final!) way: Imagine T1 and T2 are actually the same compound but applied at 1uM and 2uM concentrations and that there is linear dose response to the concentration of this compound for all genes. Looking for DE will produce lots of genes between those two conditions, but fundamentally it is the same response. I want to correct for the difference in magnitude and so would hope for *no* reported DE in this (imaginary!) experiment.
Rather than go back and tell the experimentalists to tweak their T1/T2 concentrations until the responses are equivalent, I feel there must be a smart way (using some assumptions for sure) to model this in. One simple approach that had occurred to me would be to just multiply all the T1 logFCs by a common factor that produces a T1 response that *overall* is equivalent to the T2 response. I can get that factor by simple linear regression of the two responses, but I'm not sure where in the limma workflow it makes sense (if at all) to try to shove that correction in. This test code and simulated data seems to do what I want (return 0 DE genes in the final analysis). Is it valid to do this?
Dear Alex,
The strong way to get genes specific to T2 is to use three conditions. Find those genes that are:
DE for T2 - Ctrl,
Not DE for T1- Ctrl, and
DE for T2-T1
This ensures that all the chosen genes are clearly specific. It chooses only genes that have clearly larger responses to T2 than to T1. The theshold for DE can be chosen fairly loose for the third condition.
This is of course a general procedure, and does not take account of any assumed linear relationship between T2 and T1.
> Put another way, if we assume a linear dose response for the system to
> T1, which other data suggests is not an unreasonable assumption here, an
> increase in the T1 concentration would, for many, but not all, genes
> produce the same response as T2 (or vice-versa a decrease in T2
> concentration would produce the same response as T1). I want to get
> those genes where this is *not* true.
Well, your plot of logFC for T2 vs logFC for T1 is already a good start. You could a least squares regression of T2 on T1 through zero:
It is important to force the regression through zero. Suppose the slope coef(fit) is 0.7, for example. Then the contrast you probably want to test is:
This will find genes that don't fit the linear regression line relating the T2 response to the T1 response.
Best wishes
Gordon
Dear Alex,
As a further thought, it would be worthwhile to run genas() on your data:
This is because the experimental design is such that logFC.T2 and logFC.T1 would show a correlation of about 0.5 even if neither T2 and T1 had any true differential expression. This is because both of the contrasts are relative to the same control, a fact that induces a technical correlation in the observed fold changes. The genas() function is able to deconvolve the technical and true biological components of the correlation between two contrasts.
Best wishes
Gordon