Question

EdgeR - Complex design - contrasts/coefficients

0

Entering edit mode

D ▴ 10

@d-16116

Last seen 4.1 years ago

UK

Hi, I have a question about contrasts for more complex GLM models in edgeR.

Taking the section 3.5 example as a starting point

> targets <- data.frame(Disease = factor(rep(c("Healthy","Disease1","Disease2"), each = 6)),
+                       Patient = factor(rep(c(1:9), each=2)),
+                       Treatment = factor(rep(c("None","Hormone"), 9)))

> Patient <- gl(3,2,length = 18)
> Disease <- factor(targets$Disease, levels = c("Healthy","Disease1","Disease2"))
> Treatment <- factor(targets$Treatment, levels = c("None","Hormone"))

> data.frame(Disease, Patient, Treatment)
    Disease Patient Treatment
1   Healthy       1      None
2   Healthy       1   Hormone
3   Healthy       2      None
4   Healthy       2   Hormone
5   Healthy       3      None
6   Healthy       3   Hormone
7  Disease1       1      None
8  Disease1       1   Hormone
9  Disease1       2      None
10 Disease1       2   Hormone
11 Disease1       3      None
12 Disease1       3   Hormone
13 Disease2       1      None
14 Disease2       1   Hormone
15 Disease2       2      None
16 Disease2       2   Hormone
17 Disease2       3      None
18 Disease2       3   Hormone

> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment)
> colnames(design)
 [1] "(Intercept)"                      "DiseaseDisease1"                  "DiseaseDisease2"                  "DiseaseHealthy:Patient2"          "DiseaseDisease1:Patient2"        
 [6] "DiseaseDisease2:Patient2"         "DiseaseHealthy:Patient3"          "DiseaseDisease1:Patient3"         "DiseaseDisease2:Patient3"         "DiseaseHealthy:TreatmentHormone" 
[11] "DiseaseDisease1:TreatmentHormone" "DiseaseDisease2:TreatmentHormone"

My question is how to configure the contrasts/coefficients chosen to find the equivalent of this:

(DiseaseDisease1:TreatmentHormone - DiseaseDisease1:TreatmentNone) - (DiseaseHealthy:TreatmentHormone - DiseaseHealthy:TreatmentNone)

i.e. those genes where the change of expression induced by the hormone is different between Disease1 and healthy.

There is a line in the example which says it will find genes that respond differently to the hormone in disease1 vs healthy patients, so sounds as though it is perfect, however I am not familiar enough with GLMs to decide if it is doing what I want:

qlf <- glmQLFTest(fit,contrast = c(0,0,0,0,0,0,0,0,0,-1,1,0)

This would appear to be performing:

DiseaseDisease1:TreatmentHormone - DiseaseHealthy:TreatmentHormone

Rather than the difference in the response to the hormone, is it not just finding the difference in the treated Disease1 vs the treated healthy? So any changes between disease and healthy without the hormone are not accounted for - I suspect this is due to me not understanding what the intercept represents and how it is used in the contrasts.

Thanks in advance!!!

On a related note is there a good resource for learning how to fit GLMs - so I can understand how the coefficients work with and without intercepts etc? I have read the sections in the edgeR and Limma manuals which are very informative but I don't understand the concepts behind how the various models have been fit and contrasts/coefficients chosen.

edger GLM • 2.3k views

ADD COMMENT • link updated 5.6 years ago by Yunshun Chen ▴ 900 • written 5.6 years ago by D ▴ 10

score 5 · Accepted Answer · 2019-09-17

5

Entering edit mode

Yunshun Chen ▴ 900

@yunshun-chen-5451

Last seen 5 months ago

Australia

To construct a proper contrast, you need to understand the meaning of each coefficient first.

In this particular design matrix, the (Intercept) represents the Healthy Patient 1 with no treatment. The coefficient DiseaseDisease1 represents the difference between Disease1 and Healthy for Patient 1 with no treatment. Similarly for DiseaseDisease2.

The coefficient DiseaseHealthy:Patient2 represents the difference between Patient 2 and Patient 1 under the healthy non-treatment condition. Similarly for DiseaseDisease1:Patient2, DiseaseDisease2:Patient2, DiseaseHealthy:Patient3, DiseaseDisease1:Patient3 and DiseaseDisease2:Patient3.

Finally, the coefficient DiseaseHealthy:TreatmentHormone represents the difference between Hormone treatment and non treatment under the healthy condition, and DiseaseDisease1:TreatmentHormone represents such difference under Disease1.

So that's why the contrast:

DiseaseDisease1:TreatmentHormone - DiseaseHealthy:TreatmentHormone

finds genes that respond differently to the hormone in disease1 vs healthy patients.

ADD COMMENT • link 5.6 years ago Yunshun Chen ▴ 900

0

Entering edit mode

Thanks for your very informative answer!

For future reference, do you know of a resource which further explores how to deduce the meanings of the coefficients, such that when planning future experiments I do not need to ask for further clarification?

Kind regards and many thanks,

Dean

ADD REPLY • link 5.6 years ago D ▴ 10

1

Entering edit mode

Then you should refer to this link for in-depth reading on contrast matrix.

ADD REPLY • link 5.6 years ago mikhael.manurung ▴ 280

1

Entering edit mode

There isn't a good reference at the level you're looking for, which is why we try so hard to explain things explicitly in the limma and edgeR User Guides. If you don't understand at all the role of intercepts in R model formulae, then it would be best to start with Section 9.2 in the limma User's Guide rather than a very complex design like the one above. The same principle used in Section 9.2 is applied over and over again in complex designs. The edgeR Section 3.5 formula is quite advanced and quite specialized and I've never seen a statistical textbook discuss anything like it.

In general, we recommend you use the group-means parametrization that has explicit contrast interpretations. We pushed this approach in limma and edgeR because it is otherwise so easy to misinterpret model formula in R. Again, though, it is not something that is discussed in statistical textbooks.

ADD REPLY • link 5.6 years ago Gordon Smyth 52k