Question

Overlapping treatments in the design matrix

0

Entering edit mode

jhj89 ▴ 10

@jhj89-9623

Last seen 7.9 years ago

Hi,

I am trying to create another design matrix with 4 groups in mind:

1. DiseaseA+DiseaseB, 2. DiseaseA, 3. DiseaseB, 4. Control.

I could create appropriate design matrix to basically perform several analyses of interest such as finding differentially expressed genes between any two groups, However, what I want to find is the genes that are differentially expressed between Disease A and Disease B, but I would like to incorporate DiseaseA+DiseaseB group into this comparison. I have tried various ways such as incorporating a table such as below into creating a design matrix:

DiseaseA | DiseaseB

1 | 1

1 | 0

0 | 1

0 | 0

but it is not doing what I thought it would do. Any suggestions will be very helpful.

Thank you.

edger design matrix • 1.2k views

ADD COMMENT • link updated 9.2 years ago by Aaron Lun ★ 28k • written 9.2 years ago by jhj89 ▴ 10

score 3 · Accepted Answer · 2016-02-08

3

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 5 hours ago

The city by the bay

Well, you need to make up your mind as to what you want to do. If you want to compare between any two groups, then use a one-way layout where each disease status (control, disease A only, B only, or A + B) forms its own group. As an example, here's how I would do it if the target information were present in has.A and has.B:

has.A <- rep(c("Y", "N"), each=4)
has.B <- rep(rep(c("Y", "N"), each=2), 2)
groups <- factor(paste0("A", has.A, ".B", has.B))
design <- model.matrix(~0 + groups)
colnames(design) <- levels(groups)

You can then compare between any set of groups, e.g., between disease A and B using makeContrasts:

con <- makeContrasts(AY.BN - AN.BY, levels=design)

If you want to use the A + B information in the comparison between A and B, then you need an additive model:

design <- model.matrix(~ has.A + has.B)
con <- makeContrasts(has.AY - has.BY, levels=design)

However, this assumes that the effects of A and B are additive in the group with both diseases. One can easily imagine that this will not be the case in reality, e.g., due to epistatic effects or other interactions between the disease states. If the assumption is violated, you will probably get a decrease in power because the dispersion estimate gets inflated by the poor fit. This may end up being more damaging than just leaving the A + B samples out of the comparison in the first place.

In short, I would go for the one-way layout. It's more flexible in terms of the comparisons you want to do, and also in terms of avoiding the additivity assumptions in the other model. That said, I hope the design matrix in your original post is just an example, and that you actually have replicates within each group, otherwise you'll end up with other problems.

ADD COMMENT • link 9.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you for your answer Aaron.

Yes the above is just an example. And yes, your point about them not merely being additive is valid. I will keep that in mind.

ADD REPLY • link 9.2 years ago jhj89 ▴ 10

0

Entering edit mode

Hi Aaron,

Might be a very dumb question but what happens if I have a design matrix such that:

design <- model.matrix(~0+has.A+has.B)

and make a contrast:

con <- makeContrasts(has.AY - has.BY, levels = design)?

The reason I'm asking is if we have a design matrix like (design <- model.matrix(~ has.A + has.B)), I'm not entirely sure what the Intercept term refers to; does it refer to the baseline - has.AN and has.BN. It's slightly confusing to me.

Thanks.

ADD REPLY • link 9.2 years ago jhj89 ▴ 10

0

Entering edit mode

The intercept represents to the average expression of the control group. If you make the design matrix without an intercept, the first column has.AN represents to the average expression of the control; the second column has.AY represents to the average represents the average expression of the A-only group; and the third column has.BY represents the additive effect of B over the control/over the A-only group. I don't think the contrast you've specified above will make any sense, though, because you're asking for whether the expression in the A-only group is equal to the log-fold change of B.

ADD REPLY • link 9.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron,

I have one more question regarding the design matrix above. I'm wondering whether the comparison:

design <- model.matrix(~ has.A + has.B)
con <- makeContrasts(has.AY - has.BY, levels=design)

is merely diseaseA VS disease B or is [diseaseA - control] VS [diseaseB - control]. If it is the former, how do I go about doing the latter comparison? Again, thank you for your time in answering the questions.

ADD REPLY • link 9.2 years ago jhj89 ▴ 10

0

Entering edit mode

It's the latter. Each has.XY coefficient represents the effect of disease X over control.

ADD REPLY • link 9.2 years ago Aaron Lun ★ 28k