Well, you need to make up your mind as to what you want to do. If you want to compare between any two groups, then use a one-way layout where each disease status (control, disease A only, B only, or A + B) forms its own group. As an example, here's how I would do it if the target information were present in has.A
and has.B
:
has.A <- rep(c("Y", "N"), each=4)
has.B <- rep(rep(c("Y", "N"), each=2), 2)
groups <- factor(paste0("A", has.A, ".B", has.B))
design <- model.matrix(~0 + groups)
colnames(design) <- levels(groups)
You can then compare between any set of groups, e.g., between disease A and B using makeContrasts
:
con <- makeContrasts(AY.BN - AN.BY, levels=design)
If you want to use the A + B information in the comparison between A and B, then you need an additive model:
design <- model.matrix(~ has.A + has.B)
con <- makeContrasts(has.AY - has.BY, levels=design)
However, this assumes that the effects of A and B are additive in the group with both diseases. One can easily imagine that this will not be the case in reality, e.g., due to epistatic effects or other interactions between the disease states. If the assumption is violated, you will probably get a decrease in power because the dispersion estimate gets inflated by the poor fit. This may end up being more damaging than just leaving the A + B samples out of the comparison in the first place.
In short, I would go for the one-way layout. It's more flexible in terms of the comparisons you want to do, and also in terms of avoiding the additivity assumptions in the other model. That said, I hope the design matrix in your original post is just an example, and that you actually have replicates within each group, otherwise you'll end up with other problems.
Thank you for your answer Aaron.
Yes the above is just an example. And yes, your point about them not merely being additive is valid. I will keep that in mind.
Hi Aaron,
Might be a very dumb question but what happens if I have a design matrix such that:
design <- model.matrix(~0+has.A+has.B)
and make a contrast:
con <- makeContrasts(has.AY - has.BY, levels = design)?
The reason I'm asking is if we have a design matrix like (design <- model.matrix(~ has.A + has.B)), I'm not entirely sure what the Intercept term refers to; does it refer to the baseline - has.AN and has.BN. It's slightly confusing to me.
Thanks.
The intercept represents to the average expression of the control group. If you make the design matrix without an intercept, the first column
has.AN
represents to the average expression of the control; the second columnhas.AY
represents to the average represents the average expression of the A-only group; and the third columnhas.BY
represents the additive effect of B over the control/over the A-only group. I don't think the contrast you've specified above will make any sense, though, because you're asking for whether the expression in the A-only group is equal to the log-fold change of B.Hi Aaron,
I have one more question regarding the design matrix above. I'm wondering whether the comparison:
is merely diseaseA VS disease B or is [diseaseA - control] VS [diseaseB - control]. If it is the former, how do I go about doing the latter comparison? Again, thank you for your time in answering the questions.
It's the latter. Each
has.XY
coefficient represents the effect of disease X over control.