Hello, I am running a glm analysis with edgeR on 64 biologically independent samples, with 3 factors. I have sex (M and F), age (P0, P7, P15, P30), and genotype (Control, KO). I have two separate questions, 1) I would like to look at the main effect of genotype over time. 2) I would like to look at the interaction between sex and genotype. I started by pasting all the factors together to use as a single factor without an intercept. I am having trouble understanding what exactly each coefficient means and which contrasts can be used to answer my questions.
design<-model.matrix(~0 + grouping)
> colnames(design)
[1] "groupingControl0F." "groupingControl0M." "groupingControl15F." "groupingControl15M." "groupingControl30F." "groupingControl30M."
[7] "groupingControl7F." "groupingControl7M." "groupingKO0F." "groupingKO0M." "groupingKO15F." "groupingKO15M."
[13] "groupingKO30F." "groupingKO30M." "groupingKO7F." "groupingKO7M."
>
For example: if I contrast groupingControl0F. and groupingControl0M. does this mean I am holding genotype and age constant and looking at the main effect of sex?
Any help you can provide would be greatly appreciated!
Or I suppose you could do something less silly
Which is what you would get (plus a descriptive column header) if you used
makeContrasts
, which might be the easier way to go, assuming you have fewer than a bazillion contrasts to make.Thanks so much for your quick response! Would I be able to use the coefficient terms directly by using the makeContrasts() functions or must I construct a contrast matrix in order to test for specific interaction terms?
I am also concerned because it seems from the vignettes that for these contrasts there needs to be a reference term. This would be totally fine for the effects of genotype because controls would be the reference but for time and for sex I am not sure choosing a reference would be appropriate. For example for the effect of time, I don't want p0 to be used as "baseline" differences as I am interested in the differences that happens at that time point as well.
Yes, you can use the coefficient terms directly. You could get the same thing as my second example by doing
It's just really boring (to me) if you have lots of contrasts to type all that out, rather than just generating the contrast matrix by hand.
You are misunderstanding the vignette. Or more correctly, the User's Guide. You don't have to have a reference, but the default for
model.matrix
is to construct a design matrix using treatment contrasts, where you define a baseline level, and all the other coefficients are contrasts between a given group and the baseline. In your case you will be much better off constructing the design matrix as you have (using ~ 0 + grouping), which tells R you don't want a baseline level.Ah, I see. So if I were interested in the effect of time I wouldn't necessarily have to use p0 as the reference? I could contrast how P15 changes from P7 for example?
I suppose I am still a bit confused with how to look at the impact of time
Yes. That's what I meant when I said it's just simple algebra. Any comparison can be made, although it might be difficult to interpret. So you could hypothetically be interested in time-dependent changes between P7 and P15 that are different for males and females, which is, again, simple algebra
(groupingControl15M - groupingControl7M) - (groupingControl15F - groupingControl7F)
or if you want the effect of KO and time within males, between P15 and P7,
(groupingKO15M - groupingKO7M) - (groupingControl15M - groupingControl7M)
or whatever. But this is starting to get beyond the scope of this support site, and into experimental questions or statistical design. For that you are either going to have to figure out what you care about yourself, of if you are getting hung up on the design aspects you will either need to do the required research to have the knowledge to know what to do, or find someone local who can help you.