Taking your last comment, correcting your error and running your code:
set.seed(0)
T <- 24
t <- rep(seq(3,42, by=3), 11)
c <- cos(2*pi*t/T) # a time course basis function
s <- sin(2*pi*t/T)
batch_effects <- factor(ceiling(runif(14*11)*11))
subjects <- sprintf("P%02d", rep(1:11, each = 14))
design2 <- model.matrix(~ c:subjects + s:subjects + batch_effects)
... gives me the following for colnames(design2)
:
[1] "(Intercept)" "batch_effects2" "batch_effects3" "batch_effects4"
[5] "batch_effects5" "batch_effects6" "batch_effects7" "batch_effects8"
[9] "batch_effects9" "batch_effects10" "batch_effects11" "c:subjectsP01"
[13] "c:subjectsP02" "c:subjectsP03" "c:subjectsP04" "c:subjectsP05"
[17] "c:subjectsP06" "c:subjectsP07" "c:subjectsP08" "c:subjectsP09"
[21] "c:subjectsP10" "c:subjectsP11" "subjectsP01:s" "subjectsP02:s"
[25] "subjectsP03:s" "subjectsP04:s" "subjectsP05:s" "subjectsP06:s"
[29] "subjectsP07:s" "subjectsP08:s" "subjectsP09:s" "subjectsP10:s"
[33] "subjectsP11:s"
... so setting coef=2:23
will be mostly testing for the batch effect, rather than the effect of time.
I will assume that you actually intended to drop all of the subject-time interaction terms in design2
. In this model, you are allowing each subject to have its own behaviour with respect to time, as represented by the interaction terms for each subject. When you drop the interaction terms, your null hypothesis is that there is no time effect in any of your subjects.
In contrast, with design1
, you are assuming that all subjects have the same behaviour with respect to time. This means that when you set coef=2:3
, you are testing whether the average effect of time across all patients is zero. This is clearly a different null hypothesis than that for design2
, so it's not surprising that you get different numbers of DE genes.
There is no guarantee that the (corrected) design2
contrast will yield more DE genes than your contrast with design1
. Yes, there is greater scope for variability in the true time effect between subjects - but there is also greater scope for random variability, which needs to be properly handled by the test in order to avoid detecting spurious differences between subjects. You also lose many residual d.f. due to the complexity of the model, reducing the precision of the dispersion estimates and potentially reducing power.
It seems like design2
is what you actually want to answer your scientific question. This model actually allows for differences in the time effect between subjects, which is what you're interested in - whereas design1
does not.
What is
pattern
? What issubject
? What isbatch_effect
? What contrasts are you performing, and what code are you using to do them? Give us a little example like this:Initially I only, used
The number of significant genes under the two models are
I get N1 > N2, although I would have expected N2 > N1. Or maybe, I constructing these tests incorrectly?
I would like to know inter-subject variability of these oscillating genes and also which genes are oscillating in common across subjects.
Your example doesn't work.
s
andc
are not the same length as subjects, andbatch_effects
is not defined. I'd like something that actually works so I can see the output ofmodel.matrix
.Sorry, I didnt realize you want a piece of code you could just run. Here goes ....
This is still not quite correct,
batch_effects
should bebatch_effect
.