There are several conceptually different questions you may want to ask:
1. For a given time point, and a given gene: Does the expression of the gene at this time point correlate with BMI?
2. For two time points and a given gene: Does the change in expression from the first to the second time point correlate with the initial BMI?
3. For two time points and a given gene: Does the change in expression from the first to the second time point correlate with the change in BMI?
For 1, you should fit the data for each time point separately. This is because DESeq2 will assume you 15 libraries to be measurement from 15 independent samples. In reality, the 3 expression measurements from a subject are correlated, and neglecting this fact will increase type-I error. Then use ~ X
. Do not include Person
, because if you fit a coefficient for each person, this will remove all differences between subjects, leaving nothing.
(If you want to use all data at once, you would need a so-called mixed-effect model, which DESeq does not support. The 'duplicateCorrelation' function of limma/voom does allow to account for such repeated-measures correlations and might be an alternative here.)
For 2, you best only include the two relevant time points in the sample and use ~ Person + X:time
. This will remove the base-level expression (i.e., expression at the first time point) and leave only the differences in expression between time points.
For 3, do the same, but replace X with the change in BMI.
And don't be too surprised if you get nothing. 5 subjects sounds way too few to see anything for such a question.
This depends on what X is. Is it a property of the sample or of the person? What is it?
(Please be specific about the biology when asking such questions. Statistics is not as abstract as people think.)
Hi Simon,
Sorry about that :(
X is BMI (or Body mass index). This is a pregnancy study. We want to answer two things. a. Does the BMI at the beginning of the pregnancy has an influence in the expression of any gene? b. Do different "weight gain levels" (i.e. kilograms gained) during pregnancy has an influence in the expression of any gene?
Thanks and sorry again