Hi, I'm having trouble figuring out the appropriate design for my data matrix. My data look something like this:
Condition Sex MouseID NewMouseID
WT Female 1 1
WT Female 2 2
WT Female 3 3
WT Female 3 3
WT Male 4 1
WT Male 5 2
WT Male 6 3
WT Male 6 3
KO Female 7 1
KO Female 8 2
KO Female 9 3
KO Female 9 3
KO Male 10 1
KO Male 11 2
KO Male 12 3
KO Male 12 3
My problem is with multiple samples coming from the same mice (note MouseIDs 3, 6, 9, and 12). If possible, I would like to not simply sum/average those samples and instead include them as a nested factor. The reason for this is those duplicates were from different sites within the same mouse. I had previously averaged technical replicates (same site, sequenced 2+ times).
If I do:
design = ~ MouseID + Sex + Condition
I get an error. My understanding is that this doesn't work because it treats the MouseIDs at 12 different "batches" and it would be impossible to tell if differences were due to the "batch" vs. the Condition.
Instead, my thought is to include a "NewMouseID" factor, that sequentially numbers the mice within each "group (i.e. each Condition/Sex combination). Is it possible to include this information in the design matrix? I want to control for differences between samples within the same mouse, as well as differences between male/female. However, I do want to note that there is no specific relation between MouseID #1 and MouseID #4, for example, despite them having the same NewMouseID.
Thanks in advance.
what is the error that you get?
If I run:
model.matrix(~ MouseID + Sex + Condition,data = df)
I get the following
My understanding is that the last column is the linear sum of MouseIDs 2 through 6
have you seen this post?
Just looked at it, thanks for the link.
My understanding is that their situation is different because they have cell lines with paired (t0 and t1) timepoints. I don't have pairing across conditions (e.g. same mouse given both treatments) or across sexes (e.g. male and female mouse of same "batch" or similar).
have you seen this post?