Hello,
I have a fairly complicated design of experiment and I would like some help/feedback on designing the model.matrix. The data is coming from an experiment for which there are two groups of mice (young/old), the cells of which have been used for sorting populations with two markers (sort1/sort2) and each sort has positive cells and negative cells. The problem is that the mice from which the cells are coming are nested within both the sorts and the age groups and that some groups have 3 some 4 and some 5 mice. To explain a bit better, my samples matrix looks like this:
sort | cell | age | mouse | mouse_nest | |
sample1 | sort1 | positive | young | 1 | 1 |
sample2 | sort1 | negative | young | 1 | 1 |
sample3 | sort1 | positive | young | 2 | 2 |
sample4 | sort1 | negative | young | 2 | 2 |
sample5 | sort1 | positive | young | 3 | 3 |
sample6 | sort1 | negative | young | 3 | 3 |
sample7 | sort1 | positive | young | 4 | 4 |
sample8 | sort1 | negative | young | 4 | 4 |
sample9 | sort1 | positive | young | 5 | 5 |
sample10 | sort1 | negative | young | 5 | 5 |
sample11 | sort1 | positive | old | 6 | 1 |
sample12 | sort1 | negative | old | 6 | 1 |
sample13 | sort1 | positive | old | 7 | 2 |
sample14 | sort1 | negative | old | 7 | 2 |
sample15 | sort1 | positive | old | 8 | 3 |
sample16 | sort1 | negative | old | 8 | 3 |
sample17 | sort1 | positive | old | 9 | 4 |
sample18 | sort1 | negative | old | 9 | 4 |
sample19 | sort2 | positive | young | 10 | 1 |
sample20 | sort2 | negative | young | 10 | 1 |
sample21 | sort2 | positive | young | 11 | 2 |
sample22 | sort2 | negative | young | 11 | 2 |
sample23 | sort2 | positive | young | 12 | 3 |
sample24 | sort2 | negative | young | 12 | 3 |
sample25 | sort2 | positive | old | 13 | 1 |
sample26 | sort2 | negative | old | 13 | 1 |
sample27 | sort2 | positive | old | 14 | 2 |
sample28 | sort2 | negative | old | 14 | 2 |
sample29 | sort2 | positive | old | 15 | 3 |
sample30 | sort2 | negative | old | 15 | 3 |
Initially I thought of splitting the data in two (sort1 and sort2 groups) and then using a nested design within that:
design <- model.matrix( ~ cell + age + age:cell + age:mouse_nest)
which works for sort2 but not for sort1 because the groups of mice are different for the two groups of young and old (5 vs 4 samples per group). As far as I understand the way to resolve this is either to remove a pair of samples so that I have 4 samples in each group or to remove the age:mouse_nest term. However, neither of the two solutions sounds great to me because a) don't like removing samples and b) there seem to be differences according to the mice. How do people go about choosing which is best, looking at the dispersion estimates? Any other ways to resolve this?
Also I would like to be able to compare between the positive cells of one marker (sort1) with the positive cells of the other marker (sort2) so I would like to put all the samples together but then the problem with the nesting becomes even greater because of the differences in group sizes. Is the best way to just put these samples together (sort1+ve vs sort2+ve) for young and old and forget about the nesting all together, using a design matrix like the below:
design <- model.matrix( ~ age + sort + age:sort)
(or the equivalent form of combining them into one factor and using contrasts)
Thank you in advance for all your help!
Best wishes,
Emma
Thank you for your answer Ryan! In my mind the cell factor is also nested within the sort because the positive cells are specific for the sort in question. There shouldn't be a great difference in the negative cells of the two sorts because they are just the remaining cells from either sort. Although in theory if you are taking out different things from the same pools of cells then you are left with different groups of cells, I wouldn't expect significant differences.
I think you are right in using limma for this analysis. Assuming that cell is also nested, something like this :
design <- model.matrix( ~ sort + sort:cell + age + age:cell + sort:age + sort:age:cell)
would look unreasonably complicated to interpret properly, so I guess the best way is to keep the main effect of cell even if not much comes out of it. Correct me if I am wrong.
Many thanks,
Emma
If you think that the negatives for both sorts should be equivalent, you can represent this by creating a single factor with 3 levels: "negitive", "positive1", "positive2". By using this factor in place of sort and cell, you'll be comparing both positive groups to a common baseline consisting of all the negative samples from both sorts.
Either way, my recommended way to construct a design matrix for any interaction model is still to combine all the interacting factors into a single "group" variable as demonstrated above and then use a design of
~0+group
, giving you a coefficient for each unique combination of factor levels, and then to construct contrasts between your groups of interest.With regard to your suggested design above, be aware that as long as you have
sort:age:cell
in the design, including or excluding any of the previous terms will only result in a different parametrization of the same design. So the design that you suggested is just a more complicated version of my suggested~0+group
.(This only applies to factor variables, though. I think the situation with numeric/continuous variables is a bit different.)