Hello fellow DESeq2 users,
I am having trouble figuring out how to set up the design formula for an experiment with 3 factors with up to 3 levels within each. I have narrowed down the problem to the positive controls I want to include but I am unsure why they are giving me the 'Model Matrix not full rank' error. They do not appear to be a linear combination or a combination of other factor levels to me.
I am able to successfully set up the experiment without the positive controls and a design with interaction terms like so:
stage treatment surgery 1 18 C 11 2 18 C 11 3 18 C 11 4 18 C 11 5 18 C 11 6 18 R 11 7 18 R 11 8 18 R 11 9 18 R 11 10 18 R 11 11 18 C 12 12 18 C 12 13 18 C 12 14 18 C 12 15 18 C 12 16 18 R 12 17 18 R 12 18 18 R 12 19 18 R 12 20 18 R 12 21 30 C 11 22 30 C 11 23 30 C 11 24 30 C 11 25 30 C 11 26 30 R 11 27 30 R 11 28 30 R 11 29 30 R 11 30 30 R 11 31 30 C 12 32 30 C 12 33 30 C 12 34 30 C 12 35 30 C 12 36 30 R 12 37 30 R 12 38 30 R 12 39 30 R 12 40 30 R 12 dds <- DESeqDataSetFromHTSeqCount(sampleTable = sTable, directory = "", design = ~ stage + surgery + treatment + stage:surgery + stage:treatment + stage:surgery:treatment + surgery:treatment)
When I try and include the positive controls, 5 in stage 18 and 30 respectively with no treatment or surgery, I am getting the errors:
stage treatment surgery 1 18 C 11 2 18 C 11 3 18 C 11 4 18 C 11 5 18 C 11 6 18 R 11 7 18 R 11 8 18 R 11 9 18 R 11 10 18 R 11 11 18 C 12 12 18 C 12 13 18 C 12 14 18 C 12 15 18 C 12 16 18 R 12 17 18 R 12 18 18 R 12 19 18 R 12 20 18 R 12 21 18 NoTreat NoSurg 22 18 NoTreat NoSurg 23 18 NoTreat NoSurg 24 18 NoTreat NoSurg 25 18 NoTreat NoSurg 26 30 C 11 27 30 C 11 28 30 C 11 29 30 C 11 30 30 C 11 31 30 R 11 32 30 R 11 33 30 R 11 34 30 R 11 35 30 R 11 36 30 C 12 37 30 C 12 38 30 C 12 39 30 C 12 40 30 C 12 41 30 R 12 42 30 R 12 43 30 R 12 44 30 R 12 45 30 R 12 46 30 NoTreat NoSurg 47 30 NoTreat NoSurg 48 30 NoTreat NoSurg 49 30 NoTreat NoSurg 50 30 NoTreat NoSurg dds <- DESeqDataSetFromHTSeqCount(sampleTable = sTable, directory = "", design = ~ stage + surgery + treatment + stage:surgery + stage:treatment + stage:surgery:treatment + surgery:treatment) Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed.
Hi Michael,
Thanks for shedding light on this. So in your simplified example, a solution to this would be to change the treatments of the control in one variable so it looks something like this:
1 2
C Y
C Z
W Y
W Z
X Y
X Z
Although this is obviously infeasible. As you pointed out I am able to run the model when removing the positive controls. Is there another way that I am missing where I can include these positive controls into the design? A separate factor? I want to make comparisons with these against the controls and also get the interactions. My knowledge in linear modeling and setting up designs is limited.
You can combine the two variables into one factor, with levels: control, WY, WZ, XY and XZ. Then you can have interactions of this with another variable. But you can't have them both in the design because you don't have samples where only one is the control. At some point in the future you can discuss the experimental design with a local statistician, who can explain this part.
Okay that works! I will update on how the experimental design turns out.
Seeing this recent post:
C: DESeq2- what to do when two conditions share controls?
It seems like we could use the solution that you proposed there, by coding the controls in the surgery column as "11" instead of "NoSurg" and then leaving the "NoTreat", level in the treatment column to distinguish between the two. However, after trying this with the design:
~stage + stage:treatment + stage:treatment:surgery
I still am getting the same "model matrix is not full rank" error. I'm sure this is for the same reasons that you identified in the original issue. Although this seems to me like a similar problem to the post linked to as the "NoTreat NoSurg" Samples can be seen as shared controls to both the "C", "R" and "11", "12", treatments at the respective stages.
Perhaps I am still not understanding:
"But you can't have them both in the design because you don't have samples where only one is the control."
Your thoughts on this would be appreciated. (Still looking for a statistician)
Thanks again,
-R
Unfortunately, I don't have sufficient time right now to go into why an alternate design produces an error. If you want to explore on your own, you can take a look at the matrices yourself with
model.matrix(design(dds), colData(dds))
and notice which columns are linear combinations of which other columns (which makes the set of equations not possible to solve for a unique set of coefficients).Re: that previous sentence, go back to this example I gave before above: "you never get to see what the effect of W is when variable 2 is control, or vice versa for variable 1". You need those samples to tease apart the independent effects of either, but you didn't generate them.
The best approach (the one I gave above) is to combine those two variables, because this generates a factor when the levels comprise all of the actual conditions you generated samples for. Then no problem with the rank of the model matrix.
Thanks so much for clearing this up!