I have data that has genetic line (line) and phenotype (pheno) as my factors and my data looks something like this:
sample_id | line | rep | pheno |
1 | 358 | 1 | LL |
3 | 358 | 2 | LL |
14 | 380 | 1 | LL |
15 | 380 | 2 | LL |
25 | 441 | 1 | HL |
26 | 441 | 2 | HL |
27 | 441 | 3 | HL |
37 | 486 | 1 | LL |
38 | 486 | 2 | LL |
39 | 486 | 3 | LL |
49 | 832 | 1 | HL |
50 | 832 | 2 | HL |
51 | 832 | 3 | HL |
61 | 913 | 1 | HL |
62 | 913 | 2 | HL |
63 | 913 | 3 | HL |
The model I want to run is to look at both phenotype and line nested within phenotype which I believe is expressed syntactically as:
~ pheno+pheno:line
And the model matrix is:
(Intercept) | phenoL0LL | phenoL0HL:lineL0380 | phenoL0LL:lineL0380 | phenoL0HL:lineL0441 | phenoL0LL:lineL0441 | phenoL0HL:lineL0486 | phenoL0LL:lineL0486 | phenoL0HL:lineL0832 | phenoL0LL:lineL0832 | phenoL0HL:lineL0913 | phenoL0LL:lineL0913 | |
droso_1_exp_count | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
droso_14_exp_count | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
droso_15_exp_count | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
droso_25_exp_count | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
droso_26_exp_count | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
droso_27_exp_count | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
droso_3_exp_count | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
droso_37_exp_count | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
droso_38_exp_count | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
droso_39_exp_count | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
droso_49_exp_count | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
droso_50_exp_count | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
droso_51_exp_count | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
droso_61_exp_count | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
droso_62_exp_count | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
droso_63_exp_count | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
However, I get this error when attempting to estimate dispersion:
Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05, :
Design matrix not of full rank. The following coefficients not estimable:
phenoL0HL:lineL0380 phenoL0LL:lineL0441 phenoL0HL:lineL0486 phenoL0LL:lineL0832 phenoL0HL:lineL0913 phenoL0LL:lineL0913
Which i think is saying that since both phenotypes are not since within a line, there is not enough representation. But a line is either one phenotype or the other.
Any help would be appreciated.
I tried what you proposed but got this error. Any idea what the problem may be? The error is much shorter than previously with only one line (913) listed within the error
Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05, :
Design matrix not of full rank. The following coefficients not estimable:
phenoL0HL:lineL0913
Here is the new model matrix:
As the error message says, the design matrix is not of full rank. You need to discard one of the coefficients corresponding to the HL lines. Let's say you discard
phenoHL:line441
(though any one is permissible). The remaining coefficients are:The correct interpretation of these terms requires some contemplation:
phenoLL
does not represent the main effect of LL vs HL (surprise!). Rather, it represents the log-fold change in expression of the LL 358 line over the HL 441 line.phenoLL:line380
represents the log-fold change of LL 380 over LL 358.phenoLL:line486
represents the log-fold change of LL 486 over LL 358.phenoHL:line832
represents the log-fold change of HL 832 over HL 441.phenoHL:line913
represents the log-fold change of HL 913 over HL 441.Figuring out the meaning of each coefficient is often an enjoyable puzzle; but if you don't want to deal with that, just using
~line
would give the same model but with coefficients that are much easier to interpret.Sorry, you're right. My suggestion to remove the all-zero columns was too simple, as it removes only 5 columns instead of 6. As Aaron has pointed out, you can still fix the design matrix yourself, but this is overly complicated and prone to error. I have edited my answer to correct it.
Another question: does renumbering them still keep them separate? In other words line 1 in LL and line 1 in HL are two totally different genetic lines