I have a balanced study design including two variables, genotype and location. I have a research question of the following form:
"Does genotype status (knockout vs wildtype) affect gene expression in intestinal surface cells differently than it affects intestinal crypt cells"
For this, my proposed model is:
Count ~ Batch + Genotype + Location + Genotype:Location
and I am most interested in the genotype by location interaction effect. When this is coded into the model matrix, if we use dummy coding we will obtain something like the following:
model_matrix<-model.matrix(design_statement, as.data.frame(colData(dds_d11_GxL)))
model_matrix
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
However, when with dummy coding of this kind, a dependence relationship is produced between the main effect and interaction effects. This can be circumvented through the use of things like effect coding or orthogonal coding.
But, when I searched through DESeq2 vignette, manual, and literature, I was not able to find anyone else who had used this. So, I am writing to ask if it would be OK to respecify a model matrix of the form:
name |
(Intercept) |
batch:R2 |
genotype:ko |
location:crypt |
genotypeko:locationcrypt |
crypt_d11_wt_1 |
1 |
-1 | -1 |
1 |
-1 |
surfc_d11_wt_1 |
1 |
-1 | -1 |
-1 |
1 |
crypt_d11_ko_1 |
1 |
-1 | 1 |
1 |
1 |
surfc_d11_ko_1 |
1 |
-1 | 1 |
-1 |
-1 |
crypt_d11_wt_2 |
1 |
1 |
-1 |
1 |
-1 |
surfc_d11_wt_2 |
1 |
1 |
-1 |
-1 |
1 |
crypt_d11_ko_2 |
1 |
1 |
1 |
1 |
1 |
surfc_d11_ko_2 |
1 |
1 |
1 |
-1 |
-1 |
Here, the correlation between the main effect and interaction effect is 0, but I do not know if for some reason this coding is unacceptable for DESeq2. Thank you very much.