Hello,
I have a question about the design of a DESeq2 experiment analysing RNA seq data. I have 3 factors to take into account: condition, patient and cell type. I have 2 conditions: normal and leukaemia. Within each condition I have multiple patients (please note this is an unpaired design and I do not have data from the same individual in both the normal and leukaemia condition). Within each patient there are 2 possible cell types: A and B. I am mostly interested in the overall condition effect, controlling for the differences across patients and cell types.
For example:
Condition | Patient | Cell Type |
---|---|---|
Normal | 1 | A |
Normal | 1 | B |
Normal | 2 | A |
Normal | 2 | B |
Leukaemia | 3 | A |
Leukaemia | 3 | B |
Leukaemia | 4 | A |
Leukaemia | 4 | B |
After reading the vignette, I tried creating a nested factor for the patient, giving the following:
Condition | Patient | Cell Type | Patient.nested |
---|---|---|---|
Normal | 1 | A | 1 |
Normal | 1 | B | 1 |
Normal | 2 | A | 2 |
Normal | 2 | B | 2 |
Leukaemia | 3 | A | 1 |
Leukaemia | 3 | B | 1 |
Leukaemia | 4 | A | 2 |
Leukaemia | 4 | B | 2 |
As described in the vignette, I also removed the levels missing from an interaction of factors. I then tried using the following design: ~condition + condition:patient.nested + condition:cell type. However I still encounter the 'model matrix not full rank' error.
I'd appreciate any help on the design of this and advice on whether I have used the correct approach. Thank you!
Hi Michael, thank you for your response.
I have now managed to successfully run this by doing what you have suggested. I then use the model matrix for the full argument in the DESeq function. Can I just confirm that using this design formula is the correct approach to identify genes differentially expression due to the overall condition effect? To identify these genes I use 'conditionleukaemia' as follows:
I have also used a more simple design of ~cell_type + condition and I am unsure as to which design is the most appropriate.
Thanks again for all your help!
Take a look at the DESeq2 section on interactions. It has a useful diagram of how to interpret the terms, which applies to what I write below:
"conditionleukaemic", is the main effect, which is the effect only for the reference level of cell type (A), and "conditionleukaemic.celltypeB" is the interaction term, which is the additional effect in cell type B, beyond the main effect. So the overall effect is an average of the effect in A and B, which you can achieve with a numeric contrast. Take a look at the order of coefficients in resultsNames(dds), then give a 1 to the main effect and a 0.5 to the interaction term, and 0's for all other terms. You can provide this numeric vector to the 'contrast' argument of results().