Hi all,
I'm performing a DE analysis on RNAseq data using EdgeR.
I have a number of samples each with: untreated, treatmentA, treatmentB
My samples are split across two batches. My metadata therefore looks something like this :
Treatment | Patient | Batch |
UT | 1 | 1 |
A | 1 | 1 |
B | 1 | 1 |
UT | 2 | 1 |
A | 2 | 1 |
B | 2 | 1 |
UT | 3 | 2 |
A | 3 | 2 |
B | 3 | 2 |
UT | 4 | 2 |
A | 4 | 2 |
B | 4 | 2 |
I am only interested in the differences between treatment A and UT, and B and UT so I want to block for both patient and batch. However, if I make a design matrix like this:
design.mat <- model.matrix(~0+treatment+batch+sample)
groupUT groupA groupB Batch2 Patient1 Patient2 Patient3
1 1 0 0 0 0 0 0
2 0 1 0 0 0 0 0
3 0 0 1 0 0 0 0
4 1 0 0 0 0 0 1
5 0 1 0 0 0 0 1
6 0 0 1 0 0 0 1
7 0 0 1 1 0 1 0
8 0 1 0 1 0 1 0
9 1 0 0 1 0 1 0
10 0 0 1 1 1 0 0
11 0 1 0 1 1 0 0
12 1 0 0 1 1 0 0
Which looks roughly like I was expecting, however when I come to estimateGLMCommonDisp/fit a GLM I get this error message:
"Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, :
Design matrix not of full rank. The following coefficients not estimable:
Patient2"
Is it possible to block for two factors at once with my experimental design, and if so, how do I format my design matrix?
I think my issue is that I do not have every patient represented in both batches (i.e. I wouldn't be able to batch correct using COMBAT with this design). If that is the case, would blocking just for "patient" be sufficient as the between patient differences would encompass the between batch differences as well?
Many thanks in advance
Dean
Thanks! Much appreciated.
Dean