Hi,
I am trying to create a Shiny app to allow my lab members to access and analyse the RNAseq data from patient derived tumour samples we have in our group. One of the option in the App is to perform differential gene expression analysis with DESeq2. The App will allow the user define two group (Group 1 and Group 2) by selecting two or more samples from our RNAseq data. Then the App will automatically generate the DESeqDataSet and perform DESeq on the two groups to generate a results table.
Because the samples have been collected and sequenced at different time in the past 5-6 years, I have included a batch
variable in the design formula, like below:
dds <- DESeqDataSetFromMatrix(countData = counts,
colData = metadata,
design = ~ batch + group)
Where for batch
I used the different sequencing run. Here is an example of the metadata
:
batch sample group
1 CD17 X32 2
2 CD17 X32 2
3 CD17 X32 2
4 CD19 X33 2
5 CD19 X33 2
6 CD19 X33 2
7 CD7 X08 1
8 CD7 X08 1
9 CD7 X08 1
10 CD7 X11 1
11 CD7 X11 1
12 CD7 X11 1
However, when I compare samples that have been sequenced on different days (like above) I get this error:
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
Is there any way to account for the batch effect avoiding this error?
I know that the design of the experiment is not ideal, because most samples have been sequenced on a specific day and do not appear in later or previous sequencing run, and this is probably what causes the error. However, because these are a lot of data (42 patient derived samples in biological replicates for a total of 152 sequenced samples), I was wondering if there is any way to fix this issue without having to re-sequence all of them...
Yes, I did. Maybe I have misunderstood, but from the vignette it seems that there is no way around...