Question

Problem with batches and "Model matrix not full rank" during differential gene expression analysis with DESeq2

0

Entering edit mode

alallo • 0

@alallo-21363

Last seen 4.8 years ago

Hi,

I am trying to create a Shiny app to allow my lab members to access and analyse the RNAseq data from patient derived tumour samples we have in our group. One of the option in the App is to perform differential gene expression analysis with DESeq2. The App will allow the user define two group (Group 1 and Group 2) by selecting two or more samples from our RNAseq data. Then the App will automatically generate the DESeqDataSet and perform DESeq on the two groups to generate a results table.

Because the samples have been collected and sequenced at different time in the past 5-6 years, I have included a batch variable in the design formula, like below:

dds <- DESeqDataSetFromMatrix(countData = counts,
                              colData = metadata,
                              design = ~ batch + group)

Where for batch I used the different sequencing run. Here is an example of the metadata:

batch sample group
1    CD17    X32     2
2    CD17    X32     2
3    CD17    X32     2
4    CD19    X33     2
5    CD19    X33     2
6    CD19    X33     2
7     CD7    X08     1
8     CD7    X08     1
9     CD7    X08     1
10    CD7    X11     1
11    CD7    X11     1
12    CD7    X11     1

However, when I compare samples that have been sequenced on different days (like above) I get this error:

Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

Is there any way to account for the batch effect avoiding this error?

I know that the design of the experiment is not ideal, because most samples have been sequenced on a specific day and do not appear in later or previous sequencing run, and this is probably what causes the error. However, because these are a lot of data (42 patient derived samples in biological replicates for a total of 152 sequenced samples), I was wondering if there is any way to fix this issue without having to re-sequence all of them...

deseq2 • 749 views

ADD COMMENT • link updated 4.8 years ago by swbarnes2 ★ 1.4k • written 4.8 years ago by alallo • 0

score 1 · Answer 1 · 2020-04-22

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 12 hours ago

United States

The error actually refers you to the vignette section which talks about this topic, have you read it?

ADD COMMENT • link 4.8 years ago Michael Love 43k

0

Entering edit mode

Yes, I did. Maybe I have misunderstood, but from the vignette it seems that there is no way around...

ADD REPLY • link 4.8 years ago alallo • 0

score 0 · Answer 2 · 2020-04-22

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 2 days ago

San Diego

You want to compare samples that might have been prepped years apart? That doesn't sound wise.

If batch is confounded with sample type any change that looks interesting might be totally due to batch. The best thing to do is to drop batch from the design and warn users that anything they see is very very suspect, and very well might be batch-related artifact.

ADD COMMENT • link 4.8 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

We started generating these patient derived models in 2014...we have started to sequence the first one that were generated, but with time we have generated more and they have been sequenced subsequently...I think it is a problem any lab that is generating models has. You start sequencing your first models and you publish them, then a few years later you have a lager biobank of models and you sequence the new one...

I may have to do as you suggested and remove the batch effect from the formula. I was just hoping this was not necessary. I just wonder how people can manage when they compare large dataset of patient samples collected and sequenced by different labs. Do they just assume that there will be an effect due to batches and accept it?

ADD REPLY • link 4.8 years ago alallo • 0