Question

Model matrix not full rank

0

Entering edit mode

isabelle.dupanloup • 0

@4297e73d

Last seen 20 months ago

Switzerland

I am running DESeq2 to perform a DEA between 2 groups of cell lines. In each group, I have 3 cell lines and 2 biological replicates per cell line. My colData is below:

sample  cell.line   replicate   group
BE2C.RNA1   BE2C    1   group1
BE2C.RNA2   BE2C    2   group1
CHP212.RNA1 CHP212  1   group1
CHP212.RNA2 CHP212  2   group1
NBLS.RNA1   NBLS    1   group1
NBLS.RNA2   NBLS    2   group1
KCNR.RNA1   KCNR    1   group2
KCNR.RNA2   KCNR    2   group2
LA155N.RNA1 LA155N  1   group2
LA155N.RNA2 LA155N  2   group2
SKNDZ.RNA1  SKNDZ   1   group2
SKNDZ.RNA2  SKNDZ   2   group2

When I run the following code, I got the Model matrix not full rank error message, which is understandable as different cell lines are found in different groups.

dds <- DESeqDataSetFromMatrix(countData = readcounts, colData = colData, design = ~ cell.line + group)
Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')

But is there any way to account for the cell line effect when comparing the groups ?

I performed the analysis using only the group as factor in the model, when using all replicates or only one replicate for each cell line.

dds <- DESeqDataSetFromMatrix(countData = readcounts, colData = colData, design = ~ group)

I got many more genes that are found to be significantly differentially expressed when using all replicates (3582 genes) or only one replicate per cell line (488). I was expecting to get more genes but It's almost 10 times more. Any comment or feedback ?

Many thanks in advance !

DESeq2 • 703 views

ADD COMMENT • link 21 months ago isabelle.dupanloup • 0

score 0 · Answer 1 · 2023-08-04

Cell line is indeed confounded with group variable. You can only compare the cell line effect within a group, otherwise it is not possible to distinguish cell line effect from group effect. It is not surprising to have more DEGs while increasing sample size, as you increase your statistical power when increasing sample size.