I am running DESeq2 to perform a DEA between 2 groups of cell lines. In each group, I have 3 cell lines and 2 biological replicates per cell line. My colData is below:
sample cell.line replicate group
BE2C.RNA1 BE2C 1 group1
BE2C.RNA2 BE2C 2 group1
CHP212.RNA1 CHP212 1 group1
CHP212.RNA2 CHP212 2 group1
NBLS.RNA1 NBLS 1 group1
NBLS.RNA2 NBLS 2 group1
KCNR.RNA1 KCNR 1 group2
KCNR.RNA2 KCNR 2 group2
LA155N.RNA1 LA155N 1 group2
LA155N.RNA2 LA155N 2 group2
SKNDZ.RNA1 SKNDZ 1 group2
SKNDZ.RNA2 SKNDZ 2 group2
When I run the following code, I got the Model matrix not full rank error message, which is understandable as different cell lines are found in different groups.
dds <- DESeqDataSetFromMatrix(countData = readcounts, colData = colData, design = ~ cell.line + group)
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
Please read the vignette section 'Model matrix not full rank':
vignette('DESeq2')
But is there any way to account for the cell line effect when comparing the groups ?
I performed the analysis using only the group as factor in the model, when using all replicates or only one replicate for each cell line.
dds <- DESeqDataSetFromMatrix(countData = readcounts, colData = colData, design = ~ group)
I got many more genes that are found to be significantly differentially expressed when using all replicates (3582 genes) or only one replicate per cell line (488). I was expecting to get more genes but It's almost 10 times more. Any comment or feedback ?
Many thanks in advance !