Hi,
I am trying to model batch effect with DEseq2 but get "Error in checkFullRank(modelMatrix) : the model matrix is not full rank". My design formula is ~ BATCH + group. Basically, I want to find gene expressed differently between different groups. I don't see the linear dependent between BATCH and group and how to fix it. Any help would be appreciated. Thanks!
sampleName | BATCH | group |
1_S1 | A | WT_S1_CONTROL |
2_S2 | B | WT_S1_CONTROL |
3_S3 | C | WT_S1_CONTROL |
4_S4 | A | WT_S1_Treated |
5_S5 | B | WT_S1_Treated |
6_S6 | C | WT_S1_Treated |
7_S7 | A | WT_S2_CONTROL |
8_S8 | B | WT_S2_CONTROL |
9_S9 | C | WT_S2_CONTROL |
10_S10 | A | WT_S2_Treated |
11_S11 | B | WT_S2_Treated |
12_S12 | C | WT_S2_Treated |
13_S13 | D | WT_S3_CONTROL |
14_S14 | E | WT_S3_CONTROL |
15_S15 | F | WT_S3_CONTROL |
16_S16 | D | WT_S3_Treated |
17_S17 | E | WT_S3_Treated |
18_S18 | F | WT_S3_Treated |
KO1_S19 | G | KO_S1_CONTROL |
KO2_S20 | H | KO_S1_CONTROL |
KO3_S21 | I | KO_S1_CONTROL |
KO4_S22 | G | KO_S1_Treated |
KO5_S23 | H | KO_S1_Treated |
KO6_S24 | I | KO_S1_Treated |
KO7_S25 | G | KO_S2_CONTROL |
KO8_S26 | H | KO_S2_CONTROL |
KO9_S27 | I | KO_S2_CONTROL |
KO10_S28 | G | KO_S2_Treated |
KO11_S29 | H | KO_S2_Treated |
KO12_S30 | I | KO_S2_Treated |
KO13_S31 | J | KO_S3_CONTROL |
KO14_S32 | K | KO_S3_CONTROL |
KO15_S33 | L | KO_S3_CONTROL |
KO16_S34 | J | KO_S3_Treated |
KO17_S35 | K | KO_S3_Treated |
KO18_S36 | L | KO_S3_Treated |
Thank Mike! it's a really clever trick.
Hello Michael,
Could you explain how we run the SVA code provided in your link, given that when creating the
dds
object, we get the error "Model matrix not full rank"? The first linedat <- counts(dds, normalized = TRUE)
requiresdds
, but that is exactly what we cannot create.In my case, I have 4 replicates for a treatment in one batch, and 4 replicates for a control in another batch. I am getting the "model matrix not full rank" error, so I would like to know how to correct for batch effects. It sounds like I need to use sva somehow.
As you mentioned above - because the case and control are completely in separate batches, you suggest another tool to control for batch.
I am using ashr for shrinkage, but it is unclear how to incorporate their method. From reading sva, I should be able to use the code (somehow) provided in your link above (https://www.bioconductor.org/help/workflows/rnaseqGene/#batch).
Thank you in advance!
When you try to create the dataset, just use a design with
~condition
. You can't use~batch + condition
to build the dataset.Now, this is general advice just on how to avoid the error. But in your case batch and condition are perfectly confounded and I don't think there's any point using SVA or RUV. This is a very unfortunate design and should be avoided at all costs. One way to salvage it partially would be to use a method like CQN or EDASeq to try to reduce the batch effects by directly modeling GC and length dependence:
http://bioconductor.org/packages/release/bioc/html/cqn.html https://bioconductor.org/packages/release/bioc/html/EDASeq.html