Question

DESeq2 batch effect modeling and matrix not full rank

0

Entering edit mode

emmak ▴ 20

@emmak-14917

Last seen 7.0 years ago

Hi,

I am trying to model batch effect with DEseq2 but get "Error in checkFullRank(modelMatrix) : the model matrix is not full rank". My design formula is ~ BATCH + group. Basically, I want to find gene expressed differently between different groups. I don't see the linear dependent between BATCH and group and how to fix it. Any help would be appreciated. Thanks!

sampleName	BATCH	group
1_S1	A	WT_S1_CONTROL
2_S2	B	WT_S1_CONTROL
3_S3	C	WT_S1_CONTROL
4_S4	A	WT_S1_Treated
5_S5	B	WT_S1_Treated
6_S6	C	WT_S1_Treated
7_S7	A	WT_S2_CONTROL
8_S8	B	WT_S2_CONTROL
9_S9	C	WT_S2_CONTROL
10_S10	A	WT_S2_Treated
11_S11	B	WT_S2_Treated
12_S12	C	WT_S2_Treated
13_S13	D	WT_S3_CONTROL
14_S14	E	WT_S3_CONTROL
15_S15	F	WT_S3_CONTROL
16_S16	D	WT_S3_Treated
17_S17	E	WT_S3_Treated
18_S18	F	WT_S3_Treated
KO1_S19	G	KO_S1_CONTROL
KO2_S20	H	KO_S1_CONTROL
KO3_S21	I	KO_S1_CONTROL
KO4_S22	G	KO_S1_Treated
KO5_S23	H	KO_S1_Treated
KO6_S24	I	KO_S1_Treated
KO7_S25	G	KO_S2_CONTROL
KO8_S26	H	KO_S2_CONTROL
KO9_S27	I	KO_S2_CONTROL
KO10_S28	G	KO_S2_Treated
KO11_S29	H	KO_S2_Treated
KO12_S30	I	KO_S2_Treated
KO13_S31	J	KO_S3_CONTROL
KO14_S32	K	KO_S3_CONTROL
KO15_S33	L	KO_S3_CONTROL
KO16_S34	J	KO_S3_Treated
KO17_S35	K	KO_S3_Treated
KO18_S36	L	KO_S3_Treated

deseq2 rnaseq rna-seq • 3.2k views

ADD COMMENT • link updated 7.2 years ago by Michael Love 43k • written 7.2 years ago by emmak ▴ 20

score 3 · Accepted Answer · 2018-01-31

3

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 days ago

United States

Your batches are confounded with your groups. E.g. all the WT samples are in batches A-F and all the KO samples are in batches G-L. This makes it more difficult to control for batch using known batches. I'd suggest using SVs from svaseq for example:

https://www.bioconductor.org/help/workflows/rnaseqGene/#batch

ADD COMMENT • link 7.2 years ago Michael Love 43k

0

Entering edit mode

Thank Mike! it's a really clever trick.

ADD REPLY • link 7.2 years ago emmak ▴ 20

0

Entering edit mode

Hello Michael,

Could you explain how we run the SVA code provided in your link, given that when creating the dds object, we get the error "Model matrix not full rank"? The first line dat <- counts(dds, normalized = TRUE) requires dds, but that is exactly what we cannot create.

In my case, I have 4 replicates for a treatment in one batch, and 4 replicates for a control in another batch. I am getting the "model matrix not full rank" error, so I would like to know how to correct for batch effects. It sounds like I need to use sva somehow.

     label  condition   batch
  C8mA.1    control     1
  C10mA.1   case        2
  C8mA.2    control     1
  C10mA.2   case        2
  C8mA.3    control     1
  C10mA.3   case        2
  C8mA.4    control     1
  C10mA.4   case        2

As you mentioned above - because the case and control are completely in separate batches, you suggest another tool to control for batch.

I am using ashr for shrinkage, but it is unclear how to incorporate their method. From reading sva, I should be able to use the code (somehow) provided in your link above (https://www.bioconductor.org/help/workflows/rnaseqGene/#batch).

Thank you in advance!

ADD REPLY • link 5.9 years ago ashleymaeconard ▴ 10

0

Entering edit mode

When you try to create the dataset, just use a design with ~condition. You can't use ~batch + condition to build the dataset.

Now, this is general advice just on how to avoid the error. But in your case batch and condition are perfectly confounded and I don't think there's any point using SVA or RUV. This is a very unfortunate design and should be avoided at all costs. One way to salvage it partially would be to use a method like CQN or EDASeq to try to reduce the batch effects by directly modeling GC and length dependence:

http://bioconductor.org/packages/release/bioc/html/cqn.html https://bioconductor.org/packages/release/bioc/html/EDASeq.html

ADD REPLY • link 5.9 years ago Michael Love 43k