Question

DESeq2: Error when correcting batch effect

0

Entering edit mode

Rimma • 0

@rimma-21441

Last seen 5.7 years ago

Hello, I'm struggling with batch correction for RNA-seq data in DESeq2. For example, my colData looks like this (10 samples, 6 controls+4 treatment, belong to 2 batches):

samples   condition batch    
    100        PH7     1
    101        PH7     1
    103        PH7     1
    63         PH7     1
    64         ctr     1
    74         ctr     1
    75         ctr     1
    76         ctr     2
    88         ctr     2
    99         ctr     2

As far as I understood from this post, my problem is that some conditions belongs only to one batch, for example, all "PH7" belong only to 1 batch. I tried to do as was suggested on the post:

mm = model.matrix(~ batch+conditions, colData(dds))

And then look up for columns where ALL zeros, however, I don't have such... At least in one raw of each column there is 1.

Is there a way to make such analysis?

deseq2 batch effect • 1.4k views

ADD COMMENT • link updated 5.7 years ago by Michael Love 43k • written 5.7 years ago by Rimma • 0

score 0 · Answer 1 · 2019-07-24

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 17 hours ago

United States

You can just use ~batch + condition here. What is the error?

ADD COMMENT • link 5.7 years ago Michael Love 43k

0

Entering edit mode

I tried, it shows this one:

  Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

ADD REPLY • link 5.7 years ago Rimma • 0

0

Entering edit mode

I don't get that error when I run this design and this column data. Maybe check your code?

dds <- makeExampleDESeqDataSet(m=10)
dds$batch <- factor(rep(1:2,c(7,3)))
dds$condition <- factor(rep(2:1,c(4,6)))
design(dds) <- ~ batch + condition
dds <- DESeq(dds)

ADD REPLY • link 5.7 years ago Michael Love 43k

0

Entering edit mode

Thank you for reply Michael!

I a bit simplified colData for post, but does it make changes if my actual colData looks like this (so the major difference I see is that the third batch has all conditions which don't belong to any other batches):

samples   condition batch    
    100        PH7     1
    101        PH7     1
    103        PH7     1
    63         PH7     1
    64         ctr     1
    74         ctr     1
    75         ctr     1
    76         ctr     2
    88         ctr     2
    99         ctr     2
   11         hbls     3
   12         hbls     3
   13         hbls     3

Otherwise, my code looks fine to me, but I will recheck it again

ADD REPLY • link 5.7 years ago Rimma • 0

0

Entering edit mode

Yes it makes a difference. This is why it's good to try to describe your actual data, so we don't go back and forth while talking about different datasets.

In your actual dataset, you can't control for batch effects because your batch 3 is confounded with your condition there. This means that your results cannot be trusted entirely, regardless of what statistical method you use, because you can't tell batch 3 apart from that condition.

While this doesn't solve that particular problem, my preferred approach to deal with the two batches within control at this point would be to use SVA to capture heterogeneity that is orthogonal to the condition. We have example code in the workflow on how to do this.

ADD REPLY • link 5.7 years ago Michael Love 43k

0

Entering edit mode

Sorry for this.

Yes,I understand the problem now...

Thank you for clarifications :)

ADD REPLY • link 5.7 years ago Rimma • 0