design

When I used the DESeq2, I used the code: dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition + batch) to remove the batch effect. However, I found the error: the model matrix is not full rank. I have read many threads about removing batch effects as well as model matrix is not full rank as well as the vignettes, but I still not understand how to make it works. Can you please help to walk me through it? Thank you

deseq2 • 1.3k views

ADD COMMENT • link updated 4.7 years ago by James W. MacDonald 67k • written 4.7 years ago by luongthang1908 • 0

score 0 · Answer 1 · 2020-03-13

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 16 hours ago

United States

See the vignette section about models that are not full rank.

ADD COMMENT • link 4.7 years ago James W. MacDonald 67k

0

Entering edit mode

Can you please give me a specific example of the nested that I can add to the design? I do not understand the principle to make the matrix is full rank. I followed this thread https://support.bioconductor.org/p/62357/#62368 as well but I still not get it.

ADD REPLY • link 4.7 years ago luongthang1908 • 0

0

Entering edit mode

Hi James, I read the vignette. But for my issue, instruction is: "In both of these cases above, the batch effect cannot be fit and must be removed from the model formula. There is just no way to tell apart the condition effects and the batch effects. The options are either to assume there is no batch effect (which we know is highly unlikely given the literature on batch effects in sequencing datasets) or to repeat the experiment and properly balance the conditions across batches."

If you can help, please explain a little about my specific design. Thank you,

ADD REPLY • link 4.7 years ago luongthang1908 • 0

0

Entering edit mode

If you have conditions that are nested within batches, then you have a problem. In that situation you have confounded a condition with a batch, and there isn't any way to fix that. You can assume that there isn't a batch effect, which is a pretty strong assumption, given that you have no evidence to say that's a reasonable thing to think.

Think of it this way. Assume you have two strains of corn and you want to know which one has a better yield. If you take one strain and plant it in a really loamy fertile field, and the other strain and plant it in some rocky wasteland of a field, do you think you will be able to say if the differences are due to one strain being better than the other or is it because one field is better? It's probably some combination of the two, and there isn't a way to discern the difference.

If you get equal yield is it because the strain in the bad field is better? Or is it just good for infertile soil? If you are smart, you plant BOTH strains in each field. Then you can say which strain is better, and you can even say which is better for poor fields and which is better for good fields. But if you don't do that, you can't really say much at all without making unwarranted assumptions.

ADD REPLY • link 4.7 years ago James W. MacDonald 67k

0

Entering edit mode

Oh, I missed the link. Yeah, that's a completely borked design. You can only make comparisons between conditions within each batch.