Question

DESeq2 design matrix: covariates + nested replicates

0

Entering edit mode

mdsutcliffe5 • 0

@mdsutcliffe5-22671

Last seen 4.9 years ago

Hi, I'm having trouble figuring out the appropriate design for my data matrix. My data look something like this:

Condition   Sex     MouseID NewMouseID
WT          Female  1       1
WT          Female  2       2
WT          Female  3       3
WT          Female  3       3
WT          Male    4       1
WT          Male    5       2
WT          Male    6       3
WT          Male    6       3
KO          Female  7       1
KO          Female  8       2
KO          Female  9       3
KO          Female  9       3
KO          Male    10      1
KO          Male    11      2
KO          Male    12      3
KO          Male    12      3

My problem is with multiple samples coming from the same mice (note MouseIDs 3, 6, 9, and 12). If possible, I would like to not simply sum/average those samples and instead include them as a nested factor. The reason for this is those duplicates were from different sites within the same mouse. I had previously averaged technical replicates (same site, sequenced 2+ times).

If I do:

design = ~ MouseID + Sex + Condition

I get an error. My understanding is that this doesn't work because it treats the MouseIDs at 12 different "batches" and it would be impossible to tell if differences were due to the "batch" vs. the Condition.

Instead, my thought is to include a "NewMouseID" factor, that sequentially numbers the mice within each "group (i.e. each Condition/Sex combination). Is it possible to include this information in the design matrix? I want to control for differences between samples within the same mouse, as well as differences between male/female. However, I do want to note that there is no specific relation between MouseID #1 and MouseID #4, for example, despite them having the same NewMouseID.

Thanks in advance.

deseq2 • 1.5k views

ADD COMMENT • link updated 4.9 years ago by Michael Love 43k • written 4.9 years ago by mdsutcliffe5 • 0

0

Entering edit mode

what is the error that you get?

ADD REPLY • link 4.9 years ago sebastian.lobentanzer ▴ 50

0

Entering edit mode

 Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')

If I run: model.matrix(~ MouseID + Sex + Condition,data = df)

I get the following

enter image description here

My understanding is that the last column is the linear sum of MouseIDs 2 through 6

ADD REPLY • link 4.9 years ago mdsutcliffe5 • 0

0

Entering edit mode

have you seen this post?

ADD REPLY • link 4.9 years ago sebastian.lobentanzer ▴ 50

0

Entering edit mode

Just looked at it, thanks for the link.

My understanding is that their situation is different because they have cell lines with paired (t0 and t1) timepoints. I don't have pairing across conditions (e.g. same mouse given both treatments) or across sexes (e.g. male and female mouse of same "batch" or similar).

ADD REPLY • link 4.9 years ago mdsutcliffe5 • 0

0

Entering edit mode

have you seen this post?

ADD REPLY • link 4.9 years ago sebastian.lobentanzer ▴ 50

score 1 · Answer 1 · 2020-01-08

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

You don't need to control for sex because controlling for mouse is more fine-grained. When you give each mouse a baseline, this takes care of any differences across sex because the mice are nested within sex.

ADD COMMENT • link 4.9 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Thanks for the response.

I think I understand your first sentence about controlling for mouse being more appropriate, but not the bit about giving each mouse a "baseline". I'm not sure how this translates into a design matrix.

If I'm interested in the WT vs. KO condition, would I simply enumerate the mouse IDs, and start over within each condition? i.e. MouseID = c(1,2,3,3,4,5,6,6,1,2,3,3,4,5,6,6) and then make my design design = ~ MouseID + Condition despite the fact that there is no "connection" between the MouseID "1"s, etc?