Question

DESEq2 contrast statement syntax and design statement question

0

Entering edit mode

grashow ▴ 10

@grashow-13014

Last seen 7.7 years ago

Hi,

Thank you for answering our previous questions. We've made progress addressing our shared controls and "model matrix not full rank" issues. As a reminder, we had 8 chemicals (plus control) tested at 4 concentrations both with and without estrogen (E2). We had three biological replicates and three technical replicates per plate. We assigned samples labeled "control" to each chemical and assigned a concentration of 0 uM.

We are mostly interested in the three way interaction between chemical, concentration and E2. We've explored two ways of doing this.

1) Michael Love suggested the following:

design= ~ bio_rep + E2 + E2:Conc_uM + E2:new_chem:Conc_uM

2) we have also used the paste0 command to create a mega-variable that combines chemical, concentration and E2.

design ~ bio_rep + mega_variable

We are getting very different results with these two approaches. Can you articulate what the differences might be?

As a second question, we've also seen two different syntax styles for contrast statements:

1) Tam_results_1 <- results(ddsColl, contrast= list(c("mega_varTam_0.1uM_E2_0","mega_varcontrol_0uM_E2_0")),alpha=0.05)

2) Tam_results_2 <- results(ddsColl, contrast= (c("mega_var","Tam_0.1uM_E2_0", "control_0uM_E2_0")),alpha=0.05)

These two yield different results. When would each be appropriate to use?

Thank you in advance,

Rachel

deseq2 interactions contrast • 2.0k views

ADD COMMENT • link updated 7.7 years ago by Michael Love 43k • written 7.7 years ago by grashow ▴ 10

score 0 · Answer 1 · 2017-08-15

Just to link for my own tracking back to previous post

DESeq2- what to do when two conditions share controls?

Re: very different results, this makes sense because treating concentration as a numeric variable (as before) or putting it into a string and treating the levels of concentration as categorical is a very different modeling choice.

In my last reply, I recommended you work with a local statistician, as you have a very complex experimental setup and I think it's not trivial to encode the tests that you want from English into R formula and contrasts. Statistical modeling is an iterative process with such an experimental design, between checking assumptions of certain models (how to encode the numeric variable of concentration, how to deal with shared controls). I'm going to reiterate that I recommend you partner with someone with background in using R's linear model formula.

Re: the two syntax, see the help page for ?results:

contrast: this argument specifies what comparison to extract from the
          ‘object’ to build a results table. one of either:

            • a character vector with exactly three elements: the name
              of a factor in the design formula, the name of the
              numerator level for the fold change, and the name of the
              denominator level for the fold change (simplest case)

            • a list of 2 character vectors: the names of the fold
              changes for the numerator, and the names of the fold
              changes for the denominator. these names should be
              elements of ‘resultsNames(object)’. if the list is length
              1, a second element is added which is the empty character
              vector, ‘character()’. (more general case, can be to
              combine interaction terms and main effects)

You are using the contrast correctly in (2), but for (1) above, you are only giving one character vector:

list( c("mega_varTam_0.1uM_E2_0", "mega_varcontrol_0uM_E2_0") )

This adds the two coefficient together, rather than taking their difference. If you want to contrast these two levels you need to provide the two levels as separate elements of the list:

list( "mega_varTam_0.1uM_E2_0", "mega_varcontrol_0uM_E2_0" )

They look similar, but one is adding together coefficients while the other is calculating their difference.