DESeq2 multiple interaction terms 3-factor design
1
5
Entering edit mode
@stephen-turner-4916
Last seen 6.3 years ago
United States

I have a 3-factor experiment, each factor having two levels:

  • gt: Genotype: XX vs XY
  • sex: Sex: M vs F
  • tmt: Treatment: Letrizole (let) vs Vehicle (veh)

I'm interested in the following contrasts:

  1. Average main effects of genotype, sex, and treatment across all other levels of other factors
  2. Interaction effect of gt:tmt across all levels of sex
  3. Interaction effect of sex:tmt across all levels of gt

Then similarly, getting the same information slicing across one or the other level of gt and sex. That is, 

  1. Avg main effects of gt and tmt separately in each level of sex (m and f)
  2. Avg main effects of sex and tmt separately in each level of gt (xx and xy)
  3. Interaction of gt:tmt separately in each level of sex (m and f)
  4. Interaction of sex:tmt separately in each level of gt (xx and xy)

Here's what I'm doing.

I have a design matrix that looks like this, and after running the DESeq pipeline, I have 6 contrasts in my results set.

> design(dds)
~gt + tmt + sex + gt:tmt + sex:tmt
> matrix(resultsNames(dds))
     [,1]           
[1,] "Intercept"    
[2,] "gt_xy_vs_xx"  
[3,] "tmt_let_vs_veh"
[4,] "sex_m_vs_f"   
[5,] "gtxy.tmtlet"  
[6,] "tmtlet.sexm"   

And here's what I'm doing to try to get out the main effects. Does this look right?

# average main effects across all levels: get the main effect and half of both contrasts
# for gt xy vs xx
results(dds, contrast=c(0, 1, 0, 0, .5, .5)) 
# for tmt let vs veh
results(dds, contrast=c(0, 0, 1, 0, .5, .5))
# for sex m vs f
results(dds, contrast=c(0, 0, 0, 1, .5, .5))

 

 

However, I'm confused about interaction terms. If I want the average gt:tmt interaction across all levels of sex, do I include "half" the interaction of sex:tmt also so I capture this across all levels of sex? Or not? which of these is correct? 

results(dds, contrast=c(0, 0, 0, 0, 1, .5)) 
results(dds, contrast=c(0, 0, 0, 0, 1, 0)) 

 

 

And for the second set of results, getting the same information slicing across one or the other level of gt and sex, I'm guessing it'll look something like this?

# main effect of gt xy vs xx in females average across treatment:
# add half the gt:tmt interaction?
results(dds, contrast=c(0, 1, 0, 0, .5, 0))
# main effect of gt xy vs xx in males average across treatment:
# add half the gt:tmt interaction, and all of the tmt:sex(m) interaction?
results(dds, contrast=c(0, 1, 0, 0, .5, 0))
results(dds, contrast=c(0, 1, 0, .5, 1))

 

I think if I can get these few correct I can handle the rest.

Thanks for any insight!

deseq2 deseq interactions multiple factor design • 7.4k views
ADD COMMENT
0
Entering edit mode

Also, an older version of the DESeq2 manual has a description of using numeric contrasts, which seems to have completely disappeared from the newer version of the manual. See page 25 here:

https://www.bioconductor.org/packages/3.1/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf


For standard model matrices, to obtain the “average condition effect” for both sets, we can use a numeric contrast, assigning a 1 to the main effect conditionB and a 0.5 to the interaction effect setY.conditionB, as they appear in resultsNames(dds). This can be understood as taking us halfway from the condition effect in set X to the condition effect in set Y:

resultsNames(dds)

results(dds, contrast=c(0,0,1,0.5))

 

ADD REPLY
7
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

Yes, I trimmed down the interaction section, because I think it was making people more confused than less. I've now made the vignette section more descriptive and added a picture, and then built up the examples in ?results, although not showing that using numeric contrast you can add half of an interaction, in order to get a sense of an "average" of the effects in both groups. I'll try to add it back in somewhere in this devel cycle.

For your first question, for average genotype effect across the treatments, you would not include a .5 for the "tmtlet.sexm" term. It would just be:

results(dds, contrast=c(0, 1, 0, 0, .5, 0)) 

and similar logic for the others. The reason is that, you do not have a second order interaction here of genotype-treatment-sex, so you assume the genotype-treatment interaction is the same across sex (which is perfectly reasonable, and I think it's always good to try to use only as many terms as you need). So you just need to have the main effect and half of the interaction of genotype with treatment. 

For your second question, this same reasoning applies. You would not add .5 for the "tmtlet.sexm" term but just use:

results(dds, contrast=c(0, 0, 0, 0, 1, 0)) 

or just

results(dds, name="gtxy.tmtlet")

For your third question, the same reasoning tells you that this "average" genotype effect, averaging over treatments, is the same for females and males with this model.

ADD COMMENT
0
Entering edit mode

Also, the fractional terms (0.5) are appropriate for balanced group sizes. If the frequencies of the covariate levels are unbalanced in your population of interest, you would need to adjust the fractional term accordingly. 

ADD REPLY
1
Entering edit mode

Here is a little script which might help explain this difference between whether we adjust for sample size differences or not:

# we are interested in the "z" effect           
# in two groups, x1 and x2                      
n <- c(20,20,20,20)                             
mu <- rep(c(10,30,20,50), n)                    
y <- rnorm(sum(n), mu, 0.01)                    
# plot(y)                                       
x <- factor(rep(c("1","2"),c(n[1] + n[2], n[3] + n[4])))     
z <- factor(rep(c("a","b","a","b"),n))          
cf <- coef(lm(y ~ x*z))                         
                                                
# average z effect across groups 
# ignoring sample size, we expect (20 + 30) / 2 = 25       
(ave.v <- cf[3] + 0.5 * cf[4])                  
                                                
# average z effect across *samples* should 
# use sample size to inform estimate  
(ave.v.samples <- cf[3] + (n[3] + n[4])/sum(n) * cf[4])
ADD REPLY
0
Entering edit mode

Mike and Stephen, I have two questions on this post (hope you would look at my comment on this very old post)

Mike,

I have similar experimental design and sorry I'm still confused about your explanation to the third question. Did you mean that we would have to use a different design formula to find average main effects of gt and tmt separately in each level of sex? If so, what that design might be?

Stephen,

Can I ask what was the contrast you used for your 3 question in 2nd set i.e., finding the interaction of gt:tmt separately in each level of sex?

Many many thanks !!

 

ADD REPLY
0
Entering edit mode

hi Alan,

I'll answer on your new post today

ADD REPLY

Login before adding your answer.

Traffic: 994 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6