Question

DESeq2: Is it necessary to include all terms and interactions in LRT tests?

0

Entering edit mode

chimeric • 0

@chimeric-9840

Last seen 7.4 years ago

Hi there, I am wondering if you could comment if I am setting up and interpreting the LRT test correctly. I am curious if it's necessary to include all terms and interactions, or if I can only include the terms I'm interested in.

For example, I have a time course experiment where I have 2 tissue types and 2 treatments sampled over time.

~ time + tissue_type + treatment

I would like to find genes that are significantly expressed due to the treatment, regardless of tissue type or time point.

To find these, can I simply reduce to the following:

~ time + tissue_type

On the other hand, if I wanted to identify the genes that respond differently to the treatment based on tissue type, can I just the interaction term of interest, and then reduce it, eg.

Full: ~ time + tissue_type + treatment + tissue_type:treatment
Reduced: ~ time + tissue_type + treatment

Thank you for your advice!
Erin

deseq2 lrt • 1.6k views

ADD COMMENT • link updated 7.4 years ago by Gavin Kelly ▴ 690 • written 7.4 years ago by chimeric • 0

score 2 · Answer 1 · 2017-09-14

It looks to me like you've got the correct model specification, your reduced model should include all the nuisance terms that you want to normalise out of your data; the full model should include those, plus the (usually one) term (either main or interaction), that you hypothesize might be having an effect. So you've correctly answered the two questions you pose with the two different sets of full+reduced models.

As both tissue and treatment are limited to two levels here, you should get equivalent results for if you fit the full models, and do a Wald test on the final coefficient, as that will give the effect size of the treatment, or the difference in effect sizes of treatments between tissues, respectively. This wouldn't hold if you had more than two levels in the factor. To clarify, you'd just supply one model (the one you identify as 'full') for each question, and have no need of the 'reduced' model. Which approach you use is personal preference; some people find one approach easier to understand.

The mandatory advice when I see 'time' used as a term in a model - you may want to check if 'time' is a factor or a number in your data, as the interpretation of the results will differ according to which you're doing (either removing indvidual timepoint-specific effects, or removing a linear trend of log-expression on time, respectively).