Question

confusion about Deseq2 wording in the vignette ( additive model with main effects only vs interaction )

0

Entering edit mode

Alexandre ▴ 10

@095e334e

Last seen 2.8 years ago

Hong Kong

Hi there,

My question is rather theoretical and regards the wording used to describe a multi-factor design in the vignette of Deseq2 package. For example in the following design:

~genotype + condition

The vignette mentions that the condition effect represents the overall effect controlling for differences due to genotype.

However when I start reading some books about multi-factor design and interaction, including here one book written by Michael one of the authors of DESeq2 I get the following from this book named Data Analysis for The Life Sciences.

X<-model.matrix(~type+leg, data=spider)
colnames(X)
"(Intercept)" "typepush"    "legL2""legL3""legL4"

So here they point out a model with one factor type with two level push and pull and another factor leg with 4 levels leg1 leg2 leg3 and leg4. Then they go ahead and make the following affirmation about the model ~type+leg

" In the previous linear model, we assumed that the push vs. pull effect was the same for all of the legpairs"

So if the push pull effect assumption is the same for all legpairs, how can an additive model control for differences in the first term ?

For example:

~genotype + condition

condition effect controls for differences due to genotype but ~type+leg assumes that level differences are the same for all leg levels.

So generalizing what we can say about this model ?

~ factor1 + factor2

Does the factor2 effect control for differences in factor1 or does the levels of factor1 are assumed to be the same for all levels of factor2 ??

Thanks.

DESeq2 • 1.1k views

ADD COMMENT • link updated 3.8 years ago by James W. MacDonald 68k • written 3.8 years ago by Alexandre ▴ 10

score 0 · Answer 1 · 2021-07-13

The assumption for the model defined by the following design matrix

X<-model.matrix(~type+leg, data=spider)

is that the difference between push and pull is the same for all the leg pairs. In other words push - pull for leg 1 is approximately the same as push - pull for leg 2. That is not to say that the average level for leg 1 is the same for leg 2! Consider a scenario where on average the level of both push and pull for leg 2 are 50% higher than for leg 1. But since both push and pull are higher in leg 2 vs leg 1, by definition the difference between push and pull for both legs is comparable. It's just that there is a 'leg' effect where leg 2 has a higher overall level.

If leg 2 is overall higher than leg 1, and the difference between push and pull is larger for leg 2 then you cannot fit the above model, because there is an interaction between leg and type, and simply assuming that the difference between push and pull is comparable regardless of the leg type is no longer a valid assumption.

As for your ~factor1 + factor2 model, you stated it incorrectly. The factor2 doesn't control for differences in factor1. Nobody says that. Factor1 represents changes in the levels of factor1, after controlling for differences across the levels of factor2. And by fitting that model we assume that the changes in factor1, across the different levels of factor2 are consistent.