I am performing a differential expression analysis (it happens to be on ATAC counts, but I that shouldn't matter?) using DESeq2.
My experimental design is that I have several experimental variables, but as I am trying to get to the bottom of the design, I will concentrate on two: Disease vs Non-disease, and subtype. There is only subtype information for disease, but not non-disease samples. There are 27 normals and 5 disease - and the disease have three of one subtype and one of the other.
I could think of two ways to idetnifiy disease (i.e. sub-type A or B) relevant genes. Instead of showing 33 samples, I'll just show a minimal exapmle of the same thing.
First averaging over the two sutbtypes:
design = ~ 0 + disease_and_subtype
subtypeA subtypeB subtypenormal
1 1 0 0
2 1 0 0
3 0 1 0
4 0 1 0
5 0 0 1
6 0 0 1
and testing the contrast constrast = list(c("subtypenormal"), c("subtypeA", "subtypeB")), listValues=c(1,-1/2)
The second alternative is to nest subtype within disease (and remove the empty matrix columns):
design = ~disease + disease:subtype
(Intercept) diseaseTRUE diseaseTRUE:subtypeB
1 1 1 0
2 1 1 0
3 1 1 1
4 1 1 1
5 1 0 0
6 1 0 0
and testing the coefficient diseaseTRUE
.
To my mind these are equivalent. Bu the first method gives 25,000 significant regions, while the second gives 11.
Clearly I am misunderstanding something about these designs, and I;d be grateful if someone could point out what. I guess the advice might be just to forget the subtype, and test the disease state irrespective, but I'd still like to understanding what is going on.
I guess what I'm thinking is that there will be some effects that are subtypes specific, and some which are general to the disease, and we want to isolate the disease general effects. By accounting for the subtype effect, I thought we might reduce an unwanted source of variance. I think testing against ~1 would also find things where either subtypeA or subtype B differed from normal or each other - so you'd get the subtype specific effects rather than disease general ones.
Sounds like you want the first design then.