Hello,
I've read pages (23-25) of the DESeq2 manual many, many times yet I still need help interpreting the language. My PI read it as well and admitted "it's not his language" so I could really use some feedback right now.
My model design looks like this:
4 time points in a month "lunar" (3Q, NM, 1Q, FM)
3 individual coral colonies (A, B, C)
2 sampling times in a day "dayTime" (day, night)
My design formula is: ~ dayTime+lunar+dayTime:lunar
At first I wasn’t really sure how to do contrasts correctly. I wanted to compare noon to night samples and the only way I figured out at the start was with:
resDN=results(dds,contrast=c("dayTime", "day", "night"))
Of course, because my design lunar factor has more than two levels the above does “an expanded model matrix” so the above contrast compares day over night across all lunar sets (3Q, NM, 1Q, FM). I also wanted to delve deeper to see what happens at each phase of the moon.
Therefore, I subset the data into 4 smaller datasets (one for each lunar period) and compared noon to night samples at each lunar period with the above contrast. Yet I knew that subsetting wouldn't typically produce the same results...and the reason is important. Tools like DESeq2 work by looking at all of the samples at once, which allows them to get a nice measurement of things like dispersion - by subsetting early on, you limit the amount of information available to make these estimates.
The UofC just recently hired its first Departmental Chair of Bioinformatics so I went to talk to him about my issue. He said he was only familiar with EdgeR, that by reading it (briefly, mind you) he couldn’t really tell and that the best way for me to find the appropriate contrast was empirically - trying a variety of contrasts and looking at the results.
I finally, came up with two contrasts that produce similar results to my subset analysis so my PI trusts the results. The only problem is that the results are slightly different for each of the following contrasts so I need help understanding what is different about the following so I could explain to my PI and committee members:
res1Qday=results(dds, contrast=list("dayTimeday","lunar1Q"))
resDN1Q=results(dds, contrast=list("dayTimenight", "lunar1Q"))
res1Qday had 22 DEGs and resDN1Q had 18 DEGs. Both have genes we expected to be different at different times of day (Cryptochromes, bZIP’s, etc.) but one Cryptochrome shows up in DN1Q and doesn’t on 1Qday. Conversely, a red fluorescent protein and bZIP show up in 1Qday but not DN1Q.