I have been working with DESeq2 for the past couple of months analyzing my data, I have read over the vignette many times, found other workshops, read message boards, but I still second guess my decisions and the options I have chosen.
Basically my design is that I have multiple clam lines, lets say 3 (A,B,C), and two salinities I am comparing (35 ppt vs 15 ppt). Salinity 35 ppt is my control, but I do not have a control or reference clam line.
The questions I am asking are: 1) How does the Hard clam respond to low salinity (15 vs 35), so regardless of clam line, how does this species respond to 15 ppt? 2) Do different clam lines respond differently to low salinity/ what genes are differentially expressed between (A&B), (B&C), (C&A) in 15 ppt?
I have come up with many different ways to approach these questions, but which approach is best or the right one?
I have had suggestions that I need to flip these questions and first ask question number 2 then 1.
I have struggled with if I need an interaction term or just groups. Do I just put salinity in the model and leave clam line out and vice versa to answer these different questions? When I but multiple variables or an interaction term in, the coefficient start to get vary confusing, especially since I don't have a reference clam line, but DESeq2 make one of my clam lines the reference.
Then there is the decision of shrinkage estimators. I have decided that Apeglm is best with my data. Ashr leaves dispersion outliers among my significant genes. However, contrast statements cannot be used with Apeglm.
Do I need to run multiple models or can I use one? What is ethical? I am defending my thesis in January and am in the process of creating my results. But I am terrified of doing something wrong and in the very end, when I go to defend or publish, all my results are incorrect.
If you have any guidance or suggestions, that would be great. Please don't just point me to a link or the vignette, because I have most likely read it and feel like everyone has a different solution to similar problems.
My advisors have limited experience with RNA-Seq and neither have used DESeq2, so I have been figuring this all out on my own.
I appreciate your time reading this and responding.
Cheers,
Leslie
Thank you @swbarnes2. I very much agree with your comment "Its important to make sure that the answer DESeq2 is giving you matches the question you intended to ask"
I think using an interaction term is good and does allow me to answer my questions (at least question 2). This issue I run into is that only shrinkage estimator ashr allows me to use contrast or incorporate multiple coefficients. However, with ashr I still get LFC in the 20s, which seems outrageous. Apeglm takes the same data and gives me an LFC of 8 or 10 at the largest.
I have found a very nice page that explains interactions and how to use different coef to answer different questions, but they are strictly using the results() and not lfcshrink(). https://rstudio-pubs-static.s3.amazonaws.com/329027_593046fb6d7a427da6b2c538caf601e1.html