Hello,
Would love to get the community's take on this question. The results between these methods are similar, but different enough to warrant some discussion.
We have a data set that consists of a bunch of subjects treated with different drugs at different doses.
Diet Drug Dose 1 D A 0.1 2 D V 0 3 D A 0.03 4 D A 0.3 5 D B 0.1 ...
An example is to compare expression between highest dose 0.3 of drug A and control (V). We envision two ways of doing this.
1 - Taking a subset of the data including only A-0.3 samples and V-0 samples and looking for the effect of drug (~ Drug)
2 - Using all of the data, creating a new factor of DrugDose (e.g, A_0.1, A_0.3, etc) and looking for the effect of that (~ DrugDose).
In the first case, to get all dose to control options, we need to do 3 different expression analyses. In the second case, we only need to do one, and then use contrasts to look for expression differences.
It seems more logical to me to use option #2, but previous analyses chose #1. We have the chance to potentially ameliorate this situation, if it's warranted. Option #2 appears to result in fewer DE genes, so I'm wondering if we are stacking the deck unfairly when using #1 and inflating false positives.
Thanks for taking a look. Appreciate any opinions.
Ahh, thanks Michael. I completely missed this section of the vignette.