The models are the effectively the same. The only difference lies in how you set up the contrasts. With the first model, the coefficients represent the average expression in each group. Thus, you'll have to use makeContrasts to set up comparisons between groups, and supply that as contrast in glmLRT. With the second model, the coefficients represent log-fold changes of particular groups over the group chosen as the intercept. As such, you can drop them directly with coef if you want to compare to the intercept group. (Of course, if you want to compare two non-intercept groups, then you'll still have to use makeContrasts.)
Provided you perform the same DE comparison, both parametrizations should give you identical results. I find the first model (i.e., the intercept-free approach) a bit easier to interpret in general, but that's my personal taste.
The code looks fine. You could replace steps 8-10 with a single estimateDisp call, as that's the newer function. As for the difference between glmFit and glmQLFit - the latter estimates a quasi-likelihood dispersion, which allows downstream tests to better account for uncertainty in dispersion estimation. The standard glmFit + glmLRT pipeline treats the estimated dispersions as true values, which isn't totally accurate.
You can use either pipeline. I prefer to use glmQLFit + glmQLFTest, as it provides more accurate type I error control. As to your other question, I would suggest using glmTreat rather than filtering on the log-fold change.
Thanks Aaron. I checked both the designs. They are yielding similar output.
Secondly, I have seen in the example case studies in the edgeR manual that they have used glmQFit().
I am just wondering whether I should use glmFit() or glmQFit()? How do I decide?
I have data in three biological replicates with the different groups.
I am using following codes to identify DEG in three different groups. Please comment on the codes.
Thanks
The code looks fine. You could replace steps 8-10 with a single
estimateDisp
call, as that's the newer function. As for the difference betweenglmFit
andglmQLFit
- the latter estimates a quasi-likelihood dispersion, which allows downstream tests to better account for uncertainty in dispersion estimation. The standardglmFit
+glmLRT
pipeline treats the estimated dispersions as true values, which isn't totally accurate.Thanks for your comments on the code.
You mean I should use glmQLFit() instead of
glmFit
+glmLRT.
I am not quite sure. Please guide meIn addition, I would like to subset DE data using 5% FDR and log-fold change cut-off. Could you please suggest some code.
You can use either pipeline. I prefer to use
glmQLFit
+glmQLFTest
, as it provides more accurate type I error control. As to your other question, I would suggest usingglmTreat
rather than filtering on the log-fold change.Thanks for your help.