I understand the idea of using negative bionomial distribution to test whether a covariate is differentially expressed/abundant or not.
I wonder, if the same argument is valid, when the analysis is not performing any test but for example, regressing these genes over case/control. In this regression, one continue and use relative abundance or should still use say the variantestablizer ...
"when the analysis is not performing any test but for example, regressing these genes over case/control. In this regression, one continue and use relative abundance or should still use say the variantestablizer ..."
Sorry, this is not clear enough for me to give an answer. The GLM is in fact very similar to a regression of the expected value for the normalized counts on the log scale over the case/control status. Can you restate the question in a more specific way as to your aims?
I am going to use a predictive model, to classify cancer / non-cancer. You can think, of it as a logistic regression; and eventually, my models returns some coefficient for every covariates(genes); Then if a new data comes, based on those coefficients I can assign new data points into either classes.
Typically, in this type of regression analysis, we standardize/rescale via "(x - mean(x))/sd(x)"; I wonder, if one should use DESEQ2 normalized data and skip "(x-mean(x))/sd(x)" or the other way around ?
I think you will need to be more specific about what you mean by "regression analysis".
regressing genes on the outcome ( case/control ).