Question

deseq2 normalized data

0

Entering edit mode

akp • 0

@akp-8846

Last seen 8.6 years ago

I understand the idea of using negative bionomial distribution to test whether a covariate is differentially expressed/abundant or not.

I wonder, if the same argument is valid, when the analysis is not performing any test but for example, regressing these genes over case/control. In this regression, one continue and use relative abundance or should still use say the variantestablizer ...

regression deseq2 counts • 2.1k views

ADD COMMENT • link updated 9.6 years ago by Michael Love 43k • written 9.6 years ago by akp • 0

0

Entering edit mode

I think you will need to be more specific about what you mean by "regression analysis".

ADD REPLY • link 9.6 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

regressing genes on the outcome ( case/control ).

ADD REPLY • link 9.6 years ago akp • 0

score 1 · Accepted Answer · 2015-09-22

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

"when the analysis is not performing any test but for example, regressing these genes over case/control. In this regression, one continue and use relative abundance or should still use say the variantestablizer ..."

Sorry, this is not clear enough for me to give an answer. The GLM is in fact very similar to a regression of the expected value for the normalized counts on the log scale over the case/control status. Can you restate the question in a more specific way as to your aims?

ADD COMMENT • link 9.6 years ago Michael Love 43k

0

Entering edit mode

I am going to use a predictive model, to classify cancer / non-cancer. You can think, of it as a logistic regression; and eventually, my models returns some coefficient for every covariates(genes); Then if a new data comes, based on those coefficients I can assign new data points into either classes.

Typically, in this type of regression analysis, we standardize/rescale via "(x - mean(x))/sd(x)"; I wonder, if one should use DESEQ2 normalized data and skip "(x-mean(x))/sd(x)" or the other way around ?

ADD REPLY • link 9.6 years ago akp • 0

0

Entering edit mode

I would recommend variance stabilizing using VST or rlog and not dividing out the row (gene) standard deviation*

* see this explanation: A: Biclustering Normalizing by Row in Heatmap of DESeq2

With the variance stabilized data, you can then perform any kind of machine learning or prediction algorithms your like.

ADD REPLY • link 9.6 years ago Michael Love 43k