Dear Bioconductor users,
I am working with RNA-seq data (raw counts) and I want to perform regularized cox regression modelling using glmnet package. First, I have performed VST transformation that makes RNA-seq data homoscedastic. Next do i have to set the argument of glmnet function standardize= TRUE for variable standardization (all variables to have unit variance) prior to fitting the model sequence and then use the resulting unstandardized coefficients to rank the selected features (genes) or in my case the default standardization is not appropriate ?
Thank you for your time in advance!!
Sincerely,
Panagiotis Mokos
Dear Love,
Thank you very much for your useful information!!
Please, could you explain more about this gene filtering (based on variance) or send me a link (the above-mentioned vignette)?
Also, in your opinion, is it better to prior standardize (unit variance) the VST-filtered data and then input them to glmnet algorithm setting standardize= FALSE? In other words do you believe that the final coefficient sizes (which they will be used to rank the selected features) should reflect the differences of gene variances?
Thank you for your time !!!
Sincerely,
Panagiotis
The DESeq2 vignette is available by typing into R:
vignette("DESeq2")
You should definitely read this over, particularly the part about transformations. It's the detailed user guide for the software, which has grown over 7 years of DESeq1/2.
(All Bioconductor software is required to have a detailed software vignette.)
I don't really have any extra opinion on the downstream usage beyond my suggestion above. If this seems to confusing or difficult, you could just filter out low counts genes based on some heuristic you define and then use glmnet on log counts.
Dear Love,
Thank you very much for your response!!
Panagiotis