If you want to model each gene separately, you'll probably need to supply each row of v$E
as the y
argument in glmnet
, along with the corresponding row of v$weights
as weights
(assuming v
is the output of voom
). I'm not completely sure that the precision weights from voom
are applicable as the "observation weights" mentioned in ?glmnet
; it seems that there's weighted least squares in there somewhere, so it's probably fine. Anyway, this is the standard analysis approach where each gene is considered on its own merits (excepting situations involving empirical Bayes, or when learning latent variables). Of course, summarizing the output across thousands of genes is its own challenge; you should know what statistics are of interest a priori (e.g., percentage of genes where a particular variable is non-zero, the ideal lambda from cross-validation).
The other option is to model all genes together, presumably using family="mgaussian"
where each gene is a separate response. In this setting, it doesn't seem possible to supply a weight matrix to weights
, so if you want to do this you'll just have to make do without weighting. Note that coercing v$E
and v$weights
to vectors and stuffing them into glmnet
will almost definitely not do what you want - you'd either be assuming that each variable affects all genes in the same manner (which is very unlikely) or you'd have to include gene-specific variables in the design matrix (in which case you might as well just fit each gene separately).
Of course, the bigger question is why you want to do this in the first place. Are you trying to use elastic nets to learn which variables should be put into a model for a DE analysis? This is generally considered to be a Bad idea; automatic model selection methods will happily learn models that provide great predictive power but whose variables cannot be easily interpreted in the context of the questions of interest.
P.S. Split the limma and voom
tags in your post, otherwise maintainers won't be notified.
P.P.S. Respond to answers using the "add comment" button, don't add a new answer.
Dear Aaron Lun,
Thank you very much for your quick and informative reply!
My main problem is how to perform regularized linear cox regression modeling using RNA-seq data as the log-transformed counts do not fulfill the statistical assumptions required for linear modeling (i.e. equal variance). Do you beleive that vst transformation of RNA-seq data using varianceStabilizingTransformation function of DEseq2 package is sufficient for further analysis of RNA-seq data in glmnet package?
I would appreciate hearing your opinion!!
Thank you very much for your time in advance!!
Sincerely,
Panagiotis Mokos
I'm not going to say too much, because we're quickly getting out of the scope of the support site, but it seems like you're trying to use variable selection to identify genes that contribute to the hazard. You could make life easier for yourself by instead using the time-to-event (death, presumably?) as a covariate in the DE analysis. This will identify genes that are associated with time and can be done with standard pipelines.
Dear Aaron,
Thank you for your response!
The main purpose of my analysis is to perform unbiased prediction and feature selection in high-dimensional survival regression applying bootstrap enhanced elastic net cox regression modeling. Although I use RNA-seq data to perform this analysis. Do you beleive that varianceStabilizingTransformation or voom function could better transform my data in order to perform this analysis or could you recommend a better transformation option to me?
Thank you very much for your time in advance!!
Sincerely,
Panagiotis Mokos