Question

Which one is more trustable for Cox Proportional Hazard Model with Lasso penalty? LOOCV or K-fold cross-validation?

0

Entering edit mode

Talip ▴ 10

@talip-zengin-14290

Last seen 4 months ago

Türkiye

Hi, When I checked articles to find a method to identify predictors of overall survival, mostly glmnet cox model is used with Lasso penalty and cross validation. As cross validation, people used 10-fold cross validation with 1000 iterations or LOOCV.

I read about glmnet method and searched forums for codes. I found two different approaches for 10-fold cross validation: first one is running final glmnet function by using lambda.min value from cv.glmnet results, to create final model; second is (from an article) to find frequency of active covariates from results of cv.glmnet command with 1000 iterations and decide most frequent model as final model. Moreover, I understood that if we adjust nfolds option of cv.glmnet command as number of rows (samples) of response vector (y), it performs LOOCV. For LOOCV, we do not need iterations. If there is any misunderstood part, please correct me.

I tried the codes below and I had close but different results. Therefore I wonder which approach is more trustable, in this case? Note: Approach 3 (LOOCV) takes shorter time than others although disadvantage of LOOCV is time consuming. Is it because of iterations?

Thanks in advance.

library(glmnet)
library(dplyr)

data(CoxExample)
x <- x[1:500,]
y <- y[1:500,]

## Approach 1: 10-fold cross validation with 1000 iterations, then final glmnet run

lambdas = NULL
for (i in 1:1000) {
  fit <- cv.glmnet(x,y,family = "cox", nfolds = 10, alpha = 1, grouped = TRUE)
  errors = data.frame(fit$lambda,fit$cvm)
  lambdas <- rbind(lambdas,errors)
}

# take mean cvm for each lambda
lambdas <- aggregate(lambdas[, 2], list(lambdas$fit.lambda), mean)

# select the best one
bestindex = which(lambdas[2]==min(lambdas[2]))
bestlambda = lambdas[bestindex,1]

# and now run glmnet once more with it
fit <- glmnet(x,y,family = "cox", alpha = 1, lambda=bestlambda)
coef.min = coef(fit, s = "lambda.min")
active.min = which(coef.min != 0)
active.min

## Approach 2: 10-fold cross validation with 1000 iterations, then finding frequency of active covariates
coefs <- NULL
active.mins <- list()
for (i in 1:1000){
  cvfit = cv.glmnet(x, y, family = "cox", nfolds = 10, alpha = 1, grouped = TRUE)
  coef.min = coef(cvfit, s = "lambda.min")
  coefs <- cbind(coefs, coef.min)
  active.min = which(coef.min != 0)
  active.mins <- c(active.mins, list(active.min))
}
table(unlist(lapply(active.mins, paste, collapse = " ")))

## Approach 3: LOOCV through nrow-folds
cvfit <- cv.glmnet(x, y, family = "cox", nfolds = nrow(y), alpha = 1, grouped = TRUE)
cv.coef.min <- coef(cvfit, s = "lambda.min")
cv.active.min <- which(cv.coef.min != 0)
cv.active.min

fit <- glmnet(x, y, family = "cox", alpha = 1, lambda = cvfit$lambda.min)
coef.min <- coef(fit, s = "lambda.min")
active.min <- which(coef.min != 0)
active.min

Results Approach 1:

[1] 1 2 3 4 5 6 7 8 9 10 13 17 18 24 27 29

Approach 2:

[1] 1 2 3 4 5 6 7 8 9 10 13 17 18 24 27 29 (frequency: 966/1000)

Approach 3:

[1] 1 2 3 4 5 6 7 8 9 10 13 14 17 18 24 27 29

glmnet cox LOOCV k-fold cross validation • 2.5k views

ADD COMMENT • link updated 5.6 years ago by chris86 ▴ 420 • written 5.6 years ago by Talip ▴ 10

score 1 · Answer 1 · 2019-09-02

Actually (unless this is just for training, then excuse me) none of this is trustworthy because you are not splitting into separate training and testing datasets, e.g.

https://stackoverflow.com/questions/18130338/plotting-an-roc-curve-in-glmnet

If you want to do cross validation + glmnet just use caret with all the data to do it auto, e.g.

ctrl <- trainControl(method = 'cv', summaryFunction=twoClassSummary, classProbs=T, savePredictions = T, verboseIter = T)

fit2 <- train(as.formula( paste( 'labelsa', '~', '.' ) ), data=test3, method="glmnet", trControl=ctrl, metric = "ROC")

Then fit2 contains your mean AUC.

If this is just for training (your finding the best model on this data for prediction), approach 1 sounds reasonable to me. You can also use caret for finding the best model and pass it a grid of lambda and alpha values.