Dear all,
I have 4000 (continuous) predictor variables in a set of 150 patients. First, variables with are associated with survival should be identified. I therefore use the multiple testing procedures function (http://svitsrv25.epfl.ch/R-doc/library/multtest/html/MTP.html) with the t-statistic for tests of regression coefficients in Cox proportional hazards survival models to identify significant predictors. This analysis identifies 60 parameters which are significantly associated with survival. I then perform unsupervised k-medoids clustering with the ConsensusClusterPlus package (https://www.bioconductor.org/packages/release/bioc/html/ConsensusClusterPlus.html) which identifies 3 clusters as the optimal solution based on the CDF curve & progression graph:
consClust = ConsensusClusterPlus(exprs(exampleSet), maxK=10,reps=1000,pItem=0.8,pFeature=1,title="example",distance="manhattan",clusterAlg="pam",verbose=FALSE,writeTable=TRUE) consClustList = matrix(c(consClust[[3]][["consensusClass"]]), ncol=1)
This works fine and consClustList gives me the information which of the 150 patient belongs to which of the three clusters.
Lets assume that I have another set of 50 patients and I want to predict, to which of the three clusters that were identified in the training set (n=150), these patients in the validation set (n=50) belong to. How can I achieve this?
Thanks in advance for your help!