Error on FindOptimalBinning function
0
0
Entering edit mode
@pavelgranalacant-23139
Last seen 4.7 years ago

Hi,

I was trying to replicate the BHC library example code (https://bioconductor.org/packages/release/bioc/html/BHC.html) with the Beast Cancer dataset (https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic), with PCA applied), but I have found problems with it.

I understood from the code example that, since my data is continuous, it should be discretized (as it is done in the 3rd example), so I replicate that part of the example:

BiocManager::install("BHC")
library(BHC)
library(RCurl)
library(factoextra)

breastCancer <- getURL('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data')
names <- c('id_number', 'diagnosis', 'radius_mean', 
           'texture_mean', 'perimeter_mean', 'area_mean', 
           'smoothness_mean', 'compactness_mean', 
           'concavity_mean','concave_points_mean', 
           'symmetry_mean', 'fractal_dimension_mean',
           'radius_se', 'texture_se', 'perimeter_se', 
           'area_se', 'smoothness_se', 'compactness_se', 
           'concavity_se', 'concave_points_se', 
           'symmetry_se', 'fractal_dimension_se', 
           'radius_worst', 'texture_worst', 
           'perimeter_worst', 'area_worst', 
           'smoothness_worst', 'compactness_worst', 
           'concavity_worst', 'concave_points_worst', 
           'symmetry_worst', 'fractal_dimension_worst')
breastCancer <-
  read.table(textConnection(breastCancer),
             sep = ',',
             col.names = names)

breastCancer.predictors <- breastCancer[3:32]
breastCancer.prcomp <- prcomp(breastCancer.predictors, scale = TRUE, center = TRUE)
breastCancer.PCA <- breastCancer.prcomp$x[, 1:7]

newData2 <- breastCancer.PCA
itemLabels2 <-breastCancer$diagnosis
percentiles  <- FindOptimalBinning(newData2, itemLabels2, transposeData=TRUE, verbose=TRUE)
discreteData <- DiscretiseData(t(newData2), percentiles=percentiles)
discreteData <- t(discreteData)
hc3          <- bhc(discreteData, itemLabels2, verbose=TRUE)
plot(hc3, axes=FALSE)
WriteOutClusterLabels(hc3, verbose=TRUE)

However, although I get two clusters, the first one only has one occurrence and the second one have the rest, which is far from my expected result. Am I doing something wrong?

Thanks in advance.

bhc error • 731 views
ADD COMMENT

Login before adding your answer.

Traffic: 688 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6