semi-supervised clustering
1
0
Entering edit mode
Tim Smith ★ 1.1k
@tim-smith-1532
Last seen 10.2 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20071025/ 512506a9/attachment.pl
• 1.0k views
ADD COMMENT
0
Entering edit mode
@friedrichleischstatuni-muenchende-2449
Last seen 10.2 years ago
>>>>> On Thu, 25 Oct 2007 09:12:32 -0700 (PDT), >>>>> Tim Smith (TS) wrote: > Hi, > Is there any package that implements semi-supervised clustering > through 'must-link' and 'cannot-link' constraints? Package flexclust on CRAN can do constrained clustering. The feature is not well documented in the current release version, but myfam <- kccaFamily("kmeans", groupFun = "minSumClusters") clres <- kcca(x, k, myfam, group=mygroups) will assign all points which belong to one group to the same cluster using kmeans (but flexclust can use other distances than Euclidean, too). groupFun = "minSumClusters" will assign to the cluster where the center has minimal average distance to all group members. groupFun = "majorityClusters" assigns the all group members to the cluster the majority belongs to. groupFun = "differentClusters" implements a 'cannot-link' constraint, obviously the group sizes must be smaller than the number of clusters in this case. Some details on the algorithms used can be found in http://www.ci.tuwien.ac.at/papers/Leisch+Gruen-2006.pdf Hope this helps, Fritz
0
Entering edit mode
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20071026/ 2c75f5d5/attachment.pl
ADD REPLY
0
Entering edit mode
>>>>> On Fri, 26 Oct 2007 07:19:45 -0700 (PDT), >>>>> Tim Smith (TS) wrote: > Hi Friedrich, > Thanks for the response! > I have tried the following: > --------------------------------------------------------- > nums <- sample(1:300,70) > x <- matrix(nums,10,7) > mygroups <- c(1,3,4) # i.e. I would like these 3 rows in 'x' to > cluster together The group vector needs to be a factor or integer vector with the same length as x has rows, otherwise it will be recycled as in many other functions. See below for a working example. In the usual setting the grouping would be a factor variable in your data frame. > myfam <- kccaFamily("kmeans", groupFun = "minSumClusters") > clres <- kcca(x, k=3, myfam, group=mygroups) > -------------------------------------------------------- > I get the following result: >> clres > kcca object of family 'kmeans' > call: > kcca(x = x, k = 3, family = myfam, group = mygroups) > cluster sizes: > 1 2 > 3 7 > I have two questions: > i) How do I get the details of the clusters (i.e which points/rows > are in which cluster)? cluster(clres) > ii) If k=3, then shouldn't there be 3 clusters? If a cluster gets empty during the iterations it is removed, so you can end up with less clusters than you asked for. For grouped clustering this happens more often than for regular kmeans because of the re-assignement of group members. A working example: set.seed(12) ## same as above nums <- sample(1:300,70) x <- matrix(nums,10,7) ## Rows 1, 3 and 4 are in group 1, all other groups contain ## only one observation mygroups <- c(1,2,1,1,3,4,5,6,7,8) myfam <- kccaFamily("kmeans", groupFun = "minSumClusters") clres <- kcca(x, k=3, myfam, group=mygroups) R> clres kcca object of family ?kmeans? call: kcca(x = x, k = 3, family = myfam, group = mygroups) cluster sizes: 1 2 3 3 5 2 R> table(cluster(clres),mygroups) mygroups 1 2 3 4 5 6 7 8 1 3 0 0 0 0 0 0 0 2 0 1 0 1 1 1 1 0 3 0 0 1 0 0 0 0 1 and all members of group 1 end up in cluster 1 (note: need not be cluster 1) hth, fritz -- ---------------------------------------------------------------------- - Prof. Dr. Friedrich Leisch Institut f?r Statistik Tel: (+49 89) 2180 3165 Ludwig-Maximilians-Universit?t Fax: (+49 89) 2180 5308 Ludwigstra?e 33 D-80539 M?nchen http://www.stat.uni- muenchen.de/~leisch

Login before adding your answer.

Traffic: 903 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6