problem with impute.knn in the impute package
1
0
Entering edit mode
Marcus ▴ 150
@marcus-410
Last seen 10.2 years ago
Hello. You have to remove the random seed using: if(exists(".Random.seed")) rm(.Random.seed) before you run the impute.knn function if you are using a Windows machine. Regards Marcus Marcus Gry Bj?rklund Royal Institute of Technology AlbaNova University Center Department of Molecular Biotechnology 106 91 Stockholm, Sweden Phone (office): +46 8 553 783 44 Fax: + 46 8 553 784 81 Visiting address: Roslagstullsbacken 21, Floor 3 Delivery address: Roslagsv?gen 30B Web: http://www.biotech.kth.se/molbio/microarray/index.html -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of He, Yiwen (NIH/CIT) Sent: Friday, April 29, 2005 19:22 To: 'bioconductor@stat.math.ethz.ch' Cc: Powell, John (NIH/CIT); Asaki, Esther (NIH/CIT) Subject: [BioC] problem with impute.knn in the impute package Hi, I have R version 2.0.1 and bioconductor 1.5 on both PC and Unix. I was trying to use the impute.knn function of the impute package on a dataset of 7332 genes and 3 arrays: > library(impute) > dim(dd) [1] 7332 3 > is.matrix(dd) [1] TRUE > dd.imputed <- impute.knn(dd) When run on PC (windows XP), the R program crashes after a few seconds. When run on a unix box, I can see such output: Cluster size 7332 broken into 5667 1665 Cluster size 5667 broken into 4141 1526 Cluster size 4141 broken into 1796 2345 Cluster size 1796 broken into 840 956 Done cluster 840 Done cluster 956 Done cluster 1796 And R session was closed. So the clustering was started but aborted somewhere in the middle. I searched the archive and found another report of such problem, for a dataset of 30000 x 2, but with no answers. I have some interesting findings playing around with the parameters and data size: 1). > impute.knn(dd, k=3) works, but for k bigger than 3, R crashes as described. 2). > dd2 <- cbind(dd,dd) > dim(dd2) [1] 7332 6 > impute.knn(dd2, k=8) works, but for k bigger than 8, R crashes. 3). > dd3 <- cbind(dd, dd, dd) > dim(dd3) [1] 7332 9 > impute.knn(dd3) works. (k defaults to 10) > impute.knn(dd3, k=17) R crashes. I also played around with other parameters but they didn't help. My conclusion is that the number of neighbors (k) is critical here. However, it's not straightforward how to set it based on data size. Can anybody help, or at least point me to the maintainer of the impute package? Thanks, Yiwen Yiwen He Contractor Center for Information Technology National Institute of Health _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor
Clustering impute Clustering impute • 2.0k views
ADD COMMENT
0
Entering edit mode
@he-yiwen-nihcit-1177
Last seen 10.2 years ago
Thank you Marcus. I'm glad to know that I'm not the only one using that library. However, I tested: > exists(".Random.seed") [1] FALSE So the .Random.seed was never there. To think about it, since I'm using all the default setting when calling impute.knn(myData), the default seed is set to be 362436069 and .Random.seed is not even involved. Any other suggestions? Thanks, Yiwen -----Original Message----- From: marcus [mailto:marcusb@biotech.kth.se] Sent: Monday, May 02, 2005 2:39 AM To: He, Yiwen (NIH/CIT); bioconductor@stat.math.ethz.ch Subject: RE: [BioC] problem with impute.knn in the impute package Hello. You have to remove the random seed using: if(exists(".Random.seed")) rm(.Random.seed) before you run the impute.knn function if you are using a Windows machine. Regards Marcus Marcus Gry Bj?rklund Royal Institute of Technology AlbaNova University Center Department of Molecular Biotechnology 106 91 Stockholm, Sweden Phone (office): +46 8 553 783 44 Fax: + 46 8 553 784 81 Visiting address: Roslagstullsbacken 21, Floor 3 Delivery address: Roslagsv?gen 30B Web: http://www.biotech.kth.se/molbio/microarray/index.html -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of He, Yiwen (NIH/CIT) Sent: Friday, April 29, 2005 19:22 To: 'bioconductor@stat.math.ethz.ch' Cc: Powell, John (NIH/CIT); Asaki, Esther (NIH/CIT) Subject: [BioC] problem with impute.knn in the impute package Hi, I have R version 2.0.1 and bioconductor 1.5 on both PC and Unix. I was trying to use the impute.knn function of the impute package on a dataset of 7332 genes and 3 arrays: > library(impute) > dim(dd) [1] 7332 3 > is.matrix(dd) [1] TRUE > dd.imputed <- impute.knn(dd) When run on PC (windows XP), the R program crashes after a few seconds. When run on a unix box, I can see such output: Cluster size 7332 broken into 5667 1665 Cluster size 5667 broken into 4141 1526 Cluster size 4141 broken into 1796 2345 Cluster size 1796 broken into 840 956 Done cluster 840 Done cluster 956 Done cluster 1796 And R session was closed. So the clustering was started but aborted somewhere in the middle. I searched the archive and found another report of such problem, for a dataset of 30000 x 2, but with no answers. I have some interesting findings playing around with the parameters and data size: 1). > impute.knn(dd, k=3) works, but for k bigger than 3, R crashes as described. 2). > dd2 <- cbind(dd,dd) > dim(dd2) [1] 7332 6 > impute.knn(dd2, k=8) works, but for k bigger than 8, R crashes. 3). > dd3 <- cbind(dd, dd, dd) > dim(dd3) [1] 7332 9 > impute.knn(dd3) works. (k defaults to 10) > impute.knn(dd3, k=17) R crashes. I also played around with other parameters but they didn't help. My conclusion is that the number of neighbors (k) is critical here. However, it's not straightforward how to set it based on data size. Can anybody help, or at least point me to the maintainer of the impute package? Thanks, Yiwen Yiwen He Contractor Center for Information Technology National Institute of Health _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT

Login before adding your answer.

Traffic: 594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6