HOPACH error: "negative length vectors are not allowed" in distancematrix(). Matrix too large?
1
0
Entering edit mode
ejliaw • 0
@ejliaw-7382
Last seen 6.9 years ago
United States

Greetings,

I was attempting to use HOPACH to cluster the rows of a 610758 x 9 matrix of floating points, but the distancematrix function gave the following error:

Error in .Call("R_disscosangle", as.vector(X), as.numeric(dim(X)[1]),  : 
  negative length vectors are not allowed

I can recreate the error as follows (session info included):

> library("hopach")
> test <- matrix(runif(1000*10), 1000, 10)
> my.dist <- distancematrix(test, "cosangle") # works
> dim(my.dist)
[1] 1000 1000
> test <- matrix(runif(610758*9), 610758, 9)
> my.dist <- distancematrix(test, "cosangle") # error message shows up immediately
Error in .Call("R_disscosangle", as.vector(X), as.numeric(dim(X)[1]),  : 
  negative length vectors are not allowed

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] hopach_2.26.0       Biobase_2.26.0      BiocGenerics_0.12.1
[4] cluster_2.0.1      

> test <- matrix(runif(100000*10), 100000, 10)
> my.dist <- distancematrix(test, "cosangle") # after a while we get a segfault

 *** caught segfault ***
address 0x7f60e326b000, cause 'invalid permissions'

Traceback:
 1: .Call("R_disscosangle", as.vector(X), as.numeric(dim(X)[1]),     as.numeric(dim(X)[2]), as.logical(na.rm))
 2: disscosangle(X, na.rm)
 3: distancematrix(test, "cosangle")

Am I running out of memory? (https://www.google.com/webhp?q=negative+length+vectors+are+not+allowed+r)

Cheers,

Eric

 

 

 

hopach • 6.0k views
ADD COMMENT
1
Entering edit mode
kpollard ▴ 110
@kpollard-7578
Last seen 9.5 years ago
United States

Hi Eric - It indeed looks like you've hit the memory limit. The object my.dist that you are attempting to create is a vector of length 610758*610757/2. 

Best,

Katie

ADD COMMENT
0
Entering edit mode

Hi Katie,

Would you have any suggestions for this situation? If the Internet is correct in saying that R (even 3.1.2) cannot allocate vectors longer than 2^31 - 1, is there a package somewhere that has circumvented this?

Thanks

ADD REPLY
0
Entering edit mode

R can allocate larger vectors (try integer(2^31), for instance, if your computer has enough memory!) but packages with C code have to be written to work with large vectors; packages that were developed before R supported large vectors (like hopach) are not likely to support these.

It seems like the reasonable statistical thing to do is to pre-process your data in some way to reduce its volume, e.g., by filtering on variability or kmeans-clustering followed by use of centroids (but these are naive suggestions, maybe Katie can provide something more substantive).

ADD REPLY
0
Entering edit mode

Thanks Martin,

Indeed, I've found that any function requiring a distance matrix, like the built in hclust(), cannot handle that many rows.

To see any pattern, I've been using kmeans with a large k (e.g. 80), then using hclust on the 80 cluster centroids, and finding an optimal ordering for the tree of the centroids (with the 'cba' package) to reorder my original matrix. Is this what you meant by using k-means clustering to pre-process data?

ADD REPLY
0
Entering edit mode

Yes that sounds approximately like what I was thinking.

ADD REPLY

Login before adding your answer.

Traffic: 594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6