I am getting a segfault when trying to write a matrix of more than about 20 million character values each about 30 characters long. The following demo reproduces the errors on my machine (edited to remove extraneous lines I inadvertently pasted as pointed out by Wolfgang Huber below).
library(rhdf5) hdfError <- function(nrow=20, ncol=1500000) { file <- "test.hdf" test <- matrix("Some text for my matrix", nrow=nrow, ncol=ncol) h5createFile(file) h5createDataset(file, "matrix", c(nrow, ncol), size=100, level=0, chunk=c(min(nrow, 10000), min(ncol, 10000)), storage.mode = "character") h5write(test, file, "matrix") } hdfError()
After reading other posts about segfaults with this package, I tried:
ulimit -s unlimited
But that did not resolve the issue.
I am running on AWS on an C48XLarge instance which has 60GB of RAM. Here is my sessionInfo():
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rhdf5_2.12.0
loaded via a namespace (and not attached):
[1] zlibbioc_1.14.0
(I noted in other posts references to a version 2.7 of rhdf5, but I cannot find a version later than 2.12 available for download).
EDIT: I also tried installing from source, and also installing the devel version (2.13) of rhdf5, but the problem remains.
Here is the traceback from the segfault:
*** caught segfault *** address 0x2ae66587c07c, cause 'memory not mapped' Traceback: 1: .Call("_H5Dwrite", h5dataset@ID, buf, sidFile, sidMem, PACKAGE = "rhdf5") 2: H5Dwrite(h5dataset, obj, h5spaceMem = h5spaceMem, h5spaceFile = h5spaceFile) 3: doTryCatch(return(expr), name, parentenv, handler) 4: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 5: tryCatchList(expr, classes, parentenv, handlers) 6: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) call <- sys.call(-4L) dcall <- deparse(call)[1L] prefix <- paste("Error in", dcall, ": ") LONG <- 75L msg <- conditionMessage(e) sm <- strsplit(msg, "\n")[[1L]] w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], type = "b") if (w > LONG) prefix <- paste0(prefix, "\n ") } else prefix <- "Error : " msg <- paste0(prefix, conditionMessage(e), "\n") .Internal(seterrmessage(msg[1L])) if (!silent && identical(getOption("show.error.messages"), TRUE)) { cat(msg, file = stderr()) .Internal(printDeferredWarnings()) } invisible(structure(msg, class = "try-error", condition = e))}) 7: try({ res <- H5Dwrite(h5dataset, obj, h5spaceMem = h5spaceMem, h5spaceFile = h5spaceFile)}) 8: h5writeDataset.array(...) 9: h5writeDataset.matrix(obj, loc$H5Identifier, name, ...) 10: h5writeDataset(obj, loc$H5Identifier, name, ...) 11: h5write.default(test, file, "matrix") 12: h5write(test, file, "matrix") 13: hdfError()
Here is the output of top
on the R process at the time of the segfault (does not look too onerous):
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 30871068 total, 3958112 used, 26912956 free, 61888 buffers KiB Swap: 0 total, 0 used, 0 free. 1157648 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 46688 root 20 0 3270896 2.258g 5416 S 0.0 7.7 0:06.07 R
(My goal here is to create an hdf5 version of the instance metadata file from the LINCS project to go along with the hdf5 file containing the gene expression data. That datafile is much larger than the metadata file I am trying to create, containing a roughly 25,000 x 1,200,000 data matrix, although I am not sure how they generated that file--possibly Matlab, possibly the no longer maintained hdf5 package from CRAN. So I would love to be able to work with data sets of this magnitude with the rhdf5 package from BioC).
Any thoughts on getting this to work? Thank you!
-Eric
No actually I can comment. The button just looked a bit deactivated on my mobile device but in fact works fine.