rhdf5 write/read inconsistency
2
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.2 years ago
I have an example of a matrix which I write with rhdf5 but when I read it back in I get something randomly different from what I wrote. This example demonstrates the effect. It seems to be related somehow to having small chunks. In the example I write a matrix, then read it back in 10 times, each time printing its sum. It is usually a different sum, and never correct. library(rhdf5) go <- function(numRow = blocksize, chunksize = 4, numCol = 3, dims = c(numRow, numCol), start = 1, blocksize = 7) { str(list(numRow = numRow, numCol = numCol, start = start, chunksize = chunksize, blocksize = blocksize)) mtx <- matrix(1:(blocksize*numCol), ncol = numCol) cat("sum(matrix)=", sum(mtx), "\n") file.exists("x.hdf5") && unlink("x.hdf5") h5createFile("x.hdf5") h5createDataset(file="x.hdf5", dataset = "x", dims = dims, H5type = "H5T_NATIVE_UINT32", level = 0, chunk= c(chunksize,numCol)) h5write(mtx, "x.hdf5", name = "x", start = c(start, 1), stride = c(1,1), block = c(blocksize, numCol), count= c(1,1)) { for(i in 1:10) print(sum(h5read("x.hdf5", "/x", start = c(start, 1), stride = c(1,1), block = c(blocksize, numCol), count= c(1,1)))) } } ##### and the transcript: > go() List of 5 $ numRow : num 7 $ numCol : num 3 $ start : num 1 $ chunksize: num 4 $ blocksize: num 7 sum(matrix)= 231 [1] 209 [1] 47358985 [1] 234 [1] 42963065 [1] 46236113 [1] 48574193 [1] 11738297 [1] 11738297 [1] 11738297 [1] 193 -- output of sessionInfo(): R version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rhdf5_2.5.7 loaded via a namespace (and not attached): [1] zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.
rhdf5 rhdf5 • 1.7k views
ADD COMMENT
0
Entering edit mode
Julian Gehring ★ 1.3k
@julian-gehring-5818
Last seen 5.5 years ago
Hi Brad, I recall that there was a bug in some versions of the HDF5 driver with similar consequences. Does the behavior still occur if you define the chunks to be along the second dimension of your matrix? Best wishes Julian On 11/06/2013 03:55 PM, Brad Friedman [guest] wrote: > library(rhdf5) > go <- function(numRow = blocksize, > chunksize = 4, > numCol = 3, > dims = c(numRow, numCol), > start = 1, > blocksize = 7) { > str(list(numRow = numRow, numCol = numCol, > start = start, > chunksize = chunksize, > blocksize = blocksize)) > > mtx <- matrix(1:(blocksize*numCol), ncol = numCol) > cat("sum(matrix)=", sum(mtx), "\n") > > file.exists("x.hdf5") && unlink("x.hdf5") > h5createFile("x.hdf5") > h5createDataset(file="x.hdf5", > dataset = "x", > dims = dims, > H5type = "H5T_NATIVE_UINT32", > level = 0, > chunk= c(chunksize,numCol)) > > h5write(mtx, "x.hdf5", name = "x", > start = c(start, 1), > stride = c(1,1), > block = c(blocksize, numCol), > count= c(1,1)) > > { > for(i in 1:10) > print(sum(h5read("x.hdf5", "/x", > start = c(start, 1), > stride = c(1,1), > block = c(blocksize, numCol), > count= c(1,1)))) > } > }
ADD COMMENT
0
Entering edit mode
Julian Gehring ★ 1.3k
@julian-gehring-5818
Last seen 5.5 years ago
Hi Brad, I recall that there was a bug in some versions of the HDF5 driver with similar consequences. Does the behavior still occur if you define the chunks to be along the second dimension of your matrix? Best wishes Julian On 11/06/2013 03:55 PM, Brad Friedman [guest] wrote: > library(rhdf5) > go <- function(numRow = blocksize, > chunksize = 4, > numCol = 3, > dims = c(numRow, numCol), > start = 1, > blocksize = 7) { > str(list(numRow = numRow, numCol = numCol, > start = start, > chunksize = chunksize, > blocksize = blocksize)) > > mtx <- matrix(1:(blocksize*numCol), ncol = numCol) > cat("sum(matrix)=", sum(mtx), "\n") > > file.exists("x.hdf5") && unlink("x.hdf5") > h5createFile("x.hdf5") > h5createDataset(file="x.hdf5", > dataset = "x", > dims = dims, > H5type = "H5T_NATIVE_UINT32", > level = 0, > chunk= c(chunksize,numCol)) > > h5write(mtx, "x.hdf5", name = "x", > start = c(start, 1), > stride = c(1,1), > block = c(blocksize, numCol), > count= c(1,1)) > > { > for(i in 1:10) > print(sum(h5read("x.hdf5", "/x", > start = c(start, 1), > stride = c(1,1), > block = c(blocksize, numCol), > count= c(1,1)))) > } > }
ADD COMMENT
0
Entering edit mode
Dear Brad and Julian! There was a bug when reading/writing data when chunk-/block size was defined. The bug is fixed in version 2.7.3 and will appear on the website with the next build. Best, Bernd On 07.11.2013, at 12:07, Julian Gehring <julian.gehring at="" embl.de=""> wrote: > Hi Brad, > > I recall that there was a bug in some versions of the HDF5 driver with similar consequences. Does the behavior still occur if you define the chunks to be along the second dimension of your matrix? > > Best wishes > Julian > > > On 11/06/2013 03:55 PM, Brad Friedman [guest] wrote: >> library(rhdf5) >> go <- function(numRow = blocksize, >> chunksize = 4, >> numCol = 3, >> dims = c(numRow, numCol), >> start = 1, >> blocksize = 7) { >> str(list(numRow = numRow, numCol = numCol, >> start = start, >> chunksize = chunksize, >> blocksize = blocksize)) >> >> mtx <- matrix(1:(blocksize*numCol), ncol = numCol) >> cat("sum(matrix)=", sum(mtx), "\n") >> >> file.exists("x.hdf5") && unlink("x.hdf5") >> h5createFile("x.hdf5") >> h5createDataset(file="x.hdf5", >> dataset = "x", >> dims = dims, >> H5type = "H5T_NATIVE_UINT32", >> level = 0, >> chunk= c(chunksize,numCol)) >> >> h5write(mtx, "x.hdf5", name = "x", >> start = c(start, 1), >> stride = c(1,1), >> block = c(blocksize, numCol), >> count= c(1,1)) >> >> { >> for(i in 1:10) >> print(sum(h5read("x.hdf5", "/x", >> start = c(start, 1), >> stride = c(1,1), >> block = c(blocksize, numCol), >> count= c(1,1)))) >> } >> } > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6