I am trying to use rhdf5 to read data from MAT files (generated in MATLAB 2013a using v7.3 of the MAT file format) in R. For the most part it works great, but I'm getting some strange behavior trying to load strings. In MATLAB, I create a 596x55 char array, x. I then create a second variable, y = x(1:end-1,:), which just contains the first 595 rows of x. I then save both to a file:
save z:/temp/test73.mat -v7.3 x y
I can load the file in MATLAB without any issues.
>> load('z:/temp/test73.mat', 'x') >> x(1,:) ans = GRUPO AEROPORT DEL PACIFIC-B >> load('z:/temp/test73.mat', 'y') >> y(1,:) ans = GRUPO AEROPORT DEL PACIFIC-B
In R, I am able to load y from the file without any issues:
> values <- h5read("z:/temp/test73.mat", "/y") > values[1,] [1] 71 82 85 80 79 32 65 69 82 79 80 79 82 84 32 68 69 76 32 80 65 67 73 70 73 67 45 66 32 32 32 32 32 32 32 32 [37] 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 > nc <- ncol(values) > mode(values) <- "raw" > values <- rawToChar(t(values)) > values <- substring(values, seq(1, nchar(values)-1, nc), seq(nc, nchar(values), nc)) > values <- gsub(" *$", "", values) > values[1] [1] "GRUPO AEROPORT DEL PACIFIC-B"
However, I am unable to load x:
> values <- h5read("z:/temp/test73.mat", "/x") > values[1,] [1] 71 85 79 66 82 78 79 69 32 79 71 82 73 70 32 82 32 32 32 69 48 32 79 68 32 32 32 32 32 32 32 32 32 32 32 32 [37] 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 > nc <- ncol(values) > mode(values) <- "raw" Warning message: In eval(expr, envir, enclos) : out-of-range values treated as 0 in coercion to raw > values <- rawToChar(t(values)) Error in rawToChar(t(values)) : embedded nul in string: 'GUOBRNOE OGRIF R E0 OD GUOOANOHBO HC A ROPN U R GUOUOIO EFT RC SL HT GUO UAOEIADCC WM L N D GUO INONROI AAA B CLHST- GUR INO(YCACT C TDC O GUR LNOOTOIOI-N AONNA G GUTULNE C RRAA THCNNC Y O GUYOLOUOSCITLSCIG PP E GRY LOAO CEEISECOPIR B GRC GOHOACOAR OACLD- R N 1 GRK MOHOACIAE O CR I C A R GRT ARIORATSI TN L R GR HNIORAUS / LRC C GRANWRIORAINUA ALIN L R U GRA.ROIOU HUSA SE SC KT U GRR EXIORNA N R LD S C E GRR0LRI QH P R -L > values <- substring(values, seq(1, nchar(values)-1, nc), seq(nc, nchar(values), nc)) Error in seq.default(1, nchar(values) - 1, nc) : 'to' must be of length 1
The output of sessionInfo() is:
> sessionInfo() R version 3.0.3 (2014-03-06) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rhdf5_2.6.0 loaded via a namespace (and not attached): [1] tools_3.0.3 zlibbioc_1.8.0
I would appreciate any help in solving this.
Thanks.
- Elliot
I'm wondering if there are encoding issues.
raw
type, as one might predict, expects everything to be in byte range, i.e., 0 <= x <= 255.Try this in an R session:
And the man page for rawToChar (
?rawToChar
) warns about encoding issues, and says only trailing nuls are allowed.Does
which(values < 0 & values > 255)
return anything?