Entering edit mode
I would like to run the example code from the Reading HDF5 Files In The Cloud vignette from the rhdf5
Bioconductor package:
Executing the first example:
public_S3_url <- "https://rhdf5-public.s3.eu-central-1.amazonaws.com/h5ex_t_array.h5"
h5ls(file = public_S3_url, s3 = TRUE)
raises an error, because the Rhdf5lib
package hasn't been compiled with support for S3:
Error in H5Pset_fapl_ros3(fapl, s3credentials) :
Rhdf5lib was not compiled with support for the S3 VFD
Does anybody have pointers on how to add support for S3 VFD?
This is what I found so far:
- I found the vignette that documents how the authors of the
Rhdf5lib
library created their HDF5 distribution. The details are beyond my understanding, unfortunately. The authors also state:
This is for record keeping only, users of the Rhdf5lib package are not expected to follow any of the steps detailed here.
- The hdf5 group provides information on how to include the
S3 VFD
into hdf5, e.g. by adding arguments to theconfigure
command. I tried to install theRhdf5lib
library from source and included those arguments via theconfigure.args
argument, but they weren't recognized.
BiocManager::install('Rhdf5lib', type = "source",
configure.args = "-DHDF5_ENABLE_ROS3_VFD:BOOL=ON")
Any suggestions would be appreciated - I'd love to understand how to Read HDF5 Files In The Cloud
!
Thank you, Thomas
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rhdf5_2.34.0
loaded via a namespace (and not attached):
[1] compiler_4.0.3 credentials_1.3.0 tools_4.0.3 curl_4.3
[5] rhdf5filters_1.2.0 jsonlite_1.7.2 openssl_1.4.3 sys_3.4
[9] Rhdf5lib_1.12.1 askpass_1.1
I think I figured it out - based on your pointers to
openssl
!openssl
I installed openssl through homebrew, which also pulls in curl.
This by itself didn't fix the problem, e.g. the openssl headers were still not detected.
Symbolic links
I found this issue from an unrelated github project, which recommends to create symbolic links:
Rhdf5lib installation
Now the source installation of Rhdf5lib sets the
S3_VFD=--enable-ros3-vfd
argument automatically, as expected:The installation finishes successfully.
rhdf5 installation
I reinstalled the
rhdf5
package as well (because my original installation still didn't pick up the S3 VFD).And now I can read your example file!
Thanks a lot for your help, much appreciated! Thomas
Excellent, great that it seems to be working. On my GitHub builder I settle on adding the following two lines to
$HOME/.R/Makevars
Now it's working I should point out that in my limited testing with larger, real-data, files the
h5ls()
is surprisingly slow, buth5read()
with anindex
argument seems to work quite well if you know the structure of the file already. This performance is something I'm actively working on at the moment.Also, do you want to drag you solution above to be an Answer rather than a Comment. Hopefully that'll help someone else looking for the info in the future.
Do you install the source package or the built binary? If you're not sure, does Rhdf5lib print hundreds of lines to the screen during installation? If so you're installing from source.
The real answer is that we need to make sure it finds libcurl & libopenssl system libraries during compilation, but you seem to have R packages built around those available, so I assume they're installed on the system.
I've not thought about whether the Mac binaries would ship with support, so it would be good to know how you are installing.
Thanks a lot for your super quick reply, Mike!
I have tried both installing the Mac binary
or the source
but got the
Rhdf5lib was not compiled with support for the S3 VFD
error either way.I also tried to add configuration arguments to the call, but the
configure
command did not recognize them:or
Is that what you needed to know?
Thanks, Thomas
P.S.: After you pointed out that this might be Mac specific, I tried a BioC docker container running Linux. I installed
rhdf5
(from source) and was able to access the remote HDF5 file. So it seems you are right, the Mac version might be missing the S3 support.Thanks for the info. You won't be able to enable this via
configure.args
. Rhdf5lib has its own configure file that wraps the HDF5 configuration. It's a simplified version to control many of the options. That makes it easier for me to support the majority of users, but also means a lot of the general HDF5 documentation isn't appropriate.I took at look at the Bioconductor log when the build system creates the Mac binary. You can see fairly close to the top the message
S3_VFD=--enable-ros3-vfd=no
. That's the argument that will be passed to HDF5 during compilation. Rhdf5lib selects that based on the availability of libcurl and libopenssl, so the only way to make it change that to ayes
is to make sure those can be found.You can see the results of the individual tests on the lines above e.g.
This suggests that openssl isn't installed, or at least can't be found. I don't know enough about the Bioconductor build machines to say which is the case, but it explains why the binary version doesn't have S3 support. I'm also not sure how portable it would be if the build system did have it, but a user did not.
I see the same on my GitHub Actions Mac builder. On there I explicitly install openssl with
brew install openssl
, so that clearly isn't sufficient to get this working. I'm not really a Mac user, but maybe this gives you some tips on what might be needed to get it working on your system. I'll try to get the version on Github working, and will report back if I find the right instructions.Many thanks for your detailed explanation - and for pointing me in the right direction. I will report back if I can figure out how to make it work on my system. (And if you'd like a tester - beyond your github setup - let me know!)
Thanks, Thomas
FWIW, I was able to get this to work by following the advice above. Which was to install
openssl
and add symbolic links.and adding the flags to the
~/.R/Makevars
file.and then installing
Rhdf5lib
andrhdf5
fromsource
in that order....
-MR
Oh, that's great. Thanks a lot for following up on this and posting your solution, Marcel Ramos !