However, since there is a limit to the capacity of /tmp directory, there is a risk of running out of disk space when handling large data.
Therefore, I would like to know if there is a way to optionally set the directory globally in advance and not let /tmp be used.
It seems that the writeHDF5Array can be specified with the filepath option, but is there any way to specify the filepath in the DelayedArray's functions such as setAutoRealizationBackend("HDF5Array") and AutoRealizationSink?
For example, I thought I could use the path function to change the directory, but I couldn't.
I imagined the change of tempdir will change the temporal directory inside DelayedArray so I installed unixtools package and used the set.tempdir function but it didn't work.
I think you need to look at setHDF5DumpDir() from the HDF5Array package.
Here's a little exploration of what happens when you use that function:
## define a location
hdf5_temp_dir <- "/tmp/testing/HDF5Array"
## the location must exist for HDF5Array to use it
dir.create(hdf5_temp_dir, recursive = TRUE)
## check it's empty
list.files(hdf5_temp_dir)
#> character(0)
setHDF5DumpDir(hdf5_temp_dir)
## setting this seems to create an empty H5 file automatically
list.files(hdf5_temp_dir)
#> [1] "auto00001.h5"
dump_file <- getHDF5DumpFile()
dump_file
#> [1] "/tmp/testing/HDF5Array/auto00001.h5"
file.size( dump_file )
#> [1] 800
B3 <- array(runif(2*3*4), dim=c(2,3,4))
B3 <- as(B3, "HDF5Array")
## now the temporary dump file is bigger as we've written something
## and HDF5Array is set to use another file for the next operation
file.size( dump_file )
#> [1] 6725
getHDF5DumpFile()
#> [1] "/tmp/testing/HDF5Array/auto00002.h5"
Thank you Mike Smith !
This is what I was looking for.
I've confirmed that when running DelayedArray, the intermediate HDF5 files are output to the directory I set, not /tmp.
I imagined the change of
tempdir
will change the temporal directory insideDelayedArray
so I installedunixtools
package and used theset.tempdir
function but it didn't work.