It seems that a function read_sparse_block
was implemented 3 months ago but I don't know the details.
Is this simply the same as the read_block
function with the as.sparse=TRUE
option, or is it an extract function for SparseArraySeeds
?
It seems that a function read_sparse_block
was implemented 3 months ago but I don't know the details.
Is this simply the same as the read_block
function with the as.sparse=TRUE
option, or is it an extract function for SparseArraySeeds
?
Hi Koki,
Sorry again for the slow response.
read_sparse_block()
has been around for years. It's a generic function defined in the DelayedArray package, with several methods defined in downstream packages like HDF5Array or TileDBArray. It is not meant to be called directly. You should always call read_block()
instead. See ?S4Arrays::read_block
for more information (read_block
has moved from DelayedArray to S4Arrays).
That being said, I recently introduced the read_block_as_sparse()
generic in the SparseArray package. It will be a replacement for read_sparse_block()
. The difference is that read_block_as_sparse()
will return a SparseArray object (typically an SVT_SparseArray), whereas read_sparse_block()
returns a SparseArraySeed object. This change is part of the plan to use the new and efficient SVT_SparseArray representation everywhere internally in the DelayedArray framework to handle sparse data, instead of the old and inefficient SparseArraySeed representation. This is a work-in-progress. See https://github.com/Bioconductor/DelayedArray/blob/devel/TODO for a detailed roadmap for this transition.
In other words, this is DelayedArray's internal business and the impact on downstream packages and other client code should be minimal.
Best,
H.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks for your reply.
So do you means the specification change from
DelayedArray
'sread_sparse_block
toSparseArray
'sread_block_as_sparse
will accelerate the calculation speed of Input/Output of sparse array to/from HDF5?Will such a feature be available in
DelayedArray
in next coming BioC 3.18? https://github.com/Bioconductor/DelayedArray/blob/devel/TODOCalculation speed depends on many factors like size of the data, size of the blocks, sparseness of the blocks, what operations are performed on the blocks, available memory, disk speed, etc... so I don't want to promise anything. The hope is that we will see some speed improvements in some situations but not necessarily in all situations involving sparse data.
I'm lagging a little bit behind the initial plan so I can't promise that either.
Best,
H.
Ok, I understood.
How about "writing" to HDF5?
Are you going to implement
write_block_as_sparse
inSparseArray
package?No. But I will need to modify the
write_block()
method for TENxRealizationSink implemented in HDF5Array to make it handle blocks that are SVT_SparseArray objects. I don't expect any significant performance improvement from that though.H.