Sparse matrices in Bioconductor objects for single-cell analyses
1
0
Entering edit mode
sandmann.t ▴ 70
@sandmannt-11014
Last seen 14 months ago
United States

Dear Bioconductors, 

more and more publications include large single-cell RNA-seq datasets. For example, Keren-Shaul et al made count matrices with 34016 gene x 37248 samples (= cells) available on NCBI GEO. I am interested in using Bioconductor to analyze such data and was happy to find the single-cell analysis Bioconductor workflow and the excellent scater package.

The single-cell data are very sparse, with up to 90% of zero counts and can be very efficiently stored in sparse matrices, e.g. using the Matrix package. Yet, it seems that while scater's newSCESet function accepts a sparse matrix, it coerces it into a regular matrix right away, which requires much more memory (and storage space).

I can simply find a machine with lots of RAM or apply abundance filters before creating an SCESet, but I am curious: are there ways to use Bioconductor's infrastructure and take advantage of the sparse nature of the data? 

And in case the scater authors are listening: do you have any plans to take use sparse matrices?

Any recommendations are appreciated.

Thanks,

Thomas

scater simpleSingleCell workflows • 3.6k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 19 hours ago
The city by the bay

Well, this question might as well have my name written on it.

Yes, we are planning to adapt the SCESet object to accept sparse matrices; see https://github.com/drisso/SingleCellExperiment for a class based on SummarizedExperiment (which happily takes sparse matrices, as well as disk-based representations such as instances of the HDF5Array class).

Most of the R code should then work interchangeably with dense, sparse and file-backed matrices. The C++ code will take some more effort but we have written an API (https://github.com/LTLA/beachmat) which allows package code to be written in a manner that is agnostic to the exact representation of the matrix.

All of these goodies should hopefully come out in the next release, sometime in September.

ADD COMMENT
0
Entering edit mode

That's great! Thanks a lot for sharing your progress, plans and especially the pointer to the SingleCellExperiment package. I am not surprised that you have even more awesome solutions in the works :-)

ADD REPLY

Login before adding your answer.

Traffic: 588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6