Notes from the BoF Bioc2017 Discussions Courtesy of moderator Davide Risso
Task 1: provide a unified representation of single-cell data
Challenges:
Hundreds of scRNA-seq software tools
Most R and Bioconductor packages define their own class
Some extend SummarizedExperiment, some ExpressionSet
Most packages don’t fully exploit the potential of SummarizedExperiment (e.g., assay does not have to be a matrix)
Proposed solutions:
- Create a class for developers to extend: SingleCellExperiment
Useful Bioconductor packages and other resources:
Task 2: scale-up of existing tools / implementation of tools to handle large-scale datasets
Challenges:
- Tools are scalable to thousands of cells
- 10X Genomics released 1.3 Million cell dataset
- Main problem: does not fit in memory!
Proposed solutions:
- HD5 files + "chunk operations"
- Simple algorithms + approximate, scalable methods
- Provide API to perform common operations independent of data representation (in memory vs. on disk)
Useful Bioconductor packages and other resources:
- TENxGenomics (Martin Morgan)
- beachmat (Aaron Lun)
- DelayedMatrixStats (Peter Hickey)
- BigDataAlgorithms
- HDF5Array (Herve Pages)
- restfulSE (Vince Carey)
Interested in contributing? Join the slack channel: https://community-bioc.slack.com
Disussion points:
- Benchmark (canonical datasets)
- Splatter (simulations of scRNA-seq)
- What to do next?
- BigDataAlgorithms: define scope, what functinalities we want a. Prior art in astronomy, etc?
- Visualization?
- Multi assay? a. People are running single-cell assays that generate multiple types of data (e.g., RNA expression and methylation) from each single-cell. b. Can store each assay in a SingleCellExperiment and then put inside a MultiAssayExperiment to link up the row and column metadata.
- Multiple samples--list of SingleCellExperiments vs giant joined SingleCellExperiment. Can we learn from flowSet?