Tutorial:BoF Bioc2017: Infrastructure for efficient storage and processing of large-scale single-cell genomics data
0
1
Entering edit mode
shepherl 4.1k
@lshep
Last seen 3 hours ago
United States

Notes from the BoF Bioc2017 Discussions Courtesy of moderator Davide Risso

Task 1: provide a unified representation of single-cell data

Challenges:

Proposed solutions:

  • Create a class for developers to extend: SingleCellExperiment

Useful Bioconductor packages and other resources:

Task 2: scale-up of existing tools / implementation of tools to handle large-scale datasets

Challenges:

  • Tools are scalable to thousands of cells
  • 10X Genomics released 1.3 Million cell dataset
  • Main problem: does not fit in memory!

Proposed solutions:

  • HD5 files + "chunk operations"
  • Simple algorithms + approximate, scalable methods
  • Provide API to perform common operations independent of data representation (in memory vs. on disk)

Useful Bioconductor packages and other resources:

Interested in contributing? Join the slack channel: https://community-bioc.slack.com

Disussion points:

  1. Benchmark (canonical datasets)
  2. Splatter (simulations of scRNA-seq)
  3. What to do next?
  4. BigDataAlgorithms: define scope, what functinalities we want a. Prior art in astronomy, etc?
  5. Visualization?
  6. Multi assay? a. People are running single-cell assays that generate multiple types of data (e.g., RNA expression and methylation) from each single-cell. b. Can store each assay in a SingleCellExperiment and then put inside a MultiAssayExperiment to link up the row and column metadata.
  7. Multiple samples--list of SingleCellExperiments vs giant joined SingleCellExperiment. Can we learn from flowSet?
single-cell objectstorage Tutorial • 3.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6