Entering edit mode
Anne
•
0
@38e1c8a1
Last seen 3.5 years ago
Hello,
Sorry for that very basic question: I have raw RNA-seq count data. The row names are genes, the column names are short sequences (e.g., AAACCTGCAATCTACG.1). Aren't these supposed to be sample names? What is the name of such a file format? (Couldn't find anything online, though don't know what I have to search for.) The ultimate goal is to have a count matrix of genes vs. samples.
Thanks a lot for any help on this.
Yes, it is single-cell data. That makes sense, so 1 barcode corresponds a sample (all barcodes are unique in the file). Thank you!!
The sequences do not correspond to one sample, but rather the identify of a single droplet, which (you hope) has the read counts from one and only one cell.
Be sure to run the standard QC on this count matrix, e.g. following https://bioconductor.org/books/release/OSCA/. As Steve Lianoglou says it is not samples but droplets. Ideally you captured a single cell per droplet but it could also be an empty droplet or doublet/multiplets which need to be removed. Also damaged cells and poor-quality ones need removal. OSCA will guide you through the essential steps.