Hi everybody
When following the vignette tutorial (https://stuartlab.org/signac/articles/pbmc_multiomic.html) to implement a pipeline to analyze rna-seq and atac data. I want to know how and where the data are stored in the hdf5 object. This basically is related with the cell features called modalities, and aren't not explained in the tutorial. Here, I share a simple explanation that help you to know what metadata you have available to extract.
The h5ad files are composed of a cell by feature, such as genes expression and peaks & other metadata. So, a way to extract expression data or other related, is as described below:
First load the hdf5 file
sc_hippo <- Read10X_h5("/folder/file_name_hdf5_file.h5"). #set your file name
In this MTX each row represents a peak, predicted to represent a region of open chromatin.
Next read the modalities available in the object
str(sc_hippo)
Output example:
$ Gene Expression:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:8851990] 60 70 73 80 82 84 86 94 104 106 ...
.. ..@ p : int [1:2365] 0 2972 7234 9033 12814 18212 21573 22506 24986 28028 ...
$ Peaks :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:27141995] 113 356 364 415 424 535 555 629 640 787 ...
.. ..@ p : int [1:2365] 0 925 8634 14758 28329 48650 62511 74887 77451 87482 ...
The available modalities here are Gene Expression and Peaks, this modalities are detonated with the $ sign.
By last, you can reading a specific modality
rna_counts <- sc_hippo$`Gene Expression`
CSC