Is there existing functionality to extract only the dataframes from AnnData HDF5 objects? That is, directly extract the equivalents of colData
and rowData
from .h5ad
files.
The anndata
package only seems to load the full object (plus, I very much dislike the fact that it is wrapper for the Python package, i.e., requires a Python installation). The HDF5Array
package can import the matrix, as a DelayedArray
, but doesn't include the dataframes.
For now, I am using the following code, but would prefer if this functionality was packaged somewhere.
library(tidyverse)
library(rhdf5)
read_ad_df <- function (file, name) {
x_attrs <- h5readAttributes(file, name)
## check requested entry is a dataframe
## TODO: do we need to check encoding-version?
stopifnot(x_attrs[['encoding-type']] == "dataframe")
## rownames and columns in order
idx_cols <- unlist(x_attrs[c("_index", "column-order")], use.names=FALSE)
## load the factor levels
x_levels <- h5read(file, str_c(name, "/__categories"))
## load dataframe
h5read(file, name)[idx_cols] %>% as_tibble() %>%
## replace categorical columns with proper factors
mutate(across(any_of(names(x_levels)), ~ factor(x_levels[[cur_column()]][.x+1L])))
}
where read_ad_df(FILE, "/obs")
retrieves colData
and read_ad_df(FILE, "/var")
retrieves rowData
.