Supercells with Basilisk fails for dtype('float64') to dtype('int64') conversion
I have a strange error, that I don't think is related to the input data, about dtype('float64') to dtype('int64') conversion. Is it possible that from the A2 python connection something strange is happening here?

Thanks a lot

Offending code

metacells_env <- BasiliskEnvironment(
  envname = "metacells_env",
  pkgname = "HPCell",
  packages = c("numpy==1.24.3"),  # Upgrade numpy to a version compatible with Python 3.10
  pip = c("metacells==0.9.4", "anndata==0.10.9")  # Use pip to install metacells

mc <- reticulate::import("metacells", delay_load = TRUE)
np <- reticulate::import("numpy", delay_load = TRUE)

my_anndata = anndata$AnnData(X = matrix(as.integer(rpois(200, lambda = 5)), nrow = 20, ncol = 10) )

mc$pl$divide_and_conquer_pipeline(my_anndata, random_seed=123456)


set unnamed.var[selected_gene]: * -> False
set unnamed.var[rare_gene]: 0 true (0%) out of 36412 bools
set unnamed.var[rare_gene_module]: 36412 int32 elements with all outliers (100%)
set unnamed.obs[cells_rare_gene_module]: 4860 int32 elements with all outliers (100%)
set unnamed.obs[rare_cell]: 0 true (0%) out of 4860 bools
Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  TypeError: Cannot cast scalar from dtype('float64') to dtype('int64') according to the rule 'safe'

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Red Hat Enterprise Linux 9.4 (Plow)

Matrix products: default
BLAS:   /stornext/System/data/software/rhel/9/base/tools/R/4.4.1/lib64/R/lib/ 
LAPACK: /home/users/allstaff/mangiola.s/.cache/R/basilisk/1.16.0/HPCell/0.3.7/metacells_env/lib/;  LAPACK version 3.9.0

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            

time zone: Australia/Melbourne
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SummarizedExperiment_1.34.0 Biobase_2.64.0              GenomicRanges_1.56.1        GenomeInfoDb_1.40.1         IRanges_2.38.1             
 [6] S4Vectors_0.42.1            BiocGenerics_0.50.0         MatrixGenerics_1.16.0       matrixStats_1.4.1           basilisk_1.16.0            
[11] reticulate_1.39.0           CuratedAtlasQueryR_1.4.7    crew.cluster_0.3.2          HPCell_0.3.7                shinyBS_0.61.1             
[16] stringr_1.5.1               purrr_1.0.2                 tibble_3.2.1                glue_1.8.0                  targets_1.8.0.9003         
[21] duckdb_1.0.0-2              DBI_1.2.3                   dplyr_1.1.4                 arrow_17.0.0.1             

I focused on the main data, while the problem was the seed class. This solves the problem

mc$pl$divide_and_conquer_pipeline(my_anndata, random_seed=123456L)

For the general audience, reticulate / python (which basilisk depends on / wraps) requires explicit declaration of integers, so one needs indeed 123456L rather than 123456, mind the L for long integer.


