fastMNN reproducibility across machines when using Docker
0
0
Entering edit mode
nils.eling ▴ 10
@nilseling-23204
Last seen 4 months ago
Switzerland

I'm using the batchelor::fastMNN function to integrate cells across batches as part of a book that I'm writing. I set up a Docker container for reproducibility purposes. While the results of the fastMNN function are reproducible locally inside the container, it gives different results when run on Github actions. I wrote tests to check the reproducibility of the results locally and it seems like some of the PCs are flipped on GA compared to the local results. I have the feeling similar issues arise when using runUMAP and saw online that there can be issues like this coming from irlba.

Is this to be expected when using different machines even though the computations are performed within the same Docker container?

library(batchelor)
set.seed(220228)
out <- fastMNN(spe, batch = spe$patient_id,
               auto.merge = TRUE,
               subset.row = rowData(spe)$use_channel,
               assay.type = "exprs")
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] testthat_3.1.9              SeuratObject_4.1.3         
##  [3] Seurat_4.3.0.1              BiocSingular_1.16.0        
##  [5] harmony_0.1.1               Rcpp_1.0.10                
##  [7] viridis_0.6.3               viridisLite_0.4.2          
##  [9] dittoSeq_1.12.0             cowplot_1.1.1              
## [11] scater_1.28.0               ggplot2_3.4.2              
## [13] scuttle_1.10.1              SpatialExperiment_1.10.0   
## [15] batchelor_1.16.0            SingleCellExperiment_1.22.0
## [17] SummarizedExperiment_1.30.2 Biobase_2.60.0             
## [19] GenomicRanges_1.52.0        GenomeInfoDb_1.36.1        
## [21] IRanges_2.34.1              S4Vectors_0.38.1           
## [23] BiocGenerics_0.46.0         MatrixGenerics_1.12.2      
## [25] matrixStats_1.0.0          
## 
## loaded via a namespace (and not attached):
##   [1] RcppAnnoy_0.0.20          splines_4.3.1            
##   [3] later_1.3.1               bitops_1.0-7             
##   [5] tibble_3.2.1              R.oo_1.25.0              
##   [7] polyclip_1.10-4           lifecycle_1.0.3          
##   [9] rprojroot_2.0.3           edgeR_3.42.4             
##  [11] globals_0.16.2            lattice_0.21-8           
##  [13] MASS_7.3-60               magrittr_2.0.3           
##  [15] limma_3.56.2              plotly_4.10.2            
##  [17] sass_0.4.6                rmarkdown_2.22           
##  [19] jquerylib_0.1.4           yaml_2.3.7               
##  [21] httpuv_1.6.11             sctransform_0.3.5        
##  [23] spatstat.sparse_3.0-1     sp_2.0-0                 
##  [25] reticulate_1.30           pbapply_1.7-0            
##  [27] RColorBrewer_1.1-3        ResidualMatrix_1.10.0    
##  [29] pkgload_1.3.2             abind_1.4-5              
##  [31] zlibbioc_1.46.0           Rtsne_0.16               
##  [33] purrr_1.0.1               R.utils_2.12.2           
##  [35] RCurl_1.98-1.12           GenomeInfoDbData_1.2.10  
##  [37] ggrepel_0.9.3             irlba_2.3.5.1            
##  [39] spatstat.utils_3.0-3      listenv_0.9.0            
##  [41] pheatmap_1.0.12           goftest_1.2-3            
##  [43] spatstat.random_3.1-5     dqrng_0.3.0              
##  [45] fitdistrplus_1.1-11       parallelly_1.36.0        
##  [47] DelayedMatrixStats_1.22.1 leiden_0.4.3             
##  [49] codetools_0.2-19          DropletUtils_1.20.0      
##  [51] DelayedArray_0.26.3       tidyselect_1.2.0         
##  [53] farver_2.1.1              ScaledMatrix_1.8.1       
##  [55] spatstat.explore_3.2-1    jsonlite_1.8.5           
##  [57] BiocNeighbors_1.18.0      ellipsis_0.3.2           
##  [59] progressr_0.13.0          ggridges_0.5.4           
##  [61] survival_3.5-5            tools_4.3.1              
##  [63] ica_1.0-3                 glue_1.6.2               
##  [65] gridExtra_2.3             xfun_0.39                
##  [67] dplyr_1.1.2               HDF5Array_1.28.1         
##  [69] withr_2.5.0               fastmap_1.1.1            
##  [71] rhdf5filters_1.12.1       fansi_1.0.4              
##  [73] digest_0.6.31             rsvd_1.0.5               
##  [75] R6_2.5.1                  mime_0.12                
##  [77] colorspace_2.1-0          scattermore_1.2          
##  [79] tensor_1.5                spatstat.data_3.0-1      
##  [81] R.methodsS3_1.8.2         utf8_1.2.3               
##  [83] tidyr_1.3.0               generics_0.1.3           
##  [85] data.table_1.14.8         httr_1.4.6               
##  [87] htmlwidgets_1.6.2         S4Arrays_1.0.4           
##  [89] uwot_0.1.14               pkgconfig_2.0.3          
##  [91] gtable_0.3.3              lmtest_0.9-40            
##  [93] XVector_0.40.0            brio_1.1.3               
##  [95] htmltools_0.5.5           bookdown_0.34            
##  [97] scales_1.2.1              png_0.1-8                
##  [99] knitr_1.43                rstudioapi_0.14          
## [101] reshape2_1.4.4            rjson_0.2.21             
## [103] nlme_3.1-162              cachem_1.0.8             
## [105] zoo_1.8-12                rhdf5_2.44.0             
## [107] stringr_1.5.0             KernSmooth_2.23-21       
## [109] parallel_4.3.1            miniUI_0.1.1.1           
## [111] vipor_0.4.5               desc_1.4.2               
## [113] pillar_1.9.0              grid_4.3.1               
## [115] vctrs_0.6.3               RANN_2.6.1               
## [117] promises_1.2.0.1          beachmat_2.16.0          
## [119] xtable_1.8-4              cluster_2.1.4            
## [121] waldo_0.5.1               beeswarm_0.4.0           
## [123] evaluate_0.21             magick_2.7.4             
## [125] cli_3.6.1                 locfit_1.5-9.8           
## [127] compiler_4.3.1            rlang_1.1.1              
## [129] crayon_1.5.2              future.apply_1.11.0      
## [131] labeling_0.4.2            plyr_1.8.8               
## [133] ggbeeswarm_0.7.2          stringi_1.7.12           
## [135] deldir_1.0-9              BiocParallel_1.34.2      
## [137] munsell_0.5.0             lazyeval_0.2.2           
## [139] spatstat.geom_3.2-1       Matrix_1.5-4.1           
## [141] patchwork_1.1.2           sparseMatrixStats_1.12.1 
## [143] future_1.32.0             Rhdf5lib_1.22.0          
## [145] shiny_1.7.4               highr_0.10               
## [147] ROCR_1.0-11               igraph_1.5.0             
## [149] bslib_0.5.0
batchelor • 1.0k views
ADD COMMENT
0
Entering edit mode

Is the local one a Mac? Not answering your question specifically, but I had it previously that despite full containerization I got different results from standalone sctransform on a local Mac compared to HPC and workstation. Very similar to: https://github.com/satijalab/sctransform/issues/68 -- never found out why, or at least idr.

ADD REPLY
0
Entering edit mode

Thanks for your input! Yes, the local one is a Mac. That actually reminds me that I could switch the OS on GA to Mac. Let's see if that helps.

ADD REPLY
0
Entering edit mode

You seem to be trying to get (numerical) reproducibility across platforms, which can be pretty hard (and can sometimes reveal issues with the code). Specifically, you're running into problems with using a (fast) PCA implemented in irlba. PCs are not defined up to sign, so I don't think you can expect that platforms (and possibly different BLAST) are guaranteed to have the same sign

ADD REPLY
0
Entering edit mode

Thanks for the reply! That's what I thought - just wanted to get an expert opinion on this issue. I'll still suggest the user to use the (default) fast PCA implementation and will skip all tests that check irlba results on CI.

ADD REPLY
0
Entering edit mode

Respectfully, I would work harder on making the test reproducible. Disabling the test should in my opinion only be done as a last resort.

For sign of PCA on a well-defined dataset, sometimes you may have a covariate and you get test cor(PC, covariate) and if the covariance is negative you can switch the sign on the PC.

ADD REPLY
0
Entering edit mode

Will do, thanks. Testing the sign of the PCA might get a bit tricky as this is the output of the fastMNN function and not directly a simple PCA. My next test will be to get Docker running on MacOS on GA and check if the results are reproducible.

ADD REPLY

Login before adding your answer.

Traffic: 743 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6