Hi all,
I am trying to annotate a single cell dataset using a custom reference with SingleR.
I have previously successfully annotated this dataset using SingleR with other custom references but they were smaller (~1GB).
My dataset is ~15Gb and the dataset I would like to use as a reference (chimpanzee middle temporal gyrus from here: https://cellxgene.cziscience.com/collections/4dca242c-d302-4dba-a68f-4c61e7bad553) is ~12Gb.
When I try to run the SingleR command, R crashes even with 208Gb memory allocated on my university's server.
Since the reference dataset I want to use is still a normal size for a single cell experiment, I am thinking others may have encountered a similar issue.
Does anyone have a solution? Thanks!! Elaine ps I am not putting my code in since I'm not getting an error.
Code should be placed in three backticks as shown below
#reading in my dataset (single cell experiment object)
PFC.merged.sce <- readRDS("sce_chimp_PFC.RDS")
#reading in the data I want to use as a reference with SingleR
chimp_MTG = readRDS("sc-data/Jorstad/seur_chimp_MTG_Jorstad.rds")
#convert from Seurat to SCE object
seuMTG.chimp.sce <- as.SingleCellExperiment(chimp_MTG)
#to free memory going to remove seurat object:
rm(chimp_MTG)
#converting ensembl IDs to gene symbol to match my dataset
require(EnsDb.Hsapiens.v86)
geneids <- mapIds(EnsDb.Hsapiens.v86,
keys = rownames(seuMTG.chimp.sce),
column = 'SYMBOL',
keytype = 'GENEID')
all(rownames(seuMTG.chimp.sce) == names(geneids))
keep <- !is.na(geneids)
geneids <- geneids[keep]
seuMTG.chimp.sce <- seuMTG.chimp.sce[keep,]
rownames(seuMTG.chimp.sce) <- geneids
#remove intermediate objects from environment:
rm(geneids)
rm(keep)
#SingleR command
pfc.chimp.mtg.pred <- SingleR(test = PFC.merged.sce, ref = seuMTG.chimp.sce, labels = seuMTG.chimp.sce@colData$Cluster, de.method = "wilcox")
#This is where after a few hours it crashes
sessionInfo( )
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.22.0
[3] AnnotationFilter_1.22.0 GenomicFeatures_1.50.3
[5] AnnotationDbi_1.60.0 pheatmap_1.0.12
[7] scran_1.26.2 tidySingleCellExperiment_1.8.2
[9] ttservice_0.4.0 scCustomize_2.0.1
[11] enrichR_3.2 ggpubr_0.6.0
[13] scuttle_1.8.4 reshape2_1.4.4
[15] scRNAseq_2.12.0 SingleCellExperiment_1.20.1
[17] lubridate_1.9.3 forcats_1.0.0
[19] stringr_1.5.0 dplyr_1.1.4
[21] purrr_1.0.1 readr_2.1.4
[23] tidyr_1.3.0 tibble_3.2.1
[25] ggplot2_3.4.4 tidyverse_2.0.0
[27] SingleR_2.0.0 SummarizedExperiment_1.28.0
[29] Biobase_2.58.0 GenomicRanges_1.50.2
[31] GenomeInfoDb_1.34.6 IRanges_2.32.0
[33] S4Vectors_0.36.1 BiocGenerics_0.44.0
[35] MatrixGenerics_1.10.0 matrixStats_0.63.0
[37] Seurat_5.0.0 SeuratObject_5.0.1
[39] sp_2.1-1
loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 ggprism_1.0.4 rtracklayer_1.58.0
[4] scattermore_1.2 bit64_4.0.5 irlba_2.3.5.1
[7] DelayedArray_0.24.0 data.table_1.14.6 KEGGREST_1.38.0
[10] RCurl_1.98-1.9 generics_0.1.3 ScaledMatrix_1.6.0
[13] cowplot_1.1.1 RSQLite_2.2.20 RANN_2.6.1
[16] future_1.33.0 bit_4.0.5 tzdb_0.3.0
[19] spatstat.data_3.0-3 xml2_1.3.3 httpuv_1.6.8
[22] assertthat_0.2.1 hms_1.1.2 promises_1.2.0.1
[25] fansi_1.0.3 restfulr_0.0.15 progress_1.2.2
[28] dbplyr_2.3.0 igraph_1.5.1 DBI_1.1.3
[31] htmlwidgets_1.6.1 spatstat.geom_3.2-7 paletteer_1.5.0
[34] ellipsis_0.3.2 RSpectra_0.16-1 backports_1.4.1
[37] biomaRt_2.54.0 deldir_1.0-9 sparseMatrixStats_1.10.0
[40] vctrs_0.6.4 remotes_2.4.2.1 ROCR_1.0-11
[43] abind_1.4-5 cachem_1.0.6 withr_2.5.0
[46] progressr_0.14.0 vroom_1.6.1 sctransform_0.4.1
[49] GenomicAlignments_1.34.0 prettyunits_1.1.1 goftest_1.2-3
[52] cluster_2.1.4 ExperimentHub_2.6.0 dotCall64_1.1-0
[55] lazyeval_0.2.2 crayon_1.5.2 spatstat.explore_3.2-5
[58] edgeR_3.40.2 pkgconfig_2.0.3 nlme_3.1-161
[61] vipor_0.4.5 ProtGenerics_1.30.0 rlang_1.1.2
[64] globals_0.16.2 lifecycle_1.0.3 miniUI_0.1.1.1
[67] filelock_1.0.2 fastDummies_1.7.3 BiocFileCache_2.6.0
[70] rsvd_1.0.5 AnnotationHub_3.6.0 ggrastr_1.0.2
[73] polyclip_1.10-6 RcppHNSW_0.5.0 lmtest_0.9-40
[76] Matrix_1.6-3 carData_3.0-5 zoo_1.8-11
[79] beeswarm_0.4.0 ggridges_0.5.4 GlobalOptions_0.1.2
[82] png_0.1-8 viridisLite_0.4.1 rjson_0.2.21
[85] bitops_1.0-7 KernSmooth_2.23-20 spam_2.10-0
[88] Biostrings_2.66.0 blob_1.2.3 DelayedMatrixStats_1.20.0
[91] shape_1.4.6 parallelly_1.36.0 spatstat.random_3.2-1
[94] rstatix_0.7.2 ggsignif_0.6.4 beachmat_2.14.2
[97] scales_1.2.1 memoise_2.0.1 magrittr_2.0.3
[100] plyr_1.8.8 ica_1.0-3 zlibbioc_1.44.0
[103] compiler_4.2.2 dqrng_0.3.1 BiocIO_1.8.0
[106] RColorBrewer_1.1-3 fitdistrplus_1.1-11 snakecase_0.11.1
[109] Rsamtools_2.14.0 cli_3.6.0 XVector_0.38.0
[112] listenv_0.9.0 patchwork_1.1.3 pbapply_1.7-2
[115] MASS_7.3-58.1 tidyselect_1.2.0 stringi_1.7.12
[118] yaml_2.3.6 locfit_1.5-9.7 BiocSingular_1.14.0
[121] ggrepel_0.9.4 grid_4.2.2 tools_4.2.2
[124] timechange_0.2.0 future.apply_1.11.0 parallel_4.2.2
[127] circlize_0.4.15 rstudioapi_0.14 bluster_1.8.0
[130] metapod_1.6.0 janitor_2.2.0 gridExtra_2.3
[133] Rtsne_0.16 digest_0.6.31 BiocManager_1.30.19
[136] shiny_1.7.4 Rcpp_1.0.9 car_3.1-2
[139] broom_1.0.5 BiocVersion_3.16.0 later_1.3.0
[142] RcppAnnoy_0.0.21 WriteXLS_6.4.0 httr_1.4.7
[145] colorspace_2.1-0 XML_3.99-0.13 tensor_1.5
[148] reticulate_1.34.0 splines_4.2.2 statmod_1.5.0
[151] uwot_0.1.16 rematch2_2.1.2 spatstat.utils_3.0-4
[154] plotly_4.10.3 xtable_1.8-4 jsonlite_1.8.4
[157] R6_2.5.1 pillar_1.9.0 htmltools_0.5.4
[160] mime_0.12 glue_1.6.2 fastmap_1.1.0
[163] BiocParallel_1.32.5 BiocNeighbors_1.16.0 interactiveDisplayBase_1.36.0
[166] codetools_0.2-18 utf8_1.2.2 lattice_0.20-45
[169] spatstat.sparse_3.0-3 curl_5.1.0 ggbeeswarm_0.7.2
[172] leiden_0.4.3.1 limma_3.54.0 survival_3.5-0
[175] munsell_0.5.0 GenomeInfoDbData_1.2.9 gtable_0.3.1
Thank you, James!
I ended up running seurat FindVariableFeatures with nfeatures = 10000 and then subsetting the object to just those variably expressed genes and it worked!
Thanks! Elaine