slow lookup in SNPlocs.Hsapiens.dbSNP151.GRCh38 (or any SNPlocs package)
0
0
Entering edit mode
Paul Shannon ▴ 470
@paul-shannon-5944
Last seen 2.4 years ago
United States

Calling snpsById with a vector of 4 rsids returns in about 60 seconds. But calling snpsById 4 times via lapply returns in fewer than 7 seconds, and combining the results (to, in my case, a data.frame) takes negligible time. This surprised me.
I realize, though don't understand the detail, that the first invocation of a SNPlocs method takes a long time, a minute or more, perhaps due to loading large amounts of data into memory. For that reason, I call snpsById twice below.

library(SNPlocs.Hsapiens.dbSNP151.GRCh38)
rsids <- c("rs11576415", "rs11584174", "rs12753774", "rs12754503")
t0 <- system.time(x0 <- snpsById(SNPlocs.Hsapiens.dbSNP151.GRCh38, rsids))
t1 <- system.time(x1 <- snpsById(SNPlocs.Hsapiens.dbSNP151.GRCh38, rsids))
t2 <- system.time(x2 <- lapply(rsids,
                  function(rsid) snpsById(SNPlocs.Hsapiens.dbSNP151.GRCh38, rsid)))
do.call(rbind, lapply(x2, as.data.frame))
#   seqnames       pos strand  RefSNP_id alleles_as_ambig
# 1        1 161212418      * rs11576415                S
# 2        1 161242663      * rs11584174                Y
# 3        1 161271358      * rs12753774                R
# 4        1 161265868      * rs12754503                K

t0
#    user  system elapsed
# 129.208   9.718 141.967
t1
#    user  system elapsed
#  58.031   2.573  61.067
t2
#    user  system elapsed
#   2.954   2.182   5.146

 sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] SNPlocs.Hsapiens.dbSNP151.GRCh38_0.99.20
 [2] BSgenome_1.60.0                         
 [3] rtracklayer_1.52.1                      
 [4] Biostrings_2.60.2                       
 [5] XVector_0.32.0                          
 [6] GenomicRanges_1.44.0                    
 [7] GenomeInfoDb_1.28.4                     
 [8] IRanges_2.26.0                          
 [9] S4Vectors_0.30.0                        
[10] BiocGenerics_0.38.0                     

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13             zlibbioc_1.38.0            
 [3] GenomicAlignments_1.28.0    BiocParallel_1.26.2        
 [5] lattice_0.20-44             rjson_0.2.20               
 [7] tools_4.1.0                 grid_4.1.0                 
 [9] SummarizedExperiment_1.22.0 Biobase_2.52.0             
[11] matrixStats_0.60.1          yaml_2.2.1                 
[13] crayon_1.4.1                BiocIO_1.2.0               
[15] Matrix_1.3-4                GenomeInfoDbData_1.2.6     
[17] restfulr_0.0.13             bitops_1.0-7               
[19] RCurl_1.98-1.4              DelayedArray_0.18.0        
[21] compiler_4.1.0              MatrixGenerics_1.4.3       
[23] Rsamtools_2.8.0             XML_3.99-0.7
SNPlocs.Hsapiens.dbSNP151.GRCh38 • 1.1k views
ADD COMMENT
0
Entering edit mode

Under some circumstances lookup is so slow as to be unusable

library(SNPlocs.Hsapiens.dbSNP155.GRCh38)
t0 <- system.time(snpsById(SNPlocs.Hsapiens.dbSNP155.GRCh38, "rs2639606"))

Does not complete in 30 min.

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SNPlocs.Hsapiens.dbSNP155.GRCh38_0.99.22 BSgenome_1.66.1                         
 [3] rtracklayer_1.58.0                       Biostrings_2.66.0                       
 [5] XVector_0.38.0                           GenomicRanges_1.50.2                    
 [7] GenomeInfoDb_1.34.4                      IRanges_2.32.0                          
 [9] S4Vectors_0.36.1                         BiocGenerics_0.44.0                     
[11] BiocManager_1.30.19                     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9                  lattice_0.20-45             prettyunits_1.1.1          
 [4] png_0.1-8                   Rsamtools_2.14.0            assertthat_0.2.1           
 [7] digest_0.6.31               utf8_1.2.2                  BiocFileCache_2.6.0        
[10] R6_2.5.1                    RSQLite_2.2.19              httr_1.4.4                 
[13] pillar_1.8.1                zlibbioc_1.44.0             rlang_1.0.6                
[16] GenomicFeatures_1.50.3      progress_1.2.2              lazyeval_0.2.2             
[19] curl_4.3.3                  rstudioapi_0.14             blob_1.2.3                 
[22] Matrix_1.5-3                BiocParallel_1.32.4         stringr_1.5.0              
[25] ProtGenerics_1.30.0         RCurl_1.98-1.9              bit_4.0.5                  
[28] biomaRt_2.54.0              DelayedArray_0.24.0         compiler_4.2.2             
[31] pkgconfig_2.0.3             tidyselect_1.2.0            KEGGREST_1.38.0            
[34] SummarizedExperiment_1.28.0 tibble_3.1.8                GenomeInfoDbData_1.2.9     
[37] codetools_0.2-18            matrixStats_0.63.0          XML_3.99-0.13              
[40] fansi_1.0.3                 crayon_1.5.2                dplyr_1.0.10               
[43] dbplyr_2.2.1                rappdirs_0.3.3              GenomicAlignments_1.34.0   
[46] bitops_1.0-7                grid_4.2.2                  lifecycle_1.0.3            
[49] DBI_1.1.3                   AnnotationFilter_1.22.0     magrittr_2.0.3             
[52] cli_3.4.1                   stringi_1.7.8               cachem_1.0.6               
[55] xml2_1.3.3                  filelock_1.0.2              ellipsis_0.3.2             
[58] vctrs_0.5.1                 generics_0.1.3              rjson_0.2.21               
[61] restfulr_0.0.15             ensembldb_2.22.0            tools_4.2.2                
[64] bit64_4.0.5                 Biobase_2.58.0              glue_1.6.2                 
[67] hms_1.1.2                   MatrixGenerics_1.10.0       parallel_4.2.2             
[70] fastmap_1.1.0               yaml_2.3.6                  AnnotationDbi_1.60.0       
[73] memoise_2.0.1               BiocIO_1.8.0               
ADD REPLY

Login before adding your answer.

Traffic: 1062 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6