Question

TxDb error: external pointer is not valid

0

Entering edit mode

O. William McClung • 0

@o-william-mcclung-22004

Last seen 5.5 years ago

United States

On my box within R I attempted to create a TxDb object from a GFF file with

human_txdb <- makeTxDbFromGFF(file="human.gff",format="gff3")

and this crashed the R process. (Presumably an out-of-memory error.) Here's my environment:

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2                  pillar_1.4.2               
 [3] compiler_3.6.1              GenomeInfoDb_1.20.0        
 [5] XVector_0.24.0              GenomicFeatures_1.36.4     
 [7] prettyunits_1.0.2           bitops_1.0-6               
 [9] tools_3.6.1                 zlibbioc_1.30.0            
[11] progress_1.2.2              zeallot_0.1.0              
[13] biomaRt_2.40.4              digest_0.6.20              
[15] bit_1.1-14                  lattice_0.20-38            
[17] RSQLite_2.1.2               memoise_1.1.0              
[19] tibble_2.1.3                pkgconfig_2.0.2            
[21] rlang_0.4.0                 Matrix_1.2-17              
[23] DelayedArray_0.10.0         DBI_1.0.0                  
[25] parallel_3.6.1              GenomeInfoDbData_1.2.1     
[27] rtracklayer_1.44.4          httr_1.4.1                 
[29] stringr_1.4.0               Biostrings_2.52.0          
[31] S4Vectors_0.22.1            vctrs_0.2.0                
[33] IRanges_2.18.2              hms_0.5.1                  
[35] grid_3.6.1                  stats4_3.6.1               
[37] bit64_0.9-7                 Biobase_2.44.0             
[39] R6_2.4.0                    AnnotationDbi_1.46.1       
[41] BiocParallel_1.18.1         XML_3.98-1.20              
[43] blob_1.2.0                  magrittr_1.5               
[45] matrixStats_0.55.0          GenomicAlignments_1.20.1   
[47] Rsamtools_2.0.0             backports_1.1.4            
[49] BiocGenerics_0.30.0         GenomicRanges_1.36.1       
[51] SummarizedExperiment_1.14.1 assertthat_0.2.1           
[53] stringi_1.4.3               RCurl_1.95-4.12            
[55] crayon_1.3.4

The box has 8 GB of memory and 8 cores. human.gff is 1.1 GB.

I then tried the same thing on an HPC cluster whose nodes have 16 cores and 64 GB of memory. This succeeded and within R I saved the result as an R object:

saveRDS(human_txdb,file = "human_txdb.rds")

I then scp'd human_txdb.rds to my box and within R executed

human_txdb <- readRDS(file="human_txdb.rds")

This succeeded. However

human_txdb

failed with

TxDb object:
Error: external pointer is not valid

As an experiment, I went back to the HPC cluster and changed the R script :

human_txdb <- makeTxDbFromGFF(file="human.gff",format="gff3")
saveRDS(human_txdb,file = "human_txdb.rds")
human_txdb <- readRDS(file = "human_txdb.rds")
human_txdb

This created a more explicit error:

Error in result_create(conn@ptr, statement) : 
  external pointer is not valid
Calls: <Anonymous> ... .local -> new -> initialize -> initialize -> result_create
Execution halted

Thus the problem seems not to be related to the scp. Here's the R environment on the HPC cluster node:

R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Scientific Linux release 6.10 (Carbon)

Matrix products: default
BLAS/LAPACK: /util/opt/anaconda/deployed-conda-envs/packages/bioconductor/envs/bioconductor-3.8/lib/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] systemPipeR_1.16.0          ShortRead_1.40.0           
 [3] GenomicAlignments_1.18.0    SummarizedExperiment_1.12.0
 [5] DelayedArray_0.8.0          matrixStats_0.54.0         
 [7] BiocParallel_1.16.2         Rsamtools_1.34.0           
 [9] Biostrings_2.50.1           XVector_0.22.0             
[11] GenomicFeatures_1.34.1      AnnotationDbi_1.44.0       
[13] Biobase_2.42.0              GenomicRanges_1.34.0       
[15] GenomeInfoDb_1.18.1         IRanges_2.16.0             
[17] S4Vectors_0.20.1            BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
 [1] httr_1.4.0             edgeR_3.24.1           bit64_0.9-7           
 [4] splines_3.5.1          assertthat_0.2.0       latticeExtra_0.6-28   
 [7] RBGL_1.58.1            blob_1.1.1             GenomeInfoDbData_1.2.0
[10] Category_2.48.0        progress_1.2.2         backports_1.1.3       
[13] pillar_1.4.2           RSQLite_2.1.1          lattice_0.20-38       
[16] glue_1.3.0             limma_3.38.3           digest_0.6.18         
[19] checkmate_1.8.5        RColorBrewer_1.1-2     colorspace_1.4-0      
[22] Matrix_1.2-15          plyr_1.8.4             GSEABase_1.44.0       
[25] XML_3.98-1.16          pkgconfig_2.0.2        pheatmap_1.0.12       
[28] biomaRt_2.38.0         genefilter_1.64.0      zlibbioc_1.28.0       
[31] GO.db_3.7.0            purrr_0.3.2            xtable_1.8-3          
[34] scales_1.0.0           brew_1.0-6             tibble_2.0.1          
[37] annotate_1.60.0        ggplot2_3.1.0          lazyeval_0.2.1        
[40] survival_2.43-3        magrittr_1.5           crayon_1.3.4          
[43] memoise_1.1.0          hwriter_1.3.2          GOstats_2.48.0        
[46] graph_1.60.0           data.table_1.12.0      tools_3.5.1           
[49] prettyunits_1.0.2      hms_0.4.2              BBmisc_1.11           
[52] stringr_1.4.0          sendmailR_1.2-1        munsell_0.5.0         
[55] locfit_1.5-9.1         compiler_3.5.1         rlang_0.3.4           
[58] grid_3.5.1             RCurl_1.95-4.11        rjson_0.2.20          
[61] AnnotationForge_1.24.0 base64enc_0.1-3        bitops_1.0-6          
[64] gtable_0.2.0           DBI_1.0.0              R6_2.4.0              
[67] dplyr_0.8.0.1          rtracklayer_1.42.1     bit_1.1-14            
[70] Rgraphviz_2.26.0       stringi_1.2.4          BatchJobs_1.7         
[73] Rcpp_1.0.0             tidyselect_0.2.5

Any debugging hints will be gratefully received.

software error GenomicFeatures • 3.7k views

ADD COMMENT • link 5.6 years ago • updated 5.5 years ago O. William McClung • 0

0

Entering edit mode

I wanted to present an alternative based on the TxDb that I used since I was storing other information as well. for my example I was storing the genome as DNA string set and the GFF as well the TxDb for tracks information.

genome = redDNAStringSet("genome.fna") gff = readGFF("genome.gff") txdb = makeTxDbFromGFF("genome.gff", format="gff3") rds_data = list()

save everything in a list - as lists will put out contents of the TxDb not use a pointer

rds_data[[1]] = list(genome, gff, as.list(txdb)) saveRDS(rds_data, "fileName.RDS")

read back in the RDS you already saved

dat = readRDS("fileName.RDS") newTxDb = do.call(makeTxDb, dat[[1]][[3]])

This can be found in the methods section of the TxDB class functions. This way you can save multiple types at once, but is a little more complicated than just saveDb / loadDb

ADD REPLY • link 5.3 years ago SFCinferno • 0

0

Entering edit mode

O. William McClung • 0

@o-william-mcclung-22004

Last seen 5.5 years ago

United States

Thanks go to Martin Morgan for providing the correct answer. This works:

human_txdb <- makeTxDbFromGFF(file = "human.gff",format="gff3")
saveDb(human_txdb,file = "human_txdb.sqlite")    # export
human_txdb <- loadDb(file = "human_txdb.sqlite") # import

ADD COMMENT • link 5.5 years ago O. William McClung • 0

score 4 · Accepted Answer · 2019-09-26

4

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 12 weeks ago

United States

Use saveDb() / loadDb() to save TxDb objects. The object is a reference to a sqlite data base. saveDb() saves the database. loadDb() loads a reference to the database and 'wraps' it in a new TxDb object.

ADD COMMENT • link 5.6 years ago Martin Morgan 25k