Hi,
I have an issue with creating a txmeta transcriptome with BiocFileManager (R 4.0.1, tximeta 1.8.4 and tximeta development checkout on Ubuntu 20.04 LTS with SQLite 3.3). The cache file is touched, but not filled. An rds file referencing the gtf file of the transcriptome is created. Together with Mike Love we narrowed it down to the saveDb portion of creating the transcriptome ( tximeta github issue 56 ) I tried multiple mounts on our filesystem. The only location that does work is /tmp, which makes it really weird. All target locations are user read and writable. Also a small test to write a db to the cache location works. The generated error/warning messages seem to indicate some sort of database locking issue. The SQLite error message of "not an error" is obviously not very helpful. Has anybody encountered this before? Any help on resolving this issue would be appreciated.
Regards, Judith
Writing to file system location
> bfc <- BiocFileCache::BiocFileCache("/scratch/pmaj_index_cache")
> txdb <- makeTxDbFromGFF(file=gtfPath, dataSource="EnsemblDbv97", organism="Parus major", chrominfo=chromInd)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> loc <- BiocFileCache::bfcnew(bfc, rname="testing", ext=".sqlite")
> saveDb(txdb, file=loc)
Error: Failed to copy all data:
not an error
In addition: Warning message:
Couldn't set synchronous mode: database is locked
Use `synchronous` = NULL to turn off this warning.
With /tmp location
bfc <- BiocFileCache::BiocFileCache("/tmp/pmaj_index_cache")
using temporary cache /tmp/Rtmp2O0DyA/BiocFileCache
> loc <- BiocFileCache::bfcnew(bfc, rname="testing", ext=".sqlite")
> saveDb(txdb, file=loc)
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: EnsemblDbv97
# Organism: Parus major
# Taxonomy ID: 9157
# miRBase build ID: NA
# Genome: NA
# Nb of transcripts: 33036
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2021-04-12 15:14:08 +0200 (Mon, 12 Apr 2021)
# GenomicFeatures version at creation time: 1.42.3
# RSQLite version at creation time: 2.2.6
# DBSCHEMAVERSION: 1.2
This is what the cache location looks like on the file system
total 51K
drwxr-xr-x 5 me domain users 39 Apr 12 19:26 ..
-rw-r--r-- 1 me domain users 347 Apr 12 19:26 a2b377a501945_a2b377a501945.rds
-rw-r--r-- 1 me domain users 20K Apr 12 19:26 BiocFileCache.sqlite
-rw-r--r-- 1 me domain users 0 Apr 12 19:26 a2b375abb18b7_a2b375abb18b7.sqlite
drwxr-xr-x 2 me domain users 5 Apr 12 19:26 .
On /tmp it looks like this
total 58M
-rw-r--r-- 1 me domain users 347 Apr 12 19:29 a2b3738290f91_a2b3738290f91.rds
drwx------ 3 me domain users 4.0K Apr 12 19:29 ..
-rw-r--r-- 1 me domain users 57M Apr 12 19:29 a2b3746738891_a2b3746738891.sqlite
-rw-r--r-- 1 me domain users 20K Apr 12 19:29 BiocFileCache.sqlite
drwxr-xr-x 2 me domain users 4.0K Apr 12 19:29 .
-rw-r--r-- 1 me domain users 669K Apr 12 19:29 a2b373c53219a_a2b373c53219a.rds
A simple test write using bfc to the same cache location gives no issue:
> bfc <- BiocFileCache::BiocFileCache("/scratch/pmaj_index_cache")
> loc <- BiocFileCache::bfcnew(bfc, rname="testing2")
> x <- 1:10
> save(x, file=loc)
> bfcinfo(bfc)
# A tibble: 1 x 10
rid rname create_time access_time rpath rtype fpath last_modified_t… etag
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
1 BFC1 testi… 2021-04-12… 2021-04-12… ~/.ca… rela… a2b3… NA NA
# … with 1 more variable: expires <dbl>
There is one warning when using native tximeta with /tmp (not the step by step version) that might be relevant, but I'm not sure were it is generated:
> se_s1 <- tximeta(samples_s1, useHub=FALSE)
importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
found matching linked transcriptome:
[ Ensembl - Parus major - release 97 ]
building EnsDb with 'ensembldb' package
Importing GTF file ... OK
Processing metadata ... OK
Processing genes ...
.
.
.
Generating index ... OK
-------------
Verifying validity of the information in the database:
Checking transcripts ... OK
Checking exons ... OK
generating transcript ranges
Warning messages:
1: closing unused connection 3 (ftp://ftp.ensembl.org/pub/release-97/mysql/)
**3: call dbDisconnect() when finished working with a connection**
> sessionInfo() [1/9991]
R version 4.0.1 (2020-06-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tximeta_1.9.6
loaded via a namespace (and not attached):
[1] MatrixGenerics_1.2.1 Biobase_2.50.0
[3] httr_1.4.2 jsonlite_1.7.2
[5] bit64_4.0.5 AnnotationHub_2.22.0
[7] shiny_1.6.0 assertthat_0.2.1
[9] interactiveDisplayBase_1.28.0 askpass_1.1
[11] BiocManager_1.30.12 stats4_4.0.1
[13] BiocFileCache_1.14.0 blob_1.2.1
[15] GenomeInfoDbData_1.2.4 Rsamtools_2.6.0
[17] yaml_2.2.1 progress_1.2.2
[19] BiocVersion_3.12.0 lattice_0.20-41
[21] pillar_1.5.1 RSQLite_2.2.6
[23] glue_1.4.2 digest_0.6.27
[25] GenomicRanges_1.42.0 promises_1.2.0.1
[27] XVector_0.30.0 htmltools_0.5.1.1
[29] httpuv_1.5.5 Matrix_1.2-18
[31] XML_3.99-0.6 pkgconfig_2.0.3
[33] biomaRt_2.46.3 zlibbioc_1.36.0
[35] purrr_0.3.4 xtable_1.8-4
[37] later_1.1.0.1 BiocParallel_1.24.1
[39] tibble_3.1.0 openssl_1.4.3
[41] AnnotationFilter_1.14.0 generics_0.1.0
[43] IRanges_2.24.1 ellipsis_0.3.1
[45] cachem_1.0.4 SummarizedExperiment_1.20.0
[47] GenomicFeatures_1.42.3 lazyeval_0.2.2
[49] BiocGenerics_0.36.0 magrittr_2.0.1
[51] crayon_1.4.1 mime_0.10
[53] memoise_2.0.0 fansi_0.4.2
[55] xml2_1.3.2 tools_4.0.1
[57] prettyunits_1.1.1 hms_1.0.0
[59] lifecycle_1.0.0 matrixStats_0.58.0
[61] stringr_1.4.0 S4Vectors_0.28.1
[63] DelayedArray_0.16.3 ensembldb_2.14.0
[65] AnnotationDbi_1.52.0 Biostrings_2.58.0
[67] compiler_4.0.1 GenomeInfoDb_1.26.7
[69] rlang_0.4.10 grid_4.0.1
[71] RCurl_1.98-1.3 tximport_1.18.0
[73] rstudioapi_0.13 rappdirs_0.3.3
[75] bitops_1.0-6 DBI_1.1.1
[77] curl_4.3 R6_2.5.0
[79] GenomicAlignments_1.26.0 dplyr_1.0.5
[81] rtracklayer_1.50.0 fastmap_1.1.0
[83] bit_4.0.4 utf8_1.2.1
[85] ProtGenerics_1.22.0 stringi_1.5.3
[87] parallel_4.0.1 Rcpp_1.0.6
[89] vctrs_0.3.7 dbplyr_2.1.1
[91] tidyselect_1.1.0
Thanks Judith for the follow-up and for the bug reports. It's useful for others who may find themselves in this situation.