I'm attempting to download some data from GTEx with recount3 and I'm getting an invalid description error when the download begins, as well as a warning that one of the URLs being referenced is not available.
> library(recount3)
> human_projects <- available_projects()
2020-10-29 09:52:51 caching file sra.recount_project.MD.gz.
2020-10-29 09:52:51 caching file gtex.recount_project.MD.gz.
2020-10-29 09:52:51 caching file tcga.recount_project.MD.gz.
> proj_info <- subset(human_projects,
+ project == "BLOOD_VESSEL" & project_type == "data_sources"
+ )
> rse_blood_vessel <- create_rse(proj_info)
2020-10-29 09:52:54 downloading and reading the metadata.
2020-10-29 09:52:54 caching file gtex.gtex.BLOOD_VESSEL.MD.gz.
2020-10-29 09:52:54 caching file gtex.recount_project.BLOOD_VESSEL.MD.gz.
2020-10-29 09:52:55 caching file gtex.recount_qc.BLOOD_VESSEL.MD.gz.
2020-10-29 09:52:55 caching file gtex.recount_seq_qc.BLOOD_VESSEL.MD.gz.
2020-10-29 09:52:55 downloading and reading the feature information.
2020-10-29 09:52:55 caching file human.gene_sums.G026.gtf.gz.
2020-10-29 09:52:56 downloading and reading the counts: 1398 samples across 63856 features.
Error in file(file, "rt") : invalid 'description' argument
In addition: Warning message:
The 'url' <http://idies.jhu.edu/recount3/data/human/data_sources/gtex/gene_sums/EL/BLOOD_VESSEL/gtex.gene_sums.BLOOD_VESSEL.G026.gz> does not exist or is not available.
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 20.04 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] recount3_1.0.0 SummarizedExperiment_1.20.0 Biobase_2.50.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.0 IRanges_2.24.0 S4Vectors_0.28.0
[8] BiocGenerics_0.36.0 MatrixGenerics_1.2.0 matrixStats_0.57.0
loaded via a namespace (and not attached):
[1] tidyselect_1.1.0 purrr_0.3.4 lattice_0.20-41 vctrs_0.3.4 generics_0.0.2 BiocFileCache_1.14.0 rtracklayer_1.50.0
[8] blob_1.2.1 XML_3.99-0.5 rlang_0.4.8 R.oo_1.24.0 pillar_1.4.6 withr_2.3.0 glue_1.4.2
[15] DBI_1.1.0 R.utils_2.10.1 BiocParallel_1.24.0 rappdirs_0.3.1 bit64_4.0.5 dbplyr_1.4.4 sessioninfo_1.1.1
[22] GenomeInfoDbData_1.2.4 lifecycle_0.2.0 zlibbioc_1.36.0 Biostrings_2.58.0 R.methodsS3_1.8.1 memoise_1.1.0 curl_4.3
[29] fansi_0.4.1 Rcpp_1.0.5 DelayedArray_0.16.0 XVector_0.30.0 bit_4.0.4 Rsamtools_2.6.0 digest_0.6.27
[36] dplyr_1.0.2 grid_4.0.3 cli_2.1.0 tools_4.0.3 bitops_1.0-6 magrittr_1.5 RCurl_1.98-1.2
[43] tibble_3.0.4 RSQLite_2.2.1 crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.1 Matrix_1.2-18 data.table_1.13.2
[50] assertthat_0.2.1 httr_1.4.2 rstudioapi_0.11 R6_2.5.0 GenomicAlignments_1.26.0 compiler_4.0.3
Hi Zach,
Thanks for reporting this. We'll look into it.
Best, Leo
Hi Zach,
We found the issue and are working on resolving it. Basically, we have the data at IDIES already but some files are not in the right location. So the url that
recount3
is providing is the correct one (the intended one) but we made a small mistake on the IDIES side that we are fixing right now. We'll let you know once this is resolved.Best, Leo