Hi, I am trying to download TCGA projects with the GDCdownload function. I get an error when using the 'client' method, the 'api' method works fine. But I would really like to use the "client" method for the downloading of the TCGA data, because the "client" method should be more stable than the "api" method. Below I inserted the code to reproduce the error I encountered.
The following code is copied from the terminal:
> library(TCGAbiolinks)
> query <-
+ GDCquery(
+ project = "TCGA-ESCA",
+ data.category = "Transcriptome Profiling",
+ data.type = "Gene Expression Quantification",
+ workflow.type = "HTSeq - Counts"
+ )
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-ESCA
--------------------
oo Filtering results
--------------------
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
> GDCdownload(query, method = "client")
Downloading data for project TCGA-ESCA
trying URL 'https://gdc.cancer.gov/files/public/file/gdc-client_v1.5.0_Windows_x64_0.zip'
Content type 'application/zip' length 15221595 bytes (14.5 MB)
downloaded 14.5 MB
Error in unzip(basename(bin)) : invalid zip name argument
In addition: Warning message:
In if (grepl("^https?://", url)) { :
the condition has length > 1 and only the first element will be used
And then the script breaks, the TCGA project data is not downloaded and cannot be worked with. If I execute the same code but use method = "api"
, the script does work, but the "api" method is more unstable.
I also inserted the output from the terminal when using the sessionInfo() command:
> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] TCGAbiolinks_2.15.3
loaded via a namespace (and not attached):
[1] pkgcond_0.1.0 colorspace_1.4-1
[3] ggsignif_0.6.0 selectr_0.4-2
[5] hwriter_1.3.2 testextra_0.1.0.1
[7] XVector_0.26.0 GenomicRanges_1.38.0
[9] ggpubr_0.2.5 ggrepel_0.8.1
[11] bit64_0.9-7 AnnotationDbi_1.48.0
[13] xml2_1.2.2 codetools_0.2-16
[15] splines_3.6.2 R.methodsS3_1.8.0
[17] doParallel_1.0.15 DESeq_1.38.0
[19] geneplotter_1.64.0 knitr_1.28
[21] jsonlite_1.6.1 Rsamtools_2.2.3
[23] km.ci_0.5-2 broom_0.5.5
[25] annotate_1.64.0 dbplyr_1.4.2
[27] png_0.1-7 R.oo_1.23.0
[29] readr_1.3.1 compiler_3.6.2
[31] httr_1.4.1 backports_1.1.5
[33] assertthat_0.2.1 Matrix_1.2-18
[35] limma_3.42.2 prettyunits_1.1.1
[37] tools_3.6.2 gtable_0.3.0
[39] glue_1.3.1 GenomeInfoDbData_1.2.2
[41] dplyr_0.8.4 ggthemes_4.2.0
[43] rappdirs_0.3.1 ShortRead_1.44.3
[45] Rcpp_1.0.3 Biobase_2.46.0
[47] vctrs_0.2.3 Biostrings_2.54.0
[49] nlme_3.1-144 rtracklayer_1.46.0
[51] iterators_1.0.12 xfun_0.12
[53] stringr_1.4.0 testthat_2.3.1
[55] rvest_0.3.5 lifecycle_0.1.0
[57] XML_3.99-0.3 edgeR_3.28.1
[59] zoo_1.8-7 postlogic_0.1.0.1
[61] zlibbioc_1.32.0 scales_1.1.0
[63] aroma.light_3.16.0 hms_0.5.3
[65] parallel_3.6.2 SummarizedExperiment_1.16.1
[67] RColorBrewer_1.1-2 curl_4.3
[69] memoise_1.1.0 gridExtra_2.3
[71] KMsurv_0.1-5 ggplot2_3.3.0
[73] downloader_0.4 biomaRt_2.42.0
[75] latticeExtra_0.6-29 stringi_1.4.6
[77] RSQLite_2.2.0 genefilter_1.68.0
[79] S4Vectors_0.24.3 foreach_1.4.8
[81] GenomicFeatures_1.38.2 BiocGenerics_0.32.0
[83] BiocParallel_1.20.1 GenomeInfoDb_1.22.0
[85] rlang_0.4.4 pkgconfig_2.0.3
[87] matrixStats_0.55.0 bitops_1.0-6
[89] lattice_0.20-38 purrr_0.3.3
[91] GenomicAlignments_1.22.1 bit_1.1-15.2
[93] tidyselect_1.0.0 plyr_1.8.5
[95] magrittr_1.5 R6_2.4.1
[97] IRanges_2.20.2 generics_0.0.2
[99] DelayedArray_0.12.2 DBI_1.1.0
[101] mgcv_1.8-31 pillar_1.4.3
[103] survival_3.1-8 RCurl_1.98-1.1
[105] tibble_2.1.3 EDASeq_2.20.0
[107] crayon_1.3.4 survMisc_0.5.5
[109] purrrogress_0.1.1 BiocFileCache_1.10.2
[111] jpeg_0.1-8.1 progress_1.2.2
[113] locfit_1.5-9.1 grid_3.6.2
[115] sva_3.34.0 data.table_1.12.8
[117] blob_1.2.1 digest_0.6.25
[119] xtable_1.8-4 tidyr_1.0.2
[121] R.utils_2.9.2 openssl_1.4.1
[123] stats4_3.6.2 munsell_0.5.0
[125] survminer_0.4.6 parsetools_0.1.2
[127] askpass_1.1
Could someone please tell me what I am doing wrong and how I can fix this?
Thanks a lot!
Just to add, the exact same problem also persists on macOS Catalina 10.15.4 (R 3.6.3, platform x86_64-apple-darwin15.6.0 (64-bit)). I suggest you use the "api" method and download files in chunks, as that seems to work like a charm.