Entering edit mode
Hi all, I really need some help. I am trying to run GDCquery_Maf which worked fine until yesterday. Now I get the following error:
Error in GDCquery(paste0("TCGA-", tumor), data.category = "Simple Nucleotide Variation", :
Please set a valid workflow.type argument from the list below:
=> Aliquot Ensemble Somatic Variant Merging and Masking
command used is below. any help would be greatly appreciated.
maf <- GDCquery_Maf(tumor = "COAD", pipelines = "mutect2")
sessionInfo( )
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_SG.UTF-8 LC_NUMERIC=C LC_TIME=en_SG.UTF-8
[4] LC_COLLATE=en_SG.UTF-8 LC_MONETARY=en_SG.UTF-8 LC_MESSAGES=en_SG.UTF-8
[7] LC_PAPER=en_SG.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_SG.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] TCGAbiolinks_2.20.1 maftools_2.8.05 survivalROC_1.0.3
[4] rms_6.2-0 SparseM_1.81 Hmisc_4.6-0
[7] Formula_1.2-4 lattice_0.20-45 biomaRt_2.48.3
[10] plotROC_2.2.1 survminer_0.4.9 ggpubr_0.4.0
[13] pheatmap_1.0.12 glmnet_4.1-3 Matrix_1.4-0
[16] survival_3.2-13 vsn_3.60.0 DESeq2_1.32.0
[19] limma_3.50.0 SummarizedExperiment_1.24.0 Biobase_2.54.0
[22] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 IRanges_2.28.0
[25] S4Vectors_0.32.3 BiocGenerics_0.40.0 MatrixGenerics_1.6.0
[28] matrixStats_0.61.0 EnhancedVolcano_1.10.0 ggrepel_0.9.1
[31] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.8
[34] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
[37] tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] utf8_1.2.2 R.utils_2.11.0 tidyselect_1.1.2
[4] RSQLite_2.2.11 AnnotationDbi_1.56.2 htmlwidgets_1.5.4
[7] grid_4.1.2 BiocParallel_1.28.3 munsell_0.5.0
[10] codetools_0.2-18 preprocessCore_1.54.0 withr_2.5.0
[13] colorspace_2.0-3 filelock_1.0.2 ggalt_0.4.0
[16] knitr_1.38 rstudioapi_0.13 ggsignif_0.6.3
[19] Rttf2pt1_1.3.10 labeling_0.4.2 GenomeInfoDbData_1.2.7
[22] hwriter_1.3.2 KMsurv_0.1-5 farver_2.1.0
[25] bit64_4.0.5 downloader_0.4 vctrs_0.3.8
[28] generics_0.1.2 TH.data_1.1-0 xfun_0.30
[31] BiocFileCache_2.0.0 EDASeq_2.26.1 markdown_1.1
[34] R6_2.5.1 ggbeeswarm_0.6.0 locfit_1.5-9.5
[37] bitops_1.0-7 cachem_1.0.6 DelayedArray_0.20.0
[40] assertthat_0.2.1 BiocIO_1.2.0 vroom_1.5.7
[43] scales_1.1.1 multcomp_1.4-18 nnet_7.3-16
[46] beeswarm_0.4.0 gtable_0.3.0 ash_1.0-15
[49] affy_1.70.0 sandwich_3.0-1 rlang_1.0.2
[52] MatrixModels_0.5-0 genefilter_1.74.1 splines_4.1.2
[55] rtracklayer_1.52.1 rstatix_0.7.0 extrafontdb_1.0
[58] broom_0.7.12 checkmate_2.0.0 yaml_2.3.5
[61] BiocManager_1.30.16 abind_1.4-5 modelr_0.1.8
[64] GenomicFeatures_1.44.2 backports_1.4.1 gridtext_0.1.4
[67] extrafont_0.17 tools_4.1.2 affyio_1.62.0
[70] ellipsis_0.3.2 RColorBrewer_1.1-2 Rcpp_1.0.8.3
[73] plyr_1.8.7 base64enc_0.1-3 progress_1.2.2
[76] zlibbioc_1.40.0 RCurl_1.98-1.6 prettyunits_1.1.1
[79] rpart_4.1-15 cowplot_1.1.1 zoo_1.8-9
[82] haven_2.4.3 cluster_2.1.2 fs_1.5.2
[85] magrittr_2.0.2 data.table_1.14.2 reprex_2.0.1
[88] mvtnorm_1.1-3 aroma.light_3.22.0 hms_1.1.1
[91] TCGAbiolinksGUI.data_1.12.0 xtable_1.8-4 XML_3.99-0.9
[94] jpeg_0.1-9 readxl_1.4.0 gridExtra_2.3
[97] shape_1.4.6 compiler_4.1.2 maps_3.4.0
[100] KernSmooth_2.23-20 crayon_1.5.1 R.oo_1.24.0
[103] htmltools_0.5.2 tzdb_0.3.0 ggtext_0.1.1
[106] geneplotter_1.70.0 lubridate_1.8.0 DBI_1.1.2
[109] dbplyr_2.1.1 proj4_1.0-11 MASS_7.3-54
[112] rappdirs_0.3.3 ShortRead_1.50.0 car_3.0-12
[115] cli_3.2.0 R.methodsS3_1.8.1 parallel_4.1.2
[118] km.ci_0.5-2 pkgconfig_2.0.3 GenomicAlignments_1.28.0
[121] foreign_0.8-81 xml2_1.3.3 foreach_1.5.2
[124] annotate_1.72.0 vipor_0.4.5 XVector_0.34.0
[127] rvest_1.0.2 digest_0.6.29 Biostrings_2.62.0
[130] cellranger_1.1.0 survMisc_0.5.5 htmlTable_2.4.0
[133] restfulr_0.0.13 curl_4.3.2 Rsamtools_2.8.0
[136] quantreg_5.88 rjson_0.2.21 lifecycle_1.0.1
[139] nlme_3.1-152 jsonlite_1.8.0 carData_3.0-5
[142] fansi_1.0.3 pillar_1.7.0 ggrastr_1.0.1
[145] KEGGREST_1.34.0 fastmap_1.1.0 httr_1.4.2
[148] glue_1.6.2 png_0.1-7 iterators_1.0.14
[151] bit_4.0.4 stringi_1.7.6 blob_1.2.2
[154] polspline_1.1.19 latticeExtra_0.6-29 memoise_2.0.1
Hello,
I get the same error. For me the function yesterday still worked, but today not (no change in my R version nor TCGAbiolinks package version). So probably something changed with the TCGAbiolinks database? I experience similar discrepancies when querying gene expression data with the function
GDCquery()
.Hi, I got the same error. You should use:
query1 <- GDCquery( project = "TCGA-COAD", data.category = "Simple Nucleotide Variation", data.type = "Masked Somatic Mutation", legacy=F)
GDCdownload(query1, directory = "GDCdata/")
muts <- GDCprepare(query1, directory = GDCdata/")
and so you will obtained hg38 by default (I think Benedek is right about TCGAbiolinks database changing). I tried and I obtained the same mutations datasets with both function GDCquery and GDCquery_Maf. The problem is the object of the GDCquery_Maf that is not compatible with the GDCprepare function, so we cannot get a unique file.
Barbara
Hi Barbara,
My problem is now if I try to query this way:
Output:
And then:
I get this error message:
Checking the results of the query:
Output:
So although I specify primary tumors I think it returns the normal cases as well (duplicate samples). This way I cannot even match the id's to the TCGA sample barcodes to remove duplicate samples... Everything worked fine with the
GDCquery()
function till yesterday. :( Also all the ID columns are unique (509 unique elements) and the cases column does not contain any values so it is impossible to find the duplicate samples.Hi Benedek. the code below worked on my side. You also need to set the
directory
argument inGDCprepare
.It's weird but I still get the same error message:
Output:
I have TCGAbiolinks version 2.14.1 and R version 3.6.1 (2019-07-05).
Update: when using newer version of R (4.2) then it works just as for you. Thanks for the help!
Hi Benedek, I run your code and it is ok. Maybe, as Tiago suggested, the problem is the directory of GDCprepare. Bye. Barbara
I still got the same error message (R version 3.6), although when using newer R version then it works well. Thanks for your help!
library(TCGAbiolinks) query_SNV <- GDCquery(project = "TCGA-GBM", data.category = "Simple Nucleotide Variation", data.type = "Masked Somatic Mutation", workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking") GDCdownload(query_SNV)