clinicalData() is not returning all columns from cBioPortal
In the TCGA Breast pan cancer data set there is a clinical annotation column “SUBTYPE” (which contains BRCA_LumA, BRCA_LumB, BRCA_Basal, BRCA_Her2, BRCA_Her2)

My little R script is failing to get this column. How can I get these data?

cbio <- cBioPortal()
x <- clinicalData(cbio, "brca_tcga_pan_can_atlas_2018")

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.3 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /apps/prod/easybuild/sl7.x86_64/software/OpenBLAS/0.3.9-GCC-9.3.0/lib/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cBioPortalData_2.0.10       MultiAssayExperiment_1.14.0 SummarizedExperiment_1.18.2 DelayedArray_0.14.1         matrixStats_0.57.0          Biobase_2.48.0              GenomicRanges_1.40.0        GenomeInfoDb_1.24.2         IRanges_2.22.2             
[10] S4Vectors_0.26.1            BiocGenerics_0.34.0         AnVIL_1.0.3                 dplyr_1.0.4                

loaded via a namespace (and not attached):
 [1] httr_1.4.2                bit64_4.0.5               jsonlite_1.7.1            splines_4.0.2             assertthat_0.2.1          askpass_1.1               TCGAutils_1.8.1           BiocFileCache_1.12.1      blob_1.2.1                Rsamtools_2.4.0          
[11] GenomeInfoDbData_1.2.3    RTCGAToolbox_2.18.0       progress_1.2.2            yaml_2.2.1                pillar_1.5.1              RSQLite_2.2.0             lattice_0.20-41           glue_1.4.2                limma_3.46.0              digest_0.6.25            
[21] XVector_0.28.0            rvest_0.3.6               Matrix_1.3-2              XML_3.99-0.5              pkgconfig_2.0.3           biomaRt_2.44.4            zlibbioc_1.34.0           purrr_0.3.4               RCircos_1.2.1             rapiclient_0.1.3         
[31] BiocParallel_1.22.0       openssl_1.4.2             tibble_3.1.0              generics_0.1.0            ellipsis_0.3.1            GenomicFeatures_1.40.1    survival_3.2-3            RJSONIO_1.3-1.4           magrittr_2.0.1            crayon_1.3.4             
[41] memoise_1.1.0             fansi_0.4.1               xml2_1.3.2                prettyunits_1.1.1         tools_4.0.2               data.table_1.14.0         hms_0.5.3                 formatR_1.7               lifecycle_0.2.0           stringr_1.4.0            
[51] Biostrings_2.56.0         AnnotationDbi_1.50.3      lambda.r_1.2.4            compiler_4.0.2            rlang_0.4.10              futile.logger_1.4.3       debugme_1.1.0             grid_4.0.2                GenomicDataCommons_1.12.0 RCurl_1.98-1.2           
[61] rstudioapi_0.11           rappdirs_0.3.1            bitops_1.0-6              DBI_1.1.1                 curl_4.3                  R6_2.4.1                  GenomicAlignments_1.24.0  rtracklayer_1.48.0        bit_4.0.4                 utf8_1.1.4               
[71] futile.options_1.0.1      readr_1.4.0               stringi_1.5.3             RaggedExperiment_1.12.0   Rcpp_1.0.5                vctrs_0.3.6               dbplyr_1.4.4              tidyselect_1.1.0
andreas.wernitznig ,

There has been an update to this by Marcel Ramos , the same gentleman who replied below. Try again with cBioPortalData in Bioc-devel (package version 2.13.4). I've incorporated information from SAMPLE_ID from the datasets to map and build SummarizedExperiment objects. Now, you should get an object that looks like the following:

> (mae <- cBioDataPack("brain_cptac_2020"))
A MultiAssayExperiment object of 7 listed
 experiments with user-defined names and respective classes.
 Containing an ExperimentList class object of length 7:
 [1] cna: SummarizedExperiment with 19380 rows and 190 columns
 [2] linear_cna: SummarizedExperiment with 19380 rows and 190 columns
 [3] mrna_seq_v2_rsem_zscores_ref_all_samples: SummarizedExperiment with 18209 rows and 188 columns
 [4] mrna_seq_v2_rsem: SummarizedExperiment with 18209 rows and 188 columns
 [5] mutations: RaggedExperiment with 9951 rows and 200 columns
 [6] protein_quantification_zscores: SummarizedExperiment with 6429 rows and 218 columns
 [7] protein_quantification: SummarizedExperiment with 6429 rows and 218 columns
 experiments() - obtain the ExperimentList instance
 colData() - the primary/phenotype DataFrame
 sampleMap() - the sample coordination DataFrame
 `$`, `[`, `[[` - extract colData columns, subset, or experiment
 *Format() - convert into a long or wide DataFrame
 assays() - convert ExperimentList to a SimpleList of matrices
 exportClass() - save data to flat files

These changes are in the latest version of cBioPortalData in Bioc-devel (package version 2.13.4).

Hi Andreas,

Please make sure that you have the latest release version of cBioPortalData 2.2.11 and that BiocManager::valid() == TRUE. You can also remove the cache to re-download the data:

removeDataCache(api = cbio,
    studyId = "brca_tcga_pan_can_atlas_2018", = FALSE

<details> <summary> reprex here </summary>

cbio <- cBioPortal()
x <- clinicalData(cbio, "brca_tcga_pan_can_atlas_2018")
#>  BRCA_Basal   BRCA_Her2   BRCA_LumA   BRCA_LumB BRCA_Normal 
#>         171          78         499         197          36
#> R version 4.0.5 Patched (2021-03-31 r80179)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.10
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> attached base packages:
#> [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
#> [8] methods   base     
#> other attached packages:
#>  [1] cBioPortalData_2.2.11       MultiAssayExperiment_1.16.0
#>  [3] SummarizedExperiment_1.20.0 Biobase_2.50.0             
#>  [5] GenomicRanges_1.42.0        GenomeInfoDb_1.26.7        
#>  [7] IRanges_2.24.1              S4Vectors_0.28.1           
#>  [9] BiocGenerics_0.36.0         MatrixGenerics_1.2.1       
#> [11] matrixStats_0.58.0          AnVIL_1.2.0                
#> [13] dplyr_1.0.5                
#> loaded via a namespace (and not attached):
#>  [1] bitops_1.0-6              fs_1.5.0                 
#>  [3] bit64_4.0.5               progress_1.2.2           
#>  [5] httr_1.4.2                GenomicDataCommons_1.14.0
#>  [7] tools_4.0.3               backports_1.2.1          
#>  [9] utf8_1.2.1                R6_2.5.0                 
#> [11] DBI_1.1.1                 withr_2.4.1              
#> [13] tidyselect_1.1.0          prettyunits_1.1.1        
#> [15] TCGAutils_1.10.0          bit_4.0.4                
#> [17] curl_4.3                  compiler_4.0.3           
#> [19] rvest_1.0.0               formatR_1.9              
#> [21] xml2_1.3.2                DelayedArray_0.16.3      
#> [23] rtracklayer_1.50.0        readr_1.4.0              
#> [25] askpass_1.1               rappdirs_0.3.3           
#> [27] rapiclient_0.1.3          RCircos_1.2.1            
#> [29] Rsamtools_2.6.0           stringr_1.4.0            
#> [31] digest_0.6.27             rmarkdown_2.7            
#> [33] XVector_0.30.0            pkgconfig_2.0.3          
#> [35] htmltools_0.5.1.1         styler_1.4.1             
#> [37] dbplyr_2.1.1              fastmap_1.1.0            
#> [39] limma_3.46.0              highr_0.8                
#> [41] rlang_0.4.10              RSQLite_2.2.6            
#> [43] generics_0.1.0            jsonlite_1.7.2           
#> [45] BiocParallel_1.24.1       RCurl_1.98-1.3           
#> [47] magrittr_2.0.1            GenomeInfoDbData_1.2.4   
#> [49] futile.logger_1.4.3       Matrix_1.3-2             
#> [51] Rcpp_1.0.6                fansi_0.4.2              
#> [53] lifecycle_1.0.0           stringi_1.5.3            
#> [55] yaml_2.2.1                RaggedExperiment_1.14.1  
#> [57] RJSONIO_1.3-1.4           zlibbioc_1.36.0          
#> [59] BiocFileCache_1.14.0      grid_4.0.3               
#> [61] blob_1.2.1                crayon_1.4.1             
#> [63] lattice_0.20-41           Biostrings_2.58.0        
#> [65] splines_4.0.3             GenomicFeatures_1.42.3   
#> [67] hms_1.0.0                 knitr_1.32               
#> [69] pillar_1.6.0              biomaRt_2.46.3           
#> [71] futile.options_1.0.1      reprex_2.0.0             
#> [73] XML_3.99-0.6              glue_1.4.2               
#> [75] evaluate_0.14             lambda.r_1.2.4           
#> [77] data.table_1.14.0         vctrs_0.3.7              
#> [79] openssl_1.4.3             purrr_0.3.4              
#> [81] assertthat_0.2.1          cachem_1.0.4             
#> [83] xfun_0.22                 survival_3.2-10          
#> [85] tibble_3.1.0              RTCGAToolbox_2.20.0      
#> [87] GenomicAlignments_1.26.0  AnnotationDbi_1.52.0     
#> [89] memoise_2.0.0             ellipsis_0.3.1

Created on 2021-04-28 by the [reprex package]( (v2.0.0)




