Question

How to obtain clinical data from TCGA via Bioconductor GenomicDataCommons

1

Entering edit mode

Hashirama ▴ 10

@0df7ded5

Last seen 3.4 years ago

Germany

Dear community,

I am totally new to TCGA and Bioconductor and I am really confused how to obtain more clinical data (e.g. for survival analysis, gender, RNA-seq read count data, ...) from some cases I got. For every "patient" I have

gdc_file_uuid (e.g. 52F6329C-CDC6-4196-A4A0-58952332905C)
filename (e.g. UNCID_1552290.d6b7779f-a245-48ee-b9a8-2570c023a531.sorted_genome_alignments.bam)
case_uuid (e.g. 2be42cc2-9b97-4821-afc2-d1e42eb3932d)

How can I use this in the R package GenomicDataCommons to get more clinical data?

I would be glad for any help!

Kind regards, Hashirama

TCGA GenomicDataCommons • 1.5k views

ADD COMMENT • link updated 3.4 years ago by Robert Castelo ★ 3.4k • written 3.4 years ago by Hashirama ▴ 10

1

Entering edit mode

cross-posted: https://www.biostars.org/p/9499402

ADD REPLY • link 3.4 years ago Robert Castelo ★ 3.4k

score 1 · Accepted Answer · 2021-11-30

The GenomicDataCommons package can take a set of uuids for the cases to get quite a bit of clinical detail. See available_expand(cases()) for the types of data that can be returned. Here is some code to get you started

library(GenomicDataCommons)
cases() %>% 
  expand(c('diagnoses','demographic','diagnoses.pathology_details')) %>% 
  GenomicDataCommons::filter(case_id %in% c("2be42cc2-9b97-4821-afc2-d1e42eb3932d"))  %>% 
  results() %>% 
  tibble::as_tibble() %>% 
  dplyr::glimpse()

Results:

Rows: 1
Columns: 22
$ id                      <chr> "2be42cc2-9b97-4821-afc2-d1e42eb3932d"
$ slide_ids               <named list> <"9a182c4a-6085-4829-a3d0-c46114f0875b", "4236…
$ submitter_slide_ids     <named list> <"TCGA-HZ-7926-01Z-00-DX1", "TCGA-HZ-79…
$ disease_type            <chr> "Ductal and Lobular Neoplasms"
$ analyte_ids             <named list> <"05fce9a0-fa4d-4a30-ad33-a4f04bf84abf"…
$ submitter_id            <chr> "TCGA-HZ-7926"
$ submitter_analyte_ids   <named list> <"TCGA-HZ-7926-01A-11R", "TCGA-HZ-7926-10A-01W…
$ aliquot_ids             <named list> <"1925e7c2-1730-48a4-8257-772fc4448d9b"…
$ submitter_aliquot_ids   <named list> <"TCGA-HZ-7926-10A-01D-2153-01", "TCGA-HZ-7926…
$ diagnoses               <named list> [<data.frame[1 x 28]>]
$ diagnosis_ids           <named list> "f172c483-6888-5e06-9e5c-0b2bb4be64dd"
$ created_datetime        <lgl> NA
$ sample_ids              <named list> <"8b7bd592-74f0-48e3-9e21-8005ab8d419e"…
$ demographic             <df[,14]> <data.frame[1 x 14]>
$ submitter_sample_ids    <named list> <"TCGA-HZ-7926-01A", "TCGA-HZ-7926-10A"…
$ submitter_diagnosis_ids <named list> "TCGA-HZ-7926_diagnosis"
$ primary_site            <chr> "Pancreas"
$ updated_datetime        <chr> "2019-08-06T14:42:37.317113-05:00"
$ case_id                 <chr> "2be42cc2-9b97-4821-afc2-d1e42eb3932d"
$ portion_ids             <named list> <"de913076-84e6-4ed7-8f2f-16cdd2a7f7b0"…
$ state                   <chr> "released"
$ submitter_portion_ids   <named list> <"TCGA-HZ-7926-01A-11", "TCGA-HZ-7926-1…