Hi,
I have a bunch of uuid's from TCGA RNA-seq samples and would like to get the metadata for them. Apparently you can get some basic info by going to https://gdc-portal.nci.nih.gov/search/c?filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.program.name%22,%22value%22:%5B%22TCGA%22%5D%7D%7D%5D%7D&facetTab=cases and clicking on export, but that has failed for me the past two days. If you follow the links for one sample, you can end up at https://gdc-portal.nci.nih.gov/cases/0004d251-3f70-4395-b175-c94c2f5b1b81 where you can download the clinical info for a sample.
I've looked around at TCGAbiolinks and couldn't find a way to query GDC when I have a uuid such as FFA5FFF7-6301-4CD8-8E63-A4D8294D1B0E. Is there a way to do so with TCGAbiolinks? If not, how do you suggest I should proceed?
Thank you,
Leo
> packageVersion('TCGAbiolinks')
[1] ‘2.2.1’
Hi Tiago,
The uuid I have is from https://github.com/nellore/runs/blob/105c86de2ef91846f015f5b8285a7d6e29e0fcfc/tcga/tcga_batch_0.manifest#L236 which was created with https://github.com/nellore/runs/blob/105c86de2ef91846f015f5b8285a7d6e29e0fcfc/tcga/true_manifest.py that uses the output from https://github.com/nellore/runs/blob/105c86de2ef91846f015f5b8285a7d6e29e0fcfc/tcga/tcga_file_list.py.
And hm... it's a shame that the BARCODE/UUID api no longer exists. I'm guessing that you are talking about https://wiki.nci.nih.gov/display/TCGA/TCGA+Barcode+to+UUID+Web+Service+User%27s+Guide, right?
Best,
Leo
Yes, that is the old API. It is not working anymore.
For the FFA5FFF7-6301-4CD8-8E63-A4D8294D1B0E If you have this
https://github.com/nellore/runs/blob/105c86de2ef91846f015f5b8285a7d6e29e0fcfc/tcga/tcga_batch_0.manifest#L236
You can use this to get to the file (which is the submitter_id) and search in GDC.
/Datasets/tcga/TCGA-COAD/28033279-cc74-4775-afdf-2497f6ddb55c/analysis/154aa297-0890-4fde-a8c1-2058a4c65b28/data/UNCID_2212217.4a01323f-408b-4e74-8686-ee6d4d076ee8.110302_UNC6-RDR300211_00066_FC_62J5EAAXX_3.tar.gz 0 FFA5FFF7-6301-4CD8-8E63-A4D8294D1B0E
There is no function to map UUID to BARCODE in TCGAbiolinks, but as they mapped the UUID to the file id. We could create a table, but I believe that is too much work. Did you send an email to GDC team (https://gdc.cancer.gov/contact-us) they might have a solution?
I was able to create a function to map to barcode, map that helps you.
What type of metadata do you want?
Awesome! Thanks!
I'm not super familiar with TCGA, but well, basically we would like to get all the metadata associated with a given RNA-seq sample. That is, information about the person (clinical?) and the RNA-seq sample itself if there is any. Is there other information you think might be useful?
Actually, you are getting all that is available, but there are some mark papers that have already make some studies on some samples. Maybe you can use it.
Thanks for the help... Just to update for others who may need this, the following line of code has changed from line 8 to:
baseURL <- ifelse(legacy,"https://api.gdc.cancer.gov/legacy/files/?","https://api.gdc.cancer.gov/files/?")
* EDIT * this code currently does not accurately translate legacy UUIDs to barcodes. I manually checked using the GDC legacy archive. Please use the code explained in Sean Davis' blog (https://seandavi.github.io/2017/12/genomicdatacommons-example-uuid-to-tcga-and-target-barcode-translation/) for accurate translation of legacy IDs to barcodes.