I downloaded count dataset for Esophugus tissues from ReCount and found it contains 198 samples, which is different from what I got from TCGA web-site. Actually I could downloaded 185 samples from the TCGA site. Can you let me know where this difference is came from?
Leo pointed me in the direction of your question. I'm not positive where/how Firebrowse filtered their data (as I'm less familiar with that resource and didn't look into it) to reach 185 samples; however, I think I may have found the answer to your question. It looks as though the 185 samples are tumor type samples, while the other 13 are 'normal types' (see code below). TCGA barcode explanations can be further explored here.
## download metadata for recount TCGA data
recount::all_metadata('TCGA') -> md
## just look at the Esophageal samples
md_sub = subset(md,md$xml_primary_pathology_tumor_tissue_site=="Esophagus")
## take a look at sample type information
# Tumor types range from 01 - 09, normal types from 10 - 19 and control samples from 20 - 29
table(md_sub$gdc_cases.samples.sample_type_id)
01 06 11
184 1 13
Let me (er...us) know if anything is unclear or you have further questions!
Hi,
Are you using the recount Bioconductor package? If so, please follow the posting guidelines http://www.bioconductor.org/help/support/posting-guide/. Specifically, post some reproducible code and session information.
If you are using the recount website, can you post the link of the file you downloaded?
Finally, can you provide reproducible information for how you got 185 samples from the TCGA website?
Thanks,
Leonardo
Hi Leonardo,
Thank you for your reply. I downloaded the file from the recount website and the link that I downloaded is HERE
Also the number of 185 samples was checked and downloaded from the Firebrowse and GDC Data portal in TCGA.
Thanks,
Jungsoo