Analysis using TCGAbiolinks
2
2
Entering edit mode
fawazfebin ▴ 60
@fawazfebin-14053
Last seen 4.4 years ago

Hi 

I would like to analyse all the available triple negative breast cancer data sets in TCGA using TCGAbiolinks . I couldnt find the option for this type of cancer in their manual. Can anyone help please ? Thanks in advance

tcgabiolinks triple negative breast cancer data sets • 2.8k views
ADD COMMENT
3
Entering edit mode
@antoniocolaprico-14504
Last seen 7.0 years ago
USA/ Florida/ University of Miami Hospi…

Hi fawazfebin thank you for interest in using our tool TCGAbiolinks, and pb.panigrahi86 for helping to find a solution. I am sharing here the code to obtain the triple negative TCGA-BRCA and I will add it in our manual. If you have any other questions or issues please you write here or in https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues where if me or Tiago tiagochst cannot answer you it is possible that you can have a prompt response from the GitHub community as well. Have a nice day and good work. Best, Antonio.

library(TCGAbiolinks)

#-------------------  4.1 Parameter Definition                 --------------------

CancerProject <- "TCGA-BRCA"
DataDirectory <- paste0("GDC_",gsub("-","_",CancerProject))
FileNameData <- paste0(DataDirectory, "_","Illumina HiSeq",".rda")

# Query platform Illumina HiSeq with a list of barcode 
query <- GDCquery(project = CancerProject, 
                  data.category = "Gene expression",
                  data.type = "Gene expression quantification",
                  platform = "Illumina HiSeq", 
                  file.type = "results",
                  experimental.strategy = "RNA-Seq",
                  legacy = TRUE)

samplesDown <- query$results[[1]]$cases

dataAssy.sub <- TCGAquery_subtype(tumor = gsub("TCGA-","",CancerProject))

dataERneg <- dataAssy.sub[dataAssy.sub$ER.Status %in% "Negative",]
dataPRneg <- dataAssy.sub[dataAssy.sub$PR.Status %in% "Negative",]
dataHER2neg <- dataAssy.sub[dataAssy.sub$HER2.Final.Status %in% "Negative",]

dataTNBC <- Reduce(intersect, list(dataERneg$patient, 
                                   dataPRneg$patient,
                                   dataHER2neg$patient))

dataSmTP <- TCGAquery_SampleTypes(barcode = samplesDown,
                                  typesample = "TP")

dataSmNT <- TCGAquery_SampleTypes(barcode = samplesDown,
                                  typesample = "NT")

dataSmTP_TNBC <- dataSmTP[substr(dataSmTP,1,12) %in% dataTNBC]

queryDown <- GDCquery(project = CancerProject, 
                      data.category = "Gene expression",
                      data.type = "Gene expression quantification",
                      platform = "Illumina HiSeq", 
                      file.type = "results",
                      barcode = c(dataSmTP_TNBC, dataSmTP),
                      experimental.strategy = "RNA-Seq",
                      legacy = TRUE)

GDCdownload(query = queryDown,
            directory = DataDirectory)

dataPrep <- GDCprepare(query = queryDown, 
                       save = TRUE, 
                       directory =  DataDirectory,
                       save.filename = FileNameData)

 

 

ADD COMMENT
0
Entering edit mode

 

Great thanks Antonio for the specific code. Can I know whether there is only gene expression data available for triple negative breast cancer?

 

ADD REPLY
0
Entering edit mode

I got the following warning message while running one of the commands above:

> queryDown <- GDCquery(project = CancerProject, 
+                       data.category = "Gene expression",
+                       data.type = "Gene expression quantification",
+                       platform = "Illumina HiSeq", 
+                       file.type = "results",
+                       barcode = c(dataSmTP_TNBC, dataSmTP),
+                       experimental.strategy = "RNA-Seq",
+                       legacy = TRUE)
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg19
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-BRCA
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By experimental.strategy
ooo By data.type
ooo By file.type
ooo By barcode
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
Warning: There are more than one file for the same case. Please verify query results. You can use the command View(getResults(query)) in rstudio
ooo Check if there results for the query
-------------------
o Preparing output
-------------------

Anything to be cleared at this point?  Great thanks in advance.

 

 

 

 

 

ADD REPLY
0
Entering edit mode

Thanks for this detailed response Antonio. Has the TCGAquery_subtype() been updated somehow? The data frame that I retrieve does not contain the columns you mention: "ER.Status", "PR.Status", "HER2.Final.Status"? Is there any other way I can retrieve this information? I should say, I also tried

TCGA_MolecularSubtype("TCGA-60-2721-01A-01R-0851-07") # just using the vignette example barcode

But this retrieved an empty data frame for some reason - tried other barcodes, but same result! Not sure what is going wrong, to be honest, so any help would be greatly appreciated!

Best,

Ralitsa

ADD REPLY
1
Entering edit mode
@pbpanigrahi86-14641
Last seen 7.0 years ago

From clinical data, you have to filter samples whose ER/PR/HER2 status is negative. Once you get sample ids, you can use these to fetch data for these samples.

Meanwhile, I will see if I can provide you sample code for doing that.

ADD COMMENT
1
Entering edit mode
Thanks for the help. I will be grateful if you could provide the code.
ADD REPLY

Login before adding your answer.

Traffic: 572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6