I'm trying to download SKCM melanoma samples to R, using the package TCGAbiolinks
. The wanted data is RNA-seq expression matrix, along with the metadata for the samples, like tumour purity, LDH levels, and some more wanted features. Pretty basic.
This is the code right from the beginning:
library(TCGAbiolinks)
GDCprojects = getGDCprojects()
TCGAbiolinks:::getProjectSummary("TCGA-SKCM")
query_TCGA = GDCquery(
project = "TCGA-SKCM",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
experimental.strategy = "RNA-Seq",
workflow.type = "STAR - Counts",
sample.type = c("Primary Tumor")) # picked primary
skcm_res = getResults(query_TCGA) # make results as table
GDCdownload(query = query_TCGA)
tcga_data = GDCprepare(query_TCGA)
However, I'm getting this error and I don't get what is the problem. I tried using metastatic samples instead of primary, and tried without entering the data.type
parameter, didn't work. So What does this mean and how do I fix this error? thank you.
> tcga_data = GDCprepare(query_TCGA)
|=================================================================================|100% Completed after 24 s
Error in `vectbl_as_col_location()`:
! Can't subset columns past the end.
ℹ Locations 2, 3, and 4 don't exist.
ℹ There is only 1 column.
Run `rlang::last_error()` to see where the error occurred.
There were 50 or more warnings (use warnings() to see the first 50)
Note: if there is some other recommended way to download this type of data, using other packages for example, I'll be more than glad to hear. Thanks!
You can try the Genomic Data Commons Data Portal at
https://portal.gdc.cancer.gov/
or the
GenomicDataCommons
R/Bioconductor package