GDCprepare error in TCGAbiolinks
0
0
Entering edit mode
fawazfebin ▴ 60
@fawazfebin-14053
Last seen 4.3 years ago

Hi, I was trying to do a differential expression expression analysis using TCGAbiolinks. The tumour data was queried and downloaded using GDCquery and GDCdownload. The GDCprepare command gave the following error.

library(TCGAbiolinks)

query.colon.cancer <- GDCquery(project = "TCGA-COAD", legacy = TRUE, data.category = "Gene expression", data.type = "Gene expression quantification", experimental.strategy = "RNA-Seq", sample.type = "Primary solid Tumor", file.type = "normalized_results")

GDCdownload(query.colon.cancer, files.per.chunk = 200)

prep.colon.cancer <- GDCprepare(query = query.colon.cancer, save = TRUE, summarizedExperiment = TRUE, save.filename = "COLON_CANCER.rda")

tags cases experimental_strategy
298 c("normalized", "gene", "v2") TCGA-A6-2682-01A-01R-1410-07 RNA-Seq
381 c("normalized", "gene", "v2") TCGA-A6-2682-01A-01R-1410-07 RNA-Seq
290 c("gene", "normalized", "v2") TCGA-A6-2684-01A-01R-1410-07 RNA-Seq
437 c("gene", "normalized", "v2") TCGA-A6-2684-01A-01R-1410-07 RNA-Seq
197 c("gene", "normalized", "v2") TCGA-A6-2685-01A-01R-1410-07 RNA-Seq
234 c("gene", "normalized", "v2") TCGA-A6-2685-01A-01R-1410-07 RNA-Seq
400 c("gene", "normalized", "v2") TCGA-AA-3492-01A-01R-1410-07 RNA-Seq
450 c("gene", "normalized", "v2") TCGA-AA-3492-01A-01R-1410-07 RNA-Seq
304 c("gene", "normalized", "v2") TCGA-AA-3495-01A-01R-1410-07 RNA-Seq
431 c("gene", "normalized", "v2") TCGA-AA-3495-01A-01R-1410-07 RNA-Seq
159 c("gene", "normalized", "v2") TCGA-AA-3502-01A-01R-1410-07 RNA-Seq
161 c("gene", "normalized", "v2") TCGA-AA-3502-01A-01R-1410-07 RNA-Seq
413 c("gene", "normalized", "v2") TCGA-AA-3506-01A-01R-1410-07 RNA-Seq
478 c("gene", "normalized", "v2") TCGA-AA-3506-01A-01R-1410-07 RNA-Seq
231 c("gene", "normalized", "v2") TCGA-AA-3509-01A-01R-1410-07 RNA-Seq
253 c("gene", "normalized", "v2") TCGA-AA-3509-01A-01R-1410-07 RNA-Seq
102 c("gene", "normalized", "v2") TCGA-AA-A01P-01A-21R-A083-07 RNA-Seq
185 c("gene", "normalized", "v2") TCGA-AA-A01P-01A-21R-A083-07 RNA-Seq
34 c("gene", "normalized", "v2") TCGA-AA-A01X-01A-21R-A083-07 RNA-Seq
175 c("gene", "normalized", "v2") TCGA-AA-A01X-01A-21R-A083-07 RNA-Seq
101 c("gene", "normalized", "v2") TCGA-AA-A01Z-01A-11R-A083-07 RNA-Seq
454 c("gene", "normalized", "v2") TCGA-AA-A01Z-01A-11R-A083-07 RNA-Seq
382 c("gene", "normalized", "v2") TCGA-AZ-4313-01A-01R-1410-07 RNA-Seq
446 c("gene", "normalized", "v2") TCGA-AZ-4313-01A-01R-1410-07 RNA-Seq
133 c("gene", "normalized", "v2") TCGA-AZ-4315-01A-01R-1410-07 RNA-Seq
194 c("gene", "normalized", "v2") TCGA-AZ-4315-01A-01R-1410-07 RNA-Seq
22 c("gene", "normalized", "v2") TCGA-AZ-4614-01A-01R-1410-07 RNA-Seq
412 c("gene", "normalized", "v2") TCGA-AZ-4614-01A-01R-1410-07 RNA-Seq
233 c("normalized", "gene", "v2") TCGA-AZ-4615-01A-01R-1410-07 RNA-Seq
432 c("normalized", "gene", "v2") TCGA-AZ-4615-01A-01R-1410-07 RNA-Seq
306 c("gene", "normalized", "v2") TCGA-AZ-4684-01A-01R-1410-07 RNA-Seq
475 c("gene", "normalized", "v2") TCGA-AZ-4684-01A-01R-1410-07 RNA-Seq
198 c("gene", "normalized", "v2") TCGA-CA-5256-01A-01R-1410-07 RNA-Seq
395 c("gene", "normalized", "v2") TCGA-CA-5256-01A-01R-1410-07 RNA-Seq
174 c("gene", "normalized", "v2") TCGA-CK-4951-01A-01R-1410-07 RNA-Seq
465 c("gene", "normalized", "v2") TCGA-CK-4951-01A-01R-1410-07 RNA-Seq
237 c("gene", "normalized", "v2") TCGA-CM-4747-01A-01R-1410-07 RNA-Seq
387 c("gene", "normalized", "v2") TCGA-CM-4747-01A-01R-1410-07 RNA-Seq

**Error in GDCprepare(query = query.colon.cancer, save = TRUE, summarizedExperiment = TRUE, : There are samples duplicated. We will not be able to prepare it

sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale: [1] LCCOLLATE=EnglishUnited States.1252 LCCTYPE=EnglishUnited States.1252
[3] LCMONETARY=EnglishUnited States.1252 LCNUMERIC=C
[5] LC
TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] TCGAbiolinks_2.12.6

loaded via a namespace (and not attached): [1] colorspace1.4-1 ggsignif0.6.0 selectr0.4-1
[4] rjson
0.2.20 hwriter1.3.2 circlize0.4.8
[7] XVector0.24.0 GenomicRanges1.36.1 GlobalOptions0.1.1
[10] clue
0.3-57 ggpubr0.2.3 matlab1.0.2
[13] ggrepel0.8.1 bit640.9-7 AnnotationDbi1.46.1
[16] xml2
1.2.2 codetools0.2-16 splines3.6.1
[19] R.methodsS31.7.1 doParallel1.0.15 DESeq1.36.0
[22] geneplotter
1.62.0 knitr1.25 zeallot0.1.0
[25] jsonlite1.6 Rsamtools2.0.3 km.ci0.5-2
[28] broom
0.5.2 annotate1.62.0 cluster2.1.0
[31] png0.1-7 R.oo1.22.0 BiocManager1.30.7
[34] readr
1.3.1 compiler3.6.1 httr1.4.1
[37] backports1.1.5 assertthat0.2.1 Matrix1.2-17
[40] lazyeval
0.2.2 limma3.40.6 prettyunits1.0.2
[43] tools3.6.1 gtable0.3.0 glue1.3.1
[46] GenomeInfoDbData
1.2.1 dplyr0.8.3 ggthemes4.2.0
[49] ShortRead1.42.0 Rcpp1.0.2 Biobase2.44.0
[52] vctrs
0.2.0 Biostrings2.52.0 nlme3.1-141
[55] rtracklayer1.44.4 iterators1.0.12 xfun0.10
[58] stringr
1.4.0 rvest0.3.4 lifecycle0.1.0
[61] XML3.98-1.20 edgeR3.26.8 zoo1.8-6
[64] zlibbioc
1.30.0 scales1.0.0 aroma.light3.14.0
[67] hms0.5.1 parallel3.6.1 SummarizedExperiment1.14.1 [70] RColorBrewer1.1-2 curl4.2 ComplexHeatmap2.0.0
[73] memoise1.1.0 gridExtra2.3 KMsurv0.1-5
[76] ggplot2
3.2.1 downloader0.4 biomaRt2.40.5
[79] latticeExtra0.6-28 stringi1.4.3 RSQLite2.1.2
[82] highr
0.8 genefilter1.66.0 S4Vectors0.22.1
[85] foreach1.4.7 GenomicFeatures1.36.4 BiocGenerics0.30.0
[88] BiocParallel
1.18.1 shape1.4.4 GenomeInfoDb1.20.0
[91] rlang0.4.0 pkgconfig2.0.3 matrixStats0.55.0
[94] bitops
1.0-6 lattice0.20-38 purrr0.3.2
[97] GenomicAlignments1.20.1 bit1.1-14 tidyselect0.2.5
[100] plyr
1.8.4 magrittr1.5 R62.4.0
[103] IRanges2.18.3 generics0.0.2 DelayedArray0.10.0
[106] DBI
1.0.0 mgcv1.8-28 pillar1.4.2
[109] survival2.44-1.1 RCurl1.95-4.12 tibble2.1.3
[112] EDASeq
2.18.0 crayon1.3.4 survMisc0.5.5
[115] GetoptLong0.1.7 progress1.2.2 locfit1.5-9.1
[118] grid
3.6.1 sva3.32.1 data.table1.12.4
[121] blob1.2.0 ConsensusClusterPlus1.48.0 digest0.6.21
[124] xtable
1.8-4 tidyr1.0.0 R.utils2.9.0
[127] stats43.6.1 munsell0.5.0 survminer_0.4.6

I am unable to remove the rows corresponding to the duplicated files by mentioning the row numbers.Can anyone help please? Great thanks in advance.

tcgabiolinks gdcprepare differential expression • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6