Hi, I was trying to do a differential expression expression analysis using TCGAbiolinks. The tumour data was queried and downloaded using GDCquery and GDCdownload. The GDCprepare command gave the following error.
library(TCGAbiolinks)
query.colon.cancer <- GDCquery(project = "TCGA-COAD", legacy = TRUE, data.category = "Gene expression", data.type = "Gene expression quantification", experimental.strategy = "RNA-Seq", sample.type = "Primary solid Tumor", file.type = "normalized_results")
GDCdownload(query.colon.cancer, files.per.chunk = 200)
prep.colon.cancer <- GDCprepare(query = query.colon.cancer, save = TRUE, summarizedExperiment = TRUE, save.filename = "COLON_CANCER.rda")
tags | cases | experimental_strategy | |
---|---|---|---|
298 | c("normalized", "gene", "v2") | TCGA-A6-2682-01A-01R-1410-07 | RNA-Seq |
381 | c("normalized", "gene", "v2") | TCGA-A6-2682-01A-01R-1410-07 | RNA-Seq |
290 | c("gene", "normalized", "v2") | TCGA-A6-2684-01A-01R-1410-07 | RNA-Seq |
437 | c("gene", "normalized", "v2") | TCGA-A6-2684-01A-01R-1410-07 | RNA-Seq |
197 | c("gene", "normalized", "v2") | TCGA-A6-2685-01A-01R-1410-07 | RNA-Seq |
234 | c("gene", "normalized", "v2") | TCGA-A6-2685-01A-01R-1410-07 | RNA-Seq |
400 | c("gene", "normalized", "v2") | TCGA-AA-3492-01A-01R-1410-07 | RNA-Seq |
450 | c("gene", "normalized", "v2") | TCGA-AA-3492-01A-01R-1410-07 | RNA-Seq |
304 | c("gene", "normalized", "v2") | TCGA-AA-3495-01A-01R-1410-07 | RNA-Seq |
431 | c("gene", "normalized", "v2") | TCGA-AA-3495-01A-01R-1410-07 | RNA-Seq |
159 | c("gene", "normalized", "v2") | TCGA-AA-3502-01A-01R-1410-07 | RNA-Seq |
161 | c("gene", "normalized", "v2") | TCGA-AA-3502-01A-01R-1410-07 | RNA-Seq |
413 | c("gene", "normalized", "v2") | TCGA-AA-3506-01A-01R-1410-07 | RNA-Seq |
478 | c("gene", "normalized", "v2") | TCGA-AA-3506-01A-01R-1410-07 | RNA-Seq |
231 | c("gene", "normalized", "v2") | TCGA-AA-3509-01A-01R-1410-07 | RNA-Seq |
253 | c("gene", "normalized", "v2") | TCGA-AA-3509-01A-01R-1410-07 | RNA-Seq |
102 | c("gene", "normalized", "v2") | TCGA-AA-A01P-01A-21R-A083-07 | RNA-Seq |
185 | c("gene", "normalized", "v2") | TCGA-AA-A01P-01A-21R-A083-07 | RNA-Seq |
34 | c("gene", "normalized", "v2") | TCGA-AA-A01X-01A-21R-A083-07 | RNA-Seq |
175 | c("gene", "normalized", "v2") | TCGA-AA-A01X-01A-21R-A083-07 | RNA-Seq |
101 | c("gene", "normalized", "v2") | TCGA-AA-A01Z-01A-11R-A083-07 | RNA-Seq |
454 | c("gene", "normalized", "v2") | TCGA-AA-A01Z-01A-11R-A083-07 | RNA-Seq |
382 | c("gene", "normalized", "v2") | TCGA-AZ-4313-01A-01R-1410-07 | RNA-Seq |
446 | c("gene", "normalized", "v2") | TCGA-AZ-4313-01A-01R-1410-07 | RNA-Seq |
133 | c("gene", "normalized", "v2") | TCGA-AZ-4315-01A-01R-1410-07 | RNA-Seq |
194 | c("gene", "normalized", "v2") | TCGA-AZ-4315-01A-01R-1410-07 | RNA-Seq |
22 | c("gene", "normalized", "v2") | TCGA-AZ-4614-01A-01R-1410-07 | RNA-Seq |
412 | c("gene", "normalized", "v2") | TCGA-AZ-4614-01A-01R-1410-07 | RNA-Seq |
233 | c("normalized", "gene", "v2") | TCGA-AZ-4615-01A-01R-1410-07 | RNA-Seq |
432 | c("normalized", "gene", "v2") | TCGA-AZ-4615-01A-01R-1410-07 | RNA-Seq |
306 | c("gene", "normalized", "v2") | TCGA-AZ-4684-01A-01R-1410-07 | RNA-Seq |
475 | c("gene", "normalized", "v2") | TCGA-AZ-4684-01A-01R-1410-07 | RNA-Seq |
198 | c("gene", "normalized", "v2") | TCGA-CA-5256-01A-01R-1410-07 | RNA-Seq |
395 | c("gene", "normalized", "v2") | TCGA-CA-5256-01A-01R-1410-07 | RNA-Seq |
174 | c("gene", "normalized", "v2") | TCGA-CK-4951-01A-01R-1410-07 | RNA-Seq |
465 | c("gene", "normalized", "v2") | TCGA-CK-4951-01A-01R-1410-07 | RNA-Seq |
237 | c("gene", "normalized", "v2") | TCGA-CM-4747-01A-01R-1410-07 | RNA-Seq |
387 | c("gene", "normalized", "v2") | TCGA-CM-4747-01A-01R-1410-07 | RNA-Seq |
**Error in GDCprepare(query = query.colon.cancer, save = TRUE, summarizedExperiment = TRUE, : There are samples duplicated. We will not be able to prepare it
sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LCCOLLATE=EnglishUnited States.1252 LCCTYPE=EnglishUnited States.1252
[3] LCMONETARY=EnglishUnited States.1252 LCNUMERIC=C
[5] LCTIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] TCGAbiolinks_2.12.6
loaded via a namespace (and not attached):
[1] colorspace1.4-1 ggsignif0.6.0 selectr0.4-1
[4] rjson0.2.20 hwriter1.3.2 circlize0.4.8
[7] XVector0.24.0 GenomicRanges1.36.1 GlobalOptions0.1.1
[10] clue0.3-57 ggpubr0.2.3 matlab1.0.2
[13] ggrepel0.8.1 bit640.9-7 AnnotationDbi1.46.1
[16] xml21.2.2 codetools0.2-16 splines3.6.1
[19] R.methodsS31.7.1 doParallel1.0.15 DESeq1.36.0
[22] geneplotter1.62.0 knitr1.25 zeallot0.1.0
[25] jsonlite1.6 Rsamtools2.0.3 km.ci0.5-2
[28] broom0.5.2 annotate1.62.0 cluster2.1.0
[31] png0.1-7 R.oo1.22.0 BiocManager1.30.7
[34] readr1.3.1 compiler3.6.1 httr1.4.1
[37] backports1.1.5 assertthat0.2.1 Matrix1.2-17
[40] lazyeval0.2.2 limma3.40.6 prettyunits1.0.2
[43] tools3.6.1 gtable0.3.0 glue1.3.1
[46] GenomeInfoDbData1.2.1 dplyr0.8.3 ggthemes4.2.0
[49] ShortRead1.42.0 Rcpp1.0.2 Biobase2.44.0
[52] vctrs0.2.0 Biostrings2.52.0 nlme3.1-141
[55] rtracklayer1.44.4 iterators1.0.12 xfun0.10
[58] stringr1.4.0 rvest0.3.4 lifecycle0.1.0
[61] XML3.98-1.20 edgeR3.26.8 zoo1.8-6
[64] zlibbioc1.30.0 scales1.0.0 aroma.light3.14.0
[67] hms0.5.1 parallel3.6.1 SummarizedExperiment1.14.1
[70] RColorBrewer1.1-2 curl4.2 ComplexHeatmap2.0.0
[73] memoise1.1.0 gridExtra2.3 KMsurv0.1-5
[76] ggplot23.2.1 downloader0.4 biomaRt2.40.5
[79] latticeExtra0.6-28 stringi1.4.3 RSQLite2.1.2
[82] highr0.8 genefilter1.66.0 S4Vectors0.22.1
[85] foreach1.4.7 GenomicFeatures1.36.4 BiocGenerics0.30.0
[88] BiocParallel1.18.1 shape1.4.4 GenomeInfoDb1.20.0
[91] rlang0.4.0 pkgconfig2.0.3 matrixStats0.55.0
[94] bitops1.0-6 lattice0.20-38 purrr0.3.2
[97] GenomicAlignments1.20.1 bit1.1-14 tidyselect0.2.5
[100] plyr1.8.4 magrittr1.5 R62.4.0
[103] IRanges2.18.3 generics0.0.2 DelayedArray0.10.0
[106] DBI1.0.0 mgcv1.8-28 pillar1.4.2
[109] survival2.44-1.1 RCurl1.95-4.12 tibble2.1.3
[112] EDASeq2.18.0 crayon1.3.4 survMisc0.5.5
[115] GetoptLong0.1.7 progress1.2.2 locfit1.5-9.1
[118] grid3.6.1 sva3.32.1 data.table1.12.4
[121] blob1.2.0 ConsensusClusterPlus1.48.0 digest0.6.21
[124] xtable1.8-4 tidyr1.0.0 R.utils2.9.0
[127] stats43.6.1 munsell0.5.0 survminer_0.4.6
I am unable to remove the rows corresponding to the duplicated files by mentioning the row numbers.Can anyone help please? Great thanks in advance.