Entering edit mode
Dear List,
I ran a search and received a warning " Check if there are duplicated cases" Later when dowloading the program told me " There are samples duplicated. We will not be able to prepare it"
How do I get rid of the duplicate files? My code and session
> library(TCGAbiolinks)
> library(xlsx)
> library(DT)
> library(edgeR)
> library(org.Hs.eg.db)
>
> query.cnv <- GDCquery(project = "TCGA-LUAD", data.category = "Copy Number Variation", data.type = "Gene Level Copy Number",platform="Affymetrix SNP 6.0",legacy=FALSE)
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-LUAD
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By data.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
Warning: There are more than one file for the same case. Please verify query results. You can use the command View(getResults(query)) in rstudio
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Hs.eg.db_3.14.0 AnnotationDbi_1.56.1 IRanges_2.28.0 S4Vectors_0.32.2
[5] Biobase_2.54.0 BiocGenerics_0.40.0 edgeR_3.36.0 limma_3.50.0
[9] DT_0.19 xlsx_0.6.5 TCGAbiolinks_2.23.1
loaded via a namespace (and not attached):
[1] bitops_1.0-7 matrixStats_0.61.0 bit64_4.0.5
[4] filelock_1.0.2 progress_1.2.2 httr_1.4.2
[7] GenomeInfoDb_1.30.0 tools_4.1.1 utf8_1.2.2
[10] R6_2.5.1 DBI_1.1.1 colorspace_2.0-2
[13] tidyselect_1.1.1 prettyunits_1.1.1 bit_4.0.4
[16] curl_4.3.2 compiler_4.1.1 rvest_1.0.2
[19] xml2_1.3.2 DelayedArray_0.20.0 scales_1.1.1
[22] readr_2.0.2 rappdirs_0.3.3 stringr_1.4.0
[25] digest_0.6.28 R.utils_2.11.0 XVector_0.34.0
[28] pkgconfig_2.0.3 htmltools_0.5.2 MatrixGenerics_1.6.0
[31] dbplyr_2.1.1 fastmap_1.1.0 highr_0.9
[34] htmlwidgets_1.5.4 rlang_0.4.12 RSQLite_2.2.8
[37] generics_0.1.1 jsonlite_1.7.2 dplyr_1.0.7
[40] R.oo_1.24.0 RCurl_1.98-1.5 magrittr_2.0.1
[43] GenomeInfoDbData_1.2.7 Matrix_1.3-4 Rcpp_1.0.7
[46] munsell_0.5.0 fansi_0.5.0 lifecycle_1.0.1
[49] R.methodsS3_1.8.1 stringi_1.7.5 SummarizedExperiment_1.24.0
[52] zlibbioc_1.40.0 plyr_1.8.6 BiocFileCache_2.2.0
[55] grid_4.1.1 blob_1.2.2 crayon_1.4.2
[58] lattice_0.20-45 Biostrings_2.62.0 xlsxjars_0.6.1
[61] hms_1.1.1 KEGGREST_1.34.0 locfit_1.5-9.4
[64] knitr_1.36 pillar_1.6.4 GenomicRanges_1.46.0
[67] TCGAbiolinksGUI.data_1.14.0 biomaRt_2.50.0 XML_3.99-0.8
[70] glue_1.5.0 downloader_0.4 data.table_1.14.2
[73] png_0.1-7 vctrs_0.3.8 tzdb_0.2.0
[76] gtable_0.3.0 purrr_0.3.4 tidyr_1.1.4
[79] assertthat_0.2.1 cachem_1.0.6 ggplot2_3.3.5
[82] xfun_0.28 tibble_3.1.6 rJava_1.0-5
[85] memoise_2.0.0 ellipsis_0.3.2
>
Thanks and best wishes,
Rich
Richard Friedman,
Columbia University