Entering edit mode
Problem
I'm running a script to generate pathway visualisations for a list of interesting pathways. The script uses pathview
. On most of my pathways this works, but with some of them, I get an error.
Error message
Error in cut.default(node.summary, data.cuts, include.lowest = TRUE, right = F) :
'breaks' are not unique
Reproducible example
# command
pv_out = pathview(
gene.data = geneData,
pathway.id = "05120",
species = "hsa",
same.layer = TRUE,
out.suffix = "Epithelial_cell_signaling_in_Helicobacter_pylori_infection",
limit = list(
gene = c(-3, 8),
cpd = c(1, 1)
)
)
# where "geneData" is the following:
> geneData
lfc_mean
51606 2.2439377
537 1.9637509
4790 0.4912091
10312 2.2125236
1839 5.5535664
533 1.1882902
529 0.9639135
526 1.6909496
245972 4.5743062
6868 2.5705107
528 1.4247033
5603 3.4382452
50848 1.8549717
9114 1.3987230
3576 7.4946791
5970 1.1179397
3725 3.8582940
6300 -2.7698144
# resulting error
Error in cut.default(node.summary, data.cuts, include.lowest = TRUE, right = F) :
'breaks' are not unique
Thanks in advance for your help.
Session info
# sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] pathview_1.30.1 ggnewscale_0.4.5 forcats_0.5.1
[4] stringr_1.4.0 dplyr_1.0.3 purrr_0.3.4
[7] readr_1.4.0 tidyr_1.1.2 tibble_3.0.6
[10] ggplot2_3.3.3 tidyverse_1.3.0 biomaRt_2.46.3
[13] org.Hs.eg.db_3.12.0 AnnotationDbi_1.52.0 IRanges_2.24.1
[16] S4Vectors_0.28.1 Biobase_2.50.0 BiocGenerics_0.36.0
[19] clusterProfiler_3.18.1
loaded via a namespace (and not attached):
[1] readxl_1.3.1 shadowtext_0.0.7 backports_1.2.1
[4] fastmatch_1.1-0 BiocFileCache_1.14.0 plyr_1.8.6
[7] igraph_1.2.6 splines_4.0.5 BiocParallel_1.24.1
[10] GenomeInfoDb_1.26.2 digest_0.6.27 GOSemSim_2.16.1
[13] viridis_0.5.1 GO.db_3.12.1 magrittr_2.0.1
[16] memoise_2.0.0 Biostrings_2.58.0 annotate_1.68.0
[19] graphlayouts_0.7.1 modelr_0.1.8 matrixStats_0.58.0
[22] askpass_1.1 enrichplot_1.10.2 prettyunits_1.1.1
[25] colorspace_2.0-0 blob_1.2.1 rvest_0.3.6
[28] rappdirs_0.3.3 ggrepel_0.9.1 haven_2.3.1
[31] crayon_1.4.0 RCurl_1.98-1.2 jsonlite_1.7.2
[34] graph_1.68.0 scatterpie_0.1.5 genefilter_1.72.1
[37] survival_3.2-7 glue_1.4.2 polyclip_1.10-0
[40] gtable_0.3.0 zlibbioc_1.36.0 XVector_0.30.0
[43] DelayedArray_0.16.1 Rgraphviz_2.34.0 scales_1.1.1
[46] DOSE_3.16.0 DBI_1.1.1 Rcpp_1.0.6
[49] viridisLite_0.3.0 xtable_1.8-4 progress_1.2.2
[52] bit_4.0.4 httr_1.4.2 fgsea_1.16.0
[55] RColorBrewer_1.1-2 ellipsis_0.3.1 pkgconfig_2.0.3
[58] XML_3.99-0.5 farver_2.0.3 dbplyr_2.0.0
[61] locfit_1.5-9.4 tidyselect_1.1.0 labeling_0.4.2
[64] rlang_0.4.10 reshape2_1.4.4 munsell_0.5.0
[67] cellranger_1.1.0 tools_4.0.5 cachem_1.0.1
[70] downloader_0.4 cli_2.3.0 generics_0.1.0
[73] RSQLite_2.2.3 broom_0.7.4 fastmap_1.1.0
[76] bit64_4.0.5 fs_1.5.0 tidygraph_1.2.0
[79] KEGGREST_1.30.1 ggraph_2.0.5 KEGGgraph_1.50.0
[82] DO.db_2.9 xml2_1.3.2 compiler_4.0.5
[85] rstudioapi_0.13 curl_4.3 png_0.1-7
[88] reprex_1.0.0 tweenr_1.0.1 geneplotter_1.68.0
[91] stringi_1.5.3 lattice_0.20-41 Matrix_1.3-2
[94] vctrs_0.3.6 pillar_1.4.7 lifecycle_0.2.0
[97] BiocManager_1.30.10 data.table_1.13.6 cowplot_1.1.1
[100] bitops_1.0-6 GenomicRanges_1.42.0 qvalue_2.22.0
[103] R6_2.5.0 gridExtra_2.3 MASS_7.3-53
[106] assertthat_0.2.1 SummarizedExperiment_1.20.0 openssl_1.4.3
[109] DESeq2_1.30.0 withr_2.4.1 GenomeInfoDbData_1.2.4
[112] hms_1.0.0 grid_4.0.5 rvcheck_0.1.8
[115] MatrixGenerics_1.2.1 ggforce_0.3.3 lubridate_1.7.9.2
Progress
It looks like when fetching data from BioMart, some ensembl gene IDs produce multiple rows with the same entrez gene IDs. This could be part of the problem.