Entering edit mode
I'm using ROntoTools functions to fetch KEGG Graphs and pathway names. There are pathways returned by the function "keggPathwayNames" that are missing from the set of pathways returned by keggPathwayGraphs.
kegg_mmu <- ROntoTools::keggPathwayGraphs(
organism = "mmu", updateCache = TRUE
)
kpn <- ROntoTools::keggPathwayNames(
organism = "mmu", updateCache = TRUE
)
> length(kegg_mmu)
[1] 222
>
> length(kpn)
[1] 343
>
> length(setdiff(names(kpn), names(kegg_mmu)))
[1] 121
>
> setdiff(names(kpn), names(kegg_mmu))[1:10]
[1] "path:mmu00010" "path:mmu00020" "path:mmu00030" "path:mmu00040" "path:mmu00051" "path:mmu00052" "path:mmu00053" "path:mmu00061" "path:mmu00062"
[10] "path:mmu00071"
I'm not whether there is any rhyme or reason to the missing pathways -- for example path:mmu00020
is the citric acid cycle
Any help in obtaining a complete set of pathways would be greatly appreciated.
See below for session info:
> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.5
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] pheatmap_1.0.12 ggfortify_0.4.14 openxlsx_4.2.5 reticulate_1.25 ggrepel_0.9.1 ggplot2_3.3.6 dplyr_1.0.9
[8] tibble_3.1.7 tidyr_1.2.0 edgeR_3.38.1 limma_3.52.1 org.Mm.eg.db_3.15.0 AnnotationDbi_1.58.0 IRanges_2.30.0
[15] S4Vectors_0.34.0 Biobase_2.56.0 ROntoTools_2.24.0 Rgraphviz_2.40.0 KEGGgraph_1.56.0 KEGGREST_1.36.2 boot_1.3-28
[22] graph_1.74.0 BiocGenerics_0.42.0
loaded via a namespace (and not attached):
[1] httr_1.4.3 pkgload_1.2.4 bit64_4.0.5 jsonlite_1.8.0 splines_4.2.0 here_1.0.1
[7] brio_1.1.3 assertthat_0.2.1 statmod_1.4.36 blob_1.2.3 GenomeInfoDbData_1.2.8 pillar_1.7.0
[13] RSQLite_2.2.14 lattice_0.20-45 glue_1.6.2 RColorBrewer_1.1-3 XVector_0.36.0 colorspace_2.0-3
[19] Matrix_1.4-1 XML_3.99-0.10 pkgconfig_2.0.3 zlibbioc_1.42.0 purrr_0.3.4 scales_1.2.0
[25] generics_0.1.2 ellipsis_0.3.2 cachem_1.0.6 withr_2.5.0 cli_3.3.0 magrittr_2.0.3
[31] crayon_1.5.1 memoise_2.0.1 fansi_1.0.3 tools_4.2.0 lifecycle_1.0.1 stringr_1.4.0
[37] munsell_0.5.0 locfit_1.5-9.5 zip_2.2.0 Biostrings_2.64.0 compiler_4.2.0 GenomeInfoDb_1.32.2
[43] rlang_1.0.2 RCurl_1.98-1.7 rstudioapi_0.13 bitops_1.0-7 testthat_3.1.4 gtable_0.3.0
[49] curl_4.3.2 DBI_1.1.2 R6_2.5.1 gridExtra_2.3 fastmap_1.1.0 bit_4.0.4
[55] utf8_1.2.2 rprojroot_2.0.3 desc_1.4.1 stringi_1.7.6 parallel_4.2.0 Rcpp_1.0.8.3
[61] vctrs_0.4.1 png_0.1-7 tidyselect_1.1.2
One thing I have observed is that for some, but not all, of the missing pathways, the KEGG website does not provide a link to the KGML file. However for other missing pathways, the KGML file is available. For example:
However: Ribosome / path:mmu00020 -- https://www.kegg.jp/pathway/mmu03010 does provide a KGML Download link
On the other hand Biosynthesis of Amino Acids which is also missing, does not offer a KGML download link
Biosynthesis of Amino Acids -- https://www.kegg.jp/pathway/mmu01230
On the third hand, I can access the KGML for Biosynthesis of Amino Acids using KEGGgraph
Another update:
Even though I can download the files for the missing pathways, they don't seem to load correctly
For the ribosome this makes sense, as the "Pathway" does not appear to have edges:
For Biosynthesis of amino acids, it seems like it would make sense to have edges when "non-genes", i.e metabolites, are included.