Hello, I'm a developer of the OmnipathR
package, and I wrote a function to access certain KEGG data:
library(OmnipathR)
csav00900 <- kegg_pathway_download('csav00900', process = FALSE)
csav00900
$entries
# A tibble: 113 × 3
kgml_id kegg_id genesymbol
<chr> <chr> <chr>
1 65 cpd:C05859 C05859
2 66 cpd:C06081 C06081
3 67 path:csav00010 Glycolysis / Gluconeogenesis
4 68 path:csav00900 TITLE:Terpenoid backbone biosynthesis
5 69 cpd:C00022 C00022
6 70 cpd:C00024 C00024
7 71 cpd:C00332 C00332
8 72 cpd:C00118 C00118
9 73 cpd:C11434 C11434
10 74 cpd:C11435 C11435
# 103 more rows
# Use `print(n = ...)` to see more rows
$relations
# A tibble: 88 × 6
source target type effect arrow relation_id
<chr> <chr> <chr> <chr> <chr> <chr>
1 67 85 maplink compound 70 csav00900:1
2 85 86 ECrel compound 71 csav00900:2
3 67 86 maplink compound 70 csav00900:3
4 86 89 ECrel compound 90 csav00900:4
5 89 84 ECrel compound 91 csav00900:5
6 84 115 ECrel compound 92 csav00900:6
7 115 114 ECrel compound 93 csav00900:7
8 78 79 ECrel compound 99 csav00900:8
9 67 78 maplink compound 72 csav00900:9
10 79 80 ECrel compound 73 csav00900:10
# 78 more rows
# Use `print(n = ...)` to see more rows
I also recommend to install OmnipathR
directly from github, because this is a large package where we release updates and bugfixes more often and faster than we manage to publish it here in BioconductoR. Though the BioC 3.19 version also should work fine.
library(remotes)
install_github('saezlab/OmnipathR')
I haven't checked if the data returned is alright and suitable for use, I'm not familiar with plants. I know if process = TRUE
, an empty data frame is returned, probably because we attempt to translate identifiers and apparently for C. sativa our ID translation based on UniProt doesn't work; or maybe because we are not able to correctly translate metabolite IDs? If you see what should be translated here and how, please let me know.
There is also this function:
kegg_info('csav00900')
$id
[1] "csav00900"
$name
[1] "Terpenoid backbone biosynthesis - Cannabis sativa (hemp)"
$desc
[1] "Terpenoids, also known as isoprenoids, are a large class of natural products consisting of isoprene (C5) units. There are two biosynthetic pathways, the mevalonate pathway [MD:M00095] and the non-mevalonate pathway or the MEP/DOXP pathway [MD:M00096], for the terpenoid building blocks: isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP). The action of prenyltransferases then generates higher-order building blocks: geranyl diphosphate (GPP), farsenyl diphosphate (FPP), and geranylgeranyl diphosphate (GGPP), which are the precursors of monoterpenoids (C10), sesquiterpenoids (C15), and diterpenoids (C20), respectively. Condensation of these building blocks gives rise to the precursors of sterols (C30) and carotenoids (C40). The MEP/DOXP pathway is absent in higher animals and fungi, but in green plants the MEP/DOXP and mevalonate pathways co-exist in separate cellular compartments. The MEP/DOXP pathway, operating in the plastids, is responsible for the formation of essential oil monoterpenes and linalyl acetate, some sesquiterpenes, diterpenes, and carotenoids and phytol. The mevalonate pathway, operating in the cytosol, gives rise to triterpenes, sterols, and most sesquiterpenes."
$pubmed
[1] "12777052" "16262699" "9858571" "24375100"
$diseases
NULL
$rel_pathways
[1] "Glycolysis / Gluconeogenesis" "Steroid biosynthesis" "Ubiquinone and other terpenoid-quinone biosynthesis" "Cysteine and methionine metabolism"
[5] "N-Glycan biosynthesis" "Monoterpenoid biosynthesis" "Diterpenoid biosynthesis" "Carotenoid biosynthesis"
[9] "Zeatin biosynthesis" "Sesquiterpenoid and triterpenoid biosynthesis"
$module
NULL
Meanwhile, I realized most of the IDs in the genesymbol
column are KEGG compound IDs, for example C0589. Atm there is no function in OmnipathR
that is able to translate these to other ID, such as PubChem. A simple but slow solution:
library(magrittr)
library(purrr)
library(stringr)
library(rvest)
library(dplyr)
dbget_url <- 'https://www.genome.jp/dbget-bin/www_bget?compound+%s'
compound_kegg2pubchem <- function(kegg_cid) {
kegg_cid %>%
{`if`(
str_detect(., '^C\\d+$'),
sprintf(dbget_url, .) %>%
read_html %>%
html_elements('table.w1') %>%
keep(~str_detect(html_text2(.x), 'PubChem')) %>%
html_element('a') %>%
html_text2 %>%
keep(~nchar(.x) > 0L),
.
)}
}
csav00900 <- kegg_pathway_download('csav00900', process = FALSE)
csav00900$entries %<>% mutate(pubchem = map_chr(genesymbol, compound_kegg2pubchem))
To access KEGG data, the developer of
OmnipathR
proposed an alternative way in the post below this one, but regarding your question onKEGGgraph
: I would download theKGML
file to a 'normal' folder first (and thus not use aTMP
folder/file, nor save it in aR
library system file [location])! Then load that downloaded file, and continue with the things you would like to do.