Hi
I am trying to use KEGGgraph to create an adjacency matrix of human KEGG pathways for my random walk model. I have performed the KEGG enrichment analyses and obtained a list of pathways (ex. hsa04612.xml). I then tried to use parseKGML2Graph to parse the KGML file into graph object. My question is about setting one of the parameters: expandGenes.
According to the User manual, if expandGenes = TRUE, the function will output a list of nodes that have unique KEGGID. These nodes also contained homologs of the gene product/proteins that are in the pathway. See example below:
> q = parseKGML2DataFrame('hsa04612.xml',expandGenes=T)
> q
from to type subtype
1 hsa:972 hsa:972 PPrel state change
2 hsa:3108 hsa:3108 PPrel state change
3 hsa:3108 hsa:3109 PPrel state change
4 hsa:3108 hsa:3111 PPrel state change
5 hsa:3108 hsa:3112 PPrel state change
6 hsa:3108 hsa:3113 PPrel state change
> g <- parseKGML2Graph('hsa04612.xml',expandGenes=T)
> nodes(g)
[1] "hsa:972" "hsa:3108" "hsa:3109"
[4] "hsa:3111" "hsa:3112" "hsa:3113"
[7] "hsa:3115" "hsa:3117" "hsa:3118"
[10] "hsa:3119" "hsa:3122" "hsa:3123"
If the expandGenes = FALSE, the nodes were in numeric (index) . When I look at the nodes data, one ID (hsa) had multiple genes in it.
> q = parseKGML2DataFrame('hsa04612.xml',expandGenes=F)
> q
from to type subtype
1 18 17 PPrel state change
2 19 17 PPrel activation
3 23 21 PPrel activation
4 23 22 PPrel activation
5 23 20 PPrel activation
6 24 18 PPrel activation
7 25 27 PPrel activation
8 26 27 PPrel state change
9 27 18 PPrel state change
> g <- parseKGML2Graph('hsa04612.xml',expandGenes=F)
> nodes(g)
[1] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "33"
[16] "34" "35" "36" "37" "38" "39" "40" "41" "42" "43" "44" "45" "46" "47" "48"
[31] "49" "50" "51" "52" "53" "54" "55" "56" "57"
#output for one of the nodes
`49`
KEGG Node (Entry '49'):
------------------------------------------------------------
[ displayName ]: TAP1, ABC17, ABCB2, APT1, D6S114E, PSF-1, PSF1, RING4, TAP1*0102N, TAP1N...
[ Name ]: hsa:6890,hsa:6891
[ Type ]: gene
[ Link ]: https://www.kegg.jp/dbget-bin/www_bget?hsa:6890+hsa:6891
Also, in the example, there was a sentence : '## only for expert use'
Is there a reason why expandGenes=FALSE is not suggested?
I am only interested in the genes/proteins that are in the pathway but since the hsa is not one to one any more. How should I proceed to make a adjacency matrix with the unique identifier?
Thank you very much for your help!