KEGGREST mapping between genes and pathways
1
0
Entering edit mode
@zhangjianhai-12955
Last seen 4.9 years ago

Hello,

I want to enrich KEGG pathways on my rice genes. I tried "clusterProfiler", but its input is entrezID and my gene ids are RAPIDs (Os02g0617800). I want to keep using the RAPIDs, so I have to write my own enrichment functions. To do this, I need the mapping between RAPIDs and pathways. How to get the mapping from KEGGRSET?

Regards.

keggrest mapping genes to pathways • 2.3k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia

You don't need to convert your IDs. KEGG already provides a mapping between the Japanese rice genome pathways and RAPIDs.

For example, you can use the kegga function in the limma package with species.KEGG="dosa" and it will use RAPIDs directly. To see which RAPIDs KEGG is using, have a look at the gene to pathway annotation:

> library(limma)
> GK <- getGeneKEGGLinks(species.KEGG="dosa")
> head(GK)
           GeneID      PathwayID
1 Os01t0118000-01 path:dosa00010
2 Os01t0147900-01 path:dosa00010
3 Os01t0160100-01 path:dosa00010
4 Os01t0190400-01 path:dosa00010
5 Os01t0191700-01 path:dosa00010
6 Os01t0276700-01 path:dosa00010

Alternatively, to get NCBI Entrez Gene IDs instead:

> GK.Entrez <- getGeneKEGGLinks(species.KEGG="osa")
> head(GK.Entrez)
     GeneID     PathwayID
1 107275630 path:osa00010
2 107277365 path:osa00010
3   4324066 path:osa00010
4   4324263 path:osa00010
5   4324666 path:osa00010
6   4325027 path:osa00010
ADD COMMENT
0
Entering edit mode

It is very useful, but why there is "-01" or "-00" at the end of gene IDs (e.g. Os01t0191700-01)?

ADD REPLY
0
Entering edit mode

You'd have to ask KEGG rather than me. Presumably they are transcript version numbers. It might be fine to remove them.

ADD REPLY
0
Entering edit mode

I see. Thank you very much for reply!

ADD REPLY
0
Entering edit mode

All the gene ids have a "t" (Os01t0118000) inside rather than a "g" (Os05g0532600). Why is that?

ADD REPLY
0
Entering edit mode

The ID with "t" is the transcript ID. The ID with "g" is the locus ID. See for example the gene annotation file you can download from here:

https://rapdb.dna.affrc.go.jp/download/irgsp1.html

ADD REPLY
0
Entering edit mode

I am working on locus IDs, to convert transcript ids to locus ids, should I replace "t" with "g" (based on your link, it seems yes)?

ADD REPLY
0
Entering edit mode

It seems 50 transcript ids resulting from "getGeneKEGGLinks" are not present int the link you shared. Do you know the source of transcript IDs in "limma"?

ADD REPLY

Login before adding your answer.

Traffic: 925 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6