I want to enrich KEGG pathways on my rice genes. I tried "clusterProfiler", but its input is entrezID and my gene ids are RAPIDs (Os02g0617800). I want to keep using the RAPIDs, so I have to write my own enrichment functions. To do this, I need the mapping between RAPIDs and pathways. How to get the mapping from KEGGRSET?
You don't need to convert your IDs. KEGG already provides a mapping between the Japanese rice genome pathways and RAPIDs.
For example, you can use the kegga function in the limma package with species.KEGG="dosa" and it will use RAPIDs directly. To see which RAPIDs KEGG is using, have a look at the gene to pathway annotation:
It seems 50 transcript ids resulting from "getGeneKEGGLinks" are not present int the link you shared. Do you know the source of transcript IDs in "limma"?
It is very useful, but why there is "-01" or "-00" at the end of gene IDs (e.g. Os01t0191700-01)?
You'd have to ask KEGG rather than me. Presumably they are transcript version numbers. It might be fine to remove them.
I see. Thank you very much for reply!
All the gene ids have a "t" (Os01t0118000) inside rather than a "g" (Os05g0532600). Why is that?
The ID with "t" is the transcript ID. The ID with "g" is the locus ID. See for example the gene annotation file you can download from here:
https://rapdb.dna.affrc.go.jp/download/irgsp1.html
I am working on locus IDs, to convert transcript ids to locus ids, should I replace "t" with "g" (based on your link, it seems yes)?
It seems 50 transcript ids resulting from "getGeneKEGGLinks" are not present int the link you shared. Do you know the source of transcript IDs in "limma"?