Entering edit mode
Dear BioC developers,
I intend to map gene expression data on KEGG pathways.
In more detail, I performed a DE analysis on gene expression data from
a hgu95av2 chip and want to color particular genes in corresponding
pathways.
I found out that the KEGGSOAP package already implemented an awesome
access to the KEGG API and I honestly appreciate the work that have
been done here.
However, the function mark.pathway.by.objects requires KEGG gene ids
or at least KEGG orthology terms, while there is now way to map
hgu95av2 probe IDs on KEGG gene IDs or KO terms (not in hgu95av2.db,
keggorth, KEGG.db, etc.).
I wondered why there are only selected functions of the KEGG API
integrated in the KEGGSOAP package, especially why the "bconv" utility
is not integrated, which allows to map foreign identifiers on KEGG
identifiers.
With "bconv" it would be easy for me to map hgu95av2 probe IDs on
ENSEMBL/UNIGENE/UNIPROT/etc IDs (via hgu95av2.db) and then on KEGG IDs
(via bconv).
In addition, the original mark.pathway.by.objects function from the
KEGG API allows to put in EC numbers which is not supported by the
corresponding KEGGSOAP function.
Could you please explain why there are these limitations and how it
would be possible to extend the KEGGSOAP package to all of the
function of the KEGG API ?
Currently, my workaround is like that:
(1) map the probe IDs onto ENSEMBL IDs (using hgu95av2.db) for the
selected genes
(2) In the meanwhile, I have to retrieve all KEGG entries for the
particular pathway using "get.genes.by.pathway" and "bget" from
KEGGSOAP
(3) Then, I have to parse each of these entries for ENSEMBL ID and KO
ID to create a dictionary ENSEMBL -> KO
(4) I map the IDs from (1) onto KO using (3)
This works but it is uncomfortable and, first of all, time consuming
(because of (3)).
Yours faithfully,
Ludwig Geistlinger
(Research for an ongoing diploma thesis)
(University of Cape Town, Institute of Infectious Diseases)