Many NAs using TAIR IDs for plotting KEGG pathways using pathview
1
0
Entering edit mode
Arctic • 0
@arctic-22506
Last seen 22 months ago
United States

Dear all,

Fairly new to using pathview package (v1.34.00) and plotting KEGG pathways. I get a high fraction ( ~30%-50%) of genes plotted as NA when plotting KEGG pathways using TAIR or ENTREZ IDs. However when I check the KEGG page for the genes it appears that they have comparable TAIR IDs. For instance:

KEGG "Phtosynthesis" pathway [ath00195]:

1. Plotting gene psbA using KEGG ID:

library(pathview)

ath00195 <- pathview(gene.data = c("ArthCp002"), pathway.id = "ath00195", species = "ath", gene.idtype = "KEGG", na.col = "purple" )

Returns plot with psbA in red

2. Plot gene psbA using TAIR ID

ath00195 <- pathview(gene.data = c("ATCG00020"), pathway.id = "ath00195", species = "ath", gene.idtype = "TAIR", na.col = "purple" )

Returns error: "Error in mol.sum(gene.data, gene.idmap) : no ID can be mapped!"

3. KEGG page for psbA appears to list ATCG00020 as its TAIR ID:

https://www.genome.jp/dbget-bin/www_bget?ath:ArthCp002

Many thanks in advance for your reply and help,

KEGG TAIR_IDs KEGGdzPathwaysGEO pathview • 1.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 12 hours ago
United States

If you provide pathview with a TAIR ID, it will convert to NCBI Gene IDs, which are the main IDs used by KEGG. Unfortunately there isn't a mapping for that ID

> select(org.At.tair.db, "ATCG00020", "ENTREZID", "TAIR")
'select()' returned 1:1 mapping between keys and columns
       TAIR ENTREZID
1 ATCG00020     <NA>

The page for that gene on arabidopsis.org doesn't appear to provide an NCBI Gene ID, and searching at NCBI returns nothing as well, so it appears not to have an NCBI Gene ID.

ADD COMMENT
0
Entering edit mode

Hello James thank you for your reply and apologies for the delayed reply on my behalf. I can follow your explanation that in this example the key conversion fails. But would not the NCBI gene ID listed in KEGG page (Ex. here 844802) be the ID we are looking for? In other words is this a dictionary update issue or there are other factors in play? Does not KEGG provide dictionaries that can be used for this conversion? Thanks again,

ADD REPLY
1
Entering edit mode

You can get the mapping from KEGG, and perhaps that's how pathview should do it. But for now it uses the org.At.tair.db package, which is built using data we can download from arabidopsis.org. And if you go to arabidopsis.org and search on that ID, there doesn't appear to be an NCBI Gene ID listed. It may be that KEGG maps the TAIR ID to UniProt and then to NCBI Gene ID, but that is way more complicated than we have the bandwidth to attempt. As it stands, generating the annotation packages the way we do right now is somewhere around 80 hours of work, and it's hard to come by the FTE to do that right before each release, which is a busy time already.

ADD REPLY

Login before adding your answer.

Traffic: 646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6