pathway ID in KEGG.db
1
0
Entering edit mode
Ed ▴ 230
@ed-4683
Last seen 10.2 years ago
Hi there, I found the number of pathways in KEGGPATHID2NAME is 390 while that in KEGGPATHID2EXTID is 3152. Am I missing something? BTW, the pathway id's used in this package seems inconsistent too. Thanks. Ed > ?KEGGPATHID2NAME > xx <- as.list(KEGGPATHID2NAME) > if(length(xx) > 0){ + # get the value for the first key + xx[[1]] + # Get the values for a few keys + if(length(xx) >= 3){ + xx[1:3] + } + } $`00010` [1] "Glycolysis / Gluconeogenesis" $`00020` [1] "Citrate cycle (TCA cycle)" $`00030` [1] "Pentose phosphate pathway" > length(xx) [1] 390 ?KEGGPATHID2EXTID xx <- as.list(KEGGPATHID2EXTID) if(length(xx) > 0){ # Get the value of the first key xx[[1]] # Get the values for multiget for a few keys if(length(xx) >= 3){ xx[1:3] } } > length(xx) [1] 3152 > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 [4] LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] KEGG.db_2.6.1 RSQLite_0.11.1 DBI_0.2-5 [4] AnnotationDbi_1.16.18 Biobase_2.14.0 loaded via a namespace (and not attached): [1] IRanges_1.12.5 tools_2.14.1 [[alternative HTML version deleted]]
Pathways cycle Pathways cycle • 2.0k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.3 years ago
United States
Hi Ed, The confusion is because you have an "apples to oranges" comparison here. There are 390 unique KEGG IDs in the DB. That means that there are only 390 KEGG names for those pathways. However, those pathway IDs are matched up to over 3000 different external gene IDs which is why the second mapping is so much larger. You can learn more about the different mappings by reading their manual pages like this: help("KEGGPATHID2NAME") help("KEGGPATHID2EXTID") Marc > On 07/05/2012 12:48 AM, Ed wrote: > Hi there, > > I found the number of pathways in KEGGPATHID2NAME is 390 while that > in KEGGPATHID2EXTID is 3152. Am I missing something? BTW, the pathway id's > used in this package seems inconsistent too. > > Thanks. > > Ed > >> ?KEGGPATHID2NAME >> xx<- as.list(KEGGPATHID2NAME) >> if(length(xx)> 0){ > + # get the value for the first key > + xx[[1]] > + # Get the values for a few keys > + if(length(xx)>= 3){ > + xx[1:3] > + } > + } > $`00010` > [1] "Glycolysis / Gluconeogenesis" > > $`00020` > [1] "Citrate cycle (TCA cycle)" > > $`00030` > [1] "Pentose phosphate pathway" > >> length(xx) > [1] 390 > > > ?KEGGPATHID2EXTID > xx<- as.list(KEGGPATHID2EXTID) > if(length(xx)> 0){ > # Get the value of the first key > xx[[1]] > # Get the values for multiget for a few keys > if(length(xx)>= 3){ > xx[1:3] > } > } >> length(xx) > [1] 3152 > >> sessionInfo() > R version 2.14.1 (2011-12-22) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 > [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 > [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 > [4] LC_NUMERIC=C > [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] KEGG.db_2.6.1 RSQLite_0.11.1 DBI_0.2-5 > [4] AnnotationDbi_1.16.18 Biobase_2.14.0 > > loaded via a namespace (and not attached): > [1] IRanges_1.12.5 tools_2.14.1 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Marc, KEGGPATHID2EXTID maps KEGG pathway identifiers to Entrez Gene identifiers and the followings are the examples for those identifiers. I guess the over 3000 identifiers are from different organisms instead of they are gene IDs. > xx <- as.list(KEGGPATHID2EXTID) > head(names(xx), 20) [1] "hsa00232" "hsa00983" "hsa01100" "hsa00230" "hsa05340" "hsa04514" [7] "hsa05412" "hsa04010" "hsa04012" "hsa04062" "hsa04150" "hsa04210" [13] "hsa04370" "hsa04380" "hsa04510" "hsa04530" "hsa04620" "hsa04630" [19] "hsa04660" "hsa04662" On Fri, Jul 6, 2012 at 1:07 AM, Marc Carlson <mcarlson@fhcrc.org> wrote: > Hi Ed, > > The confusion is because you have an "apples to oranges" comparison here. > There are 390 unique KEGG IDs in the DB. That means that there are only > 390 KEGG names for those pathways. However, those pathway IDs are matched > up to over 3000 different external gene IDs which is why the second mapping > is so much larger. You can learn more about the different mappings by > reading their manual pages like this: > > help("KEGGPATHID2NAME") > > help("KEGGPATHID2EXTID") > > > Marc > > > > > > > > On 07/05/2012 12:48 AM, Ed wrote: > >> Hi there, >> >> I found the number of pathways in KEGGPATHID2NAME is 390 while that >> in KEGGPATHID2EXTID is 3152. Am I missing something? BTW, the pathway id's >> used in this package seems inconsistent too. >> >> Thanks. >> >> Ed >> >> ?KEGGPATHID2NAME >>> xx<- as.list(KEGGPATHID2NAME) >>> if(length(xx)> 0){ >>> >> + # get the value for the first key >> + xx[[1]] >> + # Get the values for a few keys >> + if(length(xx)>= 3){ >> + xx[1:3] >> + } >> + } >> $`00010` >> [1] "Glycolysis / Gluconeogenesis" >> >> $`00020` >> [1] "Citrate cycle (TCA cycle)" >> >> $`00030` >> [1] "Pentose phosphate pathway" >> >> length(xx) >>> >> [1] 390 >> >> >> ?KEGGPATHID2EXTID >> xx<- as.list(KEGGPATHID2EXTID) >> if(length(xx)> 0){ >> # Get the value of the first key >> xx[[1]] >> # Get the values for multiget for a few keys >> if(length(xx)>= 3){ >> xx[1:3] >> } >> } >> >>> length(xx) >>> >> [1] 3152 >> >> sessionInfo() >>> >> R version 2.14.1 (2011-12-22) >> Platform: x86_64-pc-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 >> [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 >> [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 >> [4] LC_NUMERIC=C >> [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] KEGG.db_2.6.1 RSQLite_0.11.1 DBI_0.2-5 >> [4] AnnotationDbi_1.16.18 Biobase_2.14.0 >> >> loaded via a namespace (and not attached): >> [1] IRanges_1.12.5 tools_2.14.1 >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Ed, I think that the answer you are looking for is that "they are both". KEGGPATHID2EXTID maps "KEGG IDs" to "gene IDs" for a wide range of specific organisms. When KEGG was a new package, the KEGG style organism prefixes were left pre-pended on these IDs to emphasize this detail. So the example below shows some of the gene IDs from human (hsa). But if you look at ALL of the IDs, you will find some that map to other organisms as well (and these have different prefixes). Also, you may notice that not all of the KEGG pathway IDs will necessarily be represented in KEGGPATHID2EXTID. There is of course substantial overlap, but it's worth mentioning that being matched to a gene ID is not a prerequisite for being a KEGG ID. Marc On 07/07/2012 08:40 PM, Ed wrote: > Hi Marc, > > KEGGPATHID2EXTID maps KEGG pathway identifiers to Entrez Gene > identifiers and the followings are the examples for those identifiers. > I guess the over 3000 identifiers are from different organisms instead > of they are gene IDs. > > > xx <- as.list(KEGGPATHID2EXTID) > > head(names(xx), 20) > [1] "hsa00232" "hsa00983" "hsa01100" "hsa00230" "hsa05340" "hsa04514" > [7] "hsa05412" "hsa04010" "hsa04012" "hsa04062" "hsa04150" "hsa04210" > [13] "hsa04370" "hsa04380" "hsa04510" "hsa04530" "hsa04620" "hsa04630" > [19] "hsa04660" "hsa04662" > > > On Fri, Jul 6, 2012 at 1:07 AM, Marc Carlson <mcarlson@fhcrc.org> <mailto:mcarlson@fhcrc.org>> wrote: > > Hi Ed, > > The confusion is because you have an "apples to oranges" > comparison here. There are 390 unique KEGG IDs in the DB. That > means that there are only 390 KEGG names for those pathways. > However, those pathway IDs are matched up to over 3000 different > external gene IDs which is why the second mapping is so much > larger. You can learn more about the different mappings by > reading their manual pages like this: > > help("KEGGPATHID2NAME") > > help("KEGGPATHID2EXTID") > > > Marc > > > > > > > > On 07/05/2012 12:48 AM, Ed wrote: > > Hi there, > > I found the number of pathways in KEGGPATHID2NAME is 390 while > that > in KEGGPATHID2EXTID is 3152. Am I missing something? BTW, the > pathway id's > used in this package seems inconsistent too. > > Thanks. > > Ed > > ?KEGGPATHID2NAME > xx<- as.list(KEGGPATHID2NAME) > if(length(xx)> 0){ > > + # get the value for the first key > + xx[[1]] > + # Get the values for a few keys > + if(length(xx)>= 3){ > + xx[1:3] > + } > + } > $`00010` > [1] "Glycolysis / Gluconeogenesis" > > $`00020` > [1] "Citrate cycle (TCA cycle)" > > $`00030` > [1] "Pentose phosphate pathway" > > length(xx) > > [1] 390 > > > ?KEGGPATHID2EXTID > xx<- as.list(KEGGPATHID2EXTID) > if(length(xx)> 0){ > # Get the value of the first key > xx[[1]] > # Get the values for multiget for a few keys > if(length(xx)>= 3){ > xx[1:3] > } > } > > length(xx) > > [1] 3152 > > sessionInfo() > > R version 2.14.1 (2011-12-22) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 > [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 > [3] LC_MONETARY=Chinese (Simplified)_People's Republic of > China.936 > [4] LC_NUMERIC=C > [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 > > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] KEGG.db_2.6.1 RSQLite_0.11.1 DBI_0.2-5 > [4] AnnotationDbi_1.16.18 Biobase_2.14.0 > > loaded via a namespace (and not attached): > [1] IRanges_1.12.5 tools_2.14.1 > > [[alternative HTML version deleted]] > > ______________________________ _________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/ listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane. > science.biology.informatics. conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > ______________________________ _________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/ listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane. > science.biology.informatics. conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 475 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6