"graphite" Biocarta 'native' graphs different from Biocarta web site?
1
0
Entering edit mode
@hamid-bolouri-4258
Last seen 4.7 years ago
United States
Graphite's native Biocarta pathways seem to have a different node list than that given by the Biocarta "PROTEIN LIST" link on Biocarta pathway pages (presumably what the pathway authors consider the 'true' pathway membership). There seem to be 2 categories of difference: (1) Some genes listed by Biocarta are absent from graphite's version (see ??? marks in the example below). (2) Because the native format nodes are annotated variously, it's necessary to do a node conversion. In particular, Biocarta's "PROTEIN LIST" gives _specific_ members of enzyme families, whereas graphite seems to replace EC numbers with all family members. However, I have trouble explaining how some enzymes are on/off the list (see --- marks in the example below). Am I misinterpreting things? If not, is there any way to get pathway graphs with node lists more closely matching what Biocarta lists online? Thanks, Hamid Bolouri -- http://labs.fhcrc.org/bolouri Example: > biocarta[["epo signaling pathway"]] "epo signaling pathway" pathway from BioCarta Number of nodes = 10 Number of edges = 24 Type of identifiers = native Retrieved on = 2011-05-12 > nodes(biocarta[["epo signaling pathway"]]) [1] "EntrezGene:2056" "EntrezGene:2057" [3] "EntrezGene:2885" "EntrezGene:3265" [5] "EntrezGene:6464" "EntrezGene:6654" [7] "EnzymeConsortium:2.7.1.112" "EnzymeConsortium:3.1.3.48" [9] "EnzymeConsortium:3.1.4.11" "STAT5" > PE <- convertIdentifiers(biocarta[["epo signaling pathway"]],type="entrez") > nodes(PE) [1] "2056" "2057" "2885" "3265" "6464" "6654" "52" "993" [9] "994" "995" "1843" "1844" "1845" "1846" "1847" "1848" [17] "1849" "1850" "1852" "5770" "5777" "5778" "5781" "5787" [25] "5788" "5792" "5795" "5797" "5798" "5799" "5801" "5803" [33] "8555" "8556" "11072" "11221" "56940" "80824" "84867" "5330" [41] "5331" "5332" "5333" "5335" "5336" "23236" "84812" "113026" > PS <- convertIdentifiers(biocarta[["epo signaling pathway"]],type="symbol") > nodes(PS) [1] "EPO" "EPOR" "GRB2" "HRAS" "SHC1" "SOS1" "ACP1" "CDC25A" [9] "CDC25B" "CDC25C" "DUSP1" "DUSP2" "DUSP3" "DUSP4" "DUSP5" "DUSP6" [17] "DUSP7" "DUSP8" "DUSP9" "PTPN1" "PTPN6" "PTPN7" "PTPN11" "PTPRB" [25] "PTPRC" "PTPRF" "PTPRJ" "PTPRM" "PTPRN" "PTPRN2" "PTPRR" "PTPRZ1" [33] "CDC14B" "CDC14A" "DUSP14" "DUSP10" "DUSP22" "DUSP16" "PTPN5" "PLCB2" [41] "PLCB3" "PLCB4" "PLCD1" "PLCG1" "PLCG2" "PLCB1" "PLCD4" "PLCD3" Compare the above with what I get from: http://www.biocarta.com/pathfiles/PathwayProteinList.asp?showPFID=69 <nb the="" header="" is="" mine="" &="" i="" reordered="" the="" table="" to="" group="" similar="" cases=""> <genedescription entrezid="" ***="=HBcomment"> erythropoietin 2056 *** erythropoietin receptor 2057 *** growth factor receptor-bound protein 2 2885 *** son of sevenless homolog 1 (Drosophila) 6654 *** v-Ha-ras Harvey rat sarcoma viral oncogene homolog 3265 *** signal transducer and activator of transcription 5A 6776 *** signal transducer and activator of transcription 5B 6777 *** SHC (Src homology 2 domain containing) transforming protein 1 6464 *** v-fos FBJ murine osteosarcoma viral oncogene homolog 2353 ??? v-raf-1 murine leukemia viral oncogene homolog 1 5894 ??? ELK1, member of ETS oncogene family 2002 ??? jun oncogene 3725 ??? casein kinase 2, alpha 1 polypeptide 1457 ??? Janus kinase 2 (a protein tyrosine kinase) 3717 ??? mitogen-activated protein kinase 3 5595 --- mitogen-activated protein kinase 8 5599 --- mitogen-activated protein kinase kinase 1 5604 --- phospholipase C, gamma 1 5335 ok protein tyrosine phosphatase, non-receptor type 6 5777 ok HBcomment: ***== in graphite, ???==missing from graphite, ---==specific enzymes in Biocarta are mapped to large (& urnrelated?) families in graphite ### > sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] graphite_1.2.0 AnnotationDbi_1.18.1 Biobase_2.16.0 [4] BiocGenerics_0.2.0 RSQLite_0.11.1 DBI_0.2-5 [7] graph_1.34.0 loaded via a namespace (and not attached): [1] IRanges_1.14.3 org.Hs.eg.db_2.7.1 stats4_2.15.0 tools_2.15.0 ###
Transcription Pathways Leukemia graphite Transcription Pathways Leukemia graphite • 2.8k views
ADD COMMENT
0
Entering edit mode
@hamid-bolouri-4258
Last seen 4.7 years ago
United States
hello; Can anyone tell me how to use DEGraph with the pathways in NCIGraphData? The DEGraph Demo: >data("Loi2008_DEGraphVignette", package="DEGraph") >classData <- classLoi2008 >exprData <- exprLoi2008 >annData <- annLoi2008 >grList <- grListKEGG >res <- testOneGraph(grList[[1]],exprData,classData,verbose=T,prop=0.2) works fine for me. But replacing grList with NCI.cyList from NCIGraph: >library(NCIgraphData) >data("NCI-cyList") > NCI.cyList[[1]] A graphNEL graph with directed edges Number of Nodes = 35 Number of Edges = 40 I get this error: >res <- testOneGraph(NCI.cyList[[1]],exprData,classData,verbose=T,prop=0.2) Keeping genes in the graph *and* the expression data set... 35 genes of the graph were not found in the expression data set: chr [1:35] "6749854621221256793-pid_m_25632-674985462-829166685-pid_m_100726" ... 227 genes of the expression data set are absent from the graph: chr [1:227] "31" "32" "207" "208" "355" "356" "369" "572" ... Error: all.equal(dataGN, graphGN) is not TRUE Keeping genes in the graph *and* the expression data set...done I get the same error with 'reactome.cyList' graphs and with graphs generated by 'parseNCInetwork'. Thanks Hamid Bolouri > sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] NCIgraphData_0.99.4 DEGraph_1.8.0 R.utils_1.12.1 [4] R.oo_1.9.3 R.methodsS3_1.2.2 loaded via a namespace (and not attached): [1] BiocGenerics_0.2.0 graph_1.34.0 grid_2.15.0 KEGGgraph_1.12.0 [5] lattice_0.20-6 mvtnorm_0.9-9992 NCIgraph_1.4.0 RBGL_1.32.0 [9] RCurl_1.91-1.1 RCytoscape_1.6.3 Rgraphviz_1.34.1 rrcov_1.3-01 [13] stats4_2.15.0 tools_2.15.0 XML_3.9-4.1 XMLRPC_0.2-4
ADD COMMENT
0
Entering edit mode
I'm CC'ing the maintainer of DEGraph... Dan On Thu, Jun 7, 2012 at 5:43 PM, Hamid Bolouri <hbolouri at="" fhcrc.org=""> wrote: > hello; > > Can anyone tell me how to use DEGraph with the pathways in NCIGraphData? > > The DEGraph Demo: > >>data("Loi2008_DEGraphVignette", package="DEGraph") >>classData <- classLoi2008 >>exprData <- exprLoi2008 >>annData <- annLoi2008 >>grList <- grListKEGG >>res <- testOneGraph(grList[[1]],exprData,classData,verbose=T,prop=0.2) > > works fine for me. But replacing grList with NCI.cyList from NCIGraph: > >>library(NCIgraphData) >>data("NCI-cyList") >> NCI.cyList[[1]] > A graphNEL graph with directed edges > Number of Nodes = 35 > Number of Edges = 40 > > I get this error: > >>res <- testOneGraph(NCI.cyList[[1]],exprData,classData,verbose=T,prop=0.2) > Keeping genes in the graph *and* the expression data set... > ?35 genes of the graph were not found in the expression data set: > ?chr [1:35] "6749854621221256793-pid_m_25632-674985462-829166685-pid_m_100726" ... > ?227 genes of the expression data set are absent from the graph: > ?chr [1:227] "31" "32" "207" "208" "355" "356" "369" "572" ... > Error: all.equal(dataGN, graphGN) is not TRUE > Keeping genes in the graph *and* the expression data set...done > > I get the same error with 'reactome.cyList' graphs and with graphs generated by 'parseNCInetwork'. > > Thanks > > Hamid Bolouri > >> sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 ?LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] NCIgraphData_0.99.4 DEGraph_1.8.0 ? ? ? R.utils_1.12.1 > [4] R.oo_1.9.3 ? ? ? ? ?R.methodsS3_1.2.2 > > loaded via a namespace (and not attached): > ?[1] BiocGenerics_0.2.0 graph_1.34.0 ? ? ? grid_2.15.0 ? ? ? ?KEGGgraph_1.12.0 > ?[5] lattice_0.20-6 ? ? mvtnorm_0.9-9992 ? NCIgraph_1.4.0 ? ? RBGL_1.32.0 > ?[9] RCurl_1.91-1.1 ? ? RCytoscape_1.6.3 ? Rgraphviz_1.34.1 ? rrcov_1.3-01 > [13] stats4_2.15.0 ? ? ?tools_2.15.0 ? ? ? XML_3.9-4.1 ? ? ? ?XMLRPC_0.2-4 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Hamid, 2012/6/7 Hamid Bolouri <hbolouri at="" fhcrc.org="">: >>library(NCIgraphData) >>data("NCI-cyList") >> NCI.cyList[[1]] > A graphNEL graph with directed edges > Number of Nodes = 35 > Number of Edges = 40 The graphs NCI-cyList cannot directly be used with DEGraph: they are raw representations of the NCI biopax files. In particular, the nodes of the graph do not correspond to genes: > library(graph) > nodes(NCI.cyList[[1]]) [1] "6749854621221256793-pid_m_25632-674985462-829166685-pid_m_100726" [2] "6749854621221256792-pid_m_25631-674985462-829166685-pid_m_100726" [3] "674985462-829169511-pid_m_100441-674985462-829143405-pid_m_101095" [4] "674985462-829168394-pid_m_100592-674985462-829143405-pid_m_101095" [5] "674985462-829169605-pid_m_100410-674985462-829143405-pid_m_101095" [...] which is why testOneGraph doesn't manage to associate the graph nodes with exprData. The package NCIgraph converts these raw graphs to gene graphs that can be used with the DEGraph package: library('NCIgraph') grList <- getNCIPathways(cyList=NCI.cyList, verbose=verbose)$pList Now on my computer testOneGraph fails on NCIgraph objects which have zero or one gene in exprData. I couldn't figure why this is the case now and wasn't the case when I wrote the package. It will be fixed in the next release, in the meantime if you want to test all the pathways in grList you can check whether length(intersect(translateNCI2GeneID(gr), rownames(exprData))) > 1, if yes call testOneGraph, if not return NULL. For example in the Loi2008 demo, replace if(min(length(nodes(gr)),length(gr at edgeData@data))>0) by if(min(length(nodes(gr)),length(gr at edgeData@data))>0 && length(intersect(translateNCI2GeneID(gr), rownames(exprData))) > 1) Note that only 11 networks out of the 460 in grList will have strictly more than one gene in common with the exprData of Loi2008. Best, Laurent -- Laurent Jacob Department of Statistics UC Berkeley http://cbio.ensmp.fr/~ljacob
ADD REPLY
0
Entering edit mode
It works! Thanks very much indeed Laurent. Best wishes; Hamid ----- Original Message ----- From: "laurent jacob" <laurent.jacob@gmail.com> To: "Hamid Bolouri" <hbolouri at="" fhcrc.org=""> Cc: bioconductor at r-project.org Sent: Friday, June 8, 2012 1:30:35 PM Subject: Re: [BioC] DEGraph graph format? Hi Hamid, 2012/6/7 Hamid Bolouri <hbolouri at="" fhcrc.org="">: >>library(NCIgraphData) >>data("NCI-cyList") >> NCI.cyList[[1]] > A graphNEL graph with directed edges > Number of Nodes = 35 > Number of Edges = 40 The graphs NCI-cyList cannot directly be used with DEGraph: they are raw representations of the NCI biopax files. In particular, the nodes of the graph do not correspond to genes: > library(graph) > nodes(NCI.cyList[[1]]) [1] "6749854621221256793-pid_m_25632-674985462-829166685-pid_m_100726" [2] "6749854621221256792-pid_m_25631-674985462-829166685-pid_m_100726" [3] "674985462-829169511-pid_m_100441-674985462-829143405-pid_m_101095" [4] "674985462-829168394-pid_m_100592-674985462-829143405-pid_m_101095" [5] "674985462-829169605-pid_m_100410-674985462-829143405-pid_m_101095" [...] which is why testOneGraph doesn't manage to associate the graph nodes with exprData. The package NCIgraph converts these raw graphs to gene graphs that can be used with the DEGraph package: library('NCIgraph') grList <- getNCIPathways(cyList=NCI.cyList, verbose=verbose)$pList Now on my computer testOneGraph fails on NCIgraph objects which have zero or one gene in exprData. I couldn't figure why this is the case now and wasn't the case when I wrote the package. It will be fixed in the next release, in the meantime if you want to test all the pathways in grList you can check whether length(intersect(translateNCI2GeneID(gr), rownames(exprData))) > 1, if yes call testOneGraph, if not return NULL. For example in the Loi2008 demo, replace if(min(length(nodes(gr)),length(gr at edgeData@data))>0) by if(min(length(nodes(gr)),length(gr at edgeData@data))>0 && length(intersect(translateNCI2GeneID(gr), rownames(exprData))) > 1) Note that only 11 networks out of the 460 in grList will have strictly more than one gene in common with the exprData of Loi2008. Best, Laurent -- Laurent Jacob Department of Statistics UC Berkeley http://cbio.ensmp.fr/~ljacob -- http://labs.fhcrc.org/bolouri
ADD REPLY
0
Entering edit mode
2012/6/8 Hamid Bolouri <hbolouri at="" fhcrc.org="">: > It works! > > Thanks very much indeed Laurent. Great, I'm glad this helped. Let me know if you encounter other problems. Best, Laurent -- Laurent Jacob Department of Statistics UC Berkeley http://cbio.ensmp.fr/~ljacob
ADD REPLY
0
Entering edit mode
Hello, On Fri, Jun 8, 2012 at 4:30 PM, laurent jacob <laurent.jacob at="" gmail.com=""> wrote: > Hi Hamid, > > 2012/6/7 Hamid Bolouri <hbolouri at="" fhcrc.org="">: > >>>library(NCIgraphData) >>>data("NCI-cyList") >>> NCI.cyList[[1]] >> A graphNEL graph with directed edges >> Number of Nodes = 35 >> Number of Edges = 40 > > The graphs NCI-cyList cannot directly be used with DEGraph: they are > raw representations of the NCI biopax files. In particular, the nodes > of the graph do not correspond to genes: Does this package use a BioPAX parser that could be used for other BioPAX data? Does it read level 2 or level 3 or both? I'm looking for a BioPAX parser and would be willing to help build one if none exists. Thanks! Take care Oliver -- Oliver Ruebenacker Bioinformatics Consultant (http://www.knowomics.com/wiki/Oliver_Ruebenacker) Knowomics, The Bioinformatics Network (http://www.knowomics.com) SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)
ADD REPLY
0
Entering edit mode
Hi Oliver, 2012/6/9 Oliver Ruebenacker <curoli at="" gmail.com="">: > ? ? Hello, > > ?Does this package use a BioPAX parser that could be used for other > BioPAX data? Does it read level 2 or level 3 or both? > > ?I'm looking for a BioPAX parser and would be willing to help build > one if none exists. I think Rredland (http://bioconductor.org/packages/2.4/bioc/html/Rredland.html) parses biopax in R. I didn't use it for this package because after that I also needed to convert the read structure to graphNEL objects, which was not straightforward. I read the biopax files in Cytoscape, then used RCytoscape (http://bioconductor.org/packages/release/bioc/html/RCytoscape.html) to read the networks built by Cytoscape. Best, Laurent -- Laurent Jacob Department of Statistics UC Berkeley http://cbio.ensmp.fr/~ljacob
ADD REPLY
0
Entering edit mode
Hello Laurent, Thanks for the response. To my knowledge, Rredland is not maintained any more. Since I am very familiar with Java (and OpenRDF Sesame), I was thinking of using RJava to drive Sesame. Can you explain some more why you choose not to use Rredland? It seems almost certainly relevant for the design of a BioPAX package. Did the issue have to do with separating the actual reaction network from other types of information? Thanks! Take care Oliver On Sun, Jun 10, 2012 at 1:44 AM, laurent jacob <laurent.jacob at="" gmail.com=""> wrote: > Hi Oliver, > > 2012/6/9 Oliver Ruebenacker <curoli at="" gmail.com="">: >> ? ? Hello, >> >> ?Does this package use a BioPAX parser that could be used for other >> BioPAX data? Does it read level 2 or level 3 or both? >> >> ?I'm looking for a BioPAX parser and would be willing to help build >> one if none exists. > > I think Rredland > (http://bioconductor.org/packages/2.4/bioc/html/Rredland.html) parses > biopax in R. > > I didn't use it for this package because after that I also needed to > convert the read structure to graphNEL objects, which was not > straightforward. I read the biopax files in Cytoscape, then used > RCytoscape (http://bioconductor.org/packages/release/bioc/html/RCytoscape.html) > to read the networks built by Cytoscape. > > Best, > > Laurent > > -- > Laurent Jacob > Department of Statistics > UC Berkeley > http://cbio.ensmp.fr/~ljacob -- Oliver Ruebenacker Bioinformatics Consultant (http://www.knowomics.com/wiki/Oliver_Ruebenacker) Knowomics, The Bioinformatics Network (http://www.knowomics.com) SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)
ADD REPLY
0
Entering edit mode
Hi Oliver, 2012/6/10 Oliver Ruebenacker <curoli at="" gmail.com="">: > ?Can you explain some more why you choose not to use Rredland? It > seems almost certainly relevant for the design of a BioPAX package. > Did the issue have to do with separating the actual reaction network > from other types of information? If I remember well, I wasn't sure how to reconstruct the network structure form Rredland output, in particular how to recover the edges from the parsed BioPAX statements. It's not that the parsing done by Rredland was problematic, it's more that additional work (which seemed non-trivial to me at the time) was required to convert the output to graph objects. Are you planning to develop a bioconductor package or an independent Java parser? If you plan on using Java, you may want to look at what the mskcc people did for their Cytoscape plugin, which I used for my own package: http://cbio.mskcc.org/cytoscape/plugins/biopax/ Best, Laurent -- Laurent Jacob Department of Statistics UC Berkeley http://cbio.ensmp.fr/~ljacob
ADD REPLY
0
Entering edit mode
Hello Laurent, On Sun, Jun 10, 2012 at 4:21 PM, laurent jacob <laurent.jacob at="" gmail.com=""> wrote: > Hi Oliver, > > 2012/6/10 Oliver Ruebenacker <curoli at="" gmail.com="">: > >> ?Can you explain some more why you choose not to use Rredland? It >> seems almost certainly relevant for the design of a BioPAX package. >> Did the issue have to do with separating the actual reaction network >> from other types of information? > > If I remember well, I wasn't sure how to reconstruct the network > structure form Rredland output, in particular how to recover the edges > from the parsed BioPAX statements. It's not that the parsing done by > Rredland was problematic, it's more that additional work (which seemed > non-trivial to me at the time) was required to convert the output to > graph objects. What kind of graph you are constructing? Is it a bi-partite graph where every physical entity is a node and every reaction is a node, and you connect every reaction with its reactants, catalysts and products? In BioPAX Level 2, getting that graph was quite tricky, but Level 3 is much easier (although catalysts ad modulators are still a bit awkward). > Are you planning to develop a bioconductor package or an independent > Java parser? If you plan on using Java, you may want to look at what > the mskcc people did for their Cytoscape plugin, which I used for my > own package: http://cbio.mskcc.org/cytoscape/plugins/biopax/ I'd love to submit to Bioconductor, if that is not too difficult. The Cytoscape plugin is based on PaxTools, the official BioPAX Java library. The reason I'm not using PaxTools is that I'm combining BioPAX data with other data, such as SBPAX, which is not yet part of BioPAX (although hopefully will be soon), and is therefore not (yet) supported by PaxTools. Take care Oliver -- Oliver Ruebenacker Bioinformatics Consultant (http://www.knowomics.com/wiki/Oliver_Ruebenacker) Knowomics, The Bioinformatics Network (http://www.knowomics.com) SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)
ADD REPLY
0
Entering edit mode
Hi Oliver, 2012/6/10 Oliver Ruebenacker <curoli at="" gmail.com="">: > ?What kind of graph you are constructing? Is it a bi-partite graph > where every physical entity is a node and every reaction is a node, > and you connect every reaction with its reactants, catalysts and > products? Ultimately, the graph I construct has nodes corresponding exclusively to genes, with only one node by gene, and edges corresponding to expected correlations between gene expressions. For exemple if the protein encoded by gene A activates a complex which promotes the expression of gene B, I draw a positive edge between gene A and gene B. But as a first step I load the graph that cytoscape builds from the BioPAX files. An exemple of such a graph is given in the vignette http://bioconductor.org/packages/2.8/bioc/html/NCIgraph.html. > In BioPAX Level 2, getting that graph was quite tricky, but > Level 3 is much easier (although catalysts ad modulators are still a > bit awkward). The NCI PID people sent me BioPAX Level 2 data, I don't know if Level 3 is available for all their networks. >> Are you planning to develop a bioconductor package or an independent >> Java parser? If you plan on using Java, you may want to look at what >> the mskcc people did for their Cytoscape plugin, which I used for my >> own package: http://cbio.mskcc.org/cytoscape/plugins/biopax/ > > ?I'd love to submit to Bioconductor, if that is not too difficult. Great, good luck with the development. Best, Laurent -- Laurent Jacob Department of Statistics UC Berkeley http://cbio.ensmp.fr/~ljacob
ADD REPLY

Login before adding your answer.

Traffic: 428 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6