how to store multiple relationships between nodes in graphNEL?
3
0
Entering edit mode
Paul Shannon ★ 1.1k
@paul-shannon-578
Last seen 10.4 years ago
I'd like to find the best way to record multiple relationships between nodes in a graphNEL object. The data for my graph comes from DIP, the Database of Interacting Proteins, where many protein interactions have several kinds of evidence. In other settings, I represent this as multiple edges, another solution is needed here, since graphNEL is designed for at most one edge between nodes. So I am improvising, packing any number of experimental methods into a token-separated list in a single edge's edgeData. Here is an example of one pair of yeast proteins observed by three different methods: edgeData (g, 'YCR084C', 'YBR112C', attr='edgeType') $`YCR084C|YBR112C` [1] "Immunoprecipitation::Affinity chromatography::Gel filtration chromatography" Could I be doing this a better way? And how could I best store the pubmed id associated with each method? Thank you, - Paul
graph graph • 1.3k views
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.4 years ago
Paul Shannon <pshannon at="" systemsbiology.org=""> writes: > I'd like to find the best way to record multiple relationships > between nodes in a graphNEL object. The data for my graph comes > from DIP, the Database of Interacting Proteins, where many protein > interactions have several kinds of evidence. In other settings, I > represent this as multiple edges, another solution is needed here, > since graphNEL is designed for at most one edge between nodes. One possibility might be a list of graphNEL objects all with the same node set. You could also explore a more structured approach and implement a multiGraph class. > So I am improvising, packing any number of experimental methods into > a token-separated list in a single edge's edgeData. Here is an > example of one pair of yeast proteins observed by three different > methods: > > edgeData (g, 'YCR084C', 'YBR112C', attr='edgeType') > $`YCR084C|YBR112C` > [1] "Immunoprecipitation::Affinity chromatography::Gel > filtration chromatography" I think you can avoid the token-separation game, but maybe I'm missing something. The edge attributes can be any R object, even, say, a character vector with length greater than 1 ;-) So why not have edgeData (g, 'YCR084C', 'YBR112C', attr='edgeType') $`YCR084C|YBR112C` [1] "Immunoprecipitation" "Affinity chromatography" "Gel filtration chromatography" I'm not familiar with this data so I don't know if that makes sense or is what you want. Another option for using the edge attributes might be to use a list (or even an S4 class) with named components -- but here it isn't clear whether simply using additional edge attributes might be better. For example, you could store a logical value for each edge type: define edge attributes: type1, type2, type3 for each edge, the value of edge attributes type1-3 is TRUE or FALSE depending on whether this edge is of that type. > Could I be doing this a better way? And how could I best store the > pubmed id associated with each method? Not sure I'm following you, what does "each method" refer to here? But perhaps the above gets you going? + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
ADD COMMENT
0
Entering edit mode
Hi Seth, Thanks for a bunch of good suggestions. On your specific question, Me : >> And how could I best store the pubmed id associated with each method? You: > Not sure I'm following you, what does "each method" refer to here? > Yep, 'method' is ambiguous. Sorry. I meant to say: 1) Not infrequently a protein-protein interaction is identified by multiple experimental methods which give it extra credibility. 2) A Pubmed ID is often included in archived interaction data, providing a reference (a paper) for each underlying experimental method. In cytoscape, we often make these clickable links, so you can examine the abstract, and decide whether you want to include the interaction in your network. I general, I am trying to figure out the best (and quickest) way to adapt graphNEL to store possibly many relationships between two nodes in a graph, wherein each relationship may have several attributes to describe it. For each experimentally derived relationship, there may be a pubmed ID, a confidence score, some indication of the scale of the experiment, etc. I needed (and you suggested) a quick solution for now. As I understand it, you suggest that two more thorough-going strategies may be worth considering: use a list of S4 objects, or create a new multiGraph class. When I have a little time, I will look into these. By the way, the graph package is a delight, as is the way it meshes with RBGL. For example, I just figured out how easy it is to find putative relationships between previously unconnected nodes, using a reference graph (in my case, DIP), the graph package, and RBGL: subGraph(sp.between (g, node1, node2)[[1]]$path, g) That's lovely! - Paul > I'd like to find the best way to record multiple relationships > between nodes in a graphNEL object. The data for my graph comes > from DIP, the Database of Interacting Proteins, where many protein > interactions have several kinds of evidence. In other settings, I > represent this as multiple edges, another solution is needed here, > since graphNEL is designed for at most one edge between nodes. > One possibility might be a list of graphNEL objects all with the same node set. You could also explore a more structured approach and implement a multiGraph class. > So I am improvising, packing any number of experimental methods into > a token-separated list in a single edge's edgeData. Here is an > example of one pair of yeast proteins observed by three different > methods: > > > edgeData (g, 'YCR084C', 'YBR112C', attr='edgeType') > $`YCR084C|YBR112C` > [1] "Immunoprecipitation::Affinity chromatography::Gel > filtration chromatography" > I think you can avoid the token-separation game, but maybe I'm missing something. The edge attributes can be any R object, even, say, a character vector with length greater than 1 ;-) So why not have edgeData (g, 'YCR084C', 'YBR112C', attr='edgeType') $`YCR084C|YBR112C` [1] "Immunoprecipitation" "Affinity chromatography" "Gel filtration chromatography" I'm not familiar with this data so I don't know if that makes sense or is what you want. Another option for using the edge attributes might be to use a list (or even an S4 class) with named components -- but here it isn't clear whether simply using additional edge attributes might be better. For example, you could store a logical value for each edge type: define edge attributes: type1, type2, type3 for each edge, the value of edge attributes type1-3 is TRUE or FALSE depending on whether this edge is of that type.
ADD REPLY
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.4 years ago
Paul Shannon <pshannon at="" systemsbiology.org=""> writes: > Yep, 'method' is ambiguous. Sorry. I meant to say: > > 1) Not infrequently a protein-protein interaction is identified > by multiple experimental methods which give it extra credibility. > > 2) A Pubmed ID is often included in archived interaction data, > providing a reference > (a paper) for each underlying experimental method. In > cytoscape, we often make these clickable > links, so you can examine the abstract, and decide whether you > want to include the interaction in your network. So to expand on my suggested workaround: Define an edge attribute for each experimental method: m1, m2, m3 Have the default for each be: list(exists=FALSE, PMIDS=character(0), score=as.numeric(NA)) Assume that if an edge is present that means one of the attributes has its exists component set to TRUE, otherwise the edge would not be there. > be worth considering: use a list of S4 objects, or create a new > multiGraph class. When I have a little time, I will look into > these. It seems to me that what you really want is a nice way to represent multigraphs (one node set with a set of edge sets). > By the way, the graph package is a delight, as is the way it meshes > with RBGL. > > For example, I just figured out how easy it is to find putative > relationships between > previously unconnected nodes, using a reference graph (in my case, > DIP), the graph package, > and RBGL: > > subGraph(sp.between (g, node1, node2)[[1]]$path, g) > > That's lovely! Excellent :-) Glad to hear that. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
ADD COMMENT
0
Entering edit mode
Hi Seth, > It seems to me that what you really want is a nice way to represent > multigraphs (one node set with a set of edge sets). You're right: a multigraph seems like the best solution. How hard would it be to build such a class so that it is compatible with the graph class and RBGL? If the timing works out, I might be able to help. One small suggestion: it might be nice if 'directed' could (at least optionally) be an attribute on each edge, rather than on the graph as a whole. - Paul
ADD REPLY
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.4 years ago
Paul Shannon <pshannon at="" systemsbiology.org=""> writes: > Hi Seth, > >> It seems to me that what you really want is a nice way to represent >> multigraphs (one node set with a set of edge sets). > > You're right: a multigraph seems like the best solution. How hard > would it be > to build such a class so that it is compatible with the graph class > and RBGL? Yes, the goal would be a class that integrates with methods in the graph and RBGL packages. As for how hard it will be, I'm not sure. > If the timing works out, I might be able to help. That would be great. I don't expect to be able to take a closer look at this before the BioC2007 conference, but things could change. > One small suggestion: it might be nice if 'directed' could (at least > optionally) > be an attribute on each edge, rather than on the graph as a whole. There are algorithms that work on undirected graphs and algorithms that work on directed graphs. Introducing mixed graphs would complicate matters. What would be nice about such mixed graphs? + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
ADD COMMENT

Login before adding your answer.

Traffic: 720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6