paxtoolsr how to extract Modificationfeature related to a node
2
1
Entering edit mode
Guido Leoni ▴ 200
@guido-leoni-3328
Last seen 10.3 years ago
European Union

Dear All

I built a small network with paxtoolsr

genes <- c("AKT1", "MTOR")
t1 <- graphPc(source=genes)

and I correctly saved a biopax file with the results of my query

saveXML(t1,file="parser.owl")

Looking into my network I have (for example for AKT1) several nodes of the same gene corresponding to the protein with different combinations of ModificationFeatures. Each modificated node has a specific ID field 

Is there a way to extract Modification feature related to one node?

I try quering the Pathway Commons Web Services with the ID of nodes 

query<-getPc("http://purl.org/pc2/4/Protein_7e40725729befee922644c6245676e02")

But for different ID associated to AKT i retrieve always the same ModificationFeature

Any suggestion could be very helpful

Thank you very much

Guido

pathway paxtools • 1.6k views
ADD COMMENT
1
Entering edit mode
cannin ▴ 20
@cannin-6903
Last seen 3.3 years ago
United States

This example should pull out information about protein modification features for a given gene. It uses the traverse() command in paxtoolsr. Since the results from traverse is an XML document you need to use XPath with xpathSApply().

library(paxtoolsr)

# Get the protein Uniprot ID for a given HGNC gene symbol
id <- idMapping("AKT1")

# Covert the Uniprot ID to a URI that would be found in the Pathway Commons database
uri <- paste0("http://identifiers.org/uniprot/", id)

# Get URIs for only the ModificationFeatures of the protein
xml <- traverse(uri=uri, path="ProteinReference/entityFeature:ModificationFeature")

# Extract all the URIs
uris <- xpathSApply(xml, "//value/text()", xmlValue)

results <- NULL

# For each URI get the modification position, status, and type
for(i in 1:length(uris)) {
    tmpXml <- traverse(uri=uris[i], path="ModificationFeature/featureLocation:SequenceSite/sequencePosition")
    modPos <- xpathSApply(tmpXml, "//value/text()", xmlValue)

    tmpXml <- traverse(uri=uris[i], path="ModificationFeature/featureLocation:SequenceSite/positionStatus")
    modPosStatus <- xpathSApply(tmpXml, "//value/text()", xmlValue)

    tmpXml <- traverse(uri=uris[i], path="ModificationFeature/modificationType/term")
    modPosType <- xpathSApply(tmpXml, "//value/text()", xmlValue)

    tmpResults <- c(modPos, modPosStatus, modPosType)
    results <- rbind(results, tmpResults)
}

colnames(results) <- c("modificationPosition", "modificationPositionStatus", "modificationType")
ADD COMMENT
0
Entering edit mode
Guido Leoni ▴ 200
@guido-leoni-3328
Last seen 10.3 years ago
European Union

Thank for your answer 

Unfortunately It solve partially my problem. With your solution I obtain all the modificationFeatures related to the gene AKT1.

On the other hand I can have an AKT1 protein that has for example 2 of these modification. My final goal is to have a list of all the proteins annotated for a gene with their modificationFeatures.

For example In the network created for AKT1,MTOR I have the AKT1 proteins:

Protein_e9285d649a9462c9f5994835bdc846d9 with 2 ModificationFeatures( Phosphoserine473 and Phosphothreonine308)

Protein_c3c09444ef92f46585df73f96b89a4b6 with only 1 (Phosphoserine473)

My final goal is to obtain a list with these protein ID and their modificationFeatures.

 

today I was able to obtain my goal with the code pasted below that utilizes your package combined with another ones. But the problem is that the other package converts the biopax model in a dataframe and this conversion it could be quite long for big networks.

Best 

Guido

 

library(paxtoolsr)
library(rBiopaxParser)
library(RCurl)
#download path of interactions from CPTA2
genes <- c("AKT1","MTOR")
t1 <- graphPc(source=genes,kind="NEIGHBORHOOD",format="BIOPAX")

#save the biopax file
saveXML(t1,file="net.owl")
biopax<-readBiopax("net.owl")

#select the protein name in biopax
protein_name<-selectInstances(biopax,class="Protein")
protein_name_unique<-unique(protein_name$id)

#for each protein name
data <- data.frame()
for (i in 1:length(protein_name_unique)) {
  #extract the modification features
  protein_feat<-selectInstances(biopax, id = protein_name_unique[i],property="feature" ) 
  protein_name<-selectInstances(biopax,id=protein_name_unique[i],property="name")
  protein_feat2<-subset(protein_feat,grepl("ModificationFeature",protein_feat$property_attr_value))  
  #print(protein_name_unique[i])
  #print (nrow(protein_feat2))
  #if there are some
  if (nrow(protein_feat2)>0) {
    #we download the biopax of modification feature from biopax
    collect_vect<-character()
    for (z in 1:nrow(protein_feat2)) {
      print(protein_feat2$id[z])
      print(protein_feat2$property_attr_value[z])
     uri<-paste("http://www.pathwaycommons.org/pc2/",substr(protein_feat2$property_attr_value[z],2,53),sep="")
      print(uri)
      tmpXML<-getURI(uri)
      tmpresults <- xmlTreeParse(tmpXML, useInternalNodes = TRUE)
      saveXML(tmpresults,file="tmp.owl")
      biopaxtmp<-readBiopax("tmp.owl")
      type_feat<-selectInstances(biopaxtmp,class="SequenceModificationVocabulary",property="term")
      position_feat<-selectInstances(biopaxtmp,class="SequenceSite",property="sequencePosition")
      string_feat<-paste(position_feat$property_value,type_feat$property_value,sep=" ")   
      #print (string_feat)
      collect_vect<-append(collect_vect,string_feat)
    }
  sort(collect_vect)
  line0<-paste(collect_vect,collapse=";")
  line01<-paste(protein_name_unique[i],line0,sep="\t")
  line<-paste(protein_name$property_value,line01,sep="\t")
  #protein_name_unique[i]
  #collect_vect
  #print(collect_vect)
  write(line,file="myfile.txt",append=TRUE)
  }
  else {
    line<-paste(protein_name$property_value,protein_name_unique[i],sep="\t")
    write(line,file="myfile.txt",append=TRUE)
    
  }
  
 
}

 

ADD COMMENT

Login before adding your answer.

Traffic: 673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6