IPI to entrez id
2
0
Entering edit mode
viritha kaza ▴ 580
@viritha-kaza-4318
Last seen 10.2 years ago
Hi group, I would like to convert a list of 3000 IPI's to genesymbol and entrez id. eg : * * *IPI00658210 *to 57599, WDR48. I wanted a 1:1 mapping. Could anyone suggest the function and the package which could help me in the process. Thank u in advance, Viritha [[alternative HTML version deleted]]
convert convert • 2.7k views
ADD COMMENT
0
Entering edit mode
@manca-marco-path-4295
Last seen 10.2 years ago
Hi Viritha, I have found this old answer to a similar question and I think it should still apply: << Not sure that it is this easy. The IPI are protein identifiers. GO categories classify genes. Neither the mapping from protein to gene or gene to GO category is 1:1. GO categories form a hierarchy. So there are significant decisions to be made in representing IPI identifiers in a pie chart of GO terms. Bioconductor maintains 'org' and 'GO' database packages that provide the necessary link between IPI protein ids and GO gene ontology categories, via ENTREZ gene ids. Code might look like ## once only, to install packages source('http://bioconductor.org/biocLite.R') biocLite('org.Hs.eg.db', 'GO.db') ## from IPI to ENTREZ id, not 1:1 library(org.Hs.eg.db) ipi2eg = revmap(eapply(org.Hs.eg.db, names)) ## NOT 1:1 map ## Assume ipiIds is, e.g., c('IPI00008860', 'IPI00019922') egIds = revmap(ipi2eg[ipiIds]) ## get GO terms, also not 1:1 goIds = eapply(org.Hs.egGO[names(egIds)], names) You're still left with the problem of resolving multiple mappings and the hierarchical relationship between GO terms. Martin >> All the best, Marco -- Marco Manca, MD University of Maastricht Faculty of Health, Medicine and Life Sciences (FHML) Cardiovascular Research Institute (CARIM) Mailing address: PO Box 616, 6200 MD Maastricht (The Netherlands) Visiting address: Experimental Vascular Pathology group, Dept of Pathology - Room5.08, Maastricht University Medical Center, P. Debyelaan 25, 6229 HX Maastricht E-mail: m.manca at maastrichtuniversity.nl Office telephone: +31(0)433874633 Personal mobile: +31(0)626441205 Twitter: @markomanka ********************************************************************** *********************************************** This email and any files transmitted with it are confidential and solely for the use of the intended recipient. It may contain material protected by privacy or attorney-client privilege. If you are not the intended recipient or the person responsible for delivering to the intended recipient, be advised that you have received this email in error and that any use is STRICTLY PROHIBITED. If you have received this email in error please notify us by telephone on +31626441205 Dr Marco MANCA ********************************************************************** *********************************************** ________________________________________ Da: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] per conto di viritha kaza [viritha.k at gmail.com] Inviato: mercoled? 16 febbraio 2011 18.15 A: Bioconductor Oggetto: [BioC] IPI to entrez id Hi group, I would like to convert a list of 3000 IPI's to genesymbol and entrez id. eg : * * *IPI00658210 *to 57599, WDR48. I wanted a 1:1 mapping. Could anyone suggest the function and the package which could help me in the process. Thank u in advance, Viritha [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
viritha kaza ▴ 580
@viritha-kaza-4318
Last seen 10.2 years ago
Hi thanks for the reply: As samuel suggested I used the following link ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/ipi.HUMAN.xrefs.gz. For the once I didnot find,I used the following code Though I still dont get 1:1 mapping, I got the entrez and the gene symbol.The ipi_test file contains the list of IPI that I want to convert. code: >source('http://bioconductor.org/biocLite.R') > biocLite("biomaRt") >library("biomaRt") >ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") >ipi=scan("ipi_test.txt",what =character(),sep='\n',quote="") >getBM(attributes = c("ipi","entrezgene","hgnc_symbol"),filters="ipi",values=ipi,mart = ensembl) >write.table(ipi_entrez,"ipi_entrez_test.txt",sep='\t') I am still not getting a few.Is there any other method or should I think that those IPI numbers dont have corresponding gene symbols? Thanks, Viritha On Wed, Feb 16, 2011 at 12:23 PM, Manca Marco (PATH) < m.manca@maastrichtuniversity.nl> wrote: > > > Hi Viritha, > > I have found this old answer to a similar question and I think it should > still apply: > > << > > Not sure that it is this easy. The IPI are protein identifiers. GO > categories classify genes. Neither the mapping from protein to gene or > gene to GO category is 1:1. GO categories form a hierarchy. So there are > significant decisions to be made in representing IPI identifiers in a > pie chart of GO terms. > > Bioconductor maintains 'org' and 'GO' database packages that provide the > necessary link between IPI protein ids and GO gene ontology categories, > via ENTREZ gene ids. Code might look like > > ## once only, to install packages > source('http://bioconductor.org/biocLite.R') > biocLite('org.Hs.eg.db', 'GO.db') > > ## from IPI to ENTREZ id, not 1:1 > library(org.Hs.eg.db) > ipi2eg = revmap(eapply(org.Hs.eg.db, names)) ## NOT 1:1 map > > ## Assume ipiIds is, e.g., c('IPI00008860', 'IPI00019922') > egIds = revmap(ipi2eg[ipiIds]) > > ## get GO terms, also not 1:1 > goIds = eapply(org.Hs.egGO[names(egIds)], names) > > You're still left with the problem of resolving multiple mappings and > the hierarchical relationship between GO terms. > > Martin > > >> > > > All the best, Marco > > -- > Marco Manca, MD > University of Maastricht > Faculty of Health, Medicine and Life Sciences (FHML) > Cardiovascular Research Institute (CARIM) > > Mailing address: PO Box 616, 6200 MD Maastricht (The Netherlands) > Visiting address: Experimental Vascular Pathology group, Dept of Pathology > - Room5.08, Maastricht University Medical Center, P. Debyelaan 25, 6229 HX > Maastricht > > E-mail: m.manca@maastrichtuniversity.nl > Office telephone: +31(0)433874633 > Personal mobile: +31(0)626441205 > Twitter: @markomanka > > > > ******************************************************************** ************************************************* > > This email and any files transmitted with it are confidential and solely > for the use of the intended recipient. > > It may contain material protected by privacy or attorney-client privilege. > If you are not the intended recipient or the person responsible for > > delivering to the intended recipient, be advised that you have received > this email in error and that any use is STRICTLY PROHIBITED. > > If you have received this email in error please notify us by telephone on > +31626441205 Dr Marco MANCA > > > ******************************************************************** ************************************************* > ________________________________________ > Da: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] > per conto di viritha kaza [viritha.k@gmail.com] > Inviato: mercoledì 16 febbraio 2011 18.15 > A: Bioconductor > Oggetto: [BioC] IPI to entrez id > > Hi group, > I would like to convert a list of 3000 IPI's to genesymbol and entrez id. > eg : * * *IPI00658210 *to 57599, WDR48. > I wanted a 1:1 mapping. > Could anyone suggest the function and the package which could help me in > the > process. > Thank u in advance, > Viritha > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi, Could you give us a list of 10 unmatched? BR viritha kaza wrote: > Hi > thanks for the reply: > As samuel suggested I used the following link > ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/ipi.HUMAN.xrefs.gz. > For the once I didnot find,I used the following code > > Though I still dont get 1:1 mapping, I got the entrez and the gene > symbol.The ipi_test file contains the list of IPI that I want to convert. > > code: > >source('http://bioconductor.org/biocLite.R') > > biocLite("biomaRt") > >library("biomaRt") > >ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") > >ipi=scan("ipi_test.txt",what =character(),sep='\n',quote="") > >getBM(attributes = > c("ipi","entrezgene","hgnc_symbol"),filters="ipi",values=ipi,mart = > ensembl) > >write.table(ipi_entrez,"ipi_entrez_test.txt",sep='\t') > > I am still not getting a few.Is there any other method or should I > think that those IPI numbers dont have corresponding gene symbols? > Thanks, > Viritha -- Samuel GRANJEAUD granjeau at tagc.univ-mrs.fr INSERM - ICIM - TAGC Tel: +33 (0)491 82 87 11/24 http://tagc.univ-mrs.fr Fax: +33 (0)491 82 87 01 http://icim.marseille.inserm.fr/proteomique
ADD REPLY
0
Entering edit mode
Hi Samuel, These are some of the ids for which I didnot get. IPI00055954 IPI00221338 IPI00465149 IPI00554793 IPI00028262 IPI00412977 IPI00105532 IPI00411514 IPI00746388 IPI00419266 Thanks, Viritha On Fri, Feb 18, 2011 at 2:49 AM, Samuel GRANJEAUD - IR/ICIM < granjeau@tagc.univ-mrs.fr> wrote: > Hi, > > Could you give us a list of 10 unmatched? > > BR > > > viritha kaza wrote: > >> Hi >> thanks for the reply: >> As samuel suggested I used the following link >> ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/ipi.HUMAN.xrefs.gz. >> For the once I didnot find,I used the following code >> Though I still dont get 1:1 mapping, I got the entrez and the gene >> symbol.The ipi_test file contains the list of IPI that I want to convert. >> code: >> >source('http://bioconductor.org/biocLite.R') >> > biocLite("biomaRt") >> >library("biomaRt") >> >ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") >> >ipi=scan("ipi_test.txt",what =character(),sep='\n',quote="") >> >getBM(attributes = >> c("ipi","entrezgene","hgnc_symbol"),filters="ipi",values=ipi,mart = ensembl) >> >write.table(ipi_entrez,"ipi_entrez_test.txt",sep='\t') >> I am still not getting a few.Is there any other method or should I think >> that those IPI numbers dont have corresponding gene symbols? >> Thanks, >> Viritha >> > > -- > > Samuel GRANJEAUD granjeau@tagc.univ-mrs.fr > INSERM - ICIM - TAGC Tel: +33 (0)491 82 87 11/24 > http://tagc.univ-mrs.fr Fax: +33 (0)491 82 87 01 > http://icim.marseille.inserm.fr/proteomique > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi, Looks like all your id have been suppressed from IPI. You can still find them in UniPARC. If your list is only 10 items, you'd better do the geneid retrieval by hand. http://www.ebi.ac.uk/uniparc/ list a few links to query UNIPARC. Here is the code I used to check your Id. Sorry, there is some ugly perl, first to reformat XML than to extract interesting fields. http://www.xaprb.com/blog/2006/10/05/five-great-perl-programming- techniques-to-make-your-life-fun-again/ Regards. ~$ cat myId.txt IPI00055954 IPI00221338 IPI00465149 IPI00554793 IPI00028262 IPI00412977 IPI00105532 IPI00411514 IPI00746388 IPI00419266 ~$ wget "http://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=uniparc&id=IPI00055954+ IPI00221338+IPI00465149+IPI00554793+IPI00028262+IPI00412977+IPI0010553 2+IPI00411514+IPI00746388+IPI00419266&format=default&style=default&Ret rieve=Retrieve" -O myIPI.xml :~$ perl -pe 's/>\s+<dbref/>\n<dbref g;="" s="" \=""/>/\/>\n/g; s/>\n</g'/>/, $_); print join(" - ", map($t{$_}, qw(id version created last active))),"\n"' | grep -f myId.txt | sort "IPI00028262" - "1" - "2003-03-14" - "2009-06-17" - "N" "IPI00055954" - "1" - "2003-03-14" - "2003-10-03" - "N" "IPI00055954" - "2" - "2003-11-07" - "2007-07-14" - "N" "IPI00055954" - "3" - "2007-08-08" - "2007-10-24" - "N" "IPI00055954" - "4" - "2007-11-13" - "2009-07-09" - "N" "IPI00105532" - "1" - "2003-03-14" - "2006-03-03" - "N" "IPI00105532" - "2" - "2006-04-04" - "2006-10-06" - "N" "IPI00105532" - "3" - "2006-11-02" - "2009-09-03" - "N" "IPI00221338" - "1" - "2003-04-10" - "2003-10-03" - "N" "IPI00221338" - "2" - "2003-11-07" - "2005-02-05" - "N" "IPI00221338" - "3" - "2005-03-07" - "2005-08-02" - "N" "IPI00221338" - "4" - "2005-09-06" - "2006-09-06" - "N" "IPI00221338" - "5" - "2006-10-06" - "2006-10-06" - "N" "IPI00221338" - "6" - "2006-11-02" - "2007-10-24" - "N" "IPI00221338" - "7" - "2007-11-13" - "2008-09-02" - "N" "IPI00221338" - "8" - "2008-09-25" - "2009-06-17" - "N" "IPI00411514" - "1" - "2004-06-02" - "2007-12-05" - "N" "IPI00411514" - "2" - "2008-01-16" - "2010-06-17" - "N" "IPI00412977" - "1" - "2004-06-02" - "2010-04-26" - "N" "IPI00419266" - "1" - "2004-07-01" - "2010-04-26" - "N" "IPI00465149" - "1" - "2004-10-04" - "2005-02-05" - "N" "IPI00465149" - "2" - "2005-03-07" - "2005-05-10" - "N" "IPI00465149" - "3" - "2005-06-03" - "2009-07-09" - "N" "IPI00554793" - "1" - "2005-04-04" - "2007-01-17" - "N" "IPI00554793" - "2" - "2007-02-21" - "2009-06-17" - "N" "IPI00746388" - "1" - "2006-05-16" - "2007-10-24" - "N" "IPI00746388" - "2" - "2007-11-13" - "2009-02-12" - "N" "IPI00746388" - "3" - "2009-03-03" - "2009-06-17" - "N" viritha kaza wrote: > Hi Samuel, > These are some of the ids for which I didnot get. > IPI00055954 > IPI00221338 > IPI00465149 > IPI00554793 > IPI00028262 > IPI00412977 > IPI00105532 > IPI00411514 > IPI00746388 > IPI00419266 > > Thanks, > Viritha > > On Fri, Feb 18, 2011 at 2:49 AM, Samuel GRANJEAUD - IR/ICIM > <granjeau at="" tagc.univ-mrs.fr="" <mailto:granjeau="" at="" tagc.univ-mrs.fr="">> wrote: > > Hi, > > Could you give us a list of 10 unmatched? > > BR > > > viritha kaza wrote: > > Hi > thanks for the reply: > As samuel suggested I used the following link > ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/ipi.HUMAN.xrefs.gz. > For the once I didnot find,I used the following code > Though I still dont get 1:1 mapping, I got the entrez and the > gene symbol.The ipi_test file contains the list of IPI that I > want to convert. > code: > >source('http://bioconductor.org/biocLite.R') > > biocLite("biomaRt") > >library("biomaRt") > >ensembl = useMart("ensembl", dataset = > "hsapiens_gene_ensembl") >ipi=scan("ipi_test.txt",what > =character(),sep='\n',quote="") > >getBM(attributes = > c("ipi","entrezgene","hgnc_symbol"),filters="ipi",values=ipi,mart > = ensembl) > >write.table(ipi_entrez,"ipi_entrez_test.txt",sep='\t') > I am still not getting a few.Is there any other method or > should I think that those IPI numbers dont have corresponding > gene symbols? > Thanks, > Viritha > > -- Samuel GRANJEAUD granjeau at tagc.univ-mrs.fr INSERM - ICIM - TAGC Tel: +33 (0)491 82 87 11/24 http://tagc.univ-mrs.fr Fax: +33 (0)491 82 87 01 http://icim.marseille.inserm.fr/proteomique
ADD REPLY
0
Entering edit mode
Hi Viritha, These things can never be 1:1, but you can pretty easily just cram them all into a huge data.frame by doing this: library(org.Hs.eg.db) allAnnots <- merge(toTable(org.Hs.egPROSITE), toTable(org.Hs.egGO), by.x="gene_id", by.y="gene_id") head(allAnnots) Once you have done this, you may notice that they are not only are these things almost never (if ever) 1:1, but that this could have been even worse if I had used the GO2ALL mappings (and I probably should have, but I can't really tell because I have almost no information about what you want to do). Anyhow, I hope this helps you, but if you have a more specific use for this information that you are willing to talk about then we might be able to give you a better answer. Marc ----- Original Message ----- From: "viritha kaza" <viritha.k@gmail.com> To: "Manca Marco (PATH)" <m.manca at="" maastrichtuniversity.nl=""> Cc: bioconductor at stat.math.ethz.ch Sent: Thursday, February 17, 2011 9:46:28 AM Subject: Re: [BioC] IPI to entrez id Hi thanks for the reply: As samuel suggested I used the following link ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/ipi.HUMAN.xrefs.gz. For the once I didnot find,I used the following code Though I still dont get 1:1 mapping, I got the entrez and the gene symbol.The ipi_test file contains the list of IPI that I want to convert. code: >source('http://bioconductor.org/biocLite.R') > biocLite("biomaRt") >library("biomaRt") >ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") >ipi=scan("ipi_test.txt",what =character(),sep='\n',quote="") >getBM(attributes = c("ipi","entrezgene","hgnc_symbol"),filters="ipi",values=ipi,mart = ensembl) >write.table(ipi_entrez,"ipi_entrez_test.txt",sep='\t') I am still not getting a few.Is there any other method or should I think that those IPI numbers dont have corresponding gene symbols? Thanks, Viritha On Wed, Feb 16, 2011 at 12:23 PM, Manca Marco (PATH) < m.manca at maastrichtuniversity.nl> wrote: > > > Hi Viritha, > > I have found this old answer to a similar question and I think it should > still apply: > > << > > Not sure that it is this easy. The IPI are protein identifiers. GO > categories classify genes. Neither the mapping from protein to gene or > gene to GO category is 1:1. GO categories form a hierarchy. So there are > significant decisions to be made in representing IPI identifiers in a > pie chart of GO terms. > > Bioconductor maintains 'org' and 'GO' database packages that provide the > necessary link between IPI protein ids and GO gene ontology categories, > via ENTREZ gene ids. Code might look like > > ## once only, to install packages > source('http://bioconductor.org/biocLite.R') > biocLite('org.Hs.eg.db', 'GO.db') > > ## from IPI to ENTREZ id, not 1:1 > library(org.Hs.eg.db) > ipi2eg = revmap(eapply(org.Hs.eg.db, names)) ## NOT 1:1 map > > ## Assume ipiIds is, e.g., c('IPI00008860', 'IPI00019922') > egIds = revmap(ipi2eg[ipiIds]) > > ## get GO terms, also not 1:1 > goIds = eapply(org.Hs.egGO[names(egIds)], names) > > You're still left with the problem of resolving multiple mappings and > the hierarchical relationship between GO terms. > > Martin > > >> > > > All the best, Marco > > -- > Marco Manca, MD > University of Maastricht > Faculty of Health, Medicine and Life Sciences (FHML) > Cardiovascular Research Institute (CARIM) > > Mailing address: PO Box 616, 6200 MD Maastricht (The Netherlands) > Visiting address: Experimental Vascular Pathology group, Dept of Pathology > - Room5.08, Maastricht University Medical Center, P. Debyelaan 25, 6229 HX > Maastricht > > E-mail: m.manca at maastrichtuniversity.nl > Office telephone: +31(0)433874633 > Personal mobile: +31(0)626441205 > Twitter: @markomanka > > > > ******************************************************************** ************************************************* > > This email and any files transmitted with it are confidential and solely > for the use of the intended recipient. > > It may contain material protected by privacy or attorney-client privilege. > If you are not the intended recipient or the person responsible for > > delivering to the intended recipient, be advised that you have received > this email in error and that any use is STRICTLY PROHIBITED. > > If you have received this email in error please notify us by telephone on > +31626441205 Dr Marco MANCA > > > ******************************************************************** ************************************************* > ________________________________________ > Da: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] > per conto di viritha kaza [viritha.k at gmail.com] > Inviato: mercoled? 16 febbraio 2011 18.15 > A: Bioconductor > Oggetto: [BioC] IPI to entrez id > > Hi group, > I would like to convert a list of 3000 IPI's to genesymbol and entrez id. > eg : * * *IPI00658210 *to 57599, WDR48. > I wanted a 1:1 mapping. > Could anyone suggest the function and the package which could help me in > the > process. > Thank u in advance, > Viritha > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6