from using biomaRt and r10kcod
2
0
Entering edit mode
Weiwei Shi ★ 1.2k
@weiwei-shi-1407
Last seen 10.2 years ago
Hi, there: I happened to re-address this question of codelink probe id to human entrezgene id. I describe my question using an example: by using r10kcod package, you can find probe "GE16490" mapped to "502674", which I assume it is rat entrezgene id. However, when I use biomaRt to convert all rat entrezgene id in this array to human ones, I found the following maps involving 502674: id MappedID rat.count human.count 4167 296197 11034 1 2 7021 502674 11034 1 2 so, basically, 296197, 502674 and 11034 are all associated with protein "destrin". To be accurate, 296197 is a rat protein which is similar to destrin. However, as shown in http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gene , the other two (11034 and *502674*) are human ids (if I am wrong here, please correct me). so my questions are: 1. whether 502674 is a rat entrezgene id or human one? 2. r10kcod is wrong or ncbi is wrong or my understanding is wrong (i assume the last one :) 3. i found many many-2-many maps in this process of rat to human entrezgene ids. Like the following: > t0[t0[,1]== 396527,] id MappedID rat.count human.count 6608 396527 54576 9 4 6609 396527 54575 9 4 6610 396527 54600 9 4 6611 396527 54577 9 4 6612 396527 54578 9 4 6613 396527 54579 9 4 6614 396527 54657 9 4 6615 396527 54659 9 4 6616 396527 54658 9 4 > t0[t0[,2]== 54576,] id MappedID rat.count human.count 2494 113992 54576 9 4 6608 396527 54576 9 4 6617 396551 54576 9 4 6626 396552 54576 9 4 > t0[t0[,2]== 54577,] id MappedID rat.count human.count 2497 113992 54577 9 4 6611 396527 54577 9 4 6620 396551 54577 9 4 6629 396552 54577 9 4 so, basically all the ids are related to different polypeptides associated with UDP glucuronosyltransferase 1 family. Are there some other situations causing this many2many mappings? Sorry for the long questions, Regards, -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III
r10kcod probe PROcess convert codelink r10kcod probe PROcess convert codelink • 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 18 hours ago
United States
Weiwei Shi wrote: > Hi, there: > > I happened to re-address this question of codelink probe id to human > entrezgene id. I describe my question using an example: > > by using r10kcod package, you can find probe "GE16490" mapped to > "502674", which I assume it is rat entrezgene id. However, when I use > biomaRt to convert all rat entrezgene id in this array to human ones, > I found the following maps involving 502674: > > id MappedID rat.count human.count > 4167 296197 11034 1 2 > 7021 502674 11034 1 2 > > so, basically, 296197, 502674 and 11034 are all associated with > protein "destrin". To be accurate, 296197 is a rat protein which is > similar to destrin. > > However, as shown in > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gene > , the other two (11034 and *502674*) are human ids (if I am wrong > here, please correct me). > > so my questions are: > > 1. whether 502674 is a rat entrezgene id or human one? When I search on that term I get a rat gene. In fact, a quick sample of the IDs below indicates that the rat IDs are all rat, and the human IDs are all human. > 2. r10kcod is wrong or ncbi is wrong or my understanding is wrong (i > assume the last one :) I think you might have become confused if you did a bunch of queries, and thought that 502674 came up as Rattus norvegicus instead of Homo sapiens on NCBI. > 3. i found many many-2-many maps in this process of rat to human > entrezgene ids. Like the following: > >>t0[t0[,1]== 396527,] > > id MappedID rat.count human.count > 6608 396527 54576 9 4 > 6609 396527 54575 9 4 > 6610 396527 54600 9 4 > 6611 396527 54577 9 4 > 6612 396527 54578 9 4 > 6613 396527 54579 9 4 > 6614 396527 54657 9 4 > 6615 396527 54659 9 4 > 6616 396527 54658 9 4 > >>t0[t0[,2]== 54576,] > > id MappedID rat.count human.count > 2494 113992 54576 9 4 > 6608 396527 54576 9 4 > 6617 396551 54576 9 4 > 6626 396552 54576 9 4 > >>t0[t0[,2]== 54577,] > > id MappedID rat.count human.count > 2497 113992 54577 9 4 > 6611 396527 54577 9 4 > 6620 396551 54577 9 4 > 6629 396552 54577 9 4 > > so, basically all the ids are related to different polypeptides > associated with UDP glucuronosyltransferase 1 family. Are there some > other situations causing this many2many mappings? Not sure I understand the question. Are you asking if there are duplicate Entrez Gene Ids that map to the same or very similar genes? In my experience, yes. In addition, when you are looking at homology mappings it isn't uncommon for a gene in one species to map to several closely related genes in another (since they are mapped by homology, and the closely related genes are often nearly identical in sequence). Best, Jim > > Sorry for the long questions, > > Regards, > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
ADD COMMENT
0
Entering edit mode
James W. MacDonald wrote: > I think you might have become confused if you did a bunch of queries, > and thought that 502674 came up as Rattus norvegicus instead of Homo > sapiens on NCBI. Ack. Make that '502674 came up as Homo sapiens instead of Rattus norvegicus'. Best, Jim -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
ADD REPLY
0
Entering edit mode
Hi, 502674 is human gene id, I think. Then the source that put it into human gene id is from r10kcod? headache...:( my last question is, I found many2many maps and searched one case, which turns out to be assocated with protein quaternery structure. I am wondering if there are other cases causing this many2many. From Jim's answer, similarity definately is one of them. Best, Weiwei On 5/14/07, James W. MacDonald <jmacdon at="" med.umich.edu=""> wrote: > James W. MacDonald wrote: > > > I think you might have become confused if you did a bunch of queries, > > and thought that 502674 came up as Rattus norvegicus instead of Homo > > sapiens on NCBI. > > Ack. Make that '502674 came up as Homo sapiens instead of Rattus > norvegicus'. > > Best, > > Jim > > > -- > James W. MacDonald, M.S. > Biostatistician > Affymetrix and cDNA Microarray Core > University of Michigan Cancer Center > 1500 E. Medical Center Drive > 7410 CCGC > Ann Arbor MI 48109 > 734-647-5623 > > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues. > -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III
ADD REPLY
0
Entering edit mode
Diego Diez ▴ 760
@diego-diez-4520
Last seen 4.1 years ago
Japan
Hi Weiwei and James, (sorry Weiwei, as I sent this email the first time only to you when my intention was to send it to the list too). On May 15, 2007, at 5:29 AM, Weiwei Shi wrote: > Hi, there: > > I happened to re-address this question of codelink probe id to human > entrezgene id. I describe my question using an example: > > by using r10kcod package, you can find probe "GE16490" mapped to > "502674", which I assume it is rat entrezgene id. However, when I use > biomaRt to convert all rat entrezgene id in this array to human ones, > I found the following maps involving 502674: > > id MappedID rat.count human.count > 4167 296197 11034 1 2 > 7021 502674 11034 1 2 > I'm not too familiar with the biomaRt package but I guess that this result what is telling you is that you have two rat entrez id's 296197 and 502674 (each appearing only once), which map to one human entrez id 11034 (appearing twice, one time for each rat id). > so, basically, 296197, 502674 and 11034 are all associated with > protein "destrin". To be accurate, 296197 is a rat protein which is > similar to destrin. > > However, as shown in > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gene > , the other two (11034 and *502674*) are human ids (if I am wrong > here, please correct me). > Well, for me searching 502674 using Entrez Gene comes up a link to the Destrin rat gene: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? db=gene&cmd=search&term=502674 clicking on this entry I can see the information about the Dstn (destrin) gene. In the bottom of the page there are mappings to different sequences (Related sequences). One is CB785830.1 and the other CF111187.1 The later one is the one used in r10kcod to map from Codelink probe to Genbank, GE16490 -> CF111187.1 and then, this is used to map to Entrez Gene, if and understand a little how AnnBuilder works (that may not be the case). Of course, I use also the mappings provided from the manufacturer from probe ids to Entrez Gene and Unigene but for this particular probe, there is no such mapping in the current mappings provided (last updated March 31, 2006 so they are pretty old). In fact, in those files, there is also the information about homologues in the other two organisms (from human, mouse and rat) and in the human probes that map to Entrez Gene 11034 I can find that they map to rat Entrez Gene 502674, in agreement with the biomaRt results. > so my questions are: > > 1. whether 502674 is a rat entrezgene id or human one? > I would definitely say that it is a rat id. > 2. r10kcod is wrong or ncbi is wrong or my understanding is wrong (i > assume the last one :) > neither are wrong from my point of view, but let first see if we are seeing the same thing when we look for 502674 in Entrez Gene. > 3. i found many many-2-many maps in this process of rat to human > entrezgene ids. Like the following: > >> t0[t0[,1]== 396527,] >> > id MappedID rat.count human.count > 6608 396527 54576 9 4 > 6609 396527 54575 9 4 > 6610 396527 54600 9 4 > 6611 396527 54577 9 4 > 6612 396527 54578 9 4 > 6613 396527 54579 9 4 > 6614 396527 54657 9 4 > 6615 396527 54659 9 4 > 6616 396527 54658 9 4 > >> t0[t0[,2]== 54576,] >> > id MappedID rat.count human.count > 2494 113992 54576 9 4 > 6608 396527 54576 9 4 > 6617 396551 54576 9 4 > 6626 396552 54576 9 4 > >> t0[t0[,2]== 54577,] >> > id MappedID rat.count human.count > 2497 113992 54577 9 4 > 6611 396527 54577 9 4 > 6620 396551 54577 9 4 > 6629 396552 54577 9 4 > > so, basically all the ids are related to different polypeptides > associated with UDP glucuronosyltransferase 1 family. Are there some > other situations causing this many2many mappings? > > As for this, James has already answered (thanks for that). The probes are 30 base pair long, so it is not strange, but on the contrary, very common to find those probes mapping to multiple genes that can have related or unrelated functions. Is less common in the Codelink arrays to have multiple probes mapping to the same gene, but again, you can have multiple probes mapping to different Genbank ids that correspond to the same Entrez Gene identifier. The fact that you can have different paralogues and orthologues sequences and even sometimes unrelated sequences sharing the same piece of 30 base pair oligonucleotides makes this a very complex problem with no easy solution. Regards, Diego. ----------------------------------------------- Diego Diez, PhD. Bioknowledge systems, Kanehisa lab. Bioinformatics center, Institute for Chemical Research, Kyoto University. Gokasho, Uji, Kyoto 611-0011 JAPAN. e-mail: diez at kuicr.kyoto-u.ac.jp url: http://web.kuicr.kyoto-u.ac.jp/~diez tlf: +81-774-38-3296 fax: +81-774-38-3269 ----------------------------------------------- > Sorry for the long questions, > > Regards, > > -- > Weiwei Shi, Ph.D > Research Scientist > GeneGO, Inc. > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Hi, I just checked again about this 502674 and found it is a rat gene id. Ooops....But I remember I saw it were shown as human one last afternoon, not once. So, I pointed that question out. Probably it was due to some web explorer's problem: I am using Safari on Mac; or I must have been "drunk", :) sorry about that, Weiwei On 5/15/07, Diego Diez <diez at="" kuicr.kyoto-u.ac.jp=""> wrote: > Hi Weiwei and James, > > (sorry Weiwei, as I sent this email the first time only to you when > my intention was to send it to the list too). > > > On May 15, 2007, at 5:29 AM, Weiwei Shi wrote: > > Hi, there: > > > > I happened to re-address this question of codelink probe id to human > > entrezgene id. I describe my question using an example: > > > > by using r10kcod package, you can find probe "GE16490" mapped to > > "502674", which I assume it is rat entrezgene id. However, when I use > > biomaRt to convert all rat entrezgene id in this array to human ones, > > I found the following maps involving 502674: > > > > id MappedID rat.count human.count > > 4167 296197 11034 1 2 > > 7021 502674 11034 1 2 > > > > I'm not too familiar with the biomaRt package but I guess that this > result what is telling you is that you have two rat entrez id's > 296197 and 502674 (each appearing only once), which map to one human > entrez id 11034 (appearing twice, one time for each rat id). > > > so, basically, 296197, 502674 and 11034 are all associated with > > protein "destrin". To be accurate, 296197 is a rat protein which is > > similar to destrin. > > > > However, as shown in > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gene > > , the other two (11034 and *502674*) are human ids (if I am wrong > > here, please correct me). > > > > Well, for me searching 502674 using Entrez Gene comes up a link to > the Destrin rat gene: > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? > db=gene&cmd=search&term=502674 > > clicking on this entry I can see the information about the Dstn > (destrin) gene. In the bottom of the page there are mappings to > different sequences (Related sequences). One is CB785830.1 and the > other CF111187.1 The later one is the one used in r10kcod to map from > Codelink probe to Genbank, > > GE16490 -> CF111187.1 > > and then, this is used to map to Entrez Gene, if and understand a > little how AnnBuilder works (that may not be the case). Of course, I > use also the mappings provided from the manufacturer from probe ids > to Entrez Gene and Unigene but for this particular probe, there is no > such mapping in the current mappings provided (last updated March 31, > 2006 so they are pretty old). > > In fact, in those files, there is also the information about > homologues in the other two organisms (from human, mouse and rat) and > in the human probes that map to Entrez Gene 11034 I can find that > they map to rat Entrez Gene 502674, in agreement with the biomaRt > results. > > > so my questions are: > > > > 1. whether 502674 is a rat entrezgene id or human one? > > > > I would definitely say that it is a rat id. > > > 2. r10kcod is wrong or ncbi is wrong or my understanding is wrong (i > > assume the last one :) > > > > neither are wrong from my point of view, but let first see if we are > seeing the same thing when we look for 502674 in Entrez Gene. > > > 3. i found many many-2-many maps in this process of rat to human > > entrezgene ids. Like the following: > > > >> t0[t0[,1]== 396527,] > >> > > id MappedID rat.count human.count > > 6608 396527 54576 9 4 > > 6609 396527 54575 9 4 > > 6610 396527 54600 9 4 > > 6611 396527 54577 9 4 > > 6612 396527 54578 9 4 > > 6613 396527 54579 9 4 > > 6614 396527 54657 9 4 > > 6615 396527 54659 9 4 > > 6616 396527 54658 9 4 > > > >> t0[t0[,2]== 54576,] > >> > > id MappedID rat.count human.count > > 2494 113992 54576 9 4 > > 6608 396527 54576 9 4 > > 6617 396551 54576 9 4 > > 6626 396552 54576 9 4 > > > >> t0[t0[,2]== 54577,] > >> > > id MappedID rat.count human.count > > 2497 113992 54577 9 4 > > 6611 396527 54577 9 4 > > 6620 396551 54577 9 4 > > 6629 396552 54577 9 4 > > > > so, basically all the ids are related to different polypeptides > > associated with UDP glucuronosyltransferase 1 family. Are there some > > other situations causing this many2many mappings? > > > > > > As for this, James has already answered (thanks for that). The probes > are 30 base pair long, so it is not strange, but on the contrary, > very common to find those probes mapping to multiple genes that can > have related or unrelated functions. Is less common in the Codelink > arrays to have multiple probes mapping to the same gene, but again, > you can have multiple probes mapping to different Genbank ids that > correspond to the same Entrez Gene identifier. The fact that you can > have different paralogues and orthologues sequences and even > sometimes unrelated sequences sharing the same piece of 30 base pair > oligonucleotides makes this a very complex problem with no easy > solution. > > Regards, > > Diego. > > ----------------------------------------------- > Diego Diez, PhD. > > Bioknowledge systems, Kanehisa lab. > Bioinformatics center, > Institute for Chemical Research, > Kyoto University. > Gokasho, Uji, Kyoto 611-0011 JAPAN. > > e-mail: diez at kuicr.kyoto-u.ac.jp > url: http://web.kuicr.kyoto-u.ac.jp/~diez > tlf: +81-774-38-3296 > fax: +81-774-38-3269 > ----------------------------------------------- > > > > > Sorry for the long questions, > > > > Regards, > > > > -- > > Weiwei Shi, Ph.D > > Research Scientist > > GeneGO, Inc. > > > > "Did you always know?" > > "No, I did not. But I believed..." > > ---Matrix III > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/ > > gmane.science.biology.informatics.conductor > > > > > -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III
ADD REPLY

Login before adding your answer.

Traffic: 717 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6