GenBank RefSeq conversion

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 5 months ago

United States

On Mon, Jun 2, 2008 at 5:03 AM, Eleni Christodoulou <elenichri at="" gmail.com=""> wrote: > Thank you guys, > > I saw your answers this morning. I downloaded the package "org.Hs.eg.db", > but I am struggling a bit with the use of the commands. I am trying for > example: > x <- mget("AA868688",org.Hs.egACCNUM2EG) > and I get the following error: > Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : > invalid key "AA868688" This means that there is no entry for AA868688. > This happens with all the GenBank identifiers that I am trying to convert to > Entrez Gene IDs. What am I doing wrong? You are not doing anything wrong. NCBI supplies genbank accession numbers for what are essentially full-length transcripts that are associated with a gene. However, if you look up the accession above, it is an EST and NCBI does not provide accession-to-gene conversion directly for such non-full-length accessions. So, you have a couple of options: 1) Use the Stanford SOURCE website to do the conversion for you. It will use UniGene mappings to do so. 2) Build your own annotation package using SQLForge. This option will supply you with the mappings that you want in R and in the data structure of the other annotation packages. Hope that helps. Sean > On Fri, May 30, 2008 at 7:27 PM, Marc Carlson <mcarlson at="" fhcrc.org=""> wrote: >> >> Sean Davis wrote: >>> >>> On Fri, May 30, 2008 at 8:53 AM, Eleni Christodoulou >>> <elenichri at="" gmail.com=""> wrote: >>> >>>> >>>> Hello all! >>>> >>>> I was trying to convert RefSeq accession numbers to GenBank accesion >>>> numbers >>>> (or the opposite). I think that there must exist a library that does >>>> this >>>> job automatically...Does anyone know anything relevant to this? >>>> >>> >>> Hi, Eleni. There is no direct relationship between RefSeq and GenBank >>> numbers. A given RefSeq may or may not be represented by exactly one >>> GenBank accession. In fact, a RefSeq may not represent any "real" >>> sequence, but can be a composite of several "real" sequences. As an >>> example, see here: >>> >>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NM_007294.2 >>> >>> It looks like this RefSeq is actually composed of 4 different >>> sequences from genbank (if I am reading the record correctly). >>> >>> The only way I know to deal with this (at least in the general case) >>> is to go through Entrez Gene (or the Ensembl equivalent of a gene) to >>> find those accessions in GenBank and RefSeq that share a common Gene >>> ID. You can do this using the annotation package for the organism of >>> interest, I think. Steffen or others might be able to comment on how >>> to do this using biomaRt. >>> >>> Sean >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> >> What Sean mentioned should work to at least let you connect the dots. >> >> As an example, for human you could use the package "org.Hs.eg.db" and then >> use the following mappings to get what you want: >> >> 1st use "org.Hs.egACCNUM2EG" to get Entrez Gene IDs for your GenBank >> accessions. >> >> And then use "org.Hs.egREFSEQ" to get RefSeq IDs for your Entrez Gene IDs. >> >> >> Marc > >

Annotation GO convert biomaRt Annotation GO convert biomaRt • 1.9k views

ADD COMMENT • link 16.7 years ago Sean Davis 21k

Login before adding your answer.