Entering edit mode
On Mon, Jun 2, 2008 at 5:03 AM, Eleni Christodoulou <elenichri at="" gmail.com=""> wrote:
> Thank you guys,
>
> I saw your answers this morning. I downloaded the package
"org.Hs.eg.db",
> but I am struggling a bit with the use of the commands. I am trying
for
> example:
> x <- mget("AA868688",org.Hs.egACCNUM2EG)
> and I get the following error:
> Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
> invalid key "AA868688"
This means that there is no entry for AA868688.
> This happens with all the GenBank identifiers that I am trying to
convert to
> Entrez Gene IDs. What am I doing wrong?
You are not doing anything wrong. NCBI supplies genbank accession
numbers for what are essentially full-length transcripts that are
associated with a gene. However, if you look up the accession above,
it is an EST and NCBI does not provide accession-to-gene conversion
directly for such non-full-length accessions. So, you have a couple
of options:
1) Use the Stanford SOURCE website to do the conversion for you. It
will use UniGene mappings to do so.
2) Build your own annotation package using SQLForge. This option
will supply you with the mappings that you want in R and in the data
structure of the other annotation packages.
Hope that helps.
Sean
> On Fri, May 30, 2008 at 7:27 PM, Marc Carlson <mcarlson at="" fhcrc.org=""> wrote:
>>
>> Sean Davis wrote:
>>>
>>> On Fri, May 30, 2008 at 8:53 AM, Eleni Christodoulou
>>> <elenichri at="" gmail.com=""> wrote:
>>>
>>>>
>>>> Hello all!
>>>>
>>>> I was trying to convert RefSeq accession numbers to GenBank
accesion
>>>> numbers
>>>> (or the opposite). I think that there must exist a library that
does
>>>> this
>>>> job automatically...Does anyone know anything relevant to this?
>>>>
>>>
>>> Hi, Eleni. There is no direct relationship between RefSeq and
GenBank
>>> numbers. A given RefSeq may or may not be represented by exactly
one
>>> GenBank accession. In fact, a RefSeq may not represent any "real"
>>> sequence, but can be a composite of several "real" sequences. As
an
>>> example, see here:
>>>
>>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NM_007294.2
>>>
>>> It looks like this RefSeq is actually composed of 4 different
>>> sequences from genbank (if I am reading the record correctly).
>>>
>>> The only way I know to deal with this (at least in the general
case)
>>> is to go through Entrez Gene (or the Ensembl equivalent of a gene)
to
>>> find those accessions in GenBank and RefSeq that share a common
Gene
>>> ID. You can do this using the annotation package for the organism
of
>>> interest, I think. Steffen or others might be able to comment on
how
>>> to do this using biomaRt.
>>>
>>> Sean
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>
>> What Sean mentioned should work to at least let you connect the
dots.
>>
>> As an example, for human you could use the package "org.Hs.eg.db"
and then
>> use the following mappings to get what you want:
>>
>> 1st use "org.Hs.egACCNUM2EG" to get Entrez Gene IDs for your
GenBank
>> accessions.
>>
>> And then use "org.Hs.egREFSEQ" to get RefSeq IDs for your Entrez
Gene IDs.
>>
>>
>> Marc
>
>