Is there anyway to map a set of EST id's to genBank accessions?

0

Entering edit mode

Peter Waltman ▴ 10

@peter-waltman-3920

Last seen 10.5 years ago

Hi - I'm trying to annotate a custom cDNA array, available from GEO (specifically, GDS1761). It uses a custom annotation (GPL1290) which provides the EST's that were mapped to the different spots, i.e. the first 5 rows are: ID NAME CLONE_ID 5ACC 3ACC GB_LIST 1 SID W 60204, Homo sapiens C2H2 zinc finger protein pseudogene, mRNA sequence [5':T39154, 3':T40438] IMAGE:60204 T39154 T40438 T39154,T40438 2 EST Chr.X [60298, (D), 5':T39213, 3':T40480] IMAGE:60298 T39213 T40480 T39213,T40480 3 RPL3 Ribosomal protein L3 Chr.22 [60436, (EW), 5':T39295, 3':T40510] IMAGE:60436 T39295 T40510 T39295,T40510 4 ESTSID 60474, [5':T39311, 3':T40516] IMAGE:60474 T39311 T40516 T39311,T40516 5 SID 60218, [5':T39165, 3':T40450] IMAGE:60218 T39165 T40450 T39165,T40450 6 EST Chr.18 [60268, (IR), 5':T39192, 3':T40467] IMAGE:60268 T39192 T40467 T39192,T40467 The T##### id's are the EST accession id's, which can be queried from NCBI, i.e. for the first row http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Nucleotide &term=T39213will find one result, http://www.ncbi.nlm.nih.gov/nucest/T39213.1?ordinalpos=1&itool=EntrezS ystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum On the result that's returned, you can find a genBank gi # that can be used to find the gene annotation (in this case 646973) , but I can't figure out any way to do this for a large number of EST accessions (>9600). Any suggestions? Thanks! Peter [[alternative HTML version deleted]]

Annotation Homo sapiens annotate Annotation Homo sapiens annotate • 1.3k views

ADD COMMENT • link updated 15.1 years ago by James F. Reid ▴ 610 • written 15.1 years ago by Peter Waltman ▴ 10

0

Entering edit mode

James F. Reid ▴ 610

@james-f-reid-3148

Last seen 10.5 years ago

Hi Peter, you could use the makeDBPackage function from the AnnotationDbi package using baseMapType = "gb" for genbank entries which would be either the 3' or 5' prime accession numbers of the clones (i.e. 5':T39154, 3':T40438 from your first entry). You could simply use the 3' ones (3ACC) but if you want to use both you could run it twice to highlight inconsistencies in gene assignment for any given entry. HTH. J. On 02/02/2010 11:09, Peter Waltman wrote: > Hi - > > I'm trying to annotate a custom cDNA array, available from GEO > (specifically, GDS1761). It uses a custom annotation (GPL1290) which > provides the EST's that were mapped to the different spots, i.e. the first 5 > rows are: > ID NAME CLONE_ID 5ACC 3ACC GB_LIST > 1 SID W 60204, Homo sapiens C2H2 zinc finger protein pseudogene, mRNA > sequence [5':T39154, 3':T40438] IMAGE:60204 T39154 T40438 > T39154,T40438 > 2 EST Chr.X [60298, (D), 5':T39213, 3':T40480] IMAGE:60298 > T39213 T40480 T39213,T40480 > 3 RPL3 Ribosomal protein L3 Chr.22 [60436, (EW), 5':T39295, > 3':T40510] IMAGE:60436 T39295 T40510 T39295,T40510 > 4 ESTSID 60474, [5':T39311, 3':T40516] IMAGE:60474 T39311 > T40516 T39311,T40516 > 5 SID 60218, [5':T39165, 3':T40450] IMAGE:60218 T39165 > T40450 T39165,T40450 > 6 EST Chr.18 [60268, (IR), 5':T39192, 3':T40467] IMAGE:60268 > T39192 T40467 T39192,T40467 > > The T##### id's are the EST accession id's, which can be queried from NCBI, > i.e. for the first row > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Nucleoti de&term=T39213will > find one result, > http://www.ncbi.nlm.nih.gov/nucest/T39213.1?ordinalpos=1&itool=Entre zSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum > > On the result that's returned, you can find a genBank gi # that can be used > to find the gene annotation (in this case 646973) , but I can't figure out > any way to do this for a large number of EST accessions (>9600). > > Any suggestions? > > Thanks! > > Peter > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 15.1 years ago James F. Reid ▴ 610

Login before adding your answer.