Help with getBM using EST ids from zebrafish
1
0
Entering edit mode
Scott Ochsner ▴ 300
@scott-ochsner-599
Last seen 10.3 years ago
Dear list, I am trying to retrieve zfin annotation for each of my 6000+ zebrafish EST ids using biomaRt. As an example of the EST ids I've included the output from NCBI's EST search for CO349769. 1. DR_AOV_FL01_G08 adult ovary full-length (TLL) Danio rerio cDNA, mRNA sequence gi|49431086|gb|CO349769.1|[49431086] As a further note, searching UniGene is successful with "CO349769", but not with "49431086". I've set up the following: >library(biomaRt) >ensembl=useMart("ensembl",dataset="drerio_gene_ensembl")) Question1: If ESTs are mapped (based on previous posts they don't look like they are), what is the appropriate filter variable in the line of code below? >map<-getBM(attributes=c("zfin_id","zfin_symbol"),filters="?",values=" CO349769",mart=ensembl) Question2: If it turns out ESTs are not mapped, I'll probably have to go through UniGene to obtain an ID I can use with biomaRt. Does anyone know of a way to batch search UniGene? 6000+ ids is a lot to search one by one. Thanks for any help, > sessionInfo() R version 2.8.0 (2008-10-20) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_1.16.0 genefilter_1.22.0 survival_2.34-1 geneplotter_1.20.0 annotate_1.20.0 xtable_1.5-4 AnnotationDbi_1.4.0 lattice_0.17-15 [9] limma_2.16.2 affy_1.20.0 Biobase_2.2.0 loaded via a namespace (and not attached): [1] affyio_1.10.0 DBI_0.2-4 grid_2.8.0 KernSmooth_2.22-22 preprocessCore_1.4.0 RColorBrewer_1.0-2 RCurl_0.91-0 [8] RSQLite_0.7-1 XML_1.94-0.1 Scott A. Ochsner, Ph.D. NURSA Bioinformatics Molecular and Cellular Biology Baylor College of Medicine Houston, TX. 77030 phone: 713-798-6227
Annotation Danio rerio biomaRt Annotation Danio rerio biomaRt • 1.6k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…
Dear Scott, I would be surprised if the Ensembl BioMart provided this information - due to the nature and size of EST-to-gene mapping "data". But as always of course I would be happy to be surprised. You can download the complete Unigene clusters from ftp://ftp.ncbi.nih.gov/repository/UniGene/Danio_rerio/ and it is easy with R (or indeed Perl, Python etc.) to parse e.g. the file Dr.data for your EST IDs and extract correspond locuslink IDs. Also "CO349769" and "g49431086" are both found and map to the fth1 gene. Best wishes Wolfgang ---------------------------------------------------- Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber Ochsner, Scott A ha scritto: > Dear list, > > I am trying to retrieve zfin annotation for each of my 6000+ zebrafish EST ids using biomaRt. As an example of the EST ids I've included the output from NCBI's EST search for CO349769. > > 1. DR_AOV_FL01_G08 adult ovary full-length (TLL) Danio rerio cDNA, mRNA sequence > gi|49431086|gb|CO349769.1|[49431086] > > > As a further note, searching UniGene is successful with "CO349769", but not with "49431086". I've set up the following: > >> library(biomaRt) >> ensembl=useMart("ensembl",dataset="drerio_gene_ensembl")) > > Question1: If ESTs are mapped (based on previous posts they don't look like they are), what is the appropriate filter variable in the line of code below? > >> map<-getBM(attributes=c("zfin_id","zfin_symbol"),filters="?",values ="CO349769",mart=ensembl) > > Question2: If it turns out ESTs are not mapped, I'll probably have to go through UniGene to obtain an ID I can use with biomaRt. Does anyone know of a way to batch search UniGene? 6000+ ids is a lot to search one by one. > > Thanks for any help, > >> sessionInfo() > R version 2.8.0 (2008-10-20) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] splines tools stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_1.16.0 genefilter_1.22.0 survival_2.34-1 geneplotter_1.20.0 annotate_1.20.0 xtable_1.5-4 AnnotationDbi_1.4.0 lattice_0.17-15 > [9] limma_2.16.2 affy_1.20.0 Biobase_2.2.0 > > loaded via a namespace (and not attached): > [1] affyio_1.10.0 DBI_0.2-4 grid_2.8.0 KernSmooth_2.22-22 preprocessCore_1.4.0 RColorBrewer_1.0-2 RCurl_0.91-0 > [8] RSQLite_0.7-1 XML_1.94-0.1 > > Scott A. Ochsner, Ph.D. > NURSA Bioinformatics > Molecular and Cellular Biology > Baylor College of Medicine > Houston, TX. 77030 > phone: 713-798-6227 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6