Gene location (Base pair number)
2
0
Entering edit mode
Tim Smith ★ 1.1k
@tim-smith-1532
Last seen 10.2 years ago
Hi, I wanted the exact base pair locations for several genes (e.g. wnt16 in the human wnt pathway). Which bioconductor package should I use? thanks! [[alternative HTML version deleted]]
• 872 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
Hi Tim -- One suggestion is to use the org.Hs.eg.db package. The 'eg' means that the information is keyed off Entrez ids, so you need to map your SYMBOL to EG egid = revmap(org.Hs.egSYMBOL)[["WNT16"]] and then retrieve location information org.Hs.egCHRLOC[[egid]] org.Hs.egCHRLOCEND[[egid]] for many symbols, symids, one might egids = mappedLkeys(revmap(org.Hs.egSYMBOL)[symids]) as.list(org.Hs.egCHRLOC[egids]) etc. Some book-keeping might be needed to ensure correct symid -> egid -> CHRLOC mapping Martin Tim Smith wrote: > Hi, > > I wanted the exact base pair locations for several genes (e.g. wnt16 in the human wnt pathway). Which bioconductor package should I use? > > thanks! > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Tim Smith ★ 1.1k
@tim-smith-1532
Last seen 10.2 years ago
Hi Martin, Thanks for that. I tried your code and got: -------------------------------------------- > egid = revmap(org.Hs.egSYMBOL)[["WNT16"]] > org.Hs.egCHRLOC[[egid]] 7 7 120752656 120756325 > org.Hs.egCHRLOCEND[[egid]] 7 7 120768394 120768394 -------------------------------------------- However, if I go to NCBI site (http://www.ncbi.nlm.nih.gov/sites/entrez) and search for 'WNT16', I get the following information for WNT16: Chromosome: 7;Location: 7q31 Annotation: Chromosome 7, NC_000007.13 (120965421..120981158) Why is there a discrepancy between the values returned from bioconductor (UCSC?) and NCBI? Is there anything I can do that will get me a match with the NCBI location numbers? thanks! Hi Tim -- One suggestion is to use the org.Hs.eg.db package. The 'eg' means that the information is keyed off Entrez ids, so you need to map your SYMBOL to EG egid = revmap(org.Hs.egSYMBOL)[["WNT16"]] and then retrieve location information org.Hs.egCHRLOC[[egid]] org.Hs.egCHRLOCEND[[egid]] for many symbols, symids, one might egids = mappedLkeys(revmap(org.Hs.egSYMBOL)[symids]) as.list(org.Hs.egCHRLOC[egids]) etc. Some book-keeping might be needed to ensure correct symid -> egid -> CHRLOC mapping Martin Tim Smith wrote: > Hi, > > I wanted the exact base pair locations for several genes (e.g. wnt16 in the human wnt pathway). Which bioconductor package should I use? > > thanks! > > > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Tim -- Tim Smith wrote: > Hi Martin, > > Thanks for that. I tried your code and got: > > -------------------------------------------- >> egid = revmap(org.Hs.egSYMBOL)[["WNT16"]] >> org.Hs.egCHRLOC[[egid]] > > 7 7 > 120752656 120756325 > >> org.Hs.egCHRLOCEND[[egid]] > > 7 7 > 120768394 120768394 > -------------------------------------------- > > However, if I go to NCBI site (http://www.ncbi.nlm.nih.gov/sites/entrez) and search for 'WNT16', I get the following information for WNT16: > > > > Chromosome: 7;Location: 7q31 > Annotation: Chromosome 7, NC_000007.13 (120965421..120981158) > > > Why is there a discrepancy between the values returned from bioconductor (UCSC?) and NCBI? Is there anything I can do that will get me a match with the NCBI location numbers? In general the answer is that the Bioconductor annotation packages are a snap-shot of particular data resources, whereas web-based retrievals capture data current when you access it. A corollary is that the only way to know what data is available today from NCBI is to visit the NCBI site (today, and not tomorrow or yesterday). The details of when snap shots are taken can be found on the help pages, e.g., ?org.Hs.egCHRLOC or interactively, e.g., org.Hs.eg.db_dbInfo() The biomaRt package is also useful to explore, in terms of retrieving web-based annotations. Martin > > > thanks! > > > > > > > > > > Hi Tim -- > > One suggestion is to use the org.Hs.eg.db package. The 'eg' means that > the information is keyed off Entrez ids, so you need to map your SYMBOL > to EG > > egid = revmap(org.Hs.egSYMBOL)[["WNT16"]] > > and then retrieve location information > > org.Hs.egCHRLOC[[egid]] > org.Hs.egCHRLOCEND[[egid]] > > for many symbols, symids, one might > > egids = mappedLkeys(revmap(org.Hs.egSYMBOL)[symids]) > as.list(org.Hs.egCHRLOC[egids]) > > etc. Some book-keeping might be needed to ensure correct symid -> egid > -> CHRLOC mapping > > Martin > > Tim Smith wrote: > >> Hi, >> >> I wanted the exact base pair locations for several genes (e.g. wnt16 in the human wnt pathway). Which bioconductor package should I use? >> >> thanks! >> >> >> > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Tim, Tim Smith wrote: > Hi Martin, > > Thanks for that. I tried your code and got: > > -------------------------------------------- >> egid = revmap(org.Hs.egSYMBOL)[["WNT16"]] >> org.Hs.egCHRLOC[[egid]] > > 7 7 > 120752656 120756325 > >> org.Hs.egCHRLOCEND[[egid]] > > 7 7 > 120768394 120768394 > -------------------------------------------- > > However, if I go to NCBI site (http://www.ncbi.nlm.nih.gov/sites/entrez) and search for 'WNT16', I get the following information for WNT16: > > > > Chromosome: 7;Location: 7q31 > Annotation: Chromosome 7, NC_000007.13 (120965421..120981158) > > > Why is there a discrepancy between the values returned from bioconductor (UCSC?) and NCBI? Is there anything I can do that will get me a match with the NCBI location numbers? > This is because they use a different reference assembly: - NCBI is now using the Genome Reference Consortium Human Build 37 (GRCh37), - UCSC is still using hg18 (at UCSC, GRCh37 is called the hg19 assembly). Unfortunately it's hard to figure out which assembly is used for the org.Hs.egCHRLOC or org.Hs.egCHRLOCEND maps. The man page says: Mappings were based on data provided by: UCSC Genome Bioinformatics (Homo sapiens) ( ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens ) on 2008-Sep3 and if you connect (by anonymous FTP) to hgdownload.cse.ucsc.edu, you'll be able to see that the Homo_sapiens folder is actually a symlink to hg18: hpages at thinkpad:~$ ftp hgdownload.cse.ucsc.edu Connected to hgdownload.cse.ucsc.edu. 220 FTP Server ready. Name hgdownload.cse.ucsc.edu:hpages): anonymous 331 Anonymous login ok, send your complete email address as your password Password: 230 User anonymous logged in. Remote system type is UNIX. Using binary mode to transfer files. ftp> cd goldenPath/currentGenomes 250 CWD command successful ftp> ls 200 PORT command successful 150 Opening ASCII mode data connection for file list dr-xr-xr-x 2 ftp ftp 4096 May 11 17:18 . dr-xr-xr-x 128 ftp ftp 4096 Jun 17 00:03 .. lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Anolis_carolinensis -> ../anoCar1 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Anopheles_gambiae -> ../anoGam1 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Apis_mellifera -> ../apiMel2 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Bos_taurus -> ../bosTau4 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Branchiostoma_floridae -> ../braFlo1 lr--r--r-- 1 ftp ftp 9 Sep 3 2008 Caenorhabditis_brenneri -> ../caePb2 lr--r--r-- 1 ftp ftp 12 Sep 3 2008 Caenorhabditis_briggsae -> ../cbJul2002 lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Caenorhabditis_elegans -> ../ce2 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Caenorhabditis_japonica -> ../caeJap1 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Caenorhabditis_remanei -> ../caeRem3 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Callithrix_jacchus -> ../calJac1 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Canis_familiaris -> ../canFam2 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Cavia_porcellus -> ../cavPor3 lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Ciona_intestinalis -> ../ci2 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Danio_rerio -> ../danRer5 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_ananassae -> ../droAna2 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_erecta -> ../droEre1 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_grimshawi -> ../droGri1 lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Drosophila_melanogaster -> ../dm3 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_mojavensis -> ../droMoj2 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_persimilis -> ../droPer1 lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Drosophila_pseudoobscura -> ../dp3 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_sechellia -> ../droSec1 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_simulans -> ../droSim1 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_virilis -> ../droVir2 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Drosophila_yakuba -> ../droYak2 lr--r--r-- 1 ftp ftp 10 Dec 4 2008 Equus_caballus -> ../equCab2 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Felis_catus -> ../felCat3 lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Fugu_rubripes -> ../fr2 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Gallus_gallus -> ../galGal3 lr--r--r-- 1 ftp ftp 7 May 11 17:18 Homo_sapiens -> ../hg18 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Monodelphis_domestica -> ../monDom4 lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Mus_musculus -> ../mm9 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Ornithorhynchus_anatinus -> ../ornAna1 lr--r--r-- 1 ftp ftp 10 Nov 7 2008 Oryzias_latipes -> ../oryLat2 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Pan_troglodytes -> ../panTro2 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Petromyzon_marinus -> ../petMar1 lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Rattus_norvegicus -> ../rn4 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Rhesus_macaque -> ../rheMac2 lr--r--r-- 1 ftp ftp 12 Sep 3 2008 SARS_coronavirus -> ../scApr2003 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Saccharomyces_cereviciae -> ../sacCer1 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Saccharomyces_cerevisiae -> ../sacCer1 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Strongylocentrotus_purpuratus -> ../strPur2 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Taeniopygia_guttata -> ../taeGut1 lr--r--r-- 1 ftp ftp 6 Sep 3 2008 Takifugu_rubripes -> ../fr2 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Tetraodon_nigroviridis -> ../tetNig1 lr--r--r-- 1 ftp ftp 10 Sep 3 2008 Xenopus_tropicalis -> ../xenTro1 226 Transfer complete The problem is that this symlink could be changed at any time so the information provided in the org.Hs.egCHRLOC man page will become meaningless sooner or later... Cheers, H. > > thanks! > > > > > > > > > > Hi Tim -- > > One suggestion is to use the org.Hs.eg.db package. The 'eg' means that > the information is keyed off Entrez ids, so you need to map your SYMBOL > to EG > > egid = revmap(org.Hs.egSYMBOL)[["WNT16"]] > > and then retrieve location information > > org.Hs.egCHRLOC[[egid]] > org.Hs.egCHRLOCEND[[egid]] > > for many symbols, symids, one might > > egids = mappedLkeys(revmap(org.Hs.egSYMBOL)[symids]) > as.list(org.Hs.egCHRLOC[egids]) > > etc. Some book-keeping might be needed to ensure correct symid -> egid > -> CHRLOC mapping > > Martin > > Tim Smith wrote: > >> Hi, >> >> I wanted the exact base pair locations for several genes (e.g. wnt16 in the human wnt pathway). Which bioconductor package should I use? >> >> thanks! >> >> >> > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6