biomaRt: retrieve total chromosome lengths
2
0
Entering edit mode
@de-bondt-an-7114-prdbe-1572
Last seen 10.2 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20061027/ 364a0dcb/attachment.pl
• 3.0k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States
Hi An, De Bondt, An-7114 [PRDBE] wrote: > Hi, > > How can I retrieve, for a certain organism (e.g. human), the total length of > each of its chromosomes using biomaRt? > library(biomaRt) > mart <- useMart("ensembl") > mart <- useDataset("hsapiens_gene_ensembl", mart) > chr.lengths <- ??? Well, this doesn't agree exactly with what I see on this webpage: http://www.ornl.gov/sci/techresources/Human_Genome/posters/chromosome/ faqs.shtml But it is pretty close. Of course I am finding the end of the 'last' transcript on a given chromosome rather than the end of the chromosome itself, so there will likely be differences. However, I don't see an attribute that looks like it gives chromosomal information without first being mapped through a gene, so I don't know if you can get exactly what you want. If there is a way, Steffen Durinck will undoubtedly know what it is, but I haven't seen a response from him as yet. Anyway, here is what I did. > mart <- useMart("ensembl", "hsapiens_gene_ensembl") Checking attributes and filters ... ok > a <- getBM("hsapiens_gene_ensembl_structure.transcript_chrom_end", "chromosome_name", c(1:21, "x","y"), mart, output="list") > sapply(a[[1]], max) 1 2 3 4 5 247197891 242713278 199439629 191246650 180727832 6 7 8 9 10 170735623 158630410 146252219 140191642 135347681 11 12 13 14 15 134361903 132289533 114110907 106354309 100334282 16 17 18 19 20 88771793 78646005 76106388 63802660 62429769 21 x y 46935585 154908521 57767721 Best, Jim > > > Thanks in advance! > An > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
ADD COMMENT
0
Entering edit mode
Hi An, There is no way to retrieve the chromosome lengths with biomaRt when used with Ensembl. The closest you'll get with biomaRt is to subtract the position of the 'first' transcript from the position of the 'last' transcript. If you want to use the Ensembl data to get this information (you'll need to do some browser clicking), you can select your species of interest at http://www.ensembl.org/ for hsapiens: http://www.ensembl.org/Homo_sapiens/index.html then select a chromosome e.g.: http://www.ensembl.org/Homo_sapiens/mapview?chr=1 and here you'll get the length. Cheers, Steffen James W. MacDonald wrote: > Hi An, > > De Bondt, An-7114 [PRDBE] wrote: > >> Hi, >> >> How can I retrieve, for a certain organism (e.g. human), the total length of >> each of its chromosomes using biomaRt? >> library(biomaRt) >> mart <- useMart("ensembl") >> mart <- useDataset("hsapiens_gene_ensembl", mart) >> chr.lengths <- ??? >> > > Well, this doesn't agree exactly with what I see on this webpage: > > http://www.ornl.gov/sci/techresources/Human_Genome/posters/chromosom e/faqs.shtml > > But it is pretty close. Of course I am finding the end of the 'last' > transcript on a given chromosome rather than the end of the chromosome > itself, so there will likely be differences. However, I don't see an > attribute that looks like it gives chromosomal information without first > being mapped through a gene, so I don't know if you can get exactly what > you want. > > If there is a way, Steffen Durinck will undoubtedly know what it is, but > I haven't seen a response from him as yet. > > Anyway, here is what I did. > > > mart <- useMart("ensembl", "hsapiens_gene_ensembl") > Checking attributes and filters ... ok > > a <- getBM("hsapiens_gene_ensembl_structure.transcript_chrom_end", > "chromosome_name", c(1:21, "x","y"), mart, output="list") > > sapply(a[[1]], max) > 1 2 3 4 5 > 247197891 242713278 199439629 191246650 180727832 > 6 7 8 9 10 > 170735623 158630410 146252219 140191642 135347681 > 11 12 13 14 15 > 134361903 132289533 114110907 106354309 100334282 > 16 17 18 19 20 > 88771793 78646005 76106388 63802660 62429769 > 21 x y > 46935585 154908521 57767721 > > Best, > > Jim > > > >> Thanks in advance! >> An >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877
ADD REPLY
0
Entering edit mode
@de-bondt-an-7114-prdbe-1572
Last seen 10.2 years ago
Hi Steffen, Hi Jim, Thanks for your suggestions! To avoid hard coding, I'll retrieve indeed the end position of the last transcript on each of the chromosomes. This is, relatively seen, pretty close to the real length of the chromosome. An -----Original Message----- From: Steffen Durinck [mailto:durincks@mail.nih.gov] Sent: Monday, 30 October 2006 21:17 To: James W. MacDonald Cc: De Bondt, An-7114 [PRDBE]; 'bioconductor at stat.math.ethz.ch' Subject: Re: [BioC] biomaRt: retrieve total chromosome lengths Hi An, There is no way to retrieve the chromosome lengths with biomaRt when used with Ensembl. The closest you'll get with biomaRt is to subtract the position of the 'first' transcript from the position of the 'last' transcript. If you want to use the Ensembl data to get this information (you'll need to do some browser clicking), you can select your species of interest at http://www.ensembl.org/ for hsapiens: http://www.ensembl.org/Homo_sapiens/index.html then select a chromosome e.g.: http://www.ensembl.org/Homo_sapiens/mapview?chr=1 and here you'll get the length. Cheers, Steffen James W. MacDonald wrote: > Hi An, > > De Bondt, An-7114 [PRDBE] wrote: > >> Hi, >> >> How can I retrieve, for a certain organism (e.g. human), the total length of >> each of its chromosomes using biomaRt? >> library(biomaRt) >> mart <- useMart("ensembl") >> mart <- useDataset("hsapiens_gene_ensembl", mart) >> chr.lengths <- ??? >> > > Well, this doesn't agree exactly with what I see on this webpage: > > http://www.ornl.gov/sci/techresources/Human_Genome/posters/chromosome/ faqs.s html > > But it is pretty close. Of course I am finding the end of the 'last' > transcript on a given chromosome rather than the end of the chromosome > itself, so there will likely be differences. However, I don't see an > attribute that looks like it gives chromosomal information without first > being mapped through a gene, so I don't know if you can get exactly what > you want. > > If there is a way, Steffen Durinck will undoubtedly know what it is, but > I haven't seen a response from him as yet. > > Anyway, here is what I did. > > > mart <- useMart("ensembl", "hsapiens_gene_ensembl") > Checking attributes and filters ... ok > > a <- getBM("hsapiens_gene_ensembl_structure.transcript_chrom_end", > "chromosome_name", c(1:21, "x","y"), mart, output="list") > > sapply(a[[1]], max) > 1 2 3 4 5 > 247197891 242713278 199439629 191246650 180727832 > 6 7 8 9 10 > 170735623 158630410 146252219 140191642 135347681 > 11 12 13 14 15 > 134361903 132289533 114110907 106354309 100334282 > 16 17 18 19 20 > 88771793 78646005 76106388 63802660 62429769 > 21 x y > 46935585 154908521 57767721 > > Best, > > Jim > > > >> Thanks in advance! >> An >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877
ADD COMMENT
0
Entering edit mode
On Tuesday 31 October 2006 03:15, De Bondt, An-7114 [PRDBE] wrote: > Hi Steffen, > Hi Jim, > > Thanks for your suggestions! > To avoid hard coding, I'll retrieve indeed the end position of the last > transcript on each of the chromosomes. This is, relatively seen, pretty > close to the real length of the chromosome. Another simple solution is to use information from UCSC (who use the same chromosomes for building as ensembl, at least for human and mouse, and probably many others). As an example, for the human genome build from March 2006 (called hg18 by UCSC), one can simply download and read this file using R: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/chromInfo.txt. gz which is a tab-delimited file that has as columns 1 and 2 the chromosome name ('chr1', 'chr2', etc.) and for the second column has the total base count for the chromosome. Sean
ADD REPLY
0
Entering edit mode
they are of course included in all Bioc chip annotation packages for example, > hgu95av2CHRLENGTHS 1 2 3 4 5 6 7 8 246127941 243615958 199344050 191731959 181034922 170914576 158545518 146308819 9 10 11 12 13 14 15 16 136372045 135037215 134482954 132078379 113042980 105311216 100256656 90041932 17 18 19 20 21 22 X Y 81860266 76115139 63811651 63741868 46976097 49396972 153692391 50286555 M 16571 so while one can get them from the web, their are alternatives Sean Davis wrote: > On Tuesday 31 October 2006 03:15, De Bondt, An-7114 [PRDBE] wrote: >> Hi Steffen, >> Hi Jim, >> >> Thanks for your suggestions! >> To avoid hard coding, I'll retrieve indeed the end position of the last >> transcript on each of the chromosomes. This is, relatively seen, pretty >> close to the real length of the chromosome. > > Another simple solution is to use information from UCSC (who use the same > chromosomes for building as ensembl, at least for human and mouse, and > probably many others). As an example, for the human genome build from March > 2006 (called hg18 by UCSC), one can simply download and read this file using > R: > > http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/chromInfo.tx t.gz > > which is a tab-delimited file that has as columns 1 and 2 the chromosome name > ('chr1', 'chr2', etc.) and for the second column has the total base count for > the chromosome. > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD REPLY

Login before adding your answer.

Traffic: 1005 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6