export funciton alters ranges in output BED file
2
0
Entering edit mode
d r ▴ 150
@d-r-5459
Last seen 6.8 years ago
Israel
Hello I am attempting to use the export() function to generate a BED file from a GRanges object. However, the ranges in the output file are altered so that the start coordinate is subtracted by one, for example: [987] 3 [37035154, 37035155] + | Class 4 MLH1 c.116+1G>A [988] 3 [37067241, 37067242] + | Class 4 MLH1 c.1153C>T [989] 3 [37067125, 37067126] + | Class 4 MLH1 c.1039-2A>T [990] 3 [37067125, 37067126] + | Class 4 MLH1 c.1039-2A>G [991] 3 [37061954, 37061955] + | Class 4 MLH1 c.1038+1G>C results in this output: 3 37067240 37067242 . 0 + 3 37067124 37067126 . 0 + 3 37067124 37067126 . 0 + 3 37061953 37061955 . 0 + Since I intend to later to searrch for intersections between the ranges in the BED file and variants in a vcf file (using Tabix), I am afraid that this subtratcion may lead to false positive. What is the reason for this subtraction from the start and is there any way to supress it? thanks in advance Dolev Rahat sessionInfo: R version 3.1.0 (2014-04-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.24.2 GenomicRanges_1.16.4 GenomeInfoDb_1.0.2 IRanges_1.22.10 BiocGenerics_0.10.0 [6] BiocInstaller_1.14.2 stringr_0.6.2 loaded via a namespace (and not attached): [1] BatchJobs_1.3 BBmisc_1.7 BiocParallel_0.6.1 Biostrings_2.32.1 [5] bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 checkmate_1.4 [9] codetools_0.2-9 DBI_0.3.0 digest_0.6.4 fail_1.2 [13] foreach_1.4.2 GenomicAlignments_1.0.6 iterators_1.0.7 Rcpp_0.11.2 [17] RCurl_1.95-4.3 Rsamtools_1.16.1 RSQLite_0.11.4 sendmailR_1.1-2 [21] stats4_3.1.0 tools_3.1.0 XML_3.98-1.1 XVector_0.4.0 [25] zlibbioc_1.10.0 [[alternative HTML version deleted]]
• 1.7k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 3 days ago
Seattle, WA, United States
Hi Dolev, This is due to different conventions to represent ranges: - Bioconductor uses 1-base starting and ending positions for ranges. - The BED format and other UCSC file formats use 0-base starting positions and 1-base ending positions for ranges: http://genome.ucsc.edu/FAQ/FAQformat.html#format1 The import() and export() functions in rtracklayer are aware of that and make the correction for you. Hope this helps, H. On 09/14/2014 07:42 AM, do r wrote: > Hello > > I am attempting to use the export() function to generate a BED file from a > GRanges object. > However, the ranges in the output file are altered so that the start > coordinate is subtracted by one, > for example: > > [987] 3 [37035154, 37035155] + | Class 4 MLH1 c.116+1G>A > [988] 3 [37067241, 37067242] + | Class 4 MLH1 c.1153C>T > [989] 3 [37067125, 37067126] + | Class 4 MLH1 c.1039-2A>T > [990] 3 [37067125, 37067126] + | Class 4 MLH1 c.1039-2A>G > [991] 3 [37061954, 37061955] + | Class 4 MLH1 c.1038+1G>C > > results in this output: > 3 37067240 37067242 . 0 + > 3 37067124 37067126 . 0 + > 3 37067124 37067126 . 0 + > 3 37061953 37061955 . 0 + > > Since I intend to later to searrch for intersections between the > ranges in the BED file and variants in a vcf file (using Tabix), I am > afraid that this subtratcion may lead to false positive. > > What is the reason for this subtraction from the start and is there > any way to supress it? > > thanks in advance > > Dolev Rahat > > > sessionInfo: > > R version 3.1.0 (2014-04-10) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] rtracklayer_1.24.2 GenomicRanges_1.16.4 GenomeInfoDb_1.0.2 > IRanges_1.22.10 BiocGenerics_0.10.0 > [6] BiocInstaller_1.14.2 stringr_0.6.2 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.3 BBmisc_1.7 > BiocParallel_0.6.1 Biostrings_2.32.1 > [5] bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0 > checkmate_1.4 > [9] codetools_0.2-9 DBI_0.3.0 digest_0.6.4 > fail_1.2 > [13] foreach_1.4.2 GenomicAlignments_1.0.6 iterators_1.0.7 > Rcpp_0.11.2 > [17] RCurl_1.95-4.3 Rsamtools_1.16.1 RSQLite_0.11.4 > sendmailR_1.1-2 > [21] stats4_3.1.0 tools_3.1.0 XML_3.98-1.1 > XVector_0.4.0 > [25] zlibbioc_1.10.0 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
John Blischak ▴ 190
@john-blischak-6562
Last seen 7.1 years ago
Hi Dolev, On Sun, Sep 14, 2014 at 9:42 AM, do r <dolevrahat at="" gmail.com=""> wrote: > However, the ranges in the output file are altered so that the start > coordinate is subtracted by one, > > Since I intend to later to searrch for intersections between the > ranges in the BED file and variants in a vcf file (using Tabix), I am > afraid that this subtratcion may lead to false positive. > > What is the reason for this subtraction from the start and is there > any way to supress it? You do not want to supress this behavior. This is how BED files are formatted. The coordinates are 0-based, with the start site being inclusive and the end site being exclusive. This is the format that BEDTools will expect when you are performing your intersections. Here are some links to learn more: http://www.genome.ucsc.edu/FAQ/FAQformat.html#format1 http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 https://www.biostars.org/p/89341/#89406 https://www.biostars.org/p/84686/ Best, John
ADD COMMENT

Login before adding your answer.

Traffic: 479 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6