I need to do operations on ranges from a circular sequence (chrM). My ranges sometimes spans the "end-start" gap, but I still would like to do overlap, unions and other operations on them.
I hoped that GenomicRanges could do this since I found a promising "isCircular"-flag. However I am not able create a GenomicRanges-object that has a range covering the end-start boundary. It is interpreted as having a negative width.
I was not able to find any examples with GenomicRanges used in this way, so I wonder if this functionality is implemented, and if so what do I do wrong?
Here is a really simple example where I try to create a GenomicRanges with one range covering the gap.
library(GenomicRanges) chrMseqinfo = Seqinfo("chrM", 16569, isCircular = TRUE, genome="GRCH38") df = data.frame(chr="chrM", start=16560, end=10, strand="+") gr = makeGRangesFromDataFrame(df=df, seqinfo=chrMseqinfo) # error, "negative widths are not allowed" sum(gr@ranges@width) # should be 20
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicRanges_1.30.1 GenomeInfoDb_1.14.0 IRanges_2.12.0 S4Vectors_0.16.0
[5] BiocGenerics_0.24.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.15 knitr_1.18 bindr_0.1
[4] XVector_0.18.0 magrittr_1.5 zlibbioc_1.24.0
[7] R6_2.2.2 rlang_0.1.6 dplyr_0.7.4
[10] tools_3.4.2 yaml_2.1.16 assertthat_0.2.0
[13] tibble_1.3.4 bindrcpp_0.2 GenomeInfoDbData_1.0.0
[16] purrr_0.2.4 tidyr_0.7.2 bitops_1.0-6
[19] RCurl_1.95-4.10 glue_1.2.0 compiler_3.4.2
[22] pkgconfig_2.0.1
Thanks for you help.
It works now.
I was not aware of the trick of setting the range-end to more than the length of my sequence.
vegard.
Hi, I am stuck again.
I am using the setdiff method on a range crossing the start-end boundary and I am not able to produce correct results. It seems that setdiff discards any information on "the other" side.
Here is a small example, fragmentA crosses the boundary and I try to cut it in half with enzymeA. The part of fragmentA on the other side (more than the length of chrM) is not included.
The last range is cut short, I expect the end to be 17000.
For these operations I did not use GNCList since my understanding is that it adds efficiency but no extra functionality. Please correct me if I am wrong.
vegard.