Semantics of GenomicRanges gaps()
0
0
Entering edit mode
@alex-gutteridge-2935
Last seen 10.2 years ago
United States
I just have a quick question/comment about the behaviour of the gaps function in the GenomicRanges package, particularly with how it handles * strand ranges. The current behaviour is as below: > range = GRanges(seqnames="1",IRanges(start=300,end=700), strand="*", > seqinfo=Seqinfo("1",seqlengths=1000)) > range GRanges with 1 range and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] 1 [300, 700] * --- seqlengths: 1 1000 > gaps(range) GRanges with 4 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] 1 [ 1, 1000] + [2] 1 [ 1, 1000] - [3] 1 [ 1, 299] * [4] 1 [701, 1000] * --- seqlengths: 1 1000 My interpretation of "*" as a strand identifier is that it means the range covers both + and - strands and so intuitively I was expecting the 'gaps' to only cover the 3rd and 4th ranges returned above (not the full-length + and - strand ranges). The semantics here implies to me that the * strand is being thought of as a kind of imaginary third strand and hence doesn't overlap with the + or - strands. This is contrary to the semantics of findOverlaps which does appear to consider a * strand range to overlap with + or - strand ranges: > findOverlaps(range,gaps(range)) Hits of length 2 queryLength: 1 subjectLength: 4 queryHits subjectHits <integer> <integer> 1 1 1 2 1 2 To summarise I guess I was expecting findOverlaps(range,gaps(range)) to never return any overlaps under any circumstance (that I can think of!). Do people agree the behaviour of gaps() is not quite intuitive in this case? > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicRanges_1.10.7 IRanges_1.16.6 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] parallel_2.15.2 stats4_2.15.2 -- Alex Gutteridge
GenomicRanges GenomicRanges • 854 views
ADD COMMENT

Login before adding your answer.

Traffic: 869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6