Hi,
I am using the findOverlaps function and find some behaviour that I was puzzled about, specifically from the maxgap parameter. According to the IRanges manual
The ‘maxgap’ parameter has special meaning with the special overlap types. For ‘start’, ‘end’, and ‘equal’, it specifies the maximum difference in the starts, ends or both, respectively. For ‘within’, it is the maximum amount by which the query may be wider than the subject.
So - from what I understand it - when I use type="within", with a maxgap of 10, then any "query" that is contained in [subject.start - 10, subject.end + 10] should be returned (or in a different reading, any query where the part outside of subject added up on both sides of the subject is not larger than 10). Do I misunderstand that? Below some code where this is not happening
granges_within_query <- GRanges(seqnames=1, ranges=IRanges(6 - (1:5), 6 + 1:5), strand="*") granges_within_subject <- GRanges(seqnames=1, ranges=IRanges(3, 9), strand="*")
> granges_within_query GRanges object with 5 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] 1 [5, 7] * [2] 1 [4, 8] * [3] 1 [3, 9] * [4] 1 [2, 10] * [5] 1 [1, 11] * ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths > granges_within_subject GRanges object with 1 range and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] 1 [3, 9] * ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths
Now I run the findOverlaps function
res_within_0 <- GenomicRanges::findOverlaps(granges_within_query, granges_within_subject, type="within", select="all", minoverlap=1, maxgap=0) res_within_1 <- GenomicRanges::findOverlaps(granges_within_query, granges_within_subject, type="within", select="all", minoverlap=1, maxgap=1) res_within_2 <- GenomicRanges::findOverlaps(granges_within_query, granges_within_subject, type="within", select="all", minoverlap=1, maxgap=2)
with results
> res_within_0 Hits object with 3 hits and 0 metadata columns: queryHits subjectHits <integer> <integer> [1] 1 1 [2] 2 1 [3] 3 1 ------- queryLength: 5 / subjectLength: 1 > res_within_1 Hits object with 1 hit and 0 metadata columns: queryHits subjectHits <integer> <integer> [1] 3 1 ------- queryLength: 5 / subjectLength: 1 > res_within_2 Hits object with 2 hits and 0 metadata columns: queryHits subjectHits <integer> <integer> [1] 2 1 [2] 3 1 ------- queryLength: 5 / subjectLength: 1
Here, res_within_0 gives the expected result - the first 3 ranges of the query. However, with maxgap=1, I would have expected that the first 3 are returned, with maxgap=2, the first 4 (if we interpret "the query may be wider than the subject" in the manual to mean wider on both sides combined rather than on each side separately ...). However, with maxgap=1, only the 3rd is returned (equal to the subject) and with maxgap=2 also the second (but why not the fourth?).
Any help would be appreciated.
Thanks
Holger