findOverlaps on GAlignments object
1
0
Entering edit mode
Arun ▴ 20
@arun-6714
Last seen 8.6 years ago
Germany

 

Consider this example:

require(GenomicRanges)
require(GenomicAlignments)

b.gr = GRanges(Rle("x"), IRanges(start=155787, end=157991), strand="+")
b.ga = GAlignments(seqnames=Rle("x"), pos=155787L, cigar="67M2761N34M", strand=strand("+"))
g.gr = GRanges(Rle("x"), IRanges(157784, 157887), strand="+")

Now, using findOverlaps:

findOverlaps(b.gr, g.gr, type="any")
# Hits of length 1
# queryLength: 1
# subjectLength: 1
#   queryHits subjectHits
#    <integer>   <integer>
#  1         1           1

findOverlaps(b.ga, g.gr, type="any")
# Hits of length 0
# queryLength: 1
# subjectLength: 1

AFAICT, findOverlaps on GAlignments object seems to automatically remove the mapped read (which is wrongly mapped here) even when specifying the overlap type to be any. The explanation for type under ?findOverlaps refers to IRanges' documentation, and that should result in an overlap (as in the case of GenomicRanges output). Is this intended? If so, where can I learn about how findOverlaps works with GAlignments object? I understand I can use summarizeOverlaps, but I'm interested in getting this clarified.

Here's the session info:

sessionInfo()

# R version 3.1.2 (2014-10-31)
# Platform: x86_64-pc-linux-gnu (64-bit)

# locale:
#  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
#  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
#  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
#  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
#  [9] LC_ADDRESS=C               LC_TELEPHONE=C
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

# attached base packages:
# [1] stats4    parallel  stats     graphics  grDevices utils     datasets
# [8] methods   base

# other attached packages:
# [1] GenomicAlignments_1.2.2 Rsamtools_1.18.3        Biostrings_2.34.1
# [4] XVector_0.6.0           GenomicRanges_1.18.4    GenomeInfoDb_1.2.4
# [7] IRanges_2.0.1           S4Vectors_0.4.0         BiocGenerics_0.12.1

# loaded via a namespace (and not attached):
#  [1] base64enc_0.1-2    BatchJobs_1.6      BBmisc_1.9         BiocParallel_1.0.3
#  [5] bitops_1.0-6       brew_1.0-6         checkmate_1.5.2    codetools_0.2-10
#  [9] DBI_0.3.1          digest_0.6.8       fail_1.2           foreach_1.4.2
# [13] iterators_1.0.7    RSQLite_1.0.0      sendmailR_1.2-1    stringr_0.6.2
# [17] tools_3.1.2        zlibbioc_1.12.0

Thanks, Arun.

 

[findOverlaps] [GAlignments] • 1.4k views
ADD COMMENT
1
Entering edit mode
@valerie-obenchain-4275
Last seen 3.0 years ago
United States

Hi,
Overlaps on a GAlignments object take the CIGAR into account. You can see this by coercing the GAlignments to a GRangesList:

> grglist(b.ga)
GRangesList object of length 1:
[[1]]
GRanges object with 2 ranges and 0 metadata columns:
      seqnames           ranges strand
         <Rle>        <IRanges>  <Rle>
  [1]        x [155787, 155853]      +
  [2]        x [158615, 158648]      +

-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths

Overlaps are performed on these ranges, gaps are dropped. g.gr falls in the gap and therefore does not overlap.

> g.gr
GRanges object with 1 range and 0 metadata columns:
      seqnames           ranges strand
         <Rle>        <IRanges>  <Rle>
  [1]        x [157784, 157887]      +
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths


If you change the b.ga CIGAR to all 'M's you will see an overlap.

Valerie

ADD COMMENT
0
Entering edit mode

Thank you, that clears it. 

ADD REPLY

Login before adding your answer.

Traffic: 521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6