Entering edit mode
hi,
The new arguments 'inter.feature' and 'fragments' of summarizeOverlaps
look
very useful.
I'm wondering about the behavior for inter.feature=FALSE and
mode="IntersectionNotEmpty".
Earlier on the list, from Martin Morgan, there was the proposed
behavior:
You'd like to add 3 more modes, with inter_feature=FALSE. Reads are
sometimes
'counted twice'
|*----------------------+-----+-----------+-----------+---------------
|*|*
Mode | Row | Feature 1 | Feature 2 | Hits per read
|*|*----------------------+-----+-----------+-----------+-------------
--|*|*
Union | 5 | 1 | 0 | 1
|*|* | 6 | 1 | 1 |
2 |*|* | 7 | 1 | 1 |
2 |*|* IntersectionStrict | 5 | 1 | 0 |
1 |*|* | 6 | 1 | 0 |
1 |*|* | 7 | 1 | 1 |
2 |*|* IntersectionNotEmpty | 5 | 1 | 0 |
1 |*|* | 6 | 1 | 1 |
2 |*|* | 7 | 1 | 1 |
2 |*|*----------------------+-----+-----------+-----------+----
-----------|*
( https://stat.ethz.ch/pipermail/bioconductor/2013-April/052064.html )
For row 6 of this diagram (
http://bioconductor.org/packages/2.13/bioc/vignettes/GenomicRanges/ins
t/doc/summarizeOverlaps-modes.pdf)
and 'IntersectionNotEmpty', I don't see why Feature 2 gets a hit.
The man page has:
IntersectionNotEmpty : For this counting mode, the features are
partitioned
into unique disjoint regions. This is accomplished by disjoining the
feature ranges then removing ranges shared by more than one feature.
The
result is a group of non-overlapping regions each of which belong to a
single feature.
Simple and gapped reads are counted if,
1. the read or exactly 1 of the read fragments overlaps a unique
disjoint
region
2. the read or >1 read fragments overlap >1 unique disjoint region
from
the same feature
An example, where the read falls completely within Feature 1 but
partially
in Feature 2, the behavior is consistent with the table above:
> reads <-
GAlignments(seqnames="chr1",pos=1400L,cigar="500M",strand=strand("+"))
> features <- GRanges("chr1",IRanges(c(1300L,1700L),c(2000L,2400L)))
> assay(summarizeOverlaps(features,reads,mode="IntersectionNotEmpty",
inter.feature=FALSE))
[,1]
[1,] 1
[2,] 1
> disjoin(features)
GRanges with 3 ranges and 0 metadata columns:
seqnames ranges strand
<rle> <iranges> <rle>
[1] chr1 [1300, 1699] *
[2] chr1 [1700, 2000] *
[3] chr1 [2001, 2400] *
---
seqlengths:
chr1
NA
> findOverlaps(reads, disjoin(features))
Hits of length 2
queryLength: 1
subjectLength: 3
queryHits subjectHits
<integer> <integer>
1 1 1
2 1 2
So the example read overlaps the unique region of Feature 1 and the
shared
region, which supposedly has been removed. So I would expect it to
only
count to Feature 1.
> sessionInfo()
R Under development (unstable) (2013-05-14 r62742)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] GenomicRanges_1.13.11 XVector_0.0.0 IRanges_1.19.7
[4] BiocGenerics_0.7.2 BiocInstaller_1.11.1 Defaults_1.1-1
loaded via a namespace (and not attached):
[1] stats4_3.1.0
thanks,
Mike
[[alternative HTML version deleted]]