Potential oversimplification in the overlap function
0
0
Entering edit mode
@konstakukkonen-23122
Last seen 4 months ago
Finland

Hello,

I'm annotating some ATAC peaks with ChIPseeker package, which I find very helpful. I was trying to replicate the numbers seen in the Vennplots with my own code using intersect() and findOverlaps() functions from the GRanges package, and was a bit surprised to find that the numbers given by these functions and the overlap() function in ChIPseeker package didn't match.

I have 3 peaksets: peaks1, peaks2, and peaks3. To find the peaks that are shared between all samples I use

shared = Reduce(intersect, list(peaks1,peaks2,peaks3) #This is similar to what overlap() does

However, when I move along to fnd overlaps between peaks1 and peaks2 by:

peaks12 = intersect(peaks1,peaks2)[-findOverlaps(intersect(peaks1,peaks2), shared)@from]

ChIPseeker disagrees. I think this is because ChIPseeker uses simple subtraction:

length(intersect(peaks1,peaks2))-length(shared)

This calculation doesn't take into account that there may be multiple ranges within shared that overlap one element in intersect(peaks1,peaks2) leading to underestimation of the number of shared peaks between peaks1 and peaks2.

This is probably not the core function of the package and it might be even the way the author has purposely made it to work, but it lead to some confusion at least on my part. Any comments are appreciated, especially if I'm doing something wrong in here!

Cheers, -Konsta

ChIPseeker • 950 views
ADD COMMENT

Login before adding your answer.

Traffic: 419 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6