I am finding overlapping region in two chipseq samples. The number of overlapping peaks in ven diagram are different from what you get in ol1$overlappingPeaks and ol1$peaksInMergedPeaks. For specific peak the ven diagram number is same as in ol1$peaklist. But not for overlapping peaks.
In example data provided by tool, I found same number in overlapping region and in ol$peaksInMergedPeaks.
command used:
NCEtOHR1S17.bed <- toGRanges(files[3], format="BED", header=FALSE)
si8444DHTR1S16.bed <- toGRanges(files[9], format="BED", header=FALSE)
head(NCEtOHR1S17.bed)
GRanges object with 6 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <numeric>
NC_EtOH_R1_S17_sorted.bam.bam_peak_1 chr1 [ 10023, 10215] * | 7
NC_EtOH_R1_S17_sorted.bam.bam_peak_2 chr1 [526379, 526773] * | 22
NC_EtOH_R1_S17_sorted.bam.bam_peak_3b chr1 [564391, 565477] * | 35
NC_EtOH_R1_S17_sorted.bam.bam_peak_4b chr1 [565600, 566838] * | 34
NC_EtOH_R1_S17_sorted.bam.bam_peak_5 chr1 [567182, 567748] * | 44
NC_EtOH_R1_S17_sorted.bam.bam_peak_6a chr1 [567813, 568982] * | 32
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
head(si8444DHTR1S16)
GRanges object with 6 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <numeric>
si844_4_DHT_R1_S16_sorted.bam.bam_peak_1 chr1 [ 9969, 10370] * | 10
si844_4_DHT_R1_S16_sorted.bam.bam_peak_2c chr1 [564398, 565440] * | 38
si844_4_DHT_R1_S16_sorted.bam.bam_peak_3 chr1 [565535, 565905] * | 25
si844_4_DHT_R1_S16_sorted.bam.bam_peak_4a chr1 [566068, 566824] * | 37
si844_4_DHT_R1_S16_sorted.bam.bam_peak_5b chr1 [567219, 567773] * | 61
si844_4_DHT_R1_S16_sorted.bam.bam_peak_6a chr1 [567841, 568626] * | 24
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
ol1 <-findOverlapsOfPeaks(NCEtOHR1S17.bed, si8441DHTR1S18)
ol1 <- addMetadata(ol1, colNames="score", FUN=mean)
makeVennDiagram(ol1)
$vennCounts
NCEtOHR1S17.bed si8444DHTR1S16 Counts
[1,] 0 0 0
[2,] 0 1 1382
[3,] 1 0 7566
[4,] 1 1 8023
attr(,"class")
[1] "VennCounts"
> nrow(as.data.frame(ol1$overlappingPeaks))
[1] 8270
> nrow(as.data.frame(ol1$mergedPeaks))
[1] 7974
Thank you for replay.
I got your point . but whatever number is common, I want to extract id of those common peaks
Below is the summary of my data.
summary(ol$peaklist)
Length Class Mode
si8444DHTR1S16 803 GRanges S4
si8441DHTR1S18 2188 GRanges S4
si8441DHTR1S18///si8444DHTR1S16 552 GRanges S4
NCEtOHR1S17 4561 GRanges S4
NCEtOHR1S17///si8444DHTR1S16 584 GRanges S4
NCEtOHR1S17///si8441DHTR1S18 2973 GRanges S4
NCEtOHR1S17///si8441DHTR1S18///si8444DHTR1S16 7342 GRanges S4
1. I want to extract Id of peaks which are contributing to this number 7342 . Can you please tell me how i should process
2. In above summary, common number is 7342 . but in Venn diagram the common number is different(7415). Can you please tell me why I am getting different numbers for overlap
for details of Image please follow below line
https://drive.google.com/open?id=0B5CeCVgbLqijajBVZjlMWnNMVGM
Hi,
I will show some sample codes to answer your question. Hope this can help you to understand what happened for the Venn-diagram.
For question 1:
There is a column in the metadata of each element in peaklist, called peakNames, which is a CharacterList. The CharacterList is a list of the contributing peaks ids with prefix, eg. peaks1__peakname1, peaksi__peaknamej. Users can access the original peak name by split those characters.
More easier way to access the original peaks is by using the renames peaks.
For question 2:
There is a argument called connectedPeaks. In the documentation, we described the argument connectedPeaks as If multiple peaks involved in overlapping in several groups, set it to "merge" will count it as 1, while set it to "min" will count it as the minimal involved peaks in any group of connected/overlapped peaks. The default is “min”. By default, the program will select the minimal number from each peaklist involved in one merged peak. So the number will be no less than the number in the peaklist. If user set connectedPeaks to merge, the number will be exactly same as the number in the peaklist. It is not easy to understand above description. I provide a simple example for you to understand what is difference between them:
I paste the output here:
Let me know if you still have any question.