ChIPpeakAnno :findOverlapsOfPeaks: number for overlap peak in venn diagram is different than output of ol1$overlappingPeaks
1
0
Entering edit mode
@sudhirjadhao2009-12770
Last seen 7.4 years ago

I am finding overlapping region in two chipseq samples. The number of overlapping peaks in ven diagram are different from what you get in  ol1$overlappingPeaks and ol1$peaksInMergedPeaks. For specific peak the ven diagram number is same as in ol1$peaklist. But not for overlapping peaks.

In example data provided by tool, I found same number in overlapping region and in ol$peaksInMergedPeaks.

command used:

NCEtOHR1S17.bed <- toGRanges(files[3], format="BED", header=FALSE)

 

si8444DHTR1S16.bed <- toGRanges(files[9], format="BED", header=FALSE) 

 

 head(NCEtOHR1S17.bed)
GRanges object with 6 ranges and 1 metadata column:
                                        seqnames           ranges strand |     score
                                           <Rle>        <IRanges>  <Rle> | <numeric>
   NC_EtOH_R1_S17_sorted.bam.bam_peak_1     chr1 [ 10023,  10215]      * |         7
   NC_EtOH_R1_S17_sorted.bam.bam_peak_2     chr1 [526379, 526773]      * |        22
  NC_EtOH_R1_S17_sorted.bam.bam_peak_3b     chr1 [564391, 565477]      * |        35
  NC_EtOH_R1_S17_sorted.bam.bam_peak_4b     chr1 [565600, 566838]      * |        34
   NC_EtOH_R1_S17_sorted.bam.bam_peak_5     chr1 [567182, 567748]      * |        44
  NC_EtOH_R1_S17_sorted.bam.bam_peak_6a     chr1 [567813, 568982]      * |        32
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

 

head(si8444DHTR1S16)
GRanges object with 6 ranges and 1 metadata column:
                                            seqnames           ranges strand |     score
                                               <Rle>        <IRanges>  <Rle> | <numeric>
   si844_4_DHT_R1_S16_sorted.bam.bam_peak_1     chr1 [  9969,  10370]      * |        10
  si844_4_DHT_R1_S16_sorted.bam.bam_peak_2c     chr1 [564398, 565440]      * |        38
   si844_4_DHT_R1_S16_sorted.bam.bam_peak_3     chr1 [565535, 565905]      * |        25
  si844_4_DHT_R1_S16_sorted.bam.bam_peak_4a     chr1 [566068, 566824]      * |        37
  si844_4_DHT_R1_S16_sorted.bam.bam_peak_5b     chr1 [567219, 567773]      * |        61
  si844_4_DHT_R1_S16_sorted.bam.bam_peak_6a     chr1 [567841, 568626]      * |        24
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

 

ol1 <-findOverlapsOfPeaks(NCEtOHR1S17.bed, si8441DHTR1S18)

ol1 <- addMetadata(ol1, colNames="score", FUN=mean)

makeVennDiagram(ol1)

$vennCounts
     NCEtOHR1S17.bed si8444DHTR1S16 Counts
[1,]               0              0      0
[2,]               0              1   1382
[3,]               1              0   7566
[4,]               1              1   8023
attr(,"class")
[1] "VennCounts"

> nrow(as.data.frame(ol1$overlappingPeaks))
[1] 8270
> nrow(as.data.frame(ol1$mergedPeaks))
[1] 7974
 

 

 

 

 

 

chippeakanno • 2.8k views
ADD COMMENT
0
Entering edit mode
Ou, Jianhong ★ 1.3k
@ou-jianhong-4539
Last seen 22 days ago
United States

Hi,

This question has been asked many times. Please refer the answer in C: Curious behaviour of ChIPpeakAnno

You can have a try to turn on the argument connectedPeaks="keepAll" to see how the numbers change.

Basically, the merged peaks may be different from the original peaks because they are merged from two or more peaks since they have overlaps. peaksInMergedPeaks will tell you the original peaks before merging.  peaklist indicates the peaks for each category. overlappingPeaks are merged results for pair overlapping peaks. 

Let me know if there are any question.

ADD COMMENT
0
Entering edit mode

Thank you for replay.

I got your point . but whatever number is common,  I want to extract id of those common peaks

Below is the summary of my data.

 summary(ol$peaklist)
                                              Length Class   Mode
si8444DHTR1S16                                 803   GRanges S4  
si8441DHTR1S18                                2188   GRanges S4  
si8441DHTR1S18///si8444DHTR1S16                552   GRanges S4  
NCEtOHR1S17                                   4561   GRanges S4  
NCEtOHR1S17///si8444DHTR1S16                   584   GRanges S4  
NCEtOHR1S17///si8441DHTR1S18                  2973   GRanges S4  
NCEtOHR1S17///si8441DHTR1S18///si8444DHTR1S16 7342   GRanges S4  

 

1. I want to extract  Id of  peaks which are  contributing to this number 7342 . Can you please tell me how i should process

2. In above summary, common number is 7342 . but in Venn diagram the common number is different(7415). Can you please tell me why I am getting  different numbers for overlap

for details of Image please follow below line

https://drive.google.com/open?id=0B5CeCVgbLqijajBVZjlMWnNMVGM

 

 

 

 

ADD REPLY
0
Entering edit mode

Hi,

I will show some sample codes to answer your question. Hope this can help you to understand what happened for the Venn-diagram.

For question 1:

There is a column in the metadata of each element in peaklist, called peakNames, which is a CharacterList. The CharacterList is a list of the contributing peaks ids with prefix, eg. peaks1__peakname1, peaksi__peaknamej. Users can access the original peak name by split those characters.

library(ChIPpeakAnno)
bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno")
gr1 <- toGRanges(bed, format="BED", header=FALSE)
gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno")
gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3)
ol <- findOverlapsOfPeaks(gr1, gr2)
peakNames <- ol$peaklist[['gr1///gr2']]$peakNames
library(reshape2)
peakNames1 <- melt(peakNames, value.name="merged.peak.id")
peakNames1 <- cbind(peakNames1[, 1], do.call(rbind, strsplit(as.character(peakNames1[, 3]), "__")))
colnames(peakNames1) <- c("merged.peak.id", "group", "peakName")
head(peakNames1)
gr1.subset <- gr1[peakNames1[peakNames1[, "group"] %in% "gr1", "peakName"]]
gr2.subset <- gr2[peakNames1[peakNames1[, "group"] %in% "gr2", "peakName"]]

More easier way to access the original peaks is by using the renames peaks.

all.peaks <- ol$all.peaks
gr1.renamed <- all.peaks$gr1
gr2.renamed <- all.peaks$gr2
peakNames <- melt(ol$peaklist[['gr1///gr2']]$peakNames, value.name="merged.peak.id")
gr1.sub <- gr1.renamed[peakNames[grepl("^gr1", peakNames[, 3]), 3]]
gr2.sub <- gr2.renamed[peakNames[grepl("^gr2", peakNames[, 3]), 3]]

For question 2:

There is a argument called connectedPeaks. In the documentation, we described the argument connectedPeaks as If multiple peaks involved in overlapping in several groups, set it to "merge" will count it as 1, while set it to "min" will count it as the minimal involved peaks in any group of connected/overlapped peaks. The default is “min”. By default, the program will select the minimal number from each peaklist involved in one merged peak. So the number will be no less than the number in the peaklist. If user set connectedPeaks to merge, the number will be exactly same as the number in the peaklist. It is not easy to understand above description. I provide a simple example for you to understand what is difference between them:

p1 <- GRanges("1", IRanges(c(1, 4, 7), width=2))
p2 <- GRanges("1", IRanges(c(2, 5), width=3))
ol_min <- findOverlapsOfPeaks(p1, p2, connectedPeaks="min") ## the counts will be the minimal peaks involved in that group of connected peaks, so you get 2.
ol_merge <- findOverlapsOfPeaks(p1, p2, connectedPeaks="merge") ## the counts will be 1 for each group of connected peaks
ol_min$venn_cnt
ol_merge$venn_cnt

I paste the output here:

> ol_min$venn_cnt
     p1 p2 Counts
[1,]  0  0      0
[2,]  0  1      0
[3,]  1  0      0
[4,]  1  1      2
attr(,"class")
[1] "VennCounts"
> ol_merge$venn_cnt
     p1 p2 Counts
[1,]  0  0      0
[2,]  0  1      0
[3,]  1  0      0
[4,]  1  1      1
attr(,"class")
[1] "VennCounts"

 

Let me know if you still have any question.

ADD REPLY

Login before adding your answer.

Traffic: 707 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6