ChipPeakAnno - slightly different results between makeVennDiagram and findPeakOverlaps

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 13 months ago

United States

Antonio, In this case, 19091 peaks in peaks2 overlap with 19957 peaks in peaks1, i.e., there exists peak in peaks2 that overlap with multiple peaks in peaks1. To be conservative, makeVennDiagram shows that 19091 peaks instead of 19957 in the intersection part. If you want to present the data in a table consistent with the venn diagram, one approach is to switch the position of peaks1 and peaks2 in findOverlappingPeaks call. re<-findOverlappingPeaks(RangedData(replicate2), RangedData(replicate1), minoverlap = 100, select= "first", NameOfPeaks1="Replicate2", NameOfPeaks2="Replicate1") Alternatively, you could also filter the mergedPeaks you obtained already to include one record for each peaks1 or peaks2. Best regards, Julie On 10/1/12 2:55 PM, "Ou, Jianhong" <jianhong.ou@umassmed.edu> wrote: Hi Antonio, I am sorry I did not write the help file clearly. It always make people confuse about the overlapping number. And thank you for your dataset. I will using these data to training makeVennDiagram. Yes, you understand is correct. See codes following. > library(ChIPpeakAnno) > load('~/Documents/bioconductor/makeVennDiagram/replicate1.RData') > load('~/Documents/bioconductor/makeVennDiagram/replicate2.RData') > ls() [1] "replicate1" "replicate2" > head(replicate2) GRanges with 6 ranges and 5 elementMetadata cols: seqnames ranges strand | count score FE fdr summit <rle> <iranges> <rle> | <integer> <numeric> <numeric> <numeric> <integer> chr1:713280 chr1 [712960, 713502] * | 40 270.11 30.04 0.03 713280 chr1:713986 chr1 [713836, 714823] * | 161 1599.27 72.56 0 713986 chr1:762195 chr1 [761851, 762311] * | 52 452.96 53.8 0 762195 chr1:840657 chr1 [838917, 842620] * | 345 1995.67 57.91 0 840657 chr1:857025 chr1 [855403, 857374] * | 149 463.02 11.19 0 857025 chr1:860228 chr1 [857564, 862591] * | 998 3100 73.26 0 860228 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 ... chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrM chrX chrY NA NA NA NA NA NA NA NA NA NA NA ... NA NA NA NA NA NA NA NA NA NA > makeVennDiagram(RangedDataList(RangedData(replicate1), RangedData(replicate2)), NameOfPeaks=c("TF1", "TF2"), + totalTest=50000,useFeature=FALSE, minoverlap = 100, select= "first", main="test", + main.fontface = "bold", + col = "transparent", + fill = c("cornflowerblue", "green"), + alpha = 0.50, + #label.col = c("orange", "white", "darkorchid4", "white", "white", "white", "white", "white", "darkblue", "white", "white", "white", "white", "darkgreen", "white"), + #cat.col = c("darkblue", "darkgreen", "orange", "darkorchid4" + ) $p.value [1] 0 $vennCounts TF1 TF2 Counts [1,] 0 0 21578 [2,] 0 1 5631 [3,] 1 0 3700 [4,] 1 1 19091 attr(,"class") [1] "VennCounts" > length(replicate2) [1] 24722 > length(replicate1) [1] 22791 > > names(re) [1] "OverlappingPeaks" "MergedPeaks" "Peaks1withOverlaps" "Peaks2withOverlaps" > dim(re$MergedPeaks) [1] 19957 0 > dim(re$Peaks1withOverlaps) [1] 19957 1 > dim(re$Peaks2withOverlaps) [1] 19091 1 Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu On Oct 1, 2012, at 2:00 PM, António Miguel de Jesus Domingues wrote: Hi Jianhong, I am sending the data in attach as Rdata and also the venn diagram that I've generated (along with the code). Just to clarify as it seems that my message was not very clear: The $MergedPeaks from FindOverlappingPeaks and MakeVennDriagrams actually give out the same number of peaks. The problem is that in the venn diagram itself a smaller number of peaks is shown as overlapping both datasets. I have the feeling that this is something silly I am missing but I've read the paper and the manual and still could not find an explanation. Best, António On 1 October 2012 18:08, Ou, Jianhong <jianhong.ou@umassmed.edu> wrote: Hi Antonio, > I believe the difference is because some of peaks 2 overlap more than peaks > in peaks1. Yes, this is the reason why merged peaks from findOverlappingPeaks are different from the results makeVennDiagram. As you known, some of peaks2 may overlap more than one peaks in peaks1 and viceversa. In findOverlappingPeaks, you can get the MergedPeaks (merge overlapping peaks for peaks1 and peaks2), Peaks1withOverlaps and Peaks2withOverlaps. In makeVennDiagram, it will select the smaller one from Peaks1withOverlaps and Peaks2withOverlaps. Both of them will be no less than MergedPeaks because they will not merge the small overlapping peaks to a bigger peak. The more complicated condition is multiple peaks in peaks1 merged with multiple peaks in peaks2 into one big peak when we want to makeVennDiagram for three or more groups. I will appreciated if you send your data to me as training dataset for developing a new version of makeVennDiagram. Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu On Oct 1, 2012, at 11:14 AM, António Miguel de Jesus Domingues wrote: My apologies Jianhong, I forgot to attach the session info. I am using ChIPpeakAnno_2.5.12 Just an extra information, using the example from the vignette, it does work as it should but that might be simply because the overlaps are more straightforward - that is, no peak in peaks1 overlap with more than one peak in peaks2 and vice-versa. sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid grDevices datasets graphics utils stats methods [8] base other attached packages: [1] ChIPpeakAnno_2.5.12 limma_3.12.3 [3] org.Hs.eg.db_2.7.1 GO.db_2.7.1 [5] RSQLite_0.11.2 DBI_0.2-5 [7] AnnotationDbi_1.18.4 BSgenome.Ecoli.NCBI.20080805_1.3.17 [9] BSgenome_1.24.0 GenomicRanges_1.8.13 [11] Biostrings_2.24.1 IRanges_1.14.4 [13] multtest_2.12.0 biomaRt_2.12.0 [15] VennDiagram_1.5.1 ggplot2_0.9.2.1 [17] Biobase_2.16.0 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] amap_0.8-7 colorspace_1.1-1 dichromat_1.2-4 DiffBind_1.2.4 [5] digest_0.5.2 edgeR_2.6.12 gdata_2.12.0 gplots_2.11.0 [9] gtable_0.1.1 gtools_2.7.0 labeling_0.1 MASS_7.3-21 [13] memoise_0.1 munsell_0.4 plyr_1.7.1 proto_0.3-9.2 [17] RColorBrewer_1.0-5 RCurl_1.91-1 reshape2_1.2.1 scales_0.2.2 [21] splines_2.15.1 stats4_2.15.1 stringr_0.6.1 survival_2.36-14 [25] tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0 On 1 October 2012 16:54, Ou, Jianhong <jianhong.ou@umassmed.edu> wrote: Hi Antonio, May I know the version of ChipPeakAnno you are using? Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu On Oct 1, 2012, at 10:36 AM, António Miguel de Jesus Domingues wrote: > I've been trying to generate a set of high-confidence peaks that are common > to my ChIP-seq replicates using ChipPeakAnno. The issue I'm having is > matching the number of overlaping peaks seen on the venn digram resulting > from: > makeVennDiagram(RangedDataList(peaks1,peaks2), NameOfPeaks=c("TF1","TF2"), > totalTest=(Npeaks1 + Npeaks2), useFeature=FALSE, minoverlap = 100, > select= "first") > > and the number of peaks ($MergedPeaks) from: > findOverlappingPeaks(peaks1, peaks2, minoverlap = 100, select= "first", > NameOfPeaks1="TF1", NameOfPeaks2="TF2") > > I believe the difference is because some of peaks 2 overlap more than peaks > in peaks1. Comparing peaks2 vs peaks one does not solve the problem and > select= "first" is already being used. Also the $MergedPeaks data that is > outputted from makeVennDiagram does not match the number of overlaps: > $MergedPeaks > RangedData with 18650 rows and 0 value columns across 24 spaces > > [1] 19039 > [1] 21061 > $p.value > [1] 0 > > $vennCounts > Replicate1 Replicate2 Counts > [1,] 0 0 17300 > [2,] 0 1 3761 > [3,] 1 0 1739 > [4,] 1 1 17300 > attr(,"class") > [1] "VennCounts" > > > I would like to understand from where does this difference arises so that I > ultimately have consistent results in visual and table format. > > Cheers, > António > > > -- > -- > António Miguel de Jesus Domingues, PhD > Neugebauer group > Max Planck Institute of Molecular Cell Biology and Genetics, Dresden > Pfotenhauerstrasse 108 > 01307 Dresden > Germany > > e-mail: domingue@mpi-cbg.de > tel. +49 351 210 2481 <tel:%2b49%20351%20210%202481> > The Unbearable Lightness of Molecular Biology > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------ End of Forwarded Message [[alternative HTML version deleted]]

Genetics GO ChIPpeakAnno Genetics GO ChIPpeakAnno • 1.4k views

ADD COMMENT • link 12.2 years ago Julie Zhu ★ 4.3k

Login before adding your answer.