Entering edit mode
Antonio,
In this case, 19091 peaks in peaks2 overlap with 19957 peaks in
peaks1, i.e., there exists peak in peaks2 that overlap with multiple
peaks in peaks1. To be conservative, makeVennDiagram shows that 19091
peaks instead of 19957 in the intersection part. If you want to
present the data in a table consistent with the venn diagram, one
approach is to switch the position of peaks1 and peaks2 in
findOverlappingPeaks call.
re<-findOverlappingPeaks(RangedData(replicate2),
RangedData(replicate1), minoverlap = 100, select= "first",
NameOfPeaks1="Replicate2", NameOfPeaks2="Replicate1")
Alternatively, you could also filter the mergedPeaks you obtained
already to include one record for each peaks1 or peaks2.
Best regards,
Julie
On 10/1/12 2:55 PM, "Ou, Jianhong" <jianhong.ou@umassmed.edu> wrote:
Hi Antonio,
I am sorry I did not write the help file clearly. It always make
people confuse about the overlapping number. And thank you for your
dataset. I will using these data to training makeVennDiagram.
Yes, you understand is correct. See codes following.
> library(ChIPpeakAnno)
> load('~/Documents/bioconductor/makeVennDiagram/replicate1.RData')
> load('~/Documents/bioconductor/makeVennDiagram/replicate2.RData')
> ls()
[1] "replicate1" "replicate2"
> head(replicate2)
GRanges with 6 ranges and 5 elementMetadata cols:
seqnames ranges strand | count score
FE fdr summit
<rle> <iranges> <rle> | <integer> <numeric>
<numeric> <numeric> <integer>
chr1:713280 chr1 [712960, 713502] * | 40 270.11
30.04 0.03 713280
chr1:713986 chr1 [713836, 714823] * | 161 1599.27
72.56 0 713986
chr1:762195 chr1 [761851, 762311] * | 52 452.96
53.8 0 762195
chr1:840657 chr1 [838917, 842620] * | 345 1995.67
57.91 0 840657
chr1:857025 chr1 [855403, 857374] * | 149 463.02
11.19 0 857025
chr1:860228 chr1 [857564, 862591] * | 998 3100
73.26 0 860228
---
seqlengths:
chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19
... chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrM chrX chrY
NA NA NA NA NA NA NA NA NA NA NA
... NA NA NA NA NA NA NA NA NA NA
> makeVennDiagram(RangedDataList(RangedData(replicate1),
RangedData(replicate2)), NameOfPeaks=c("TF1", "TF2"),
+ totalTest=50000,useFeature=FALSE, minoverlap = 100, select= "first",
main="test",
+ main.fontface = "bold",
+ col = "transparent",
+ fill = c("cornflowerblue", "green"),
+ alpha = 0.50,
+ #label.col = c("orange", "white", "darkorchid4", "white", "white",
"white", "white", "white", "darkblue", "white", "white", "white",
"white", "darkgreen", "white"),
+ #cat.col = c("darkblue", "darkgreen", "orange", "darkorchid4"
+ )
$p.value
[1] 0
$vennCounts
TF1 TF2 Counts
[1,] 0 0 21578
[2,] 0 1 5631
[3,] 1 0 3700
[4,] 1 1 19091
attr(,"class")
[1] "VennCounts"
> length(replicate2)
[1] 24722
> length(replicate1)
[1] 22791
>
> names(re)
[1] "OverlappingPeaks" "MergedPeaks" "Peaks1withOverlaps"
"Peaks2withOverlaps"
> dim(re$MergedPeaks)
[1] 19957 0
> dim(re$Peaks1withOverlaps)
[1] 19957 1
> dim(re$Peaks2withOverlaps)
[1] 19091 1
Yours sincerely,
Jianhong Ou
jianhong.ou@umassmed.edu
On Oct 1, 2012, at 2:00 PM, António Miguel de Jesus Domingues wrote:
Hi Jianhong,
I am sending the data in attach as Rdata and also the venn diagram
that I've generated (along with the code). Just to clarify as it seems
that my message was not very clear:
The $MergedPeaks from FindOverlappingPeaks and MakeVennDriagrams
actually give out the same number of peaks. The problem is that in the
venn diagram itself a smaller number of peaks is shown as overlapping
both datasets.
I have the feeling that this is something silly I am missing but I've
read the paper and the manual and still could not find an explanation.
Best,
António
On 1 October 2012 18:08, Ou, Jianhong <jianhong.ou@umassmed.edu>
wrote:
Hi Antonio,
> I believe the difference is because some of peaks 2 overlap more
than peaks
> in peaks1.
Yes, this is the reason why merged peaks from findOverlappingPeaks are
different from the results makeVennDiagram. As you known, some of
peaks2 may overlap more than one peaks in peaks1 and viceversa. In
findOverlappingPeaks, you can get the MergedPeaks (merge overlapping
peaks for peaks1 and peaks2), Peaks1withOverlaps and
Peaks2withOverlaps. In makeVennDiagram, it will select the smaller one
from Peaks1withOverlaps and Peaks2withOverlaps. Both of them will be
no less than MergedPeaks because they will not merge the small
overlapping peaks to a bigger peak. The more complicated condition is
multiple peaks in peaks1 merged with multiple peaks in peaks2 into one
big peak when we want to makeVennDiagram for three or more groups.
I will appreciated if you send your data to me as training dataset for
developing a new version of makeVennDiagram.
Yours sincerely,
Jianhong Ou
jianhong.ou@umassmed.edu
On Oct 1, 2012, at 11:14 AM, António Miguel de Jesus Domingues wrote:
My apologies Jianhong,
I forgot to attach the session info. I am using ChIPpeakAnno_2.5.12
Just an extra information, using the example from the vignette, it
does work as it should but that might be simply because the overlaps
are more straightforward - that is, no peak in peaks1 overlap with
more than one peak in peaks2 and vice-versa.
sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid grDevices datasets graphics utils stats
methods
[8] base
other attached packages:
[1] ChIPpeakAnno_2.5.12 limma_3.12.3
[3] org.Hs.eg.db_2.7.1 GO.db_2.7.1
[5] RSQLite_0.11.2 DBI_0.2-5
[7] AnnotationDbi_1.18.4
BSgenome.Ecoli.NCBI.20080805_1.3.17
[9] BSgenome_1.24.0 GenomicRanges_1.8.13
[11] Biostrings_2.24.1 IRanges_1.14.4
[13] multtest_2.12.0 biomaRt_2.12.0
[15] VennDiagram_1.5.1 ggplot2_0.9.2.1
[17] Biobase_2.16.0 BiocGenerics_0.2.0
loaded via a namespace (and not attached):
[1] amap_0.8-7 colorspace_1.1-1 dichromat_1.2-4
DiffBind_1.2.4
[5] digest_0.5.2 edgeR_2.6.12 gdata_2.12.0
gplots_2.11.0
[9] gtable_0.1.1 gtools_2.7.0 labeling_0.1
MASS_7.3-21
[13] memoise_0.1 munsell_0.4 plyr_1.7.1
proto_0.3-9.2
[17] RColorBrewer_1.0-5 RCurl_1.91-1 reshape2_1.2.1
scales_0.2.2
[21] splines_2.15.1 stats4_2.15.1 stringr_0.6.1
survival_2.36-14
[25] tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0
On 1 October 2012 16:54, Ou, Jianhong <jianhong.ou@umassmed.edu>
wrote:
Hi Antonio,
May I know the version of ChipPeakAnno you are using?
Yours sincerely,
Jianhong Ou
jianhong.ou@umassmed.edu
On Oct 1, 2012, at 10:36 AM, António Miguel de Jesus Domingues wrote:
> I've been trying to generate a set of high-confidence peaks that are
common
> to my ChIP-seq replicates using ChipPeakAnno. The issue I'm having
is
> matching the number of overlaping peaks seen on the venn digram
resulting
> from:
> makeVennDiagram(RangedDataList(peaks1,peaks2),
NameOfPeaks=c("TF1","TF2"),
> totalTest=(Npeaks1 + Npeaks2), useFeature=FALSE, minoverlap =
100,
> select= "first")
>
> and the number of peaks ($MergedPeaks) from:
> findOverlappingPeaks(peaks1, peaks2, minoverlap = 100, select=
"first",
> NameOfPeaks1="TF1", NameOfPeaks2="TF2")
>
> I believe the difference is because some of peaks 2 overlap more
than peaks
> in peaks1. Comparing peaks2 vs peaks one does not solve the problem
and
> select= "first" is already being used. Also the $MergedPeaks data
that is
> outputted from makeVennDiagram does not match the number of
overlaps:
> $MergedPeaks
> RangedData with 18650 rows and 0 value columns across 24 spaces
>
> [1] 19039
> [1] 21061
> $p.value
> [1] 0
>
> $vennCounts
> Replicate1 Replicate2 Counts
> [1,] 0 0 17300
> [2,] 0 1 3761
> [3,] 1 0 1739
> [4,] 1 1 17300
> attr(,"class")
> [1] "VennCounts"
>
>
> I would like to understand from where does this difference arises so
that I
> ultimately have consistent results in visual and table format.
>
> Cheers,
> António
>
>
> --
> --
> António Miguel de Jesus Domingues, PhD
> Neugebauer group
> Max Planck Institute of Molecular Cell Biology and Genetics, Dresden
> Pfotenhauerstrasse 108
> 01307 Dresden
> Germany
>
> e-mail: domingue@mpi-cbg.de
> tel. +49 351 210 2481 <tel:%2b49%20351%20210%202481>
> The Unbearable Lightness of Molecular Biology
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------ End of Forwarded Message
[[alternative HTML version deleted]]