Question

ChipPeakAnno / GRanges isolate peaks that do NOT overlap

0

Entering edit mode

José LÓPEZ ▴ 110

@jose-lopez-5444

Last seen 5.8 years ago

Dear Julie, I am using findOverlappingPeaks and makeVennDiagram to compare 2 set of peaks (A and B). I get overlapping peaks of A with B ($Peaks1withOverlaps) (9992 see below). My question is to how can I get those peaks specific to A (13268) and those specific to B (2640)? I also tried with subsetByOverlaps from (GenomicRanges package) and get the intersect between A and B (9992 as with ChiPpeakAnno functions). Then, to identify peaks that do not overlap tried to isolate those A regions that were not present in the overlap with B, but get an error (see below)... > noOlap = peaks_AcH3_Ranged[!peaks_A_Ranged %in% peaks_B_Ranged] Error in peaks_AcH3_Ranged[!peaks_A_Ranged %in% peaks_B_Ranged] : selecting spaces: invalid subscript type Thank you in advance for your suggestions, Jose R version 2.15.0 (2012-03-30) Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [R.app GUI 1.51 (6148) x86_64-apple-darwin9.8.0] [History restored from /Users/joselopez/.Rapp.history] #load ChiPpeakAnno > library (ChIPpeakAnno) Loading required package: gplots Loading required package: gtools Loading required package: gdata gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files. gdata: Run the function 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats. Attaching package: gdata The following object(s) are masked from package:stats: nobs The following object(s) are masked from package:utils: object.size Loading required package: caTools Loading required package: bitops Loading required package: grid Loading required package: KernSmooth KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 Loading required package: MASS Attaching package: gplots The following object(s) are masked from package:stats: lowess Loading required package: BiocGenerics Attaching package: BiocGenerics The following object(s) are masked from package:gdata: combine The following object(s) are masked from package:stats: xtabs The following object(s) are masked from package:base: anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, rownames, sapply, setdiff, table, tapply, union, unique Loading required package: biomaRt Loading required package: multtest Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Attaching package: multtest The following object(s) are masked from package:gplots: wapply Loading required package: IRanges Attaching package: IRanges The following object(s) are masked from package:gplots: space The following object(s) are masked from package:caTools: runmean The following object(s) are masked from package:gdata: trim Loading required package: Biostrings Loading required package: BSgenome Loading required package: GenomicRanges Loading required package: BSgenome.Ecoli.NCBI.20080805 Loading required package: GO.db Loading required package: AnnotationDbi Attaching package: AnnotationDbi The following object(s) are masked from package:MASS: select Loading required package: DBI Loading required package: org.Hs.eg.db Loading required package: limma Warning messages: 1: package gtools was built under R version 2.15.1 2: package gdata was built under R version 2.15.1 3: package KernSmooth was built under R version 2.15.1 4: package MASS was built under R version 2.15.1 5: package IRanges was built under R version 2.15.1 6: package GenomicRanges was built under R version 2.15.1 7: replacing previous import space when loading IRanges #Generate GRanges objects > peaks_A = read.table("data_A.txt") > dim(peaks_A) [1] 23260 16 > peaks_A_Ranged = BED2RangedData(peaks_A,header=F) > peaks_A_Ranged RangedData with 23260 rows and 2 value columns across 21 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 00001 1 [3658000, 3663999] | 1 1 00002 1 [4481600, 4483799] | 1 1 00003 1 [4486200, 4488199] | 1 1 00004 1 [4561200, 4562399] | 1 1 00005 1 [4774000, 4776599] | 1 1 00006 1 [4796800, 4799399] | 1 1 00007 1 [4846400, 4849799] | 1 1 00008 1 [5007000, 5013799] | 1 1 00009 1 [5072800, 5075999] | 1 1 ... ... ... ... ... ... 23252 X [165110000, 165112999] | 1 1 23253 X [165474200, 165475399] | 1 1 23254 X [165755800, 165759199] | 1 1 23255 X [166317200, 166320199] | 1 1 23256 X [166416000, 166421999] | 1 1 23257 Y [ 234000, 234799] | 1 1 23258 Y [ 346400, 347799] | 1 1 23259 Y [ 581200, 582799] | 1 1 23260 Y [ 621400, 623599] | 1 1 > peaks_B = read.table("data_B.txt") > dim(peaks_B) [1] 12632 16 > peaks_B_Ranged = BED2RangedData(peaks_B,header=F) > peaks_B_Ranged RangedData with 12632 rows and 2 value columns across 20 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 00001 1 [3660400, 3662799] | 1 1 00002 1 [4775200, 4777199] | 1 1 00003 1 [4797000, 4798399] | 1 1 00004 1 [6204200, 6205799] | 1 1 00005 1 [7076400, 7080599] | 1 1 00006 1 [9288400, 9290999] | 1 1 00007 1 [9538200, 9541999] | 1 1 00008 1 [9764400, 9766399] | 1 1 00009 1 [9998600, 9999999] | 1 1 ... ... ... ... ... ... 12624 X [149932200, 149934799] | 1 1 12625 X [151695800, 151697199] | 1 1 12626 X [152057200, 152059599] | 1 1 12627 X [154255600, 154257999] | 1 1 12628 X [159173200, 159175599] | 1 1 12629 X [160707200, 160709599] | 1 1 12630 X [162676400, 162677599] | 1 1 12631 X [166356400, 166384999] | 1 1 12632 X [166416000, 166421999] | 1 1 #find overlaps A,B. > t1 = findOverlappingPeaks (peaks_A_Ranged, peaks_B_Ranged, maxgap = 0, NameOfPeaks1 = "A", NameOfPeaks2 = "B", select = "all") There were 20 warnings (use warnings() to see them) > overlappingPeaks = as.data.frame(t1$Peaks1withOverlaps) > head(overlappingPeaks) space start end width names strand 1 1 3658000 3663999 6000 00001 + 2 1 4774000 4776599 2600 00005 + 3 1 4796800 4799399 2600 00006 + 4 1 6202800 6207199 4400 00010 + 5 1 7076200 7081599 5400 00014 + 6 1 9285400 9290999 5600 00017 + > dim(overlappingPeaks) [1] 9992 6 > write.table(as.data.frame(t1$Peaks1withOverlaps), file="overlappingPeaks.txt", sep="\t", row.names=F) #perform venn diagram > makeVennDiagram(RangedDataList(peaks_A_Ranged, peaks_B_Ranged), NameOfPeaks=c("A","B"), maxgap = 0, totalTest = 3E4, cex =1, counts.col='red', useFeature=F) $p.value [1] 1.455914e-08 $vennCounts A B Counts [1,] 0 0 4100 [2,] 0 1 2640 [3,] 1 0 13268 [4,] 1 1 9992 attr(,"class") [1] "VennCounts" Warning message: In findOverlappingPeaks(Peaks[[1]], Peaks[[2]], NameOfPeaks1 = NameOfPeaks[1], : Please use select instead of multiple! > dev.copy2eps(file='VennDiagram.eps') > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_2.4.0 limma_3.12.1 org.Hs.eg.db_2.7.1 [4] GO.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 [7] AnnotationDbi_1.18.1 BSgenome.Ecoli.NCBI. 20080805_1.3.17 BSgenome_1.24.0 [10] GenomicRanges_1.8.9 Biostrings_2.24.1 IRanges_1.14.4 [13] multtest_2.12.0 Biobase_2.16.0 biomaRt_2.12.0 [16] BiocGenerics_0.2.0 gplots_2.11.0 MASS_7.3-19 [19] KernSmooth_2.23-8 caTools_1.13 bitops_1.0-4.1 [22] gdata_2.11.0 gtools_2.7.0 loaded via a namespace (and not attached): [1] RCurl_1.91-1 splines_2.15.0 stats4_2.15.0 survival_2.36-14 XML_3.9-4 [[alternative HTML version deleted]]

GO GUI BSgenome BSgenome GO GUI BSgenome BSgenome • 2.2k views

ADD COMMENT • link updated 12.6 years ago by Julie Zhu ★ 4.3k • written 12.6 years ago by José LÓPEZ ▴ 110

score 0 · Answer 1 · 2012-08-30

Jose, Here is an example to get the non-overlapping peaks. peaks1 = RangedData(IRanges(start=c(1543200,1557200,1563000,1569800,167889600), end=c(1555199,1560599,1565199,1573799,167893599),names=c("p1","p2","p3 ","p4","p5")), strand=as.integer(1),space=c(6,6,6,6,5)) peaks2 = RangedData(IRanges(start=c(1549800,1554400,1565000,1569400,167888600, 1000), end=c(1550599,1560799,1565399,1571199,167888999,1200),names=c("f1","f2 ","f3","f4","f5", "f6")), strand=as.integer(1),space=c(6,6,6,6,5,8)) t1 =findOverlappingPeaks(peaks1, peaks2, maxgap=1000, NameOfPeaks1="TF1", NameOfPeaks2="TF2", select="all") t1$Peaks1withOverlaps t1$Peaks2withOverlaps peaks1[!rownames(peaks1) %in% rownames(t1$Peaks1withOverlaps),] peaks2[!rownames(peaks2) %in% rownames(t1$Peaks2withOverlaps),] Best regards, Julie On 8/30/12 12:40 PM, "José LÓPEZ" <jose.lopez@umh.es> wrote: Dear Julie, I am using findOverlappingPeaks and makeVennDiagram to compare 2 set of peaks (A and B). I get overlapping peaks of A with B ($Peaks1withOverlaps) (9992 see below). My question is to how can I get those peaks specific to A (13268) and those specific to B (2640)? I also tried with subsetByOverlaps from (GenomicRanges package) and get the intersect between A and B (9992 as with ChiPpeakAnno functions). Then, to identify peaks that do not overlap tried to isolate those A regions that were not present in the overlap with B, but get an error (see below)... > noOlap = peaks_AcH3_Ranged[!peaks_A_Ranged %in% peaks_B_Ranged] Error in peaks_AcH3_Ranged[!peaks_A_Ranged %in% peaks_B_Ranged] : selecting spaces: invalid subscript type Thank you in advance for your suggestions, Jose R version 2.15.0 (2012-03-30) Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [R.app GUI 1.51 (6148) x86_64-apple-darwin9.8.0] [History restored from /Users/joselopez/.Rapp.history] #load ChiPpeakAnno > library (ChIPpeakAnno) Loading required package: gplots Loading required package: gtools Loading required package: gdata gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. gdata: Unable to load perl libaries needed by read.xls() gdata: to support 'XLSX' (Excel 2007+) files. gdata: Run the function 'installXLSXsupport()' gdata: to automatically download and install the perl gdata: libaries needed to support Excel XLS and XLSX formats. Attaching package: gdata The following object(s) are masked from package:stats: nobs The following object(s) are masked from package:utils: object.size Loading required package: caTools Loading required package: bitops Loading required package: grid Loading required package: KernSmooth KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 Loading required package: MASS Attaching package: gplots The following object(s) are masked from package:stats: lowess Loading required package: BiocGenerics Attaching package: BiocGenerics The following object(s) are masked from package:gdata: combine The following object(s) are masked from package:stats: xtabs The following object(s) are masked from package:base: anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, rownames, sapply, setdiff, table, tapply, union, unique Loading required package: biomaRt Loading required package: multtest Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Attaching package: multtest The following object(s) are masked from package:gplots: wapply Loading required package: IRanges Attaching package: IRanges The following object(s) are masked from package:gplots: space The following object(s) are masked from package:caTools: runmean The following object(s) are masked from package:gdata: trim Loading required package: Biostrings Loading required package: BSgenome Loading required package: GenomicRanges Loading required package: BSgenome.Ecoli.NCBI.20080805 Loading required package: GO.db Loading required package: AnnotationDbi Attaching package: AnnotationDbi The following object(s) are masked from package:MASS: select Loading required package: DBI Loading required package: org.Hs.eg.db Loading required package: limma Warning messages: 1: package gtools was built under R version 2.15.1 2: package gdata was built under R version 2.15.1 3: package KernSmooth was built under R version 2.15.1 4: package MASS was built under R version 2.15.1 5: package IRanges was built under R version 2.15.1 6: package GenomicRanges was built under R version 2.15.1 7: replacing previous import space when loading IRanges #Generate GRanges objects > peaks_A = read.table("data_A.txt") > dim(peaks_A) [1] 23260 16 > peaks_A_Ranged = BED2RangedData(peaks_A,header=F) > peaks_A_Ranged RangedData with 23260 rows and 2 value columns across 21 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 00001 1 [3658000, 3663999] | 1 1 00002 1 [4481600, 4483799] | 1 1 00003 1 [4486200, 4488199] | 1 1 00004 1 [4561200, 4562399] | 1 1 00005 1 [4774000, 4776599] | 1 1 00006 1 [4796800, 4799399] | 1 1 00007 1 [4846400, 4849799] | 1 1 00008 1 [5007000, 5013799] | 1 1 00009 1 [5072800, 5075999] | 1 1 ... ... ... ... ... ... 23252 X [165110000, 165112999] | 1 1 23253 X [165474200, 165475399] | 1 1 23254 X [165755800, 165759199] | 1 1 23255 X [166317200, 166320199] | 1 1 23256 X [166416000, 166421999] | 1 1 23257 Y [ 234000, 234799] | 1 1 23258 Y [ 346400, 347799] | 1 1 23259 Y [ 581200, 582799] | 1 1 23260 Y [ 621400, 623599] | 1 1 > peaks_B = read.table("data_B.txt") > dim(peaks_B) [1] 12632 16 > peaks_B_Ranged = BED2RangedData(peaks_B,header=F) > peaks_B_Ranged RangedData with 12632 rows and 2 value columns across 20 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 00001 1 [3660400, 3662799] | 1 1 00002 1 [4775200, 4777199] | 1 1 00003 1 [4797000, 4798399] | 1 1 00004 1 [6204200, 6205799] | 1 1 00005 1 [7076400, 7080599] | 1 1 00006 1 [9288400, 9290999] | 1 1 00007 1 [9538200, 9541999] | 1 1 00008 1 [9764400, 9766399] | 1 1 00009 1 [9998600, 9999999] | 1 1 ... ... ... ... ... ... 12624 X [149932200, 149934799] | 1 1 12625 X [151695800, 151697199] | 1 1 12626 X [152057200, 152059599] | 1 1 12627 X [154255600, 154257999] | 1 1 12628 X [159173200, 159175599] | 1 1 12629 X [160707200, 160709599] | 1 1 12630 X [162676400, 162677599] | 1 1 12631 X [166356400, 166384999] | 1 1 12632 X [166416000, 166421999] | 1 1 #find overlaps A,B. > t1 = findOverlappingPeaks (peaks_A_Ranged, peaks_B_Ranged, maxgap = 0, NameOfPeaks1 = "A", NameOfPeaks2 = "B", select = "all") There were 20 warnings (use warnings() to see them) > overlappingPeaks = as.data.frame(t1$Peaks1withOverlaps) > head(overlappingPeaks) space start end width names strand 1 1 3658000 3663999 6000 00001 + 2 1 4774000 4776599 2600 00005 + 3 1 4796800 4799399 2600 00006 + 4 1 6202800 6207199 4400 00010 + 5 1 7076200 7081599 5400 00014 + 6 1 9285400 9290999 5600 00017 + > dim(overlappingPeaks) [1] 9992 6 > write.table(as.data.frame(t1$Peaks1withOverlaps), file="overlappingPeaks.txt", sep="\t", row.names=F) #perform venn diagram > makeVennDiagram(RangedDataList(peaks_A_Ranged, peaks_B_Ranged), NameOfPeaks=c("A","B"), maxgap = 0, totalTest = 3E4, cex =1, counts.col='red', useFeature=F) $p.value [1] 1.455914e-08 $vennCounts A B Counts [1,] 0 0 4100 [2,] 0 1 2640 [3,] 1 0 13268 [4,] 1 1 9992 attr(,"class") [1] "VennCounts" Warning message: In findOverlappingPeaks(Peaks[[1]], Peaks[[2]], NameOfPeaks1 = NameOfPeaks[1], : Please use select instead of multiple! > dev.copy2eps(file='VennDiagram.eps') > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_2.4.0 limma_3.12.1 org.Hs.eg.db_2.7.1 [4] GO.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 [7] AnnotationDbi_1.18.1 BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.24.0 [10] GenomicRanges_1.8.9 Biostrings_2.24.1 IRanges_1.14.4 [13] multtest_2.12.0 Biobase_2.16.0 biomaRt_2.12.0 [16] BiocGenerics_0.2.0 gplots_2.11.0 MASS_7.3-19 [19] KernSmooth_2.23-8 caTools_1.13 bitops_1.0-4.1 [22] gdata_2.11.0 gtools_2.7.0 loaded via a namespace (and not attached): [1] RCurl_1.91-1 splines_2.15.0 stats4_2.15.0 survival_2.36-14 XML_3.9-4 [[alternative HTML version deleted]]