Question

question regarding ChIPPeakAnno use

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 13 months ago

United States

Dear Xianyong, You would need to use the following code without specifying the index range for your peak. Ma[1:6,] means the first 6 peaks. annotatedPeak = annotatePeakInBatch(ma, AnnotationData=TSS.human.NCBI36) Hope this resolves your issue. Best regards, Julie On 3/16/12 10:24 AM, "Ma, Xian-Yong" <xian-yong.ma at="" yale.edu=""> wrote: > Hi, Dear Dr.Zhu: > > I am Xianyong Ma from Yale Medical School, I am working on a CHIP- Sequecning > project, and I try to use > ChIPPeakAnno to analysis my data, it is a wonderful tool to map the binding > sites for my purpose, now I have > trouble when I use it, I don't understand the meaning of following sentence: > annotatedPeak = annotatePeakInBatch(ma[1:6,], > AnnotationData=TSS.human.NCBI36), > when I change the [1:6] to the number [1:5000], I only got the peaks from > chromosome 1,10 and 11, I think I should get the bindings from all chromosomes > since I check them by another method. > > Thanks very much for your nice help! > > > sincerely, > > Xianyong Ma >

ChIPpeakAnno ChIPpeakAnno • 1.4k views

ADD COMMENT • link updated 12.7 years ago by Ma, Xian-Yong ▴ 20 • written 12.7 years ago by Julie Zhu ★ 4.3k

score 0 · Answer 1 · 2012-03-19

Hi, Jianhong: Thank you very much and I just checked the file you sent to me, and returned to the ChIPPeakAnno program followed Julie's code, I found the output only part of the row, the showed data is 7142, and after this number, all of the peak was deleted from output dataset: 7141 425497 NearestStart 7142 426447 NearestStart [ reached getOption("max.print") -- omitted 34386 rows ]] output 7142 +omitted 34386= 41528 rows you showed to me, I don't know why only got partial data output? if I can get the results for my exon, intron or other regions peaks? I just followed Julie's code as following: to show the peaks from exons: >annotatedPeak = annotatePeakInBatch(ma, AnnotationData=Exon.human.NCBI36) or to show the peaks from intron as following: >annotatedPeak = annotatePeakInBatch(ma, AnnotationData=Intron.human.NCBI36) is right? Thanks again for your nice help! Xianyong On Mar 19, 2012, at 11:10 AM, Ou, Jianhong wrote: Hi Xian-Yong, I opened you bed file and found that there are 41528 rows in the file. If you only annotate the first 5000 rows, you should get the annotation for the first 5000 rows. So please try Julie's code again. I also copied my output here and hope this will help you. > setwd("/Users/jianhongou/Documents/Julie") > library(ChIPpeakAnno) Loading required package: biomaRt Loading required package: multtest Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation("pkgname")'. Loading required package: IRanges Attaching package: IRanges The following object(s) are masked from package:Biobase: updateObject The following object(s) are masked from package:base: cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, setdiff, table, union Loading required package: Biostrings Loading required package: BSgenome Loading required package: GenomicRanges Loading required package: BSgenome.Ecoli.NCBI.20080805 Loading required package: GO.db Loading required package: AnnotationDbi Loading required package: DBI Loading required package: org.Hs.eg.db Loading required package: limma Loading required package: gplots Loading required package: gtools Loading required package: gdata gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED. Attaching package: gdata The following object(s) are masked from package:IRanges: trim The following object(s) are masked from package:Biobase: combine The following object(s) are masked from package:stats: nobs The following object(s) are masked from package:utils: object.size Loading required package: caTools Loading required package: bitops Attaching package: caTools The following object(s) are masked from package:IRanges: runmean Loading required package: grid Loading required package: KernSmooth KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 Attaching package: gplots The following object(s) are masked from package:IRanges: space The following object(s) are masked from package:multtest: wapply The following object(s) are masked from package:stats: lowess Warning messages: 1: package AnnotationDbi was built under R version 2.14.2 2: package limma was built under R version 2.14.2 3: replacing previous import space when loading IRanges > ?annotatePeakInBatch starting httpd help server ... done > data(TSS.human.NCBI36) > ma<-read.delim("ma2.bed",header=F) > head(ma) V1 V2 V3 1 chr1 9949 10500 2 chr1 114849 115000 3 chr1 115749 115850 4 chr1 117649 117750 5 chr1 123749 123850 6 chr1 124249 124400 > colnames(ma)<-c("chrom","chromStart","chromEnd") > ma.rd<-BED2RangedData(ma) > head(ma.rd) RangedData with 6 rows and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 00001 1 [ 9949, 10500] | 1 1 00002 1 [114849, 115000] | 1 1 00003 1 [115749, 115850] | 1 1 00004 1 [117649, 117750] | 1 1 00005 1 [123749, 123850] | 1 1 00006 1 [124249, 124400] | 1 1 > tail(ma.rd) RangedData with 6 rows and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 41523 Y [58996299, 58997400] | 1 1 41524 Y [58997549, 58997800] | 1 1 41525 Y [59005049, 59005250] | 1 1 41526 Y [59020749, 59020850] | 1 1 41527 Y [59024149, 59024300] | 1 1 41528 Y [59027749, 59027900] | 1 1 > tail(ma) chrom chromStart chromEnd 41523 chrY 58996299 58997400 41524 chrY 58997549 58997800 41525 chrY 59005049 59005250 41526 chrY 59020749 59020850 41527 chrY 59024149 59024300 41528 chrY 59027749 59027900 > annotatedPeak<-annotatePeakInBatch(ma.rd[1:5000,],AnnotationData=TSS .human.NCBI36) > tail(annotatedPeak) RangedData with 6 rows and 9 value columns across 2 spaces space ranges | peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance <factor> <iranges> | <character> <character> <character> <numeric> <numeric> <character> <numeric> <numeric> 04982 ENSG00000165606 10 [50319257, 50319808] | 04982 - ENSG00000165606 50242243 50273992 upstream -45265 45265 04983 ENSG00000165606 10 [50328407, 50328608] | 04983 - ENSG00000165606 50242243 50273992 upstream -54415 54415 04984 ENSG00000032514 10 [50373707, 50373808] | 04984 - ENSG00000032514 50336715 50417078 inside 43371 36992 04987 ENSG00000209936 10 [50593507, 50593808] | 04987 - ENSG00000209936 50581041 50581425 upstream -12082 12082 04999 ENSG00000219927 10 [51192307, 51192408] | 04999 - ENSG00000219927 51202108 51202578 downstream 10271 9700 05000 ENSG00000197612 10 [51389007, 51389258] | 05000 - ENSG00000197612 51398379 51398687 downstream 9680 9121 fromOverlappingOrNearest <character> 04982 ENSG00000165606 NearestStart 04983 ENSG00000165606 NearestStart 04984 ENSG00000032514 NearestStart 04987 ENSG00000209936 NearestStart 04999 ENSG00000219927 NearestStart 05000 ENSG00000197612 NearestStart > ma.rd[5000,] RangedData with 1 row and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 05000 10 [51389007, 51389258] | 1 1 > ma.rd[20000,] RangedData with 1 row and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 20000 18 [72846414, 72846715] | 1 1 > annotatedPeak<-annotatePeakInBatch(ma.rd[20000:22000,],AnnotationDat a=TSS.human.NCBI36) > tail(annotatedPeak) RangedData with 6 rows and 9 value columns across 3 spaces space ranges | peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance <factor> <iranges> | <character> <character> <character> <numeric> <numeric> <character> <numeric> <numeric> 21971 ENSG00000207350 2 [88480587, 88481138] | 21971 - ENSG00000207350 88495532 88495638 downstream 15051 14394 21973 ENSG00000172071 2 [88679537, 88679688] | 21973 - ENSG00000172071 88637376 88708209 inside 28672 28521 21974 ENSG00000172071 2 [88739987, 88740138] | 21974 - ENSG00000172071 88637376 88708209 upstream -31778 31778 21975 ENSG00000211592 2 [88906287, 88906388] | 21975 - ENSG00000211592 88937989 88938311 downstream 32024 31601 21977 ENSG00000220770 2 [89219237, 89219338] | 21977 - ENSG00000220770 89215573 89215855 upstream -3382 3382 21978 ENSG00000211619 2 [89462437, 89462638] | 21978 - ENSG00000211619 89410986 89411308 upstream -51129 51129 fromOverlappingOrNearest <character> 21971 ENSG00000207350 NearestStart 21973 ENSG00000172071 NearestStart 21974 ENSG00000172071 NearestStart 21975 ENSG00000211592 NearestStart 21977 ENSG00000220770 NearestStart 21978 ENSG00000211619 NearestStart > ma.rd[21971:21978,] RangedData with 8 rows and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 21971 2 [88480587, 88481138] | 1 1 21972 2 [88603637, 88603888] | 1 1 21973 2 [88679537, 88679688] | 1 1 21974 2 [88739987, 88740138] | 1 1 21975 2 [88906287, 88906388] | 1 1 21976 2 [88959637, 88959738] | 1 1 21977 2 [89219237, 89219338] | 1 1 21978 2 [89462437, 89462638] | 1 1 > annotatedPeak<-annotatePeakInBatch(ma.rd,AnnotationData=TSS.human.NC BI36) > tail(annotatedPeak) RangedData with 6 rows and 9 value columns across 25 spaces space ranges | peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance <factor> <iranges> | <character> <character> <character> <numeric> <numeric> <character> <numeric> <numeric> 41488 ENSG00000219871 Y [28812949, 28813100] | 41488 - ENSG00000219871 27150386 27190187 upstream -1622762 1622762 41489 ENSG00000219871 Y [28813749, 28814400] | 41489 - ENSG00000219871 27150386 27190187 upstream -1623562 1623562 41490 ENSG00000219871 Y [28815249, 28815500] | 41490 - ENSG00000219871 27150386 27190187 upstream -1625062 1625062 41491 ENSG00000219871 Y [28816399, 28816900] | 41491 - ENSG00000219871 27150386 27190187 upstream -1626212 1626212 41492 ENSG00000219871 Y [28817599, 28818450] | 41492 - ENSG00000219871 27150386 27190187 upstream -1627412 1627412 41493 ENSG00000219871 Y [28818799, 28819050] | 41493 - ENSG00000219871 27150386 27190187 upstream -1628612 1628612 fromOverlappingOrNearest <character> 41488 ENSG00000219871 NearestStart 41489 ENSG00000219871 NearestStart 41490 ENSG00000219871 NearestStart 41491 ENSG00000219871 NearestStart 41492 ENSG00000219871 NearestStart 41493 ENSG00000219871 NearestStart > write.csv(as.data.frame(annotatedPeak),"ma.annotatedPeak.031912.csv" ,row.names=F) > sessionInfo() R version 2.14.1 (2011-12-22) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_2.2.0 gplots_2.10.1 KernSmooth_2.23-7 caTools_1.12 [5] bitops_1.0-4.1 gdata_2.8.2 gtools_2.6.2 limma_3.10.3 [9] org.Hs.eg.db_2.6.4 GO.db_2.6.1 RSQLite_0.11.1 DBI_0.2-5 [13] AnnotationDbi_1.16.19 BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.22.0 GenomicRanges_1.6.7 [17] Biostrings_2.22.0 IRanges_1.12.6 multtest_2.10.0 Biobase_2.14.0 [21] biomaRt_2.10.0 loaded via a namespace (and not attached): [1] MASS_7.3-17 RCurl_1.91-1 splines_2.14.1 survival_2.36-12 tools_2.14.1 XML_3.9-4 > Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Mar 19, 2012, at 10:44 AM, Zhu, Lihua (Julie) wrote: Dear Xian-Yong, Jianhong will help you to resolve the issues. Could you please send us the sessionInfo() output? Thanks! Best regards, Julie On 3/19/12 10:05 AM, "Ma, Xian-Yong" <xian- yong.ma@yale.edu<x-msg:="" 61=""/>> wrote: Dear Julie: When I tried your code to analyze my CHIP data, I was still got the peaks only from chromosome 1, 10 and 11, and I checked back to the data and I found peaks distribute on all chromosomes, another question is when I tried to search the exons or introns use this program, I can't get the results, here I attach one of my "bed" format file of my CHIP data, would you please help me to figure out what's wrong when I use your software? since I am struggle with this issue for several days and I am really need to use the correct code to analyze these data. annotatedPeak = annotatePeakInBatch(ma, AnnotationData=TSS.human.NCBI36) for TSS peak analysis? annotatedPeak = annotatePeakInBatch(ma, AnnotationData=exon.human.NCBI36) for exon peak analysis? I appreciate your nice help, Best regards, Xianyong On Mar 16, 2012, at 1:47 PM, Zhu, Lihua (Julie) wrote: Xianyong, Thanks for letting me know! Good luck! Best regards, Julie On 3/16/12 1:27 PM, "Ma, Xian-Yong" <xian- yong.ma@yale.edu<x-msg:="" 61=""/>> wrote: Dear Julie: I just followed your suggestion and I got similar results, I will check my dataset and use your code again, hope problem is from my dataset. Best regards, Xianyong On Mar 16, 2012, at 11:33 AM, Zhu, Lihua (Julie) wrote: Dear Xianyong, You would need to use the following code without specifying the index range for your peak. Ma[1:6,] means the first 6 peaks. annotatedPeak = annotatePeakInBatch(ma, AnnotationData=TSS.human.NCBI36) Hope this resolves your issue. Best regards, Julie On 3/16/12 10:24 AM, "Ma, Xian-Yong" <xian- yong.ma@yale.edu<x-msg:="" 61=""/>> wrote: Hi, Dear Dr.Zhu: I am Xianyong Ma from Yale Medical School, I am working on a CHIP- Sequecning project, and I try to use ChIPPeakAnno to analysis my data, it is a wonderful tool to map the binding sites for my purpose, now I have trouble when I use it, I don't understand the meaning of following sentence: annotatedPeak = annotatePeakInBatch(ma[1:6,], AnnotationData=TSS.human.NCBI36), when I change the [1:6] to the number [1:5000], I only got the peaks from chromosome 1,10 and 11, I think I should get the bindings from all chromosomes since I check them by another method. Thanks very much for your nice help! sincerely, Xianyong Ma <ma.annotatedpeak.031912.csv.zip> [[alternative HTML version deleted]]

score 0 · Answer 2 · 2012-03-19

Hi Xian-Yong, I think you already get your answers why you only got partial data output. [ reached getOption("max.print") -- omitted 34386 rows ]] try max.print.o<-options(max.print=99999) and then output the data. Could you tell me how did you make Exon.human.NCBI36 and Intron.human.NCBI36? If possible, could you share the dataset to me? Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Mar 19, 2012, at 12:37 PM, Ma, Xian-Yong wrote: Hi, Jianhong: Thank you very much and I just checked the file you sent to me, and returned to the ChIPPeakAnno program followed Julie's code, I found the output only part of the row, the showed data is 7142, and after this number, all of the peak was deleted from output dataset: 7141 425497 NearestStart 7142 426447 NearestStart [ reached getOption("max.print") -- omitted 34386 rows ]] output 7142 +omitted 34386= 41528 rows you showed to me, I don't know why only got partial data output? if I can get the results for my exon, intron or other regions peaks? I just followed Julie's code as following: to show the peaks from exons: >annotatedPeak = annotatePeakInBatch(ma, AnnotationData=Exon.human.NCBI36) or to show the peaks from intron as following: >annotatedPeak = annotatePeakInBatch(ma, AnnotationData=Intron.human.NCBI36) is right? Thanks again for your nice help! Xianyong On Mar 19, 2012, at 11:10 AM, Ou, Jianhong wrote: Hi Xian-Yong, I opened you bed file and found that there are 41528 rows in the file. If you only annotate the first 5000 rows, you should get the annotation for the first 5000 rows. So please try Julie's code again. I also copied my output here and hope this will help you. > setwd("/Users/jianhongou/Documents/Julie") > library(ChIPpeakAnno) Loading required package: biomaRt Loading required package: multtest Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation("pkgname")'. Loading required package: IRanges Attaching package: IRanges The following object(s) are masked from package:Biobase: updateObject The following object(s) are masked from package:base: cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, setdiff, table, union Loading required package: Biostrings Loading required package: BSgenome Loading required package: GenomicRanges Loading required package: BSgenome.Ecoli.NCBI.20080805 Loading required package: GO.db Loading required package: AnnotationDbi Loading required package: DBI Loading required package: org.Hs.eg.db Loading required package: limma Loading required package: gplots Loading required package: gtools Loading required package: gdata gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED. Attaching package: gdata The following object(s) are masked from package:IRanges: trim The following object(s) are masked from package:Biobase: combine The following object(s) are masked from package:stats: nobs The following object(s) are masked from package:utils: object.size Loading required package: caTools Loading required package: bitops Attaching package: caTools The following object(s) are masked from package:IRanges: runmean Loading required package: grid Loading required package: KernSmooth KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 Attaching package: gplots The following object(s) are masked from package:IRanges: space The following object(s) are masked from package:multtest: wapply The following object(s) are masked from package:stats: lowess Warning messages: 1: package AnnotationDbi was built under R version 2.14.2 2: package limma was built under R version 2.14.2 3: replacing previous import space when loading IRanges > ?annotatePeakInBatch starting httpd help server ... done > data(TSS.human.NCBI36) > ma<-read.delim("ma2.bed",header=F) > head(ma) V1 V2 V3 1 chr1 9949 10500 2 chr1 114849 115000 3 chr1 115749 115850 4 chr1 117649 117750 5 chr1 123749 123850 6 chr1 124249 124400 > colnames(ma)<-c("chrom","chromStart","chromEnd") > ma.rd<-BED2RangedData(ma) > head(ma.rd) RangedData with 6 rows and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 00001 1 [ 9949, 10500] | 1 1 00002 1 [114849, 115000] | 1 1 00003 1 [115749, 115850] | 1 1 00004 1 [117649, 117750] | 1 1 00005 1 [123749, 123850] | 1 1 00006 1 [124249, 124400] | 1 1 > tail(ma.rd) RangedData with 6 rows and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 41523 Y [58996299, 58997400] | 1 1 41524 Y [58997549, 58997800] | 1 1 41525 Y [59005049, 59005250] | 1 1 41526 Y [59020749, 59020850] | 1 1 41527 Y [59024149, 59024300] | 1 1 41528 Y [59027749, 59027900] | 1 1 > tail(ma) chrom chromStart chromEnd 41523 chrY 58996299 58997400 41524 chrY 58997549 58997800 41525 chrY 59005049 59005250 41526 chrY 59020749 59020850 41527 chrY 59024149 59024300 41528 chrY 59027749 59027900 > annotatedPeak<-annotatePeakInBatch(ma.rd[1:5000,],AnnotationData=TSS .human.NCBI36) > tail(annotatedPeak) RangedData with 6 rows and 9 value columns across 2 spaces space ranges | peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance <factor> <iranges> | <character> <character> <character> <numeric> <numeric> <character> <numeric> <numeric> 04982 ENSG00000165606 10 [50319257, 50319808] | 04982 - ENSG00000165606 50242243 50273992 upstream -45265 45265 04983 ENSG00000165606 10 [50328407, 50328608] | 04983 - ENSG00000165606 50242243 50273992 upstream -54415 54415 04984 ENSG00000032514 10 [50373707, 50373808] | 04984 - ENSG00000032514 50336715 50417078 inside 43371 36992 04987 ENSG00000209936 10 [50593507, 50593808] | 04987 - ENSG00000209936 50581041 50581425 upstream -12082 12082 04999 ENSG00000219927 10 [51192307, 51192408] | 04999 - ENSG00000219927 51202108 51202578 downstream 10271 9700 05000 ENSG00000197612 10 [51389007, 51389258] | 05000 - ENSG00000197612 51398379 51398687 downstream 9680 9121 fromOverlappingOrNearest <character> 04982 ENSG00000165606 NearestStart 04983 ENSG00000165606 NearestStart 04984 ENSG00000032514 NearestStart 04987 ENSG00000209936 NearestStart 04999 ENSG00000219927 NearestStart 05000 ENSG00000197612 NearestStart > ma.rd[5000,] RangedData with 1 row and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 05000 10 [51389007, 51389258] | 1 1 > ma.rd[20000,] RangedData with 1 row and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 20000 18 [72846414, 72846715] | 1 1 > annotatedPeak<-annotatePeakInBatch(ma.rd[20000:22000,],AnnotationDat a=TSS.human.NCBI36) > tail(annotatedPeak) RangedData with 6 rows and 9 value columns across 3 spaces space ranges | peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance <factor> <iranges> | <character> <character> <character> <numeric> <numeric> <character> <numeric> <numeric> 21971 ENSG00000207350 2 [88480587, 88481138] | 21971 - ENSG00000207350 88495532 88495638 downstream 15051 14394 21973 ENSG00000172071 2 [88679537, 88679688] | 21973 - ENSG00000172071 88637376 88708209 inside 28672 28521 21974 ENSG00000172071 2 [88739987, 88740138] | 21974 - ENSG00000172071 88637376 88708209 upstream -31778 31778 21975 ENSG00000211592 2 [88906287, 88906388] | 21975 - ENSG00000211592 88937989 88938311 downstream 32024 31601 21977 ENSG00000220770 2 [89219237, 89219338] | 21977 - ENSG00000220770 89215573 89215855 upstream -3382 3382 21978 ENSG00000211619 2 [89462437, 89462638] | 21978 - ENSG00000211619 89410986 89411308 upstream -51129 51129 fromOverlappingOrNearest <character> 21971 ENSG00000207350 NearestStart 21973 ENSG00000172071 NearestStart 21974 ENSG00000172071 NearestStart 21975 ENSG00000211592 NearestStart 21977 ENSG00000220770 NearestStart 21978 ENSG00000211619 NearestStart > ma.rd[21971:21978,] RangedData with 8 rows and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 21971 2 [88480587, 88481138] | 1 1 21972 2 [88603637, 88603888] | 1 1 21973 2 [88679537, 88679688] | 1 1 21974 2 [88739987, 88740138] | 1 1 21975 2 [88906287, 88906388] | 1 1 21976 2 [88959637, 88959738] | 1 1 21977 2 [89219237, 89219338] | 1 1 21978 2 [89462437, 89462638] | 1 1 > annotatedPeak<-annotatePeakInBatch(ma.rd,AnnotationData=TSS.human.NC BI36) > tail(annotatedPeak) RangedData with 6 rows and 9 value columns across 25 spaces space ranges | peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance <factor> <iranges> | <character> <character> <character> <numeric> <numeric> <character> <numeric> <numeric> 41488 ENSG00000219871 Y [28812949, 28813100] | 41488 - ENSG00000219871 27150386 27190187 upstream -1622762 1622762 41489 ENSG00000219871 Y [28813749, 28814400] | 41489 - ENSG00000219871 27150386 27190187 upstream -1623562 1623562 41490 ENSG00000219871 Y [28815249, 28815500] | 41490 - ENSG00000219871 27150386 27190187 upstream -1625062 1625062 41491 ENSG00000219871 Y [28816399, 28816900] | 41491 - ENSG00000219871 27150386 27190187 upstream -1626212 1626212 41492 ENSG00000219871 Y [28817599, 28818450] | 41492 - ENSG00000219871 27150386 27190187 upstream -1627412 1627412 41493 ENSG00000219871 Y [28818799, 28819050] | 41493 - ENSG00000219871 27150386 27190187 upstream -1628612 1628612 fromOverlappingOrNearest <character> 41488 ENSG00000219871 NearestStart 41489 ENSG00000219871 NearestStart 41490 ENSG00000219871 NearestStart 41491 ENSG00000219871 NearestStart 41492 ENSG00000219871 NearestStart 41493 ENSG00000219871 NearestStart > write.csv(as.data.frame(annotatedPeak),"ma.annotatedPeak.031912.csv" ,row.names=F) > sessionInfo() R version 2.14.1 (2011-12-22) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_2.2.0 gplots_2.10.1 KernSmooth_2.23-7 caTools_1.12 [5] bitops_1.0-4.1 gdata_2.8.2 gtools_2.6.2 limma_3.10.3 [9] org.Hs.eg.db_2.6.4 GO.db_2.6.1 RSQLite_0.11.1 DBI_0.2-5 [13] AnnotationDbi_1.16.19 BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.22.0 GenomicRanges_1.6.7 [17] Biostrings_2.22.0 IRanges_1.12.6 multtest_2.10.0 Biobase_2.14.0 [21] biomaRt_2.10.0 loaded via a namespace (and not attached): [1] MASS_7.3-17 RCurl_1.91-1 splines_2.14.1 survival_2.36-12 tools_2.14.1 XML_3.9-4 > Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Mar 19, 2012, at 10:44 AM, Zhu, Lihua (Julie) wrote: Dear Xian-Yong, Jianhong will help you to resolve the issues. Could you please send us the sessionInfo() output? Thanks! Best regards, Julie On 3/19/12 10:05 AM, "Ma, Xian-Yong" <xian- yong.ma@yale.edu<x-msg:="" 61=""/>> wrote: Dear Julie: When I tried your code to analyze my CHIP data, I was still got the peaks only from chromosome 1, 10 and 11, and I checked back to the data and I found peaks distribute on all chromosomes, another question is when I tried to search the exons or introns use this program, I can't get the results, here I attach one of my "bed" format file of my CHIP data, would you please help me to figure out what's wrong when I use your software? since I am struggle with this issue for several days and I am really need to use the correct code to analyze these data. annotatedPeak = annotatePeakInBatch(ma, AnnotationData=TSS.human.NCBI36) for TSS peak analysis? annotatedPeak = annotatePeakInBatch(ma, AnnotationData=exon.human.NCBI36) for exon peak analysis? I appreciate your nice help, Best regards, Xianyong On Mar 16, 2012, at 1:47 PM, Zhu, Lihua (Julie) wrote: Xianyong, Thanks for letting me know! Good luck! Best regards, Julie On 3/16/12 1:27 PM, "Ma, Xian-Yong" <xian- yong.ma@yale.edu<x-msg:="" 61=""/>> wrote: Dear Julie: I just followed your suggestion and I got similar results, I will check my dataset and use your code again, hope problem is from my dataset. Best regards, Xianyong On Mar 16, 2012, at 11:33 AM, Zhu, Lihua (Julie) wrote: Dear Xianyong, You would need to use the following code without specifying the index range for your peak. Ma[1:6,] means the first 6 peaks. annotatedPeak = annotatePeakInBatch(ma, AnnotationData=TSS.human.NCBI36) Hope this resolves your issue. Best regards, Julie On 3/16/12 10:24 AM, "Ma, Xian-Yong" <xian- yong.ma@yale.edu<x-msg:="" 61=""/>> wrote: Hi, Dear Dr.Zhu: I am Xianyong Ma from Yale Medical School, I am working on a CHIP- Sequecning project, and I try to use ChIPPeakAnno to analysis my data, it is a wonderful tool to map the binding sites for my purpose, now I have trouble when I use it, I don't understand the meaning of following sentence: annotatedPeak = annotatePeakInBatch(ma[1:6,], AnnotationData=TSS.human.NCBI36), when I change the [1:6] to the number [1:5000], I only got the peaks from chromosome 1,10 and 11, I think I should get the bindings from all chromosomes since I check them by another method. Thanks very much for your nice help! sincerely, Xianyong Ma <ma.annotatedpeak.031912.csv.zip> [[alternative HTML version deleted]]

score 0 · Answer 3 · 2012-03-19

Hi Xianyong, You can follow my code step by step. If you still can not get the same results, please send me your sessionInfo() and history file. About how to map the peaks into exon and intron, you need to prepare the annotation dataset in RangedData format by yourself for ChIPpeakAnno. There is a tool getAnnotation, please try ?getAnnotation to get the help file and follow the examples in the help documentation. Good Luck. Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Mar 19, 2012, at 3:10 PM, Ma, Xian-Yong wrote: Hi, Jianhong: Thanks very much for your email, I opened the output data file that you generated, but I can't generate by using the code as following: >max.print.o<-options(max.print=99999) >annotatedPeak = annotatePeakInBatch(ma, AnnotationData=TSS.human.NCBI36) >as.data.frame(annotatedPeak) if you use different code to generate the data? Regarding the Exon or intron analysis, I think the NCBI database or ExonIntron Database (EID) has this type of function, I read some paper use the software (CisGenome) can do this analysis, if your software can do this? because I am not very familiar with bioinformatics , I am working on the tumor molecular biology area. the paper described this software is "www.pnas.org/cgi/doi/10.1073/pna s.1110931108<http: www.pnas.org="" cgi="" doi="" 10.1073="" pnas.1110931108="">" Best wishes, Xianyong On Mar 19, 2012, at 12:56 PM, Ou, Jianhong wrote: Hi Xian-Yong, I think you already get your answers why you only got partial data output. [ reached getOption("max.print") -- omitted 34386 rows ]] try max.print.o<-options(max.print=99999) and then output the data. Could you tell me how did you make Exon.human.NCBI36 and Intron.human.NCBI36? If possible, could you share the dataset to me? Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Mar 19, 2012, at 12:37 PM, Ma, Xian-Yong wrote: Hi, Jianhong: Thank you very much and I just checked the file you sent to me, and returned to the ChIPPeakAnno program followed Julie's code, I found the output only part of the row, the showed data is 7142, and after this number, all of the peak was deleted from output dataset: 7141 425497 NearestStart 7142 426447 NearestStart [ reached getOption("max.print") -- omitted 34386 rows ]] output 7142 +omitted 34386= 41528 rows you showed to me, I don't know why only got partial data output? if I can get the results for my exon, intron or other regions peaks? I just followed Julie's code as following: to show the peaks from exons: >annotatedPeak = annotatePeakInBatch(ma, AnnotationData=Exon.human.NCBI36) or to show the peaks from intron as following: >annotatedPeak = annotatePeakInBatch(ma, AnnotationData=Intron.human.NCBI36) is right? Thanks again for your nice help! Xianyong On Mar 19, 2012, at 11:10 AM, Ou, Jianhong wrote: Hi Xian-Yong, I opened you bed file and found that there are 41528 rows in the file. If you only annotate the first 5000 rows, you should get the annotation for the first 5000 rows. So please try Julie's code again. I also copied my output here and hope this will help you. > setwd("/Users/jianhongou/Documents/Julie") > library(ChIPpeakAnno) Loading required package: biomaRt Loading required package: multtest Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation("pkgname")'. Loading required package: IRanges Attaching package: IRanges The following object(s) are masked from package:Biobase: updateObject The following object(s) are masked from package:base: cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, setdiff, table, union Loading required package: Biostrings Loading required package: BSgenome Loading required package: GenomicRanges Loading required package: BSgenome.Ecoli.NCBI.20080805 Loading required package: GO.db Loading required package: AnnotationDbi Loading required package: DBI Loading required package: org.Hs.eg.db Loading required package: limma Loading required package: gplots Loading required package: gtools Loading required package: gdata gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED. Attaching package: gdata The following object(s) are masked from package:IRanges: trim The following object(s) are masked from package:Biobase: combine The following object(s) are masked from package:stats: nobs The following object(s) are masked from package:utils: object.size Loading required package: caTools Loading required package: bitops Attaching package: caTools The following object(s) are masked from package:IRanges: runmean Loading required package: grid Loading required package: KernSmooth KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 Attaching package: gplots The following object(s) are masked from package:IRanges: space The following object(s) are masked from package:multtest: wapply The following object(s) are masked from package:stats: lowess Warning messages: 1: package AnnotationDbi was built under R version 2.14.2 2: package limma was built under R version 2.14.2 3: replacing previous import space when loading IRanges > ?annotatePeakInBatch starting httpd help server ... done > data(TSS.human.NCBI36) > ma<-read.delim("ma2.bed",header=F) > head(ma) V1 V2 V3 1 chr1 9949 10500 2 chr1 114849 115000 3 chr1 115749 115850 4 chr1 117649 117750 5 chr1 123749 123850 6 chr1 124249 124400 > colnames(ma)<-c("chrom","chromStart","chromEnd") > ma.rd<-BED2RangedData(ma) > head(ma.rd) RangedData with 6 rows and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 00001 1 [ 9949, 10500] | 1 1 00002 1 [114849, 115000] | 1 1 00003 1 [115749, 115850] | 1 1 00004 1 [117649, 117750] | 1 1 00005 1 [123749, 123850] | 1 1 00006 1 [124249, 124400] | 1 1 > tail(ma.rd) RangedData with 6 rows and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 41523 Y [58996299, 58997400] | 1 1 41524 Y [58997549, 58997800] | 1 1 41525 Y [59005049, 59005250] | 1 1 41526 Y [59020749, 59020850] | 1 1 41527 Y [59024149, 59024300] | 1 1 41528 Y [59027749, 59027900] | 1 1 > tail(ma) chrom chromStart chromEnd 41523 chrY 58996299 58997400 41524 chrY 58997549 58997800 41525 chrY 59005049 59005250 41526 chrY 59020749 59020850 41527 chrY 59024149 59024300 41528 chrY 59027749 59027900 > annotatedPeak<-annotatePeakInBatch(ma.rd[1:5000,],AnnotationData=TSS .human.NCBI36) > tail(annotatedPeak) RangedData with 6 rows and 9 value columns across 2 spaces space ranges | peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance <factor> <iranges> | <character> <character> <character> <numeric> <numeric> <character> <numeric> <numeric> 04982 ENSG00000165606 10 [50319257, 50319808] | 04982 - ENSG00000165606 50242243 50273992 upstream -45265 45265 04983 ENSG00000165606 10 [50328407, 50328608] | 04983 - ENSG00000165606 50242243 50273992 upstream -54415 54415 04984 ENSG00000032514 10 [50373707, 50373808] | 04984 - ENSG00000032514 50336715 50417078 inside 43371 36992 04987 ENSG00000209936 10 [50593507, 50593808] | 04987 - ENSG00000209936 50581041 50581425 upstream -12082 12082 04999 ENSG00000219927 10 [51192307, 51192408] | 04999 - ENSG00000219927 51202108 51202578 downstream 10271 9700 05000 ENSG00000197612 10 [51389007, 51389258] | 05000 - ENSG00000197612 51398379 51398687 downstream 9680 9121 fromOverlappingOrNearest <character> 04982 ENSG00000165606 NearestStart 04983 ENSG00000165606 NearestStart 04984 ENSG00000032514 NearestStart 04987 ENSG00000209936 NearestStart 04999 ENSG00000219927 NearestStart 05000 ENSG00000197612 NearestStart > ma.rd[5000,] RangedData with 1 row and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 05000 10 [51389007, 51389258] | 1 1 > ma.rd[20000,] RangedData with 1 row and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 20000 18 [72846414, 72846715] | 1 1 > annotatedPeak<-annotatePeakInBatch(ma.rd[20000:22000,],AnnotationDat a=TSS.human.NCBI36) > tail(annotatedPeak) RangedData with 6 rows and 9 value columns across 3 spaces space ranges | peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance <factor> <iranges> | <character> <character> <character> <numeric> <numeric> <character> <numeric> <numeric> 21971 ENSG00000207350 2 [88480587, 88481138] | 21971 - ENSG00000207350 88495532 88495638 downstream 15051 14394 21973 ENSG00000172071 2 [88679537, 88679688] | 21973 - ENSG00000172071 88637376 88708209 inside 28672 28521 21974 ENSG00000172071 2 [88739987, 88740138] | 21974 - ENSG00000172071 88637376 88708209 upstream -31778 31778 21975 ENSG00000211592 2 [88906287, 88906388] | 21975 - ENSG00000211592 88937989 88938311 downstream 32024 31601 21977 ENSG00000220770 2 [89219237, 89219338] | 21977 - ENSG00000220770 89215573 89215855 upstream -3382 3382 21978 ENSG00000211619 2 [89462437, 89462638] | 21978 - ENSG00000211619 89410986 89411308 upstream -51129 51129 fromOverlappingOrNearest <character> 21971 ENSG00000207350 NearestStart 21973 ENSG00000172071 NearestStart 21974 ENSG00000172071 NearestStart 21975 ENSG00000211592 NearestStart 21977 ENSG00000220770 NearestStart 21978 ENSG00000211619 NearestStart > ma.rd[21971:21978,] RangedData with 8 rows and 2 value columns across 25 spaces space ranges | strand score <factor> <iranges> | <numeric> <numeric> 21971 2 [88480587, 88481138] | 1 1 21972 2 [88603637, 88603888] | 1 1 21973 2 [88679537, 88679688] | 1 1 21974 2 [88739987, 88740138] | 1 1 21975 2 [88906287, 88906388] | 1 1 21976 2 [88959637, 88959738] | 1 1 21977 2 [89219237, 89219338] | 1 1 21978 2 [89462437, 89462638] | 1 1 > annotatedPeak<-annotatePeakInBatch(ma.rd,AnnotationData=TSS.human.NC BI36) > tail(annotatedPeak) RangedData with 6 rows and 9 value columns across 25 spaces space ranges | peak strand feature start_position end_position insideFeature distancetoFeature shortestDistance <factor> <iranges> | <character> <character> <character> <numeric> <numeric> <character> <numeric> <numeric> 41488 ENSG00000219871 Y [28812949, 28813100] | 41488 - ENSG00000219871 27150386 27190187 upstream -1622762 1622762 41489 ENSG00000219871 Y [28813749, 28814400] | 41489 - ENSG00000219871 27150386 27190187 upstream -1623562 1623562 41490 ENSG00000219871 Y [28815249, 28815500] | 41490 - ENSG00000219871 27150386 27190187 upstream -1625062 1625062 41491 ENSG00000219871 Y [28816399, 28816900] | 41491 - ENSG00000219871 27150386 27190187 upstream -1626212 1626212 41492 ENSG00000219871 Y [28817599, 28818450] | 41492 - ENSG00000219871 27150386 27190187 upstream -1627412 1627412 41493 ENSG00000219871 Y [28818799, 28819050] | 41493 - ENSG00000219871 27150386 27190187 upstream -1628612 1628612 fromOverlappingOrNearest <character> 41488 ENSG00000219871 NearestStart 41489 ENSG00000219871 NearestStart 41490 ENSG00000219871 NearestStart 41491 ENSG00000219871 NearestStart 41492 ENSG00000219871 NearestStart 41493 ENSG00000219871 NearestStart > write.csv(as.data.frame(annotatedPeak),"ma.annotatedPeak.031912.csv" ,row.names=F) > sessionInfo() R version 2.14.1 (2011-12-22) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_2.2.0 gplots_2.10.1 KernSmooth_2.23-7 caTools_1.12 [5] bitops_1.0-4.1 gdata_2.8.2 gtools_2.6.2 limma_3.10.3 [9] org.Hs.eg.db_2.6.4 GO.db_2.6.1 RSQLite_0.11.1 DBI_0.2-5 [13] AnnotationDbi_1.16.19 BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.22.0 GenomicRanges_1.6.7 [17] Biostrings_2.22.0 IRanges_1.12.6 multtest_2.10.0 Biobase_2.14.0 [21] biomaRt_2.10.0 loaded via a namespace (and not attached): [1] MASS_7.3-17 RCurl_1.91-1 splines_2.14.1 survival_2.36-12 tools_2.14.1 XML_3.9-4 > Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Mar 19, 2012, at 10:44 AM, Zhu, Lihua (Julie) wrote: Dear Xian-Yong, Jianhong will help you to resolve the issues. Could you please send us the sessionInfo() output? Thanks! Best regards, Julie On 3/19/12 10:05 AM, "Ma, Xian-Yong" <xian- yong.ma@yale.edu<x-msg:="" 61=""/>> wrote: Dear Julie: When I tried your code to analyze my CHIP data, I was still got the peaks only from chromosome 1, 10 and 11, and I checked back to the data and I found peaks distribute on all chromosomes, another question is when I tried to search the exons or introns use this program, I can't get the results, here I attach one of my "bed" format file of my CHIP data, would you please help me to figure out what's wrong when I use your software? since I am struggle with this issue for several days and I am really need to use the correct code to analyze these data. annotatedPeak = annotatePeakInBatch(ma, AnnotationData=TSS.human.NCBI36) for TSS peak analysis? annotatedPeak = annotatePeakInBatch(ma, AnnotationData=exon.human.NCBI36) for exon peak analysis? I appreciate your nice help, Best regards, Xianyong On Mar 16, 2012, at 1:47 PM, Zhu, Lihua (Julie) wrote: Xianyong, Thanks for letting me know! Good luck! Best regards, Julie On 3/16/12 1:27 PM, "Ma, Xian-Yong" <xian- yong.ma@yale.edu<x-msg:="" 61=""/>> wrote: Dear Julie: I just followed your suggestion and I got similar results, I will check my dataset and use your code again, hope problem is from my dataset. Best regards, Xianyong On Mar 16, 2012, at 11:33 AM, Zhu, Lihua (Julie) wrote: Dear Xianyong, You would need to use the following code without specifying the index range for your peak. Ma[1:6,] means the first 6 peaks. annotatedPeak = annotatePeakInBatch(ma, AnnotationData=TSS.human.NCBI36) Hope this resolves your issue. Best regards, Julie On 3/16/12 10:24 AM, "Ma, Xian-Yong" <xian- yong.ma@yale.edu<x-msg:="" 61=""/>> wrote: Hi, Dear Dr.Zhu: I am Xianyong Ma from Yale Medical School, I am working on a CHIP- Sequecning project, and I try to use ChIPPeakAnno to analysis my data, it is a wonderful tool to map the binding sites for my purpose, now I have trouble when I use it, I don't understand the meaning of following sentence: annotatedPeak = annotatePeakInBatch(ma[1:6,], AnnotationData=TSS.human.NCBI36), when I change the [1:6] to the number [1:5000], I only got the peaks from chromosome 1,10 and 11, I think I should get the bindings from all chromosomes since I check them by another method. Thanks very much for your nice help! sincerely, Xianyong Ma <ma.annotatedpeak.031912.csv.zip> [[alternative HTML version deleted]]