Hi Xianyong,
You can follow my code step by step. If you still can not get the same
results, please send me your sessionInfo() and history file.
About how to map the peaks into exon and intron, you need to prepare
the annotation dataset in RangedData format by yourself for
ChIPpeakAnno. There is a tool getAnnotation, please try
?getAnnotation
to get the help file and follow the examples in the help
documentation.
Good Luck.
Yours sincerely,
Jianhong Ou
jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu>
On Mar 19, 2012, at 3:10 PM, Ma, Xian-Yong wrote:
Hi, Jianhong:
Thanks very much for your email, I opened the output data file that
you generated, but I can't generate by using the code as following:
>max.print.o<-options(max.print=99999)
>annotatedPeak = annotatePeakInBatch(ma,
AnnotationData=TSS.human.NCBI36)
>as.data.frame(annotatedPeak)
if you use different code to generate the data?
Regarding the Exon or intron analysis, I think the NCBI database or
ExonIntron Database (EID) has this type of function,
I read some paper use the software (CisGenome) can do this analysis,
if your software can do this? because I am not very
familiar with bioinformatics , I am working on the tumor molecular
biology area.
the paper described this software is "www.pnas.org/cgi/doi/10.1073/pna
s.1110931108<http: www.pnas.org="" cgi="" doi="" 10.1073="" pnas.1110931108="">"
Best wishes,
Xianyong
On Mar 19, 2012, at 12:56 PM, Ou, Jianhong wrote:
Hi Xian-Yong,
I think you already get your answers why you only got partial data
output.
[ reached getOption("max.print") -- omitted 34386 rows ]]
try
max.print.o<-options(max.print=99999)
and then output the data.
Could you tell me how did you make Exon.human.NCBI36 and
Intron.human.NCBI36? If possible, could you share the dataset to me?
Yours sincerely,
Jianhong Ou
jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu>
On Mar 19, 2012, at 12:37 PM, Ma, Xian-Yong wrote:
Hi, Jianhong:
Thank you very much and I just checked the file you sent to me, and
returned to the ChIPPeakAnno program followed Julie's code, I found
the output only part of the row,
the showed data is 7142, and after this number, all of the peak was
deleted from output dataset:
7141 425497 NearestStart
7142 426447 NearestStart
[ reached getOption("max.print") -- omitted 34386 rows ]]
output 7142 +omitted 34386= 41528 rows you showed to me, I don't know
why only got partial data output?
if I can get the results for my exon, intron or other regions peaks? I
just followed Julie's code as following:
to show the peaks from exons:
>annotatedPeak = annotatePeakInBatch(ma,
AnnotationData=Exon.human.NCBI36)
or to show the peaks from intron as following:
>annotatedPeak = annotatePeakInBatch(ma,
AnnotationData=Intron.human.NCBI36)
is right?
Thanks again for your nice help!
Xianyong
On Mar 19, 2012, at 11:10 AM, Ou, Jianhong wrote:
Hi Xian-Yong,
I opened you bed file and found that there are 41528 rows in the file.
If you only annotate the first 5000 rows, you should get the
annotation for the first 5000 rows. So please try Julie's code again.
I also copied my output here and hope this will help you.
> setwd("/Users/jianhongou/Documents/Julie")
> library(ChIPpeakAnno)
Loading required package: biomaRt
Loading required package: multtest
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material. To view, type
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation("pkgname")'.
Loading required package: IRanges
Attaching package: IRanges
The following object(s) are masked from package:Biobase:
updateObject
The following object(s) are masked from package:base:
cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int,
pmin, pmin.int, rbind, rep.int, setdiff, table, union
Loading required package: Biostrings
Loading required package: BSgenome
Loading required package: GenomicRanges
Loading required package: BSgenome.Ecoli.NCBI.20080805
Loading required package: GO.db
Loading required package: AnnotationDbi
Loading required package: DBI
Loading required package: org.Hs.eg.db
Loading required package: limma
Loading required package: gplots
Loading required package: gtools
Loading required package: gdata
gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
Attaching package: gdata
The following object(s) are masked from package:IRanges:
trim
The following object(s) are masked from package:Biobase:
combine
The following object(s) are masked from package:stats:
nobs
The following object(s) are masked from package:utils:
object.size
Loading required package: caTools
Loading required package: bitops
Attaching package: caTools
The following object(s) are masked from package:IRanges:
runmean
Loading required package: grid
Loading required package: KernSmooth
KernSmooth 2.23 loaded
Copyright M. P. Wand 1997-2009
Attaching package: gplots
The following object(s) are masked from package:IRanges:
space
The following object(s) are masked from package:multtest:
wapply
The following object(s) are masked from package:stats:
lowess
Warning messages:
1: package AnnotationDbi was built under R version 2.14.2
2: package limma was built under R version 2.14.2
3: replacing previous import space when loading IRanges
> ?annotatePeakInBatch
starting httpd help server ... done
> data(TSS.human.NCBI36)
> ma<-read.delim("ma2.bed",header=F)
> head(ma)
V1 V2 V3
1 chr1 9949 10500
2 chr1 114849 115000
3 chr1 115749 115850
4 chr1 117649 117750
5 chr1 123749 123850
6 chr1 124249 124400
> colnames(ma)<-c("chrom","chromStart","chromEnd")
> ma.rd<-BED2RangedData(ma)
> head(ma.rd)
RangedData with 6 rows and 2 value columns across 25 spaces
space ranges | strand score
<factor> <iranges> | <numeric> <numeric>
00001 1 [ 9949, 10500] | 1 1
00002 1 [114849, 115000] | 1 1
00003 1 [115749, 115850] | 1 1
00004 1 [117649, 117750] | 1 1
00005 1 [123749, 123850] | 1 1
00006 1 [124249, 124400] | 1 1
> tail(ma.rd)
RangedData with 6 rows and 2 value columns across 25 spaces
space ranges | strand score
<factor> <iranges> | <numeric> <numeric>
41523 Y [58996299, 58997400] | 1 1
41524 Y [58997549, 58997800] | 1 1
41525 Y [59005049, 59005250] | 1 1
41526 Y [59020749, 59020850] | 1 1
41527 Y [59024149, 59024300] | 1 1
41528 Y [59027749, 59027900] | 1 1
> tail(ma)
chrom chromStart chromEnd
41523 chrY 58996299 58997400
41524 chrY 58997549 58997800
41525 chrY 59005049 59005250
41526 chrY 59020749 59020850
41527 chrY 59024149 59024300
41528 chrY 59027749 59027900
> annotatedPeak<-annotatePeakInBatch(ma.rd[1:5000,],AnnotationData=TSS
.human.NCBI36)
> tail(annotatedPeak)
RangedData with 6 rows and 9 value columns across 2 spaces
space ranges | peak
strand feature start_position end_position insideFeature
distancetoFeature shortestDistance
<factor> <iranges> | <character>
<character> <character> <numeric> <numeric> <character>
<numeric> <numeric>
04982 ENSG00000165606 10 [50319257, 50319808] | 04982
- ENSG00000165606 50242243 50273992 upstream
-45265 45265
04983 ENSG00000165606 10 [50328407, 50328608] | 04983
- ENSG00000165606 50242243 50273992 upstream
-54415 54415
04984 ENSG00000032514 10 [50373707, 50373808] | 04984
- ENSG00000032514 50336715 50417078 inside
43371 36992
04987 ENSG00000209936 10 [50593507, 50593808] | 04987
- ENSG00000209936 50581041 50581425 upstream
-12082 12082
04999 ENSG00000219927 10 [51192307, 51192408] | 04999
- ENSG00000219927 51202108 51202578 downstream
10271 9700
05000 ENSG00000197612 10 [51389007, 51389258] | 05000
- ENSG00000197612 51398379 51398687 downstream
9680 9121
fromOverlappingOrNearest
<character>
04982 ENSG00000165606 NearestStart
04983 ENSG00000165606 NearestStart
04984 ENSG00000032514 NearestStart
04987 ENSG00000209936 NearestStart
04999 ENSG00000219927 NearestStart
05000 ENSG00000197612 NearestStart
> ma.rd[5000,]
RangedData with 1 row and 2 value columns across 25 spaces
space ranges | strand score
<factor> <iranges> | <numeric> <numeric>
05000 10 [51389007, 51389258] | 1 1
> ma.rd[20000,]
RangedData with 1 row and 2 value columns across 25 spaces
space ranges | strand score
<factor> <iranges> | <numeric> <numeric>
20000 18 [72846414, 72846715] | 1 1
> annotatedPeak<-annotatePeakInBatch(ma.rd[20000:22000,],AnnotationDat
a=TSS.human.NCBI36)
> tail(annotatedPeak)
RangedData with 6 rows and 9 value columns across 3 spaces
space ranges | peak
strand feature start_position end_position insideFeature
distancetoFeature shortestDistance
<factor> <iranges> | <character>
<character> <character> <numeric> <numeric> <character>
<numeric> <numeric>
21971 ENSG00000207350 2 [88480587, 88481138] | 21971
- ENSG00000207350 88495532 88495638 downstream
15051 14394
21973 ENSG00000172071 2 [88679537, 88679688] | 21973
- ENSG00000172071 88637376 88708209 inside
28672 28521
21974 ENSG00000172071 2 [88739987, 88740138] | 21974
- ENSG00000172071 88637376 88708209 upstream
-31778 31778
21975 ENSG00000211592 2 [88906287, 88906388] | 21975
- ENSG00000211592 88937989 88938311 downstream
32024 31601
21977 ENSG00000220770 2 [89219237, 89219338] | 21977
- ENSG00000220770 89215573 89215855 upstream
-3382 3382
21978 ENSG00000211619 2 [89462437, 89462638] | 21978
- ENSG00000211619 89410986 89411308 upstream
-51129 51129
fromOverlappingOrNearest
<character>
21971 ENSG00000207350 NearestStart
21973 ENSG00000172071 NearestStart
21974 ENSG00000172071 NearestStart
21975 ENSG00000211592 NearestStart
21977 ENSG00000220770 NearestStart
21978 ENSG00000211619 NearestStart
> ma.rd[21971:21978,]
RangedData with 8 rows and 2 value columns across 25 spaces
space ranges | strand score
<factor> <iranges> | <numeric> <numeric>
21971 2 [88480587, 88481138] | 1 1
21972 2 [88603637, 88603888] | 1 1
21973 2 [88679537, 88679688] | 1 1
21974 2 [88739987, 88740138] | 1 1
21975 2 [88906287, 88906388] | 1 1
21976 2 [88959637, 88959738] | 1 1
21977 2 [89219237, 89219338] | 1 1
21978 2 [89462437, 89462638] | 1 1
> annotatedPeak<-annotatePeakInBatch(ma.rd,AnnotationData=TSS.human.NC
BI36)
> tail(annotatedPeak)
RangedData with 6 rows and 9 value columns across 25 spaces
space ranges | peak
strand feature start_position end_position insideFeature
distancetoFeature shortestDistance
<factor> <iranges> | <character>
<character> <character> <numeric> <numeric> <character>
<numeric> <numeric>
41488 ENSG00000219871 Y [28812949, 28813100] | 41488
- ENSG00000219871 27150386 27190187 upstream
-1622762 1622762
41489 ENSG00000219871 Y [28813749, 28814400] | 41489
- ENSG00000219871 27150386 27190187 upstream
-1623562 1623562
41490 ENSG00000219871 Y [28815249, 28815500] | 41490
- ENSG00000219871 27150386 27190187 upstream
-1625062 1625062
41491 ENSG00000219871 Y [28816399, 28816900] | 41491
- ENSG00000219871 27150386 27190187 upstream
-1626212 1626212
41492 ENSG00000219871 Y [28817599, 28818450] | 41492
- ENSG00000219871 27150386 27190187 upstream
-1627412 1627412
41493 ENSG00000219871 Y [28818799, 28819050] | 41493
- ENSG00000219871 27150386 27190187 upstream
-1628612 1628612
fromOverlappingOrNearest
<character>
41488 ENSG00000219871 NearestStart
41489 ENSG00000219871 NearestStart
41490 ENSG00000219871 NearestStart
41491 ENSG00000219871 NearestStart
41492 ENSG00000219871 NearestStart
41493 ENSG00000219871 NearestStart
> write.csv(as.data.frame(annotatedPeak),"ma.annotatedPeak.031912.csv"
,row.names=F)
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets
methods base
other attached packages:
[1] ChIPpeakAnno_2.2.0 gplots_2.10.1
KernSmooth_2.23-7 caTools_1.12
[5] bitops_1.0-4.1 gdata_2.8.2
gtools_2.6.2 limma_3.10.3
[9] org.Hs.eg.db_2.6.4 GO.db_2.6.1
RSQLite_0.11.1 DBI_0.2-5
[13] AnnotationDbi_1.16.19
BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.22.0
GenomicRanges_1.6.7
[17] Biostrings_2.22.0 IRanges_1.12.6
multtest_2.10.0 Biobase_2.14.0
[21] biomaRt_2.10.0
loaded via a namespace (and not attached):
[1] MASS_7.3-17 RCurl_1.91-1 splines_2.14.1
survival_2.36-12 tools_2.14.1 XML_3.9-4
>
Yours sincerely,
Jianhong Ou
jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu>
On Mar 19, 2012, at 10:44 AM, Zhu, Lihua (Julie) wrote:
Dear Xian-Yong,
Jianhong will help you to resolve the issues. Could you please send us
the sessionInfo() output? Thanks!
Best regards,
Julie
On 3/19/12 10:05 AM, "Ma, Xian-Yong" <xian- yong.ma@yale.edu<x-msg:="" 61=""/>> wrote:
Dear Julie:
When I tried your code to analyze my CHIP data, I was still got the
peaks only from chromosome 1, 10 and 11, and I checked
back to the data and I found peaks distribute on all chromosomes,
another question is when I tried to search the exons or introns use
this program,
I can't get the results, here I attach one of my "bed" format file of
my CHIP data, would you please help me to figure out what's wrong when
I use your software? since I am struggle with this issue for several
days and I am really need to use the correct code to analyze these
data.
annotatedPeak = annotatePeakInBatch(ma,
AnnotationData=TSS.human.NCBI36) for TSS peak analysis?
annotatedPeak = annotatePeakInBatch(ma,
AnnotationData=exon.human.NCBI36) for exon peak analysis?
I appreciate your nice help,
Best regards,
Xianyong
On Mar 16, 2012, at 1:47 PM, Zhu, Lihua (Julie) wrote:
Xianyong,
Thanks for letting me know! Good luck!
Best regards,
Julie
On 3/16/12 1:27 PM, "Ma, Xian-Yong" <xian- yong.ma@yale.edu<x-msg:="" 61=""/>> wrote:
Dear Julie:
I just followed your suggestion and I got similar results, I will
check my
dataset and use your code again,
hope problem is from my dataset.
Best regards,
Xianyong
On Mar 16, 2012, at 11:33 AM, Zhu, Lihua (Julie) wrote:
Dear Xianyong,
You would need to use the following code without specifying the index
range
for your peak. Ma[1:6,] means the first 6 peaks.
annotatedPeak = annotatePeakInBatch(ma,
AnnotationData=TSS.human.NCBI36)
Hope this resolves your issue.
Best regards,
Julie
On 3/16/12 10:24 AM, "Ma, Xian-Yong" <xian- yong.ma@yale.edu<x-msg:="" 61=""/>> wrote:
Hi, Dear Dr.Zhu:
I am Xianyong Ma from Yale Medical School, I am working on a CHIP-
Sequecning
project, and I try to use
ChIPPeakAnno to analysis my data, it is a wonderful tool to map the
binding
sites for my purpose, now I have
trouble when I use it, I don't understand the meaning of following
sentence:
annotatedPeak = annotatePeakInBatch(ma[1:6,],
AnnotationData=TSS.human.NCBI36),
when I change the [1:6] to the number [1:5000], I only got the peaks
from
chromosome 1,10 and 11, I think I should get the bindings from all
chromosomes
since I check them by another method.
Thanks very much for your nice help!
sincerely,
Xianyong Ma
<ma.annotatedpeak.031912.csv.zip>
[[alternative HTML version deleted]]