Hi
Thanks for working on this but I am still getting the same result even
with 2.13.1. The annotatePeakInBatch is not assigning peaks to
features correctly when genes are located on the negative strand.
Below is an example that produces an incorrect result when using
annotatePeakInBatch. I think the problem is located in the output=
"nearestStart" because "overlapping" produces the correct result.
Specifically the second line of the annPeak output should be assigning
peak1 to overlapEnd instead of overlapStart because the gene feature
is located on the negative strand.
Matt
> library(ChIPpeakAnno)
>
>
> peak<-RangedData(IRanges(start=as.numeric(as.character(c(1650,2806,8
361))), end=as.numeric(as.character(c(1860,3006,8591)))),
space=c("Chr01","Chr01","Chr01"),strand=as.integer(1))
> row.names(peak)<-c("peak1","peak2","peak3")
>
>
> #load subset of data from Populus v210 gff3 file
> genome<-as.data.frame(rbind(c("Chr01","phytozome8_0",
"gene",1660,2502, ".", "-", ".","Potri.001G000100"),c("Chr01",
"phytozome8_0", "gene",2906,6646, ".", "-",
".","Potri.001G000200"),c("Chr01", "phytozome8_0", "gene",8391,8860,
".", "+", ".","Potri.001G000300")))
>
>
> GENOME<-GFF2RangedData(genome)
Warning message:
In DataFrame(...) : NAs introduced by coercion
> row.names(GENOME)<-genome[,9]
>
> annPeak = annotatePeakInBatch(peak, AnnotationData=GENOME,
PeakLocForDistance ="middle",output="both")
Warning messages:
1: In mapply(FUN = f, ..., SIMPLIFY = FALSE) : NAs introduced by
coercion
2: In mapply(FUN = f, ..., SIMPLIFY = FALSE) : NAs introduced by
coercion
>
> View(annPeak)
space
start
end
width
names
peak
strand
feature
start_position
end_position
insideFeature
distancetoFeature
shortestDistance
Chr01
8361
8591
231
peak3 Potri.001G000300
peak3
+
Potri.001G000300
8391
8860
overlapStart
85
30
Chr01
1650
1860
211
peak1 Potri.001G000100
peak1
-
Potri.001G000100
1660
2502
overlapStart
747
10
Chr01
2806
3006
201
peak2 Potri.001G000100
peak2
-
Potri.001G000100
1660
2502
downstream
-404
304
Chr01
2806
3006
201
peak2 Potri.001G000200
peak2
-
Potri.001G000200
2906
6646
overlapEnd
3740
100
>
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=C LC_COLLATE=C
[5] LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] grid parallel stats graphics grDevices utils
[7] datasets methods base
other attached packages:
[1] ChIPpeakAnno_2.13.1 Biostrings_2.32.1
[3] XVector_0.4.0 IRanges_1.22.10
[5] biomaRt_2.20.0 VennDiagram_1.6.7
[7] GOSemSim_1.22.0 Rcpp_0.11.2
[9] BiocInstaller_1.14.2 edgeR_3.6.7
[11] limma_3.20.8 hash_2.2.6
[13] GSEABase_1.26.0 annotate_1.42.1
[15] GOstats_2.30.0 graph_1.42.0
[17] Category_2.30.0 GO.db_2.14.0
[19] RSQLite_0.11.4 DBI_0.2-7
[21] Matrix_1.1-4 AnnotationDbi_1.26.0
[23] GenomeInfoDb_1.0.2 Biobase_2.24.0
[25] BiocGenerics_0.10.0
loaded via a namespace (and not attached):
[1] AnnotationForge_1.6.1 BBmisc_1.7
[3] BSgenome_1.32.0 BatchJobs_1.3
[5] BiocParallel_0.6.1 GenomicAlignments_1.0.5
[7] GenomicFeatures_1.16.2 GenomicRanges_1.16.4
[9] MASS_7.3-33 RBGL_1.40.0
[11] RCurl_1.95-4.3 Rsamtools_1.16.1
[13] XML_3.98-1.1 bitops_1.0-6
[15] brew_1.0-6 checkmate_1.2
[17] codetools_0.2-8 digest_0.6.4
[19] fail_1.2 foreach_1.4.2
[21] genefilter_1.46.1 iterators_1.0.7
[23] lattice_0.20-29 multtest_2.20.0
[25] rtracklayer_1.24.2 sendmailR_1.1-2
[27] splines_3.1.0 stats4_3.1.0
[29] stringr_0.6.2 survival_2.37-7
[31] tools_3.1.0 xtable_1.7-3
[33] zlibbioc_1.10.0
>
From: Zhu, Lihua (Julie) [mailto:Julie.Zhu@umassmed.edu]
Sent: Wednesday, August 13, 2014 2:47 PM
To: Zinkgraf, Matthew S -FS
Cc: bioconductor at r-project.org
Subject: Re: ChIPpeakAnno 2.12.2 bug
Matt,
Many thanks for helping us identifying the bug introduced in 2.12.2!
The bug has been fixed in version 2.13.1. Please let me know if you
have any issues.
For future correspondence, could you please keep the discussion in the
bioconductor list so others can contribute or benefit? Thanks!
Best regards,
Julie
On 8/13/14 1:18 PM, "Zinkgraf, Matthew S -FS" <mszinkgraf at="" fs.fed.us=""> wrote:
Hello Julie
I recently updated ChIPpeakAnno from version 2.10 to 2.12.2 and I am
getting incorrect results from annotatePeakInBatch.
More specifically the output column 'insideFeature' is not taking
feature strand into account when it is assigning peaks to gene
features, the function thinks all gene features are on the positive
strand even though my genome ranged data contains strand information.
Also all other calculations from annotatePeakInBatch gives the same
results as in 2.10.
Please let me know if you would like more information about the bug I
am getting.
Thanks
Matt
---
Matthew S. Zinkgraf, PhD
Postdoctoral Researcher
USDA, Forest Service
Davis, CA 95618
530-759-1739
mszinkgraf at fs.fed.us <mailto:mszinkgraf at="" fs.fed.us="">
This electronic message contains information generated by the USDA
solely for the intended recipients. Any unauthorized interception of
this message or the use or disclosure of the information it contains
may violate the law and subject the violator to civil or criminal
penalties. If you believe you have received this message in error,
please notify the sender and delete the email immediately.
[[alternative HTML version deleted]]