Entering edit mode
Hi Amy,
Thank you very much for the feedback!
The inside feature is TRUE if the peakSummit (stored as Start in the
RangedData) is inside the nearest annotated feature, FALSE otherwise.
The feature you are proposing is useful for finding overlapping
fragments between query peak ranges and target ranges. The timing of
your suggestion cannot be better. We plan to add a function called
FindOverlappingPeaks that will be added to the dev version, your ideas
will be incorporated. Please let me know if you and others think that
it is better to incorporate your ideas into annotatePeakInBatch.
Thanks again for your help making this package more useful.
I have a favor to ask you and those who have used or are aware of the
ChIPpeakAnno package. We submitted a paper describing this package and
just got the reviewers' comments back. One question comes up is that
what this package offers that the existing annotation tools such as
Cisgenome and CEAS do not. I thought it might be useful to get the
feedback from the users of this package. If possible, could you please
send me your thoughts on this, especially the reasons you chose using
this package? Thanks a lot for your time and help!
Best regards,
Julie
*******************************************
Lihua Julie Zhu, Ph.D
Research Associate Professor
Program Gene Function and Expression
University of Massachusetts Medical School
364 Plantation Street, Room 613
Worcester, MA 01605
508-856-5256
http://www.umassmed.edu/pgfe/faculty/zhu.cfm
On 3/9/10 6:23 AM, "Amy Molesworth" <amy.m.molesworth@gsk.com> wrote:
Firstly I'd like to thank the authors of the very useful package
ChIPpeakAnno. I'd like to report a feature in ChIPpeakAnno
annotatePeakInBatch function results that other users may or may not
be aware of. I also propose improvements to compensate.
The resulting insideFeature column reports TRUE if the query peak is
either contained within an annotated feature, and also reports TRUE if
it overlaps the end of an annotated feature.
I think its worth noting that it reports FALSE if the peak overlaps
the beginning of an annotated feature, and also reports FALSE if the
peak overlaps in entirety an annotated feature(s).
So, perhaps the insideFeature column (or additional new column called
overlappingFeature) could report five options:
("false","inside","overlapStart","overlapEnd","super"). I haven't
looked into the effects on how distanceToFeature should/could be
called for each different scenario.
Apologies if this has already been addressed, or if others do not
consider this useful.
Details with dummy example are described below.
Many thanks,
Amy.
#####
In the dummy example below, p1 is bigger than f1 and consequently p1
overlaps it in entirety. It would be nice if ChIPpeakAnno could report
this - although I accept it may overlap more than one feature,
so would need to consider how to deal with that.
And another example from below, p3 in fact overlaps with the start of
f3, but is called as insideFeature=FALSE. It would be nice if
ChIPpeakAnno could report it as OverlapStart.
p4 is called as insideFeature = TRUE for overlapping with f4, but it
would be nice if ChIPpeakAnno could report it as OverlapEnd or
something similar.
And correctly p2 is called as insideFeature = TRUE for overlap with
f2, in this case p2 ranges are within the f2 ranges as you would
expect.
library(ChIPpeakAnno)
peaks = RangedData(IRanges(start=c(1543200,1557200,1563000,1569800,167
889600),end=c(1555199,1560599,1565199,1573799,167893599),names=c("p1",
"p2","p3","p4","p5")),strand=as.integer(1),space=c(6,6,6,6,5))
features = RangedData(IRanges(start=c(1549800,1554400,1565000,1569400
,167888600),end=c(1550599,1560799,1565399,1571199,167888999),names=c("
f1","f2","f3","f4","f5")),strand=as.integer(1),space=c(6,6,6,6,5))
annoPeaks = annotatePeakInBatch(peaks,AnnotationData=features)
as.data.frame(annoPeaks)
space start end width names strand feature start_position
1 5 167889600 167893599 4000 p5 1 f5 167888600
2 6 1543200 1555199 12000 p1 1 f1 1549800
3 6 1557200 1560599 3400 p2 1 f2 1554400
4 6 1563000 1565199 2200 p3 1 f3 1565000
5 6 1569800 1573799 4000 p4 1 f4 1569400
end_position insideFeature distancetoFeature
1 167888999 FALSE 1000
2 1550599 FALSE -6600
3 1560799 TRUE 2800
4 1565399 FALSE -2000
5 1571199 TRUE 400
> sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ChIPpeakAnno_1.3.0 org.Hs.eg.db_2.3.6
[3] GO.db_2.3.5 RSQLite_0.7-3
[5] DBI_0.2-4 AnnotationDbi_1.8.0
[7] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.14.0
[9] Biostrings_2.14.2 IRanges_1.5.18
[11] multtest_2.2.0 Biobase_2.6.0
[13] biomaRt_2.3.0
loaded via a namespace (and not attached):
[1] MASS_7.3-3 RCurl_1.3-0 XML_2.6-0 splines_2.10.0
[5] survival_2.35-7
-----------------------------------------------------------
This e-mail was sent by GlaxoSmithKline Services Unlimited
(registered in England and Wales No. 1047315), which is a
member of the GlaxoSmithKline group of companies. The
registered address of GlaxoSmithKline Services Unlimited
is 980 Great West Road, Brentford, Middlesex TW8 9GS.
-----------------------------------------------------------
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
[[alternative HTML version deleted]]