Hi, I tried to annotate my chipseq peak called regions with ChIPpeakAnno package using the following command. If i understand correctly : output=both, annotates to the nearest features (upstream and downstream) as well as the features overlapping within given maxgap (i.e 5000 bp ) distance.
final_anno<-annotatePeakInBatch(final,AnnotationData=ucsc.mm10.knownGene,output="both", maxgap=5000)
When I looked into the output results, the shortestdistance to Overlapping features is >5000bp (** in the below table). I think that features not overlapping within 5000 bp should be given NA. But looks like program is searching for greater than given maxgap distance. Can anyone help me in understanding this or how ChIPpeakAnno is making use to maxgap for overlapping features??
seqnames start end width strand peakNames peak feature start_position end_position feature_strand insideFeature distancetoFeature shortestDistance fromOverlappingOrNearest symbol
chr1 5071994 5072969 976 + c("peaks1_range__0002", "peaks1_range__0003", "peaks2_range__00005") 2 58175 4909576 5070285 - upstream -1709 1709 NearestLocation Rgs20
chr1 9772666 9773182 517 + c("peaks1_range__0011", "peaks2_range__00014") 6 73331 9747648 9791922 + inside 25018 **18740** Overlapping 1700034P13Rik
chr1 10286194 10286710 517 + c("peaks2_range__00017", "peaks1_range__0012") 7 211673 10137507 10232670 - upstream -53524 53524 NearestLocation Arfgef1
chr1 10337617 10338120 504 + c("peaks2_range__00019", "peaks1_range__0014") 9 329093 10324719 10719945 - inside 382328 12898 Overlapping Cpa6
chr1 10396872 10397407 536 + c("peaks2_range__00020", "peaks1_range__0015") 10 211673 10137507 10232670 - upstream -164202 164202 NearestLocation Arfgef1
chr1 10396872 10397407 536 + c("peaks2_range__00020", "peaks1_range__0015") 10 329093 10324719 10719945 - inside 323073 **72153** Overlapping Cpa6
chr1 16544879 16545397 519 + c("peaks1_range__0036", "peaks2_range__00054") 28 66799 16540788 16619338 - inside 74459 4091 Overlapping Ube2w
NearestLocation will ignore the maxgap parameter. If you want all annotation within 5K of gene, you can filter it after annotation. See https://support.bioconductor.org/p/60971/
Jianhong.
Thanks for your reply Ou! But i am using "both" option not the "nearestLocation" . And according to the manual maxgap should be considered for this parameter right?
"both" will output all the nearest features, in addition, will output any features that overlap the peak that is not the nearest features"
"both" means it will include all the results of nearestLocation and overlapping. I understand that this is a little confusion.
Yes, coming back to my main question. If "both" is considering overlapping features with maxgap:5000 then why does it reported the gene Cpa6 as its "overlapping" with distance of 72153 (which is > 5000 bp)
Example here:
The peak is inside the feature. Therefore, it is considered overlapping even though the distance between starts/ends are greater than 5000.
Hope it makes sense to you.
Best regards,
Julie
Thanks Julie, for your clarification:)