Question

ChIPpeakAnno annotatePeakInBatch output

0

Entering edit mode

ilaria.maurizio • 0

@15877814

Last seen 13 months ago

Italy

Hi everyone, I am trying to understand the output of ChIPpeakAnno annotatePeakInBatch(). I have used the following code to annotate my peak list to the reference genome:

annotatedpeaks <- annotatePeakInBatch( peaks.GR, AnnotationData=annoData, output = c("both"), maxgap = 0, multiple=F)

I don't understand the tool's output multiple= "at most one overlapping feature for each peak" as stated in the manual. My idea is to have my peaks annotated both to their nearest position and to the overlapping ones. In particular in the case the peak overlaps with a feature that is not the nearest I want it in the output. This is becase I realized that some peaks are annotated to their nearest position eventhough they overlap with a feature that reside on the same strand. How can I solve this problem?

Thanks

ChIPpeakAnno • 1.3k views

ADD COMMENT • link 17 months ago ilaria.maurizio • 0

score 1 · Answer 1 · 2023-10-26

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 9 hours ago

United States

The help page says that 'multiple' is kept for backwards compatibility, and that you should use 'select' instead. And if you use select = "all" (the default), you will get all peaks returned. Which seems to be what you want.

ADD COMMENT • link 17 months ago James W. MacDonald 68k

0

Entering edit mode

Thanks James for your kind reply!. I read what you mentioned in your comment and I tried it. The point is that when the peak is overlapping a feature the tools give me both the overlapping feature and the nearest feature to the peak. I would like to have an output containg only nearest feature for those peaks that don not reside within peaks and only overlapping features for those peaks that are inside genes...

ADD REPLY • link 17 months ago ilaria.maurizio • 0

0

Entering edit mode

Hi Ilaria,

Thank you for your great question! To achieve your specific goal, you can utilize the insideFeature column in the output file. By setting the insideFeature to "inside," you can effectively isolate peaks that fall within features.

There are several other values for the insideFeature column:

  "upstream": Indicates peaks situated upstream of features.
  "includeFeature": Indicates peaks exactly matching features.
   "overlapStart": Indicates peaks overlapping with feature starts.
    "inside" (as mentioned earlier): Indicates peaks located entirely within features.
    "overlapEnd": Indicates peaks overlapping with feature ends.

   "downstream": Indicates peaks located downstream of features.

Hope this fits your needs.

Best regards,

Julie

ADD REPLY • link 17 months ago Julie Zhu ★ 4.3k

score 0 · Answer 2 · 2023-10-26

I read your concern again:

I would like to have an output containg only nearest feature for those peaks that don not reside within peaks and only overlapping features for those peaks that are inside genes...

Seems like you would like to assign only one type of feature (either "nearest" or "overlapping", and "overlapping" is preferred if the "nearest" feature is not "overlapping") to each peak. Like you mentioned, if you set output = "both", select = "all", the tool gives both "overlapping" and "nearest" features to peaks. To obtain what you want, I suggest three steps: first, annotate peaks to the overlapping features; second, annotate the peaks that don't have overlapping features to the nearest features; last, concatenate the two. Below is some example codes.

library(ensembldb)
library(EnsDb.Hsapiens.v75)
data(myPeakList)
annoData <- annoGR(EnsDb.Hsapiens.v75)

# Step1: annotate peaks to the overlapping features, if "select = 'all'", multiple features can be assigned to a single peak.
anno_overlapping <- annotatePeakInBatch(myPeakList, AnnotationData = annoData, 
                                        output = "overlapping", select = "first")
anno_overlapping_non_na <- anno_overlapping[!is.na(anno_overlapping$feature)]

# Step2: annotate peaks that are without overlapping features to nearest features
myPeakList_non_overlapping <- myPeakList[!(names(myPeakList) %in% anno_overlapping_non_na$peak)]  
anno_nearest <- annotatePeakInBatch(myPeakList_non_overlapping, 
                                    AnnotationData = annoData, 
                                    output = "nearestLocation", select = "first")

# Step3: concatenate the two
anno_final <- c(anno_overlapping_non_na, anno_nearest)

The above code assigns either "overlapping" or "nearest" feature to peak, and if "overlapping" feature is not the "nearest", only the "overlapping" one will be reported. Hope this is what you want.