ChIPpeakAnno: findOverlapsOfPeaks keep GRanges metadata
5
0
Entering edit mode
da.de ▴ 30
@dade-7723
Last seen 23 months ago
Austria

Hi,

I am using the findOverlapsOfPeaks function from the ChIPpeakAnno_3.2.0 package.

Is there a possibility to keep the metadata of the GRanges objects used for the overlap? Would be nice to have it in the output (peaklist GRanges).

Or to at least keep the peak names from the input. For me it seems like findOverlapsOfPeaks is using the name of the GRanges + number for the peak names in the output peaklist.

GRanges1 peak=protA_antibody_1212
GRanges2 peak=protB_antibody_5894
names used in peaklist GRanges1///GRanges2:   GRanges1__1, GRanges2__1
would like to have: protA_antibody_1212, protB_antibody_5894

Thanks for your help!
Dagmar

chippeakanno findOverlapsOfPeaks • 4.2k views
ADD COMMENT
1
Entering edit mode
Ou, Jianhong ★ 1.3k
@ou-jianhong-4539
Last seen 5 days ago
United States

Hi Hari,

Please update your R into 3.2.0 or above.

findOverlapsOfPeaks function is introduced in ChIPpeakAnno v3.2.0 (happened to be same as R version) or above.
ADD COMMENT
0
Entering edit mode
Guangchuang Yu ★ 1.2k
@guangchuang-yu-5419
Last seen 10 weeks ago
China/Guangzhou/Southern Medical Univer…

You may interested in ChIPseeker, http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btv145

ADD COMMENT
0
Entering edit mode
Ou, Jianhong ★ 1.3k
@ou-jianhong-4539
Last seen 5 days ago
United States

Hi Dagmar,

It is very interesting. It should be designed to show the peak names in the output of peaklist. If there are duplicated names or NA names, it will convert to peak number. I am not sure I fully understand your case. If you got peak names in GRanges1 and GRanges2, but not got peak names in overlapping peaks, this should be a bug. If this is the case, could you kindly send me your data and the code to repeat that? Thank you.

 

> library(ChIPpeakAnno)
> packageVersion("ChIPpeakAnno")
[1] ‘3.2.0’
> bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno")
> gr1 <- toGRanges(bed, format="BED", header=FALSE)
> gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno")
> gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3)
> head(gr1)
GRanges object with 6 ranges and 1 metadata column:
              seqnames           ranges strand |     score
                 <Rle>        <IRanges>  <Rle> | <numeric>
  MACS_peak_1     chr1 [ 28341,  29610]      * |    160.81
  MACS_peak_2     chr1 [ 90821,  91234]      * |    133.12
  MACS_peak_3     chr1 [134974, 135538]      * |    138.99
  MACS_peak_4     chr1 [136331, 137068]      * |    106.17
  MACS_peak_5     chr1 [137277, 137847]      * |     124.9
  MACS_peak_6     chr1 [326732, 327221]      * |    190.74
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> head(gr2)
GRanges object with 6 ranges and 4 metadata columns:
           seqnames           ranges strand |   source     score    frame     group
              <Rle>        <IRanges>  <Rle> | <factor> <integer> <factor>  <factor>
  region_0     chr1 [713893, 714747]      + |  bed2gff         0        . region_0;
  region_1     chr1 [715023, 715578]      + |  bed2gff         0        . region_1;
  region_2     chr1 [724851, 725445]      + |  bed2gff         0        . region_2;
  region_3     chr1 [839467, 840090]      + |  bed2gff         0        . region_3;
  region_4     chr1 [856361, 856999]      + |  bed2gff         0        . region_4;
  region_5     chr1 [859315, 859903]      + |  bed2gff         0        . region_5;
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> head(peaklist[[3]])
GRanges object with 6 ranges and 1 metadata column:
      seqnames           ranges strand |                                     peakNames
         <Rle>        <IRanges>  <Rle> |                               <CharacterList>
  [1]     chr1 [713791, 715578]      * | gr1__MACS_peak_13,gr2__region_0,gr2__region_1
  [2]     chr1 [724851, 727191]      * |               gr2__region_2,gr1__MACS_peak_14
  [3]     chr1 [839467, 840090]      * |               gr1__MACS_peak_16,gr2__region_3
  [4]     chr1 [856361, 856999]      * |               gr1__MACS_peak_17,gr2__region_4
  [5]     chr1 [859315, 860144]      * |               gr2__region_5,gr1__MACS_peak_18
  [6]     chr1 [870970, 871568]      * |               gr2__region_7,gr1__MACS_peak_19
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> ol <- findOverlapsOfPeaks(gr1, gr2)
> peaklist <- ol$peaklist
> names(peaklist)
[1] "gr2"       "gr1"       "gr1///gr2"
> head(peaklist[[1]])
GRanges object with 6 ranges and 1 metadata column:
      seqnames             ranges strand |       peakNames
         <Rle>          <IRanges>  <Rle> | <CharacterList>
  [1]     chr1 [ 860248,  860833]      + |   gr2__region_6
  [2]     chr1 [ 905647,  906230]      + |  gr2__region_12
  [3]     chr1 [ 908528,  909096]      + |  gr2__region_13
  [4]     chr1 [ 918145,  918733]      + |  gr2__region_16
  [5]     chr1 [ 986902,  987370]      + |  gr2__region_23
  [6]     chr1 [1004125, 1004714]      + |  gr2__region_26
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

ADD COMMENT
0
Entering edit mode

Hi,
thanks for your reply. I found out why it did not keep the names. I only had them as a metadata column gr1$names not as names(gr1).
I changed this and now it works.
However, any suggestions how to keep the metadata?

ADD REPLY
0
Entering edit mode

Hi,

To keep metadata is difficult. When I develop the package, I should consider about the metadata format. If all the peaks keep same structure of metadata, it is much easy. However, not every time it will work like that. When then peaks contain different metadata, I could not suppose the same column name indicates same info. Even more, if the metadata is contain complex data such as list, it will be time consuming to merge them.

Here, once you got the peakNames as CharacterList, you can extract the metadata as you want. My codes may be

peakNames <- peaklist[[3]]$peakNames
peakNames <- lapply(peakNames, function(.ele) as.data.frame(do.call(rbind, strsplit(.ele, "__"))))
names(peakNames) <- paste("ol", formatC(1:length(peakNames), flag="0", width=nchar(length(peakNames))), sep="")
peakNamesGP <- do.call(rbind, peakNames)
gp1 <- peakNamesGP[peakNamesGP$V1=="gr1",]
gp1.grs <- gr1[gp1$V2]
gp1.gps <- gsub("\\.\\d+$", "", rownames(gp1))
 
ADD REPLY
0
Entering edit mode

Hi,

Regarding on your data table, in particular, the score column, how can I get correct pvalue format on it, and add it as new slot on GRanges objects? Thanks

Best regards

Jurat

 

ADD REPLY
0
Entering edit mode

Maybe you can have a try the following code to understand how to add original metadata into the overlapping peaks:

library(ChIPpeakAnno)
path <- system.file("extdata", package="ChIPpeakAnno")
files <- dir(path, "broadPeak")
data <- sapply(file.path(path, files), toGRanges, format="broadPeak")
names(data) <- gsub(".broadPeak", "", files)
ol <- findOverlapsOfPeaks(data)
olpeaks <- ol$peaklist[[length(ol$peaklist)]]
data.GR <- unlist(GRangesList(data), use.names=TRUE)
olpeaks$score <- sapply(olpeaks$peakNames, function(.ele) data.GR[gsub("__", ".", .ele)]$score)
head(olpeaks$score)
ADD REPLY
0
Entering edit mode
Hari Easwaran ▴ 240
@hari-easwaran-3510
Last seen 9.6 years ago
United States

I am facing a weird problem with ChIPpeakAnno. I am  to able to fine the function findOverlapsOfPeaks. I get the following:

> findOverlapsOfPeaks
Error: object 'findOverlapsOfPeaks' not found

However, other functions like annotatePeakInBatch can be found.

Any idea what could be wrong. Thanks for your help.

 

> R.Version()
$platform
[1] "x86_64-apple-darwin10.8.0"

$arch
[1] "x86_64"

$os
[1] "darwin10.8.0"

$system
[1] "x86_64, darwin10.8.0"

$status
[1] ""

$major
[1] "3"

$minor
[1] "1.3"

$year
[1] "2015"

$month
[1] "03"

$day
[1] "09"

$`svn rev`
[1] "67962"

$language
[1] "R"

$version.string
[1] "R version 3.1.3 (2015-03-09)"

$nickname
[1] "Smooth Sidewalk"
ADD COMMENT
0
Entering edit mode
Hari Easwaran ▴ 240
@hari-easwaran-3510
Last seen 9.6 years ago
United States

Hi Jianhong, Thanks for your message. It works now. Earlier I had a problem with R3.2 in that it could not find BiocParallel, and I could not install it. Had to restart R and reinstall Bioconductor, ChIPpeakAnno, BiocParallel.... it works now.

Thanks.

 

ADD COMMENT

Login before adding your answer.

Traffic: 712 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6