Entering edit mode
Dear all,
I have a list of peaks from ChIPseq experiments. Now I am trying to
find to over which genes these peaks overlap (and extract the gene
name).
I'm sure this should be pretty easy, but I am just starting with
bioconductor, so some concepts are still vague.
Here's what I did so far:
# Not run because it is installed (as well as other packages)
# source("http://bioconductor.org/biocLite.R")
# biocLite("TxDb.Dmelanogaster.UCSC.dm3.ensGene")
# Load the Dmelanogaster genome
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene, quietly = TRUE)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
ee <- exonsBy(txdb, "gene")
# Load an subset of my peaks for the sake of the example
finalPeaks <- rbind(c("chr3R", "2788500", "2842850", "2815675",
"54350"), (c("chr3R", "12484350", "12661350", "12572850", "177000")))
rownames(finalPeaks) <- c("Peak 1", "Peak 2")
colnames(finalPeaks) <- c("chr", "start", "end", "center", "length")
# Create a GRanges object with the file I have
# finalPeaks is a matrix with rows being individual peaks and columns
<- c()
GRfinalPeaks <- GRanges(finalPeaks[,1], IRanges(start =
as.numeric(finalPeaks[,2]), end = as.numeric(finalPeaks[,3])))
I'm stuck from here. What I'd like is to get, for each peak, the
overlapping genes.
For example, an output that would be:
Peak1: GeneA, GeneB, GeneC, etc
Peak2: GeneD
Also, I don't know how simple it is because (as specified in the
output example) one peak can overlap several genes or none at all..
Thanks for any help,
Patrick
[[alternative HTML version deleted]]