Question

ChIP peak annotation: how to get all genes that overlap each peak?

0

Entering edit mode

94133 • 0

@94133-14305

Last seen 4.8 years ago

USA, Stanford

How do I get all genes that overlap a peak as a comma separated column within a single row for the peak?

I tried ChIPseeker:

annotatePeak(peaks, level = "gene",
addFlankGeneInfo = FALSE,
assignGenomicAnnotation = TRUE,
TxDb = TxDb.Mmusculus.UCSC.mm10.knownGene,
annoDb = "org.Mm.eg.db",
ignoreOverlap = FALSE,
overlap="all")

Result:

Doesn't report all genes within peak. If change addFlankGeneInfo = TRUE, get some of the genes but doesn't cover exact peak size so not useful in this case.

I tried ChIPpeakAnno:

ucsc.mm10.knownGene <- genes(TxDb.Mmusculus.UCSC.mm10.knownGene)
gr_peaks <- toGRanges(merge_peaks, format="BED", header = FALSE, feature = "gene")
peaks_anno <- annotatePeakInBatch(gr_peaks,
AnnotationData = ucsc.mm10.knownGene,
output = "both",
select = "all")
peaks_anno <- addGeneIDs(annotatedPeak = peaks_anno,
orgAnn = "org.Mm.eg.db",
feature_id_type = "entrez_id",
IDs2Add = "symbol")

Result:

Returns all genes in peaks but in new rows instead of list in one column.

seqnames start end width strand feature start_position end_position feature_strand insideFeature distancetoFeature shortestDistance fromOverlappingOrNearest symbol

X07696.103889 chr11 96268247 96356723 88477 X07696 103889 96351632 96354014 + includeFeature -83385 2709 NearestLocation Hoxb2

X07696.15410 chr11 96268247 96356723 88477 X07696 15410 96323126 96347930 + includeFeature -54879 8793 NearestLocation Hoxb3

X07696.15412 chr11 96268247 96356723 88477 X07696 15412 96318267 96321638 + includeFeature -50020 35085 NearestLocation Hoxb4

X07696.15413 chr11 96268247 96356723 88477 X07696 15413 96303512 96306121 + includeFeature -35265 35265 NearestLocation Hoxb5

X07696.15414 chr11 96268247 96356723 88477 X07696 15414 96299171 96301569 + includeFeature -30924 30924 NearestLocation Hoxb6

X07696.15415 chr11 96268247 96356723 88477 X07696 15415 96286646 96290163 + includeFeature -18399 18399 NearestLocation Hoxb7

X07696.15416 chr11 96268247 96356723 88477 X07696 15416 96281905 96285325 + includeFeature -13658 13658 NearestLocation Hoxb8

X07696.15417 chr11 96268247 96356723 88477 X07696 15417 96271330 96276593 + includeFeature -3083 3083 NearestLocation Hoxb9

Want something like:

seqnames start end width strand feature start_position end_position feature_strand insideFeature distancetoFeature shortestDistance fromOverlappingOrNearest symbol

X07696.103889 chr11 96268247 96356723 88477 X07696 103889 96351632 96354014 + includeFeature -83385 2709 NearestLocation Hoxb2, Hoxb3, Hoxb4, Hoxb5, Hoxb6, Hoxb7, Hoxb8, Hoxb9

ChIP-seq chippeakanno chipseeker • 2.0k views

ADD COMMENT • link 6.7 years ago 94133 • 0

1

Entering edit mode

Did you tried ?condenseMatrixByColnames

for example:

library(ChIPpeakAnno)

res <- condenseMatrixByColnames(as.matrix(peaks_anno), "peak")

ADD REPLY • link 6.7 years ago Ou, Jianhong ★ 1.3k

0

Entering edit mode

Thanks for your reply!

I tried that but get:

Error in as.vector(x) : no method for coercing this S4 class to a vector

Suggestions?

I made workaround with dplyr, but have to convert from class "GRanges" to "data.frame" to do that.

ADD REPLY • link 6.7 years ago 94133 • 0

1

Entering edit mode

try this:

res <- condenseMatrixByColnames(as.matrix(as.data.frame(peaks_anno)), "peak")

ADD REPLY • link 6.7 years ago Ou, Jianhong ★ 1.3k

0

Entering edit mode

Thanks! Is there a way to keep res as "GRanges" object?

ADD REPLY • link 6.7 years ago 94133 • 0

1

Entering edit mode

NO. after merging, you can create a GRanges object by the new dataset.

ADD REPLY • link 6.7 years ago Ou, Jianhong ★ 1.3k

0

Entering edit mode

Great, thanks for your quick reply!

ADD REPLY • link 6.7 years ago 94133 • 0