How do I get all genes that overlap a peak as a comma separated column within a single row for the peak?
I tried ChIPseeker:
annotatePeak(peaks, level = "gene",
addFlankGeneInfo = FALSE,
assignGenomicAnnotation = TRUE,
TxDb = TxDb.Mmusculus.UCSC.mm10.knownGene,
annoDb = "org.Mm.eg.db",
ignoreOverlap = FALSE,
overlap="all")
Result:
Doesn't report all genes within peak. If change addFlankGeneInfo = TRUE, get some of the genes but doesn't cover exact peak size so not useful in this case.
I tried ChIPpeakAnno:
ucsc.mm10.knownGene <- genes(TxDb.Mmusculus.UCSC.mm10.knownGene)
gr_peaks <- toGRanges(merge_peaks, format="BED", header = FALSE, feature = "gene")
peaks_anno <- annotatePeakInBatch(gr_peaks,
AnnotationData = ucsc.mm10.knownGene,
output = "both",
select = "all")
peaks_anno <- addGeneIDs(annotatedPeak = peaks_anno,
orgAnn = "org.Mm.eg.db",
feature_id_type = "entrez_id",
IDs2Add = "symbol")
Result:
Returns all genes in peaks but in new rows instead of list in one column.
seqnames start end width strand feature start_position end_position feature_strand insideFeature distancetoFeature shortestDistance fromOverlappingOrNearest symbol
X07696.103889 chr11 96268247 96356723 88477 X07696 103889 96351632 96354014 + includeFeature -83385 2709 NearestLocation Hoxb2
X07696.15410 chr11 96268247 96356723 88477 X07696 15410 96323126 96347930 + includeFeature -54879 8793 NearestLocation Hoxb3
X07696.15412 chr11 96268247 96356723 88477 X07696 15412 96318267 96321638 + includeFeature -50020 35085 NearestLocation Hoxb4
X07696.15413 chr11 96268247 96356723 88477 X07696 15413 96303512 96306121 + includeFeature -35265 35265 NearestLocation Hoxb5
X07696.15414 chr11 96268247 96356723 88477 X07696 15414 96299171 96301569 + includeFeature -30924 30924 NearestLocation Hoxb6
X07696.15415 chr11 96268247 96356723 88477 X07696 15415 96286646 96290163 + includeFeature -18399 18399 NearestLocation Hoxb7
X07696.15416 chr11 96268247 96356723 88477 X07696 15416 96281905 96285325 + includeFeature -13658 13658 NearestLocation Hoxb8
X07696.15417 chr11 96268247 96356723 88477 X07696 15417 96271330 96276593 + includeFeature -3083 3083 NearestLocation Hoxb9
Want something like:
seqnames start end width strand feature start_position end_position feature_strand insideFeature distancetoFeature shortestDistance fromOverlappingOrNearest symbol
X07696.103889 chr11 96268247 96356723 88477 X07696 103889 96351632 96354014 + includeFeature -83385 2709 NearestLocation Hoxb2, Hoxb3, Hoxb4, Hoxb5, Hoxb6, Hoxb7, Hoxb8, Hoxb9
Did you tried ?condenseMatrixByColnames
for example:
library(ChIPpeakAnno)
res <- condenseMatrixByColnames(as.matrix(peaks_anno), "peak")
Thanks for your reply!
I tried that but get:
Error in as.vector(x) : no method for coercing this S4 class to a vector
Suggestions?
I made workaround with dplyr, but have to convert from class "GRanges" to "data.frame" to do that.
try this:
res <- condenseMatrixByColnames(as.matrix(as.data.frame(peaks_anno)), "peak")
Thanks! Is there a way to keep res as "GRanges" object?
NO. after merging, you can create a GRanges object by the new dataset.
Great, thanks for your quick reply!