Recently I noticed that using the org.Hs.eg.db
database annotated specific genes (e.g. CASC15, ACTR3BP5, STAG3L3) with ENSEMBL IDs as "NA". This could be the limitation of database.
I tried EnsDb.Hsapiens.v86
which could determine the correct ENSEMBL IDs for these features.
library(EnsDb.Hsapiens.v86)
edb <- EnsDb.Hsapiens.v86
transcripts(edb, filter = GeneNameFilter("CASC15"))
I was able to use the same database in ChIPSeeker without any issues but somehow it does not captured the ENSEMBL ID which is available in the column named gene_id
.
library(ChIPseeker)
Anno <- annotatePeak(bedfile, tssRegion=c(-3000, 3000), TxDb=TxDb.Hsapiens.UCSC.hg38.knownGene, annoDb=EnsDb.Hsapiens.v86)
This fetched the annotation columns "geneId", "transcriptId", "distanceToTSS", "SYMBOL", "GENENAME".
How to modify the ChIPSeeker code to get the column "gene_id" from the EnsDb.Hsapiens.v86
database?
Thanks
annotatePeak is hard-coded to look for
geneId
, notgene_id
- see here: https://github.com/YuLab-SMU/ChIPseeker/blob/master/R/annotatePeak.R#L178You may want to create an issue on the GitHub page, and/or literally edit the column name of the object
EnsDb.Hsapiens.v86