Entering edit mode
Marco Blanchette
▴
220
@marco-blanchette-5439
Last seen 10.1 years ago
United States/Kansas City/Stowers Instiā¦
I am working on a project involving Schizosaccharomyces pombe as a
source for genomic analysis and love to use ReportingTools html
producing wrappers. However, I am struggling as there is no
AnnotationDbi package available for this organism. I decided to
finally take the plunge and try to see if I could be one myself using
AnnotationForge and was quite exciting to find that I could perhaps
melt one simply by using the makeOrgPackageFromNCBI(). Most likely,
something went wrong and I suspect a bug somewhere in the pipeline. I
have not dug deeper then trying to build the package and use it hoping
that someone closer to the code could shed some lights. Here the steps
I took:'
> library(AnnotationForge)
> makeOrgPackageFromNCBI(version = "0.1",
author = "Marco Blanchette <mab at="" stowers.org="">",
maintainer = "Marco Blanchette <mab at="" stowers.org="">",
outputDir = ".",
tax_id = "4896",
genus = "Schizosaccharomyces",
species = "pombe")
This step succeeded with only a warning:
Warning message:
In .makeSimpleTable(ug, table = "unigene", con) :
no values found for table unigene in this data chunk.
I didn't think this was critical enough to raise any red flag, so I
then proceeded with the installation that went smoothly
> library(devtools)
> install('org.Spombe.eg.db')
> library('org.Spombe.eg.db')
Then I try to use it with ReportingTools publish() but fail as it
returns an error related to Entrez ID which I had a conversion table
from biomaRt. I dug a bit deeper and found that none of the genes I
was querying were in the database to finally realize that there was
only 38 entries int the org.Spombe.eg.db database I had just created
and installed... Check this out:
> keytypes(org.Spombe.eg.db)
[1] "ENTREZID" "ACCNUM" "ALIAS" "CHR" "PMID" "REFSEQ"
[7] "SYMBOL" "UNIGENE" "GENENAME" "GO" "EVIDENCE" "ONTOLOGY"
Looking good! However:
> length(keys(org.Spombe.eg.db,'ENTREZID'))
[1] 38
Can someone close enough to the code shed some lights has to whether
there is a bug in AnnotationForge or whether it is the NCBI database
that is not conforming to what is expected? For instance, biomart has
5117 entrez ID
> library(biomaRt)
> mart <- useMart("fungi_mart_18","spombe_eg_gene")
> ensembl2entrez <- getBM(c('ensembl_gene_id','entrezgene'),mart=mart)
> sum(!is.na(ensembl2entrez$entrezgene))
[1] 5117
The ids I tested on the NCBI website return the correct genes.
However, only 10 of the AnnotationForge EntrezID (out of a skirmish 38
ids) are found in biomaRt
> sum(keys(org.Spombe.eg.db,'ENTREZID') %in%
ensembl2entrez$entrezgene)
[1] 10
Again, I would appreciate any comments or suggestions as to whether
this is a bug or something I did wrong or a miss alignment between the
NCBI S. pombe annotation and what is expected by AnnotationForge.
Thanks
--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.
Kansas City, MO 64110
Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018
Marc, I find this to still be the state of affairs and wonder if you have any suggested alternative approaches.
For example, there are only 36 rows of
gene_info
in the org.Scerevisiae.eg.db, forged as follows:FWIW: my specific aim now it to have an orgDB for sacCer that will interoperate with
clusterProfiler
, as per: https://support.bioconductor.org/p/118647/ - if you have any insight there it would be most welcome.