Hi,
It's likely I'm missing something obvious or being dense here, but it appears that there are no GO annotations attached to human mitochondrial genes in org.Hs.eg.db v3.4.2? This is not the case for mouse mitochondrial genes in org.Mm.eg.db, or for rat mitochondrial genes in org.Rn.eg.db, nor does it appear to be true for the underlying "lite" Gene Ontology database (ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/) on which I understand the GO mappings in org.Hs.eg.db are based. Is this correct?
The code below hopefully illustrates the issue. I'm using Bioconductor v3.6 (also tried with v3.5).
Thanks in advance,
Owen Dando
library(dplyr)
library(org.Hs.eg.db)
library(org.Mm.eg.db)
library(org.Rn.eg.db)
library(magrittr)
# Return the number of genes on chromosome 'chr'
number_of_genes_on_chromosome <- function(chr, gene_id_to_chromosome) {
gene_id_to_chromosome %>%
toTable %>%
filter(chromosome == chr) %>%
nrow
}
# Return a table counting the number of GO terms for each gene
gene_id_to_number_of_terms <- function(gene_id_to_go_term) {
gene_id_to_number_of_terms <- gene_id_to_go_term %>%
toTable() %>%
distinct(gene_id, go_id) %>%
group_by(gene_id) %>%
summarise(count=n())
}
# Return the number of genes on chromosome 'chr' annotated with at least one GO term
number_of_genes_on_chromosome_with_annotation <- function(
chr, gene_id_to_go_term, gene_id_to_chromosome) {
gene_id_to_chromosome %>%
toTable %>%
left_join(gene_id_to_go_term %>% gene_id_to_number_of_terms()) %>%
filter(chromosome == chr & !is.na(count)) %>%
nrow
}
# Return the percentage of genes on chromosome 'chr' annotated with at least one GO term
percentage_of_genes_with_annotations <- function(chr, gene_id_to_go_term, gene_id_to_chromosome) {
100 *
number_of_genes_on_chromosome_with_annotation(chr, gene_id_to_go_term, gene_id_to_chromosome) /
number_of_genes_on_chromosome(chr, gene_id_to_chromosome)
}
# Then for human mitochondrial genes there are none with GO annotations...
percentage_of_genes_with_annotations("MT", org.Hs.egGO2ALLEGS, org.Hs.egCHR)
> 0
# But for mouse and rat mitochondrial genes, at least some have annotations...
percentage_of_genes_with_annotations("MT", org.Mm.egGO2ALLEGS, org.Mm.egCHR)
> 100
percentage_of_genes_with_annotations("MT", org.Rn.egGO2ALLEGS, org.Rn.egCHR)
> 35.13514
# Print session info
sessionInfo()
> R version 3.4.2 (2017-09-28)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.3 LTS
> Matrix products: default
> BLAS: /usr/lib/libblas/libblas.so.3.6.0
> LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
> [4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C
> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
> [1] parallel stats4 stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] bindrcpp_0.2 magrittr_1.5 org.Rn.eg.db_3.4.2 org.Mm.eg.db_3.4.2
> [5] org.Hs.eg.db_3.4.2 AnnotationDbi_1.40.0 IRanges_2.12.0 S4Vectors_0.16.0
> [9] Biobase_2.38.0 BiocGenerics_0.24.0 dplyr_0.7.4
> loaded via a namespace (and not attached):
> [1] Rcpp_0.12.13 bindr_0.1 bit_1.1-12 R6_2.2.2 rlang_0.1.2
> [6] blob_1.1.0 tools_3.4.2 DBI_0.7 bit64_0.9-7 assertthat_0.2.0
> [11] digest_0.6.12 tibble_1.3.4 memoise_1.1.0 glue_1.2.0 RSQLite_2.0
> [16] compiler_3.4.2 pkgconfig_2.0.1
Hi James - many thanks for the quick response and explanation. I am attempting to follow up with NCBI as to why these mappings aren't present in gene2go.gz.
Just in case anyone else hits this issue: after following up with NCBI, this was indeed confirmed as a bug in the production of the underlying gene2go.gz data. Apparently this has now been resolved, so the corrected data will presumably percolate up into the next version of org.Hs.eg.db.