Presence of GO annotations for human mitochondrial genes in org.Hs.eg.db?
1
3
Entering edit mode
Owen Dando ▴ 40
@owen-dando-14300
Last seen 4.8 years ago

Hi, 

It's likely I'm missing something obvious or being dense here, but it appears that there are no GO annotations attached to human mitochondrial genes in org.Hs.eg.db v3.4.2? This is not the case for mouse mitochondrial genes in org.Mm.eg.db, or for rat mitochondrial genes in org.Rn.eg.db, nor does it appear to be true for the underlying "lite" Gene Ontology database (ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/) on which I understand the GO mappings in org.Hs.eg.db are based. Is this correct?

The code below hopefully illustrates the issue. I'm using Bioconductor v3.6 (also tried with v3.5). 

Thanks in advance,

Owen Dando


library(dplyr)
library(org.Hs.eg.db)
library(org.Mm.eg.db)
library(org.Rn.eg.db)
library(magrittr)

# Return the number of genes on chromosome 'chr'
number_of_genes_on_chromosome <- function(chr, gene_id_to_chromosome) {
  gene_id_to_chromosome %>%
    toTable %>% 
    filter(chromosome == chr) %>% 
    nrow
}

# Return a table counting the number of GO terms for each gene
gene_id_to_number_of_terms <- function(gene_id_to_go_term) {
  gene_id_to_number_of_terms <- gene_id_to_go_term %>% 
    toTable() %>% 
    distinct(gene_id, go_id) %>% 
    group_by(gene_id) %>% 
    summarise(count=n()) 
}

# Return the number of genes on chromosome 'chr' annotated with at least one GO term
number_of_genes_on_chromosome_with_annotation <- function(
  chr, gene_id_to_go_term, gene_id_to_chromosome) {

  gene_id_to_chromosome %>% 
    toTable %>% 
    left_join(gene_id_to_go_term %>% gene_id_to_number_of_terms()) %>% 
    filter(chromosome == chr & !is.na(count)) %>% 
    nrow  
}

# Return the percentage of genes on chromosome 'chr' annotated with at least one GO term
percentage_of_genes_with_annotations <- function(chr, gene_id_to_go_term, gene_id_to_chromosome) {
  100 * 
    number_of_genes_on_chromosome_with_annotation(chr, gene_id_to_go_term, gene_id_to_chromosome) / 
    number_of_genes_on_chromosome(chr, gene_id_to_chromosome)
}

# Then for human mitochondrial genes there are none with GO annotations...

percentage_of_genes_with_annotations("MT", org.Hs.egGO2ALLEGS, org.Hs.egCHR)

> 0

# But for mouse and rat mitochondrial genes, at least some have annotations...
percentage_of_genes_with_annotations("MT", org.Mm.egGO2ALLEGS, org.Mm.egCHR)

> 100

percentage_of_genes_with_annotations("MT", org.Rn.egGO2ALLEGS, org.Rn.egCHR)

> 35.13514

# Print session info
sessionInfo()

> R version 3.4.2 (2017-09-28)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.3 LTS

> Matrix products: default
> BLAS: /usr/lib/libblas/libblas.so.3.6.0
> LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

> locale:
> [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
> [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
> [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
> [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

> attached base packages:
> [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

> other attached packages:
> [1] bindrcpp_0.2         magrittr_1.5         org.Rn.eg.db_3.4.2   org.Mm.eg.db_3.4.2  
> [5] org.Hs.eg.db_3.4.2   AnnotationDbi_1.40.0 IRanges_2.12.0       S4Vectors_0.16.0    
> [9] Biobase_2.38.0       BiocGenerics_0.24.0  dplyr_0.7.4         

> loaded via a namespace (and not attached):
> [1] Rcpp_0.12.13     bindr_0.1        bit_1.1-12       R6_2.2.2         rlang_0.1.2     
> [6] blob_1.1.0       tools_3.4.2      DBI_0.7          bit64_0.9-7      assertthat_0.2.0
> [11] digest_0.6.12    tibble_1.3.4     memoise_1.1.0    glue_1.2.0       RSQLite_2.0     
> [16] compiler_3.4.2   pkgconfig_2.0.1

 

 

 

org.hs.eg.db gene ontology mitochondria • 1.7k views
ADD COMMENT
3
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

The org.Hs.eg.db package is simply a repackaging of what we can get from NCBI; in this case the gene2go.gz file we get from their FTP site. We use the mappings in that file to map Entrez Gene IDs to GO terms. We can look in the gene2go file to see what they provide:

> library(org.Hs.eg.db)
> z <- unlist(as.list(org.Hs.egCHR))
> egids <- names(z)[z %in% "MT"]
> library(org.Mm.eg.db)
> zz <- unlist(as.list(org.Mm.egCHR))
> megids <- names(zz)[zz %in% "MT"]
> length(system(paste("awk '{if($2 ~", paste0("/", paste0("^", egids, "$", collapse = "|"), "/"), ") print $0}' gene2go"), intern = TRUE))
[1] 0
> length(system(paste("awk '{if($2 ~", paste0("/", paste0("^", megids, "$", collapse = "|"), "/"), ") print $0}' gene2go"), intern = TRUE))
[1] 271

So NCBI isn't providing us with any mappings of Entrez Gene IDs to GO terms for human mitochondrial genes, but they are for mouse (and presumably rat).

 

ADD COMMENT
0
Entering edit mode

Hi James - many thanks for the quick response and explanation. I am attempting to follow up with NCBI as to why these mappings aren't present in gene2go.gz.

ADD REPLY
1
Entering edit mode

Just in case anyone else hits this issue: after following up with NCBI, this was indeed confirmed as a bug in the production of the underlying gene2go.gz data. Apparently this has now been resolved, so the corrected data will presumably percolate up into the next version of org.Hs.eg.db.

ADD REPLY

Login before adding your answer.

Traffic: 761 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6