Question

how to choose annotation file in AnnotationHub web server

0

Entering edit mode

15958021290 ▴ 10

@15958021290-21573

Last seen 5.3 years ago

Hey, guys.I met a problem about choice of annotation file in AnnotationHub web server. I was up to do GO enrichemnt analysis of Oryza sativa. I find 3 latest(2019-10-29) annotation file in AnnotationHub web server in https://annotationhub.bioconductor.org/species/Oryza%20sativa .

AH75915 35257 EntrezID gene 346 unqiue GO term

AH75916 35574 EntrezID gene 346 unqiue GO term

AH75917 35257 EntrezID gene 345 unqiue GO term

They are same files when you use AnnotationHub in R(3.6.1) I download all 3 annotation file .And check the total EntrezID gene and total unique GO term. There are actually difference in two prarameter of 3 file. But when I use same gene list to do the GO enrichemnt. The result is mostly same But I don't know which benchmark I can baed on. More gene numbers ,more better? or More unique GO term ,more better? Hope someone can help me Thanks in advance!

annotation AnnotationHub • 1.5k views

ADD COMMENT • link 5.3 years ago 15958021290 ▴ 10

1

Entering edit mode

May I ask what code you used to determine the differences?
For instance when I grab the ENTREZID column I get 35257 for all three:

one = ah[["AH75915"]]
two = ah[["AH75916"]]
three = ah[["AH75917"]]

> length(keys(one, keytype="ENTREZID"))
[1] 35257
> length(keys(two, keytype="ENTREZID"))
[1] 35257
> length(keys(three, keytype="ENTREZID"))
[1] 35257

I'll investigate the code on how the files were generated to see if they were generated differently but firstly please provide the code you used to discover the differences.

ADD REPLY • link 5.3 years ago shepherl 4.1k

1

Entering edit mode

Looks pretty identical to me:

> d.f <- do.call(rbind, lapply(dbListTables(dbconn(one)), 
        function(x) sapply(c(one, two, three),
           function(y) dbGetQuery(dbconn(y), paste0("select count(*) from ", x, ";")))))
> rownames(d.f) <- dbListTables(dbconn(one))
> colnames(d.f) <- c("one","two","three")
> d.f
             one    two    three 
accessions   208558 208558 208558
alias        50651  50651  50651 
chromosomes  35091  35091  35091 
entrez_genes 35257  35257  35257 
gene_info    35257  35257  35257 
genes        35257  35257  35257 
go           7995   7995   7995  
go_all       83477  83477  83477 
go_bp        622    622    622   
go_bp_all    11248  11248  11248 
go_cc        6607   6607   6607  
go_cc_all    66906  66906  66906 
go_mf        766    766    766   
go_mf_all    5323   5323   5323  
map_counts   0      0      0     
map_metadata 0      0      0     
metadata     8      8      8     
pubmed       20462  20462  20462 
refseq       96029  96029  96029

There could be some differences there, but I can't imagine the row counts would be identical for every table if what was in those tables is different? Howeva

> sapply(dbListTables(dbconn(one)), function(x) all.equal(dbGetQuery(dbconn(one), paste("select * from", x))[,2], dbGetQuery(dbconn(two), paste("select * from", x))[,2]))
           accessions                 alias           chromosomes 
               "TRUE"                "TRUE"                "TRUE" 
         entrez_genes             gene_info                 genes 
               "TRUE"                "TRUE"                "TRUE" 
                   go                go_all                 go_bp 
               "TRUE"                "TRUE"                "TRUE" 
            go_bp_all                 go_cc             go_cc_all 
               "TRUE"                "TRUE"                "TRUE" 
                go_mf             go_mf_all            map_counts 
               "TRUE"                "TRUE"                "TRUE" 
         map_metadata              metadata                pubmed 
               "TRUE" "2 string mismatches"                "TRUE" 
               refseq 
               "TRUE" 
> sapply(dbListTables(dbconn(one)), function(x) all.equal(dbGetQuery(dbconn(one), paste("select * from", x))[,2], dbGetQuery(dbconn(three), paste("select * from", x))[,2]))
           accessions                 alias           chromosomes 
               "TRUE"                "TRUE"                "TRUE" 
         entrez_genes             gene_info                 genes 
               "TRUE"                "TRUE"                "TRUE" 
                   go                go_all                 go_bp 
               "TRUE"                "TRUE"                "TRUE" 
            go_bp_all                 go_cc             go_cc_all 
               "TRUE"                "TRUE"                "TRUE" 
                go_mf             go_mf_all            map_counts 
               "TRUE"                "TRUE"                "TRUE" 
         map_metadata              metadata                pubmed 
               "TRUE" "2 string mismatches"                "TRUE" 
               refseq 
               "TRUE" 
>

ADD REPLY • link 5.3 years ago James W. MacDonald 68k