Hello everyone,
this is my first post and my firt time using the AnnotationDbi package along with GO.db one.
I am trying to retrieve the 'Onology' category and description for a list of GO IDs. I started importing a text file full of Gene Ontologies Ids and I processed them using the function select
along with the GO.db database. The output of this command is very curios: it creates a correct Ontology and Description for the first 1,953 IDs and then it starts to place NA for the rest of them. I found this comment here (https://support.bioconductor.org/p/69790/) but I am thinking that my problem is different.
That's the code:
yy= scan('/home/text.txt', character(), sep='\t')
result=select(GO.db, keys=yy, columns = c("TERM",'ONTOLOGY'), keytype = "GOID")
result=data.frame(result)
colnames(result)=c('GO', 'TERM', 'ONTOLOGY')
Any suggestion?
Thanks in advance.
EDIT:
That's the output:
GO TERM ONTOLOGY
GO:0000012 single strand break repair BP
GO:0000016 lactase activity MF
GO:0000026 alpha-1,2-mannosyltransferase activity MF
GO:0000028 ribosomal small subunit assembly BP
GO:0000062 fatty-acyl-CoA binding MF
GO:0000076 DNA replication checkpoint BP
GO:0000082 G1/S transition of mitotic cell cycle BP
GO:0000086 G2/M transition of mitotic cell cycle BP
GO:0000109 nucleotide-excision repair complex CC
GO:0110152 NA NA
GO:0140326 NA NA
GO:0140359 NA NA
GO:0150099 NA NA
GO:0000003 NA NA
GO:0000009 NA NA
GO:0000027 NA NA
GO:0000032 NA NA
GO:0000038 NA NA
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=it_IT.UTF-8 LC_NUMERIC=C LC_TIME=it_IT.UTF-8 LC_COLLATE=it_IT.UTF-8 LC_MONETARY=it_IT.UTF-8
[6] LC_MESSAGES=it_IT.UTF-8 LC_PAPER=it_IT.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringr_1.4.0 ggplot2_3.3.0 GO.db_3.10.0 AnnotationDbi_1.48.0 IRanges_2.20.2 S4Vectors_0.24.3 Biobase_2.46.0
[8] BiocGenerics_0.32.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4 compiler_3.6.3 pillar_1.4.3 tools_3.6.3 digest_0.6.25 bit_1.1-15.2 RSQLite_2.2.0 memoise_1.1.0 lifecycle_0.2.0
[10] tibble_3.0.0 gtable_0.3.0 pkgconfig_2.0.3 rlang_0.4.5 DBI_1.1.0 cli_2.0.2 rstudioapi_0.11 withr_2.1.2 vctrs_0.2.4
[19] bit64_0.9-7 grid_3.6.3 tidyselect_1.0.0 glue_1.3.2 R6_2.4.1 fansi_0.4.1 purrr_0.3.3 blob_1.2.1 magrittr_1.5
[28] scales_1.1.0 ellipsis_0.3.0 assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.6 munsell_0.5.0 crayon_1.3.4
Dear James, I edited my question with what you asked. As you can see all my packages and versions are updated.
I tried with the dput function on my GO IDs (as you kindly suggested), but I still have the same problem. It's funny, but the first 1900 (about) GO IDs are processed correctly, but not the rest of them.
I also tried to split the GO IDs list in two different ones, but the ones which had <na> values in Go Term and Go Ontology unchanged. Is it possible that the database could have some limitations and my GO Ids do not match with this database? Copying my entries in QuickGO I have a 'normal' result.