# Invalid keytype: GOALL. Please use the keytypes method to see a listing of valid arguments.
1
0
Entering edit mode
@618f7cbf
Last seen 3 months ago
Spain

Hi everyone! Trying to solve some issue here about 'makeOrgPackage' to use gseGO function of clusterProfiler package. Please, any help will be very appreciated. I need to analyse GSE GO terms for my RNA-seq expression study in Quercus suber. First of all, I looked for an available OrgDb file on NCBI and pum, there is one but sadly doesn't include any GO annotations. Second, I prepared the GO annotations files to build another OrgDb with makeOrgPackage as follows with the specific columns: GID, CHROMOSOME, START, END, STRAND, GOALL and the GO, ONTOLOGY, EVIDENCE. However, seems that GOALL column, which allows you to perform the analysis can not be integrated by this tool as was reported before in: Use of clusterProfiler : Error in testForValidKeytype(x, keytype)

So, do you know any other way to build a new OrgDb or implement the exiting one with the GO terms I already have? Thanks,

Nuri

library(AnnotationHub)
# Is Quercus suber already in the hub database?
#UPLOAD THE WHOLE ANNOTATIONHUB
hub <- AnnotationHub()
query(hub, c("suber", "orgdb"))
#AnnotationHub with 1 record
QS2 <- hub[["AH114342"]]
keytypes(QS2)
[1] "ACCNUM"   "ALIAS"    "ENTREZID" "GENENAME" "GID"     
[6] "PMID"     "REFSEQ"   "SYMBOL"
#no GO annotations

library(AnnotationDbi)
AnnotationDbi::keytypes(orgdb)
AnnotationDbi::columns(orgdb)

library(AnnotationForge)
a=read.csv(file = "gene_info.tsv", sep = "\t")
b=read.csv(file = "go.tsv", sep = "\t")
c=read.csv(file = "goall.tsv", sep = "\t")
makeOrgPackage(
  gene_info = a,  
  go = b,             
  goall = c,         
  tax_id = "58331",                    # Taxonomy ID for Quercus suber
  genus = "Quercus",
  species = "suber",
  version = "0.99.0",
  outputDir = "."
)
#  Invalid keytype: GOALL. Please use the keytypes method to see a listing of valid arguments.

sessionInfo( )
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8      
 [2] LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                 
 [9] LC_ADDRESS=C              
[10] LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8
[12] LC_IDENTIFICATION=C       

time zone: Europe/Madrid
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats    
[3] graphics  grDevices
[5] utils     datasets 
[7] methods   base     

other attached packages:
 [1] tidyr_1.3.1            
 [2] dplyr_1.1.4            
 [3] biomaRt_2.60.1         
 [4] org.Qsuber.eg.db_0.99.0
 [5] AnnotationForge_1.46.0 
 [6] ggridges_0.5.6         
 [7] AnnotationDbi_1.66.0   
 [8] IRanges_2.38.1         
 [9] S4Vectors_0.42.1       
[10] Biobase_2.64.0         
[11] clusterProfiler_4.12.3 
[12] AnnotationHub_3.12.0   
[13] BiocFileCache_2.12.0   
[14] dbplyr_2.5.0           
[15] BiocGenerics_0.50.0    
[16] BiocManager_1.30.23
clusterProfiler makeOrgPackage gseGO • 820 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 24 minutes ago
United States

You can make your own. It takes a while because you download all the stuff from NCBI, but it's got all the GO stuff

> library(AnnotationForge)

> makeOrgPackageFromNCBI("0.0.1", "me <me@mine.org>", "me", ".", "58331", "Quercus","suber")
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
extracting data for our organism from : gene2accession
getting data for gene2refseq.gz
extracting data for our organism from : gene2refseq
getting data for gene_info.gz
extracting data for our organism from : gene_info
getting data for gene2go.gz
extracting data for our organism from : gene2go
processing gene2pubmed
processing gene_info: chromosomes
processing gene_info: description
processing alias data
processing refseq data
processing accession data
processing GO data
making the OrgDb package ...
Populating genes table:
genes table filled
Populating pubmed table:
pubmed table filled
Populating chromosomes table:
chromosomes table filled
Populating gene_info table:
gene_info table filled
Populating entrez_genes table:
entrez_genes table filled
Populating alias table:
alias table filled
Populating refseq table:
refseq table filled
Populating accessions table:
accessions table filled
Populating go table:
go table filled
table metadata filled

'select()' returned many:1 mapping between keys and columns
Dropping GO IDs that are too new for the current GO.db
Populating go table:
go table filled
Populating go_bp table:
go_bp table filled
Populating go_cc table:
go_cc table filled
Populating go_mf table:
go_mf table filled
'select()' returned many:1 mapping between keys and columns
Populating go_bp_all table:
go_bp_all table filled
Populating go_cc_all table:
go_cc_all table filled
Populating go_mf_all table:
go_mf_all table filled
Populating go_all table:
go_all table filled
Creating package in ./org.Qsuber.eg.db 
Now deleting temporary database file
complete!
[1] "org.Qsuber.eg.sqlite"

> install.packages("org.Qsuber.eg.db", repos = NULL, type = "source")
Installing package into 'C:/Users/jmacdon/AppData/Local/R/win-library/4.4'
(as 'lib' is unspecified)
* installing *source* package 'org.Qsuber.eg.db' ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
Warning messages:
1: package 'IRanges' was built under R version 4.4.1 
2: package 'S4Vectors' was built under R version 4.4.1 
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
Warning: package 'IRanges' was built under R version 4.4.1
Warning: package 'S4Vectors' was built under R version 4.4.1
** testing if installed package can be loaded from final location
Warning: package 'IRanges' was built under R version 4.4.1
Warning: package 'S4Vectors' was built under R version 4.4.1
** testing if installed package keeps a record of temporary installation path
* DONE (org.Qsuber.eg.db)
> library(org.Qsuber.eg.db)
0
Entering edit mode

Ugh. Sent the last one prematurely...

> dim(select(org.Qsuber.eg.db, keys(org.Qsuber.eg.db), "GOALL"))
'select()' returned 1:many mapping between keys and columns
[1] 715301      2
ADD REPLY
0
Entering edit mode

Thanks for your help and time, James. I followed the steps you showed me, but I encountered problems with fully downloading the files due to URL access issues. So, I've been trying to download the files first and then run the function, avoiding rebuildCache. However, I'm still not successful because, even though the download was complete, I'm getting an error with one of the tables. Do you have any idea what might be causing this? Also, do you know if there is a way to use gseGO with the gene2go.gz file?

Many thanks!

for i in gene2pubmed.gz gene2accession.gz gene2refseq.gz gene_info.gz gene2go.gz
do
wget -c -t 0 --retry-connrefused --waitretry=30 --timeout=60 ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/$i
done
makeOrgPackageFromNCBI("0.99.3", "me <me@gmail.com>", "me", "save/", "58331", "Quercus","suber", rebuildCache = F)
preparing data from NCBI ...
starting download for 
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
Error: no such table: main.gene2accession
ADD REPLY
1
Entering edit mode

There are two steps involved in this process. First, downloading all the files and putting all the data into an omnibus SQLite DB called 'NCBI.sqlite'. The second step involves parsing the data from that DB and putting into a smaller organism-specific DB that then goes in the package.

If you already have a (good, complete) NCBI.sqlite DB, then you can say rebuildCache = FALSE, which means 'skip the first step and just parse the data from my NCBI.sqlite DB'. But if the NCBI.sqlite DB isn't good or complete (yours isn't complete - the error says you are missing the gene2accession table) you will get an error. In that situation you should delete the NCBI.sqlite file, and then re-run with rebuildCache = FALSE (the default), which will re-generate the NCBI.sqlite DB using the files you downloaded.

ADD REPLY

Login before adding your answer.

Traffic: 731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6