Question

Use of clusterProfiler : Error in testForValidKeytype(x, keytype)

0

Entering edit mode

Paul • 0

@76773ba1

Last seen 3.0 years ago

France

Hi, I don't know if you have already encountered this kind of problem with clusterProfiler, but it seems that it doesn't recognize the keytype I indicate.

Having GIDs, indicating keyType="GID" as an argument, I get this error:

> GOenrich <- enrichGO(gene=GenesVector,
+                      OrgDb="org.Cgigas.eg.db",
+                      keyType="GID",
+                      ont="BP",
+                      pAdjustMethod="BH",
+                      qvalueCutoff="0.05",
+                      universe=GenesBackground,
+                      readable=FALSE,
+                      pool=FALSE)
Error in testForValidKeytype(x, keytype) : 
  Invalid keytype: GOALL. Please use the keytypes method to see a listing of valid arguments.

When I display the list of possible arguments, I get the following result:

> keytypes(org.Cgigas.eg.db)
[1] "EVIDENCE" "GID"      "GO"

However, the same error is displayed for keyType="GO" or keyType="EVIDENCE", even though these arguments are not relevant to my study.

If you are using clusterProfiler and have any advice, I am listening.

For precision: I generated my Org.db using the makeOrgPackage() function of AnnotationForge.

clusterProfiler r • 3.4k views

ADD COMMENT • link updated 3.0 years ago by Guido Hooiveld ★ 4.1k • written 3.0 years ago by Paul • 0

0

Entering edit mode

For now it would be helpful if you could show the result of:

GenesVector[1:10]
GenesBackground[1:15]

... in order to get a 'feeling' on how your gene ids look like.

Also, are you working with data from an oyster? Crassostrea gigas, taxonomty ID: 29159? https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=29159

ADD REPLY • link 3.0 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Also cross-posted... https://stackoverflow.com/questions/71921737/use-of-clusterprofiler-error-in-testforvalidkeytypex-keytype

ADD REPLY • link 3.0 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Hi, thanks for your interest, Here is what GenesVector and GenesBackground objects contain for me, respectively the list of my genes of interest (annotated by GOs) and the list of genes in the C.gigas genome (annotated by GOs) :

> GenesVector[1:10]
 [1] "G12920" "G29792" "G2895"  "G24808" "G15670" "G16256" "G29838" "G19667" "G19668" "G539"  
> GenesBackground[1:15]
 [1] "G10"    "G1000"  "G10002" "G10003" "G10007" "G1001"  "G10018" "G1002"  "G10023" "G10026" "G10027" "G10029" "G10043" "G10046" "G10049"

I am indeed working on C.gigas. I saw that there was already an org.db made from v9, however I work from the Roslin.

ADD REPLY • link 3.0 years ago Paul • 0

0

Entering edit mode

Aha, I see.

I am no expert at all on oyster, but apparently there are multiple genome assemblies for this organism: https://www.ncbi.nlm.nih.gov/assembly/?term=txid29159

I deduce that you used cgigas_uk_roslin_v1 for your work (and not oyster_v9), but it seems that this Roslin assembly has not been annotated (at least, not by NCBI). Therefore NCBI has no annotation info available, and as a consequence the AnnotationHub cannot be used... And considering the answer of James below (lack of GOALL column), it seems that 'self-made' OrgDbs unfortunately cannot be used with clusterProfiler.

Yet, since you have gene-to-GO mapping info, you may want to perform GO overrepresentation analysis using the generic enricher() function from clusterProfiler. enricher() does not require an OrgDb, but rather a simple TERM2GENE and TERM2NAME mapping file. In this context you may want to check this previous post of mine (specifically 'option 2' and links): clusterProfiler-GO enrichment Error, and Chapter 12 of the 'clusterProfiler-book' (here).

ADD REPLY • link 3.0 years ago Guido Hooiveld ★ 4.1k

score 1 · Answer 1 · 2022-04-19

Annotation packages generated using makeOrgPackage won't have the GOALL column, because it's a simplified pipeline that cannot handle the complexities of getting all the GO annotation. However, there are two C. gigas OrgDb packages on the Annotation hub that you may be able to use.

> library(AnnotationHub)

> hub <- AnnotationHub()

  |======================================================================| 100%

snapshotDate(): 2021-10-20

> query(hub, c("gigas","orgdb"))
AnnotationHub with 3 records
# snapshotDate(): 2021-10-20
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Ostrea gigas, Crassostrea gigas, Colletes gigas
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH96137"]]' 

            title                          
  AH96137 | org.Crassostrea_gigas.eg.sqlite
  AH96138 | org.Ostrea_gigas.eg.sqlite     
  AH97683 | org.Colletes_gigas.eg.sqlite   
## get both C. gigas OrgDbs

> orgdb1 <- hub[["AH96137"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: 'Biobase'

The following object is masked from 'package:AnnotationHub':

    cache

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    expand.grid, I, unname


Attaching package: 'IRanges'

The following object is masked from 'package:grDevices':

    windows

> columns(orgdb1)
 [1] "ACCNUM"      "ALIAS"       "CHR"         "ENTREZID"    "EVIDENCE"   
 [6] "EVIDENCEALL" "GENENAME"    "GID"         "GO"          "GOALL"      
[11] "ONTOLOGY"    "ONTOLOGYALL" "PMID"        "REFSEQ"      "SYMBOL"     

> orgdb2 <- hub[["AH97683"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache
> columns(orgdb2)
 [1] "ACCNUM"      "ALIAS"       "CHR"         "ENTREZID"    "EVIDENCE"   
 [6] "EVIDENCEALL" "GENENAME"    "GID"         "GO"          "GOALL"      
[11] "ONTOLOGY"    "ONTOLOGYALL" "REFSEQ"      "SYMBOL"