Problem with buildGOmap from clusterProfiler
1
0
Entering edit mode
@dcc43578
Last seen 10 months ago
Germany

Hi everyone,

I am new to R and I want to use clusterProfiler for GO Term enrichment analysis. I am working with bacteria, so I need to build my own GO mapping and tried to do that by using the buildGOmap() function. I get my gene to GO mapping from PseudoCAP. I downloaded it, and select only the GO accession number and the KEGG ID.

When I feed this to the buildGOmap function, I get the error:

Error in names(object) <- nm : 'names' attribute [15883] must be the same length as the vector [0]

I have been getting an error that I am unable to fix, even after a couple of days of reading and trying. Did anyone have a similar problem? What am I doing wrong?

Thanks!

> Pa_GO <- read.csv("~/Experiments/20230724-Biofilm-Assay082-Proteomics/AnalysisR/gene_ontology_csv.csv")
> Pa_GOterms <- Pa_GO[c(5,1)]
> head(Pa_GOterms)
   Accession Locus.Tag
1 GO:0005524    PA0001
2 GO:0006270    PA0001
3 GO:0006275    PA0001
4 GO:0016887    PA0001
5 GO:0016887    PA0001
6 GO:0006260    PA0001
> Pa_GOMap <- buildGOmap(Pa_GOterms)
Error in names(object) <- nm : 
  'names' attribute [15883] must be the same length as the vector [0]
> sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_Germany.utf8  LC_CTYPE=English_Germany.utf8    LC_MONETARY=English_Germany.utf8
[4] LC_NUMERIC=C                     LC_TIME=English_Germany.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] enrichplot_1.22.0      readxl_1.4.3           clusterProfiler_4.10.0 scales_1.3.0           ggforce_0.4.1         
[6] ggplot2_3.4.4         

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3            rstudioapi_0.15.0             jsonlite_1.8.8                magrittr_2.0.3               
  [5] farver_2.1.1                  fs_1.6.3                      zlibbioc_1.48.0               vctrs_0.6.5                  
  [9] memoise_2.0.1                 RCurl_1.98-1.14               ggtree_3.10.0                 htmltools_0.5.7              
 [13] AnnotationHub_3.10.0          curl_5.2.0                    cellranger_1.1.0              gridGraphics_0.5-1           
 [17] plyr_1.8.9                    cachem_1.0.8                  igraph_2.0.1.1                mime_0.12                    
 [21] lifecycle_1.0.4               pkgconfig_2.0.3               Matrix_1.6-1.1                R6_2.5.1                     
 [25] fastmap_1.1.1                 gson_0.1.0                    GenomeInfoDbData_1.2.11       shiny_1.8.0                  
 [29] digest_0.6.34                 aplot_0.2.2                   colorspace_2.1-0              patchwork_1.2.0              
 [33] AnnotationDbi_1.64.1          S4Vectors_0.40.2              RSQLite_2.3.5                 filelock_1.0.3               
 [37] fansi_1.0.6                   httr_1.4.7                    polyclip_1.10-6               compiler_4.3.2               
 [41] remotes_2.4.2.1               bit64_4.0.5                   withr_3.0.0                   BiocParallel_1.36.0          
 [45] viridis_0.6.5                 DBI_1.2.1                     MASS_7.3-60                   rappdirs_0.3.3               
 [49] HDO.db_0.99.1                 tools_4.3.2                   ape_5.7-1                     scatterpie_0.2.1             
 [53] interactiveDisplayBase_1.40.0 httpuv_1.6.14                 glue_1.7.0                    promises_1.2.1               
 [57] nlme_3.1-163                  GOSemSim_2.28.1               gridtext_0.1.5                grid_4.3.2                   
 [61] shadowtext_0.1.3              reshape2_1.4.4                fgsea_1.28.0                  generics_0.1.3               
 [65] gtable_0.3.4                  tidyr_1.3.1                   data.table_1.15.0             tidygraph_1.3.1              
 [69] xml2_1.3.6                    utf8_1.2.4                    XVector_0.42.0                BiocGenerics_0.48.1          
 [73] ggrepel_0.9.5                 BiocVersion_3.18.1            pillar_1.9.0                  stringr_1.5.1                
 [77] yulab.utils_0.1.4             later_1.3.2                   splines_4.3.2                 dplyr_1.1.4                  
 [81] ggtext_0.1.2                  tweenr_2.0.2                  BiocFileCache_2.10.1          treeio_1.26.0                
 [85] lattice_0.21-9                bit_4.0.5                     tidyselect_1.2.0              GO.db_3.18.0                 
 [89] Biostrings_2.70.2             gridExtra_2.3                 IRanges_2.36.0                stats4_4.3.2                 
 [93] graphlayouts_1.1.0            Biobase_2.62.0                stringi_1.8.3                 lazyeval_0.2.2               
 [97] ggfun_0.1.4                   yaml_2.3.8                    codetools_0.2-19              ggraph_2.1.0                 
[101] tibble_3.2.1                  qvalue_2.34.0                 BiocManager_1.30.22           ggplotify_0.1.2              
[105] cli_3.6.2                     xtable_1.8-4                  munsell_0.5.0                 Rcpp_1.0.12                  
[109] GenomeInfoDb_1.38.5           dbplyr_2.4.0                  png_0.1-8                     parallel_4.3.2               
[113] ellipsis_0.3.2                blob_1.2.4                    DOSE_3.28.2                   bitops_1.0-7                 
[117] viridisLite_0.4.2             tidytree_0.4.6                purrr_1.0.2                   crayon_1.5.2                 
[121] rlang_1.1.3                   cowplot_1.1.3                 fastmatch_1.1-4               KEGGREST_1.42.0
clusterProfiler • 842 views
ADD COMMENT
0
Entering edit mode
Guido Hooiveld ★ 4.1k
@guido-hooiveld-2020
Last seen 10 hours ago
Wageningen University, Wageningen, the …

I agree that it is not very clear stated on the help page of buildGOmap, but you should reverse the order of your input, since (from ?buildGOmap):

Details: provided by a data.frame of GENE (column 1), GO (column 2) and ONTOLOGY (optional) that describes GO direct annotation, this function will add indirect GO annotation of genes.

Moreover, the column with GOIDs should have name GO (and not the current name Accession). This is because it is (currently) hard-coded in the function, unfortunately.

Thus:

> library(clusterProfiler)
> Pa_GO <- read.csv("gene_ontology_csv.csv")
> 
> ## first gene, then GOID
> Pa_GOterms <- Pa_GO[c(1,5)]
> 
> ## check
> dim(Pa_GOterms)
[1] 15883     2
> head(Pa_GOterms)
  Locus.Tag  Accession
1    PA0001 GO:0005524
2    PA0001 GO:0006270
3    PA0001 GO:0006275
4    PA0001 GO:0016887
5    PA0001 GO:0016887
6    PA0001 GO:0006260
> tail(Pa_GOterms)
      Locus.Tag  Accession
15878    PA5569 GO:0008033
15879    PA5569 GO:0001682
15880    PA5569 GO:0004526
15881    PA5570 GO:0003735
15882    PA5570 GO:0005840
15883    PA5570 GO:0006412
> 
> ## rename 2nd column
> colnames(Pa_GOterms)[2] <- c("GO")
> 
> Pa_GOMap <- buildGOmap(Pa_GOterms)
> 
> ## check; note list is longer and 'tail' shows additional GO IDs.
> dim(Pa_GOMap)
[1] 119221      2
> head(Pa_GOMap)
  Locus.Tag         GO
1    PA0001 GO:0005524
2    PA0001 GO:0006270
3    PA0001 GO:0006275
4    PA0001 GO:0016887
5    PA0001 GO:0016887
6    PA0001 GO:0006260
> tail(Pa_GOMap)
       Locus.Tag         GO
183988    PA5570 GO:0044249
183989    PA5570 GO:0044271
183990    PA5570 GO:0071704
183991    PA5570 GO:1901564
183992    PA5570 GO:1901566
183993    PA5570 GO:1901576
> 

EDIT (few hours later): I had reported this issue at the GOSemSim GitHub page, and the maintainer Guangchuang Yu has promptly addressed this; thanks! See: https://github.com/YuLab-SMU/GOSemSim/issues/47

In other words, if you would update to the GitHub version of GOSemSim (which currently is v2.29.1.1) there is no need anymore to change the column name, and the input (order of columns) aligns with the TERM2GENE convention that is also used for the generic enricher and GSEA functions of clusterProfiler. See more on this at the GitHub page.

ADD COMMENT
0
Entering edit mode

Thank you so much for your help and for contacting Guangchuang Yu! It solve the issue for the buildGOmap function.

Curiously, I wanted to feed the result to the enrich function and it wasn't able to map the genes on the IDs. It only worked if I flip the rows in the Pa_GOMap, making it a true "TERM2GENE" ("GO" in column 1 and "Gene ID" in column 2)

Thanks again!

ADD REPLY

Login before adding your answer.

Traffic: 867 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6