Using the GOSemSim package, I am trying to generate a data frame which contains the semantic similarity between any two gene identifiers in yeast, like this (minimal, reproducible example):
library("GOSemSim")
yeastGO <- godata('org.Sc.sgd.db', keytype = "ENSEMBL", ont="BP", computeIC = TRUE)
genes <- keys(org.Sc.sgd.db, keytype="ENSEMBL")
my_sim_matrix <- mgeneSim (genes,
semData = yeastGO,
measure = "Resnik",
combine = "max",
verbose = F)
However, I run into this error:
Error in .checkKeys(value, Lkeys(x), x@ifnotfound) : value for "GO:0000059" not found
Tried various things and reading, cannot figure out what is going on there. genes is an array which I got right out of org.Sc.sgd.db
, so I would assume all genes and GO terms are properly matched?!? Any help would be very much appreciated.
P.S.:
> sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS Matrix products: default BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0 LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] org.Sc.sgd.db_3.5.0 AnnotationDbi_1.40.0 IRanges_2.12.0 [4] S4Vectors_0.16.0 Biobase_2.38.0 BiocGenerics_0.24.0 [7] GOSemSim_2.4.1 loaded via a namespace (and not attached): [1] Rcpp_0.12.15 GO.db_3.5.0 digest_0.6.15 DBI_0.7 [5] RSQLite_2.0 pillar_1.1.0 rlang_0.1.6 blob_1.1.0 [9] tools_3.4.3 bit64_0.9-7 bit_1.1-12 compiler_3.4.3 [13] pkgconfig_2.0.1 memoise_1.1.0 tibble_1.4.2
Hi Guangchuang, great that the original author chimes in as well, I do not know how to update to 2.5.1, though (I am on Bioconductor version 3.6, R version 3.4.3).
Another question. I was wondering about the exact way you are computing Resnik. In my understanding Resnik in the original 1996, 1999 papers is only concerned with IS-A taxonomies. Gene Ontology contains many different types of relationships including "is a", "part of", "has part" and many more. The similarity values I get from GOSemSim for Resnik are only with respect to IS-A relationships? Or, all relationships are treated equally, i.e. everything is treated as if it were an IS-A relationship? I could not find a statement on this in the GOSemSim paper or vignette, could I kindly ask you for a comment regarding the treatment of different relationship types?
you can use
devtools::install_github("guangchuangyu/GOSemSim")
to install 2.5.1.In GOSemSim, all the relationships are treated equally for IC-based methods.