Gene ontology testing with goana
gives me an error for a specific coefficient of interest, but not the others in my data. For instance, setting the argument coef = 1
gives me the following error:
Error in BiasedUrn::pWNCHypergeo(S[i, 1L + j], S[i, "N"], M2[i], nde[j], :
Invalid value for odds
However, for all other coefficients (i.e. 2:6), it works as intended. For example, I followed a demonstration similar to the posted guide:
# Goana does not work with specific contrast
# Packages needed
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("limma")
library(limma)
# Load and read in example MArrayLM object from wd()
# https://github.com/nztao/goana_issue_example
fit.example <- readRDS("goana_issue_example.rds")
# Gene ontology testing with goana
# Example: Coefficient of interest [1] gives error, but [2:6] do not
go <- c()
go <- goana(fit.example,
# Specify contrast coefficient for applying goana function
coef = 1,
# Specify Entrez Gene ID location
geneid = fit.example[["genes"]][["ENTREZID"]],
# Use human annot. and adjust analysis for gene length and abundance
species = "Hs", trend = T)
# Session info
sessionInfo()
I assume it's an issue with my data, but I am at a loss at where to diagnose and how to fix it. I tried applying checks such as
any(is.na(fit.example$coefficients))
,any(is.infinite(fit.example$coefficients))
,- and
any(is.nan(fit.example$coefficients))
,
but they all return FALSE
. Any suggestions?
Thank you!
Here is my sessioninfo:
> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] limma_3.53.4 BiocManager_1.30.18
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 XVector_0.37.0 compiler_4.2.0
[4] GenomeInfoDb_1.33.3 zlibbioc_1.43.0 bitops_1.0-7
[7] tools_4.2.0 digest_0.6.29 bit_4.0.4
[10] RSQLite_2.2.15 evaluate_0.15 memoise_2.0.1
[13] pkgconfig_2.0.3 png_0.1-7 rlang_1.0.4
[16] DBI_1.1.3 cli_3.3.0 rstudioapi_0.13
[19] yaml_2.3.5 xfun_0.31 fastmap_1.1.0
[22] GenomeInfoDbData_1.2.8 httr_1.4.3 knitr_1.39
[25] Biostrings_2.65.1 vctrs_0.4.1 S4Vectors_0.35.1
[28] IRanges_2.31.0 stats4_4.2.0 bit64_4.0.5
[31] Biobase_2.57.1 R6_2.5.1 AnnotationDbi_1.59.1
[34] BiasedUrn_1.07 rmarkdown_2.14 org.Hs.eg.db_3.15.0
[37] GO.db_3.15.0 blob_1.2.3 htmltools_0.5.3
[40] BiocGenerics_0.43.0 KEGGREST_1.37.3 RCurl_1.98-1.7
[43] cachem_1.0.6 crayon_1.5.1
Thanks so much Gordon! For future best practices, is it recommended to remove all Entrez ID's with
NA
values prior to runninggoana
? I neglected to remove them originally due to thoseNA's
having HUGO gene symbols and/or ENSEMBL IDs.You may as well remove NA Entrez IDs before running goana(), even though goana handles them ok in almost circumstances. The SYMBOL entries in your fit object corresponding to NA Entrez IDs are not HUGO gene symbols but seem to be a mix of locus IDs, clone IDs and other things, so filtering them out is no loss.
A more important issue with your code however is that you are trying to estimate an abundance trend with
trend=TRUE
even when you have hardly any DE genes. You should only usetrend=TRUE
when you have a lot of DE genes because it is only then that a trend can be estimated reliably.