limma::goana - "Invalid value for odds"
1
0
Entering edit mode
neilzhao ▴ 30
@0f8d7a2e
Last seen 3 months ago
United States

Gene ontology testing with goana gives me an error for a specific coefficient of interest, but not the others in my data. For instance, setting the argument coef = 1 gives me the following error:

Error in BiasedUrn::pWNCHypergeo(S[i, 1L + j], S[i, "N"], M2[i], nde[j],  : 
  Invalid value for odds

However, for all other coefficients (i.e. 2:6), it works as intended. For example, I followed a demonstration similar to the posted guide:

# Goana does not work with specific contrast
# Packages needed
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("limma")

library(limma)

# Load and read in example MArrayLM object from wd()
# https://github.com/nztao/goana_issue_example
fit.example <- readRDS("goana_issue_example.rds")

# Gene ontology testing with goana
# Example: Coefficient of interest [1] gives error, but [2:6] do not
go <- c()
go <- goana(fit.example, 
#   Specify contrast coefficient for applying goana function
                        coef = 1,
#   Specify Entrez Gene ID location
                        geneid = fit.example[["genes"]][["ENTREZID"]],
#   Use human annot. and adjust analysis for gene length and abundance
                        species = "Hs", trend = T)

# Session info
sessionInfo()

I assume it's an issue with my data, but I am at a loss at where to diagnose and how to fix it. I tried applying checks such as

  1. any(is.na(fit.example$coefficients)),
  2. any(is.infinite(fit.example$coefficients)),
  3. and any(is.nan(fit.example$coefficients)),

but they all return FALSE. Any suggestions?

Thank you!

Here is my sessioninfo:

> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] limma_3.53.4        BiocManager_1.30.18

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9             XVector_0.37.0         compiler_4.2.0        
 [4] GenomeInfoDb_1.33.3    zlibbioc_1.43.0        bitops_1.0-7          
 [7] tools_4.2.0            digest_0.6.29          bit_4.0.4             
[10] RSQLite_2.2.15         evaluate_0.15          memoise_2.0.1         
[13] pkgconfig_2.0.3        png_0.1-7              rlang_1.0.4           
[16] DBI_1.1.3              cli_3.3.0              rstudioapi_0.13       
[19] yaml_2.3.5             xfun_0.31              fastmap_1.1.0         
[22] GenomeInfoDbData_1.2.8 httr_1.4.3             knitr_1.39            
[25] Biostrings_2.65.1      vctrs_0.4.1            S4Vectors_0.35.1      
[28] IRanges_2.31.0         stats4_4.2.0           bit64_4.0.5           
[31] Biobase_2.57.1         R6_2.5.1               AnnotationDbi_1.59.1  
[34] BiasedUrn_1.07         rmarkdown_2.14         org.Hs.eg.db_3.15.0   
[37] GO.db_3.15.0           blob_1.2.3             htmltools_0.5.3       
[40] BiocGenerics_0.43.0    KEGGREST_1.37.3        RCurl_1.98-1.7        
[43] cachem_1.0.6           crayon_1.5.1
limma • 1.0k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

The problem is that you have NA GeneIDs:

> i <- is.na(fit.example$genes$ENTREZID)
> table(i)
i
FALSE  TRUE 
14986  3850

goana will work fine if you stop trying to adjust for trend:

> g <- goana(fit.example, coef=1, geneid="ENTREZID", trend=FALSE)

or if you remove the genes that have NA Entrez IDs:

> g <- goana(fit.example[!i,], coef=1, geneid="ENTREZID", trend=TRUE)

But a more basic problem is that is only one DE gene for your first contrast, so there is no meaningful GO analysis to be done anyway. The error arises for the first contrast because there is only one DE gene for that contrast and that gene has an NA Entrez ID.

Note also that trend=TRUE adjusts only for abundance and not for gene length.

ADD COMMENT
0
Entering edit mode

Thanks so much Gordon! For future best practices, is it recommended to remove all Entrez ID's with NA values prior to running goana? I neglected to remove them originally due to those NA's having HUGO gene symbols and/or ENSEMBL IDs.

ADD REPLY
1
Entering edit mode

You may as well remove NA Entrez IDs before running goana(), even though goana handles them ok in almost circumstances. The SYMBOL entries in your fit object corresponding to NA Entrez IDs are not HUGO gene symbols but seem to be a mix of locus IDs, clone IDs and other things, so filtering them out is no loss.

A more important issue with your code however is that you are trying to estimate an abundance trend with trend=TRUE even when you have hardly any DE genes. You should only use trend=TRUE when you have a lot of DE genes because it is only then that a trend can be estimated reliably.

ADD REPLY

Login before adding your answer.

Traffic: 866 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6