Hi,
I'm running piano
's runGSA
on a list of 9881 genes (with directional fold-changes) and 13244 GO BP gene sets and it takes ~30 min to complete. I'm using the default geneSetStat
option and all other arguments are at default values:
Final gene/gene-set association: 9881 genes and 13244 gene-sets
Details:
Calculating gene set statistics from 9881 out of 9881 gene-level statistics
Using all 9881 gene-level statistics for significance estimation
Removed 0 genes from GSC due to lack of matching gene statistics
Removed 0 gene sets containing no genes after gene removal
Removed additionally 0 gene sets not matching the size limits
Loaded additional information for 0 gene sets
Gene statistic type: F-like
Method: mean
Gene-set statistic name: mean
Significance: Gene sampling
Adjustment: fdr
Gene set size limit: (1,Inf)
Permutations: 10000
Total run time: 29.75 min
In contrast, if I upload this genes list to the GORILLA GO enrichment analysis website at: http://cbl-gorilla.cs.technion.ac.il/ i takes a couple of seconds. And, the order of magnitude of the p-values is not smaller.
Also, I'm not sure way all pDistinctDirUp
and pDistinctDirDown
are NA
s.
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12.1 (Sierra)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] snpEnrichment_1.7.0 BiocInstaller_1.24.0 dplyr_0.5.0 piano_1.14.5 Gviz_1.18.1 GenomicRanges_1.26.1 GenomeInfoDb_1.10.2 IRanges_2.8.1
[9] S4Vectors_0.12.0 BiocGenerics_0.20.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 matrixStats_0.51.0 RColorBrewer_1.1-2 httr_1.2.1 data.tree_0.6.2 tools_3.3.1
[7] R6_2.2.0 rpart_4.1-10 KernSmooth_2.23-15 Hmisc_4.0-2 DBI_0.5-1 lazyeval_0.2.0
[13] colorspace_1.2-7 nnet_7.3-12 gridExtra_2.2.1 chron_2.3-47 Biobase_2.34.0 htmlTable_1.7
[19] influenceR_0.1.0 slam_0.1-40 rtracklayer_1.34.1 caTools_1.17.1 scales_0.4.1 relations_0.6-6
[25] stringr_1.1.0 digest_0.6.10 Rsamtools_1.26.1 foreign_0.8-67 XVector_0.14.0 base64enc_0.1-3
[31] dichromat_2.0-0 htmltools_0.3.5 ensembldb_1.6.2 limma_3.30.2 BSgenome_1.42.0 htmlwidgets_0.8
[37] rstudioapi_0.6 RSQLite_1.0.0 shiny_0.14.2 visNetwork_1.0.3 jsonlite_1.1 BiocParallel_1.8.1
[43] gtools_3.5.0 acepack_1.4.1 rgexf_0.15.3 VariantAnnotation_1.20.2 RCurl_1.95-4.8 magrittr_1.5
[49] Formula_1.2-1 Matrix_1.2-7.1 Rcpp_0.12.7 munsell_0.4.3 viridis_0.3.4 stringi_1.1.2
[55] yaml_2.1.14 SummarizedExperiment_1.4.0 zlibbioc_1.20.0 gplots_3.0.1 plyr_1.8.4 AnnotationHub_2.6.4
[61] gdata_2.17.0 snpStats_1.24.0 lattice_0.20-34 Biostrings_2.42.0 splines_3.3.1 GenomicFeatures_1.26.2
[67] knitr_1.15.1 fgsea_1.0.1 igraph_1.0.1 marray_1.52.0 biomaRt_2.30.0 fastmatch_1.0-4
[73] XML_3.98-1.5 biovizBase_1.22.0 latticeExtra_0.6-28 data.table_1.9.6 httpuv_1.3.3 gtable_0.2.0
[79] assertthat_0.1 ggplot2_2.2.1 mime_0.5 xtable_1.8-2 survival_2.40-1 tibble_1.2
[85] GenomicAlignments_1.10.0 AnnotationDbi_1.36.0 sets_1.0-16 cluster_2.0.5 Rook_1.1-1 DiagrammeR_0.9.0
[91] brew_1.0-6 interactiveDisplayBase_1.12.0
Hi, could you clarify this part: "
pMixedDirUp
is anti-correlated withpMixedDirUp
. I'm guessing the p-value is really1-pMixedDirUp
. This is not true forpMixedDirDown
. Is this a bug?"Is there a typo in one of the pMixedDirUp? I guess you mean something else?
Could you also clarify what input you are using? The run output indicates that your gene-level statistics are in the range [0,Inf] (are they maybe ranks?) but you also mention directional fold-changes, so I am not sure...
Sorry about the lack of clarity. I dropped the part of the anti correlation between the statMixedDirUp and pMixedDirUp. My question is only about the run-time, which I guess is not solvable.