Entering edit mode
rubi
▴
110
@rubi-6462
Last seen 6.4 years ago
Hi,
Does fgsea order the stats argument? What I'm currently doing is order ordering the effect (i.e. log treatment vs. control fold-changes) but their p-vaiues and passing that to fgsea. But the fgsea code suggests it re-ranks stats so the p-value ranking is lost. Is that the case? and if so is there any way to disable it?
> sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] setEnrichmentTests_0.0.0.9000 org.Ss.eg.db_3.4.0 org.Sc.sgd.db_3.4.0 org.Rn.eg.db_3.4.0 org.Pt.eg.db_3.4.0 org.Mmu.eg.db_3.4.0 [7] org.Mm.eg.db_3.4.0 org.Hs.eg.db_3.4.0 org.Cf.eg.db_3.4.0 org.Ce.eg.db_3.4.0 ggdendro_0.1-20 dendextend_1.5.2 [13] fastcluster_1.1.22 cluster_2.0.5 tidyr_0.6.3 GOSemSim_2.0.1 matrixStats_0.51.0 doParallel_1.0.10 [19] iterators_1.0.8 foreach_1.4.3 snpEnrichment_1.7.0 piano_1.14.0 topGO_2.26.0 SparseM_1.72 [25] GO.db_3.4.0 AnnotationDbi_1.36.0 Biobase_2.34.0 fgsea_1.0.2 Rcpp_0.12.11.1 graph_1.50.0 [31] gageData_2.12.0 gage_2.24.0 pryr_0.1.2 scales_0.4.1 stringi_1.1.5 zoo_1.7-13 [37] biomaRt_2.30.0 gplots_3.0.1 reshape2_1.4.2 plotrix_3.6-3 Hmisc_3.17-4 Formula_1.2-1 [43] survival_2.40-1 lattice_0.20-34 data.table_1.9.6 annotationData_0.1.0 dplyr_0.5.0 plyr_1.8.4 [49] magrittr_1.5 gtable_0.2.0 gridExtra_2.2.1 plotly_4.7.0 ggplot2_2.2.1.9000 kableExtra_0.2.1 [55] knitr_1.16 rtracklayer_1.34.1 GenomicRanges_1.26.2 GenomeInfoDb_1.10.0 IRanges_2.8.1 S4Vectors_0.12.1 [61] BiocGenerics_0.20.0 yaml_2.1.14 doBy_4.5-15 loaded via a namespace (and not attached): [1] colorspace_1.3-2 class_7.3-14 modeltools_0.2-21 mclust_5.2 rprojroot_1.2 XVector_0.14.0 [7] flexmix_2.3-13 mvtnorm_1.0-5 xml2_1.1.1 codetools_0.2-15 splines_3.3.2 snpStats_1.24.0 [13] robustbase_0.92-6 jsonlite_1.4 Rsamtools_1.26.1 kernlab_0.9-25 png_0.1-7 httr_1.2.1 [19] backports_1.0.5 assertthat_0.2.0 Matrix_1.2-7.1 lazyeval_0.2.0 limma_3.30.2 acepack_1.4.1 [25] htmltools_0.3.6 tools_3.3.2 igraph_1.0.1 fastmatch_1.0-4 slam_0.1-40 trimcluster_0.1-2 [31] Biostrings_2.42.1 gdata_2.17.0 fpc_2.1-10 stringr_1.2.0 rvest_0.3.2 gtools_3.5.0 [37] XML_3.98-1.4 DEoptimR_1.0-6 zlibbioc_1.20.0 MASS_7.3-45 relations_0.6-6 SummarizedExperiment_1.2.3 [43] RColorBrewer_1.1-2 sets_1.0-16 rpart_4.1-10 latticeExtra_0.6-28 RSQLite_1.0.0 caTools_1.17.1 [49] BiocParallel_1.8.1 chron_2.3-47 rlang_0.1.1 prabclus_2.2-6 bitops_1.0-6 evaluate_0.10 [55] purrr_0.2.2.2 GenomicAlignments_1.8.4 htmlwidgets_0.8 R6_2.2.0 DBI_0.5-1 whisker_0.3-2 [61] foreign_0.8-67 KEGGREST_1.14.0 RCurl_1.95-4.8 nnet_7.3-12 tibble_1.3.3 KernSmooth_2.23-15 [67] rmarkdown_1.6 viridis_0.4.0 marray_1.52.0 diptest_0.75-7 digest_0.6.12 munsell_0.4.3 [73] viridisLite_0.2.0
Can you provide an example where it's useful?
Hi @assaron,
I think it's a matter of preference what you want your enrichment analysis to pick up on. Sorting only by effect size ignores the error of the estimate (qhich can often be large in gene expression data), whereas sorting by p-value does not, so I'd prefer sorting first by p-value and then by effect size. Sounds to me like a small but useful addition to fgsea.
Sorry, I still don't understand. You can only sort something by one value, how can you sort first by p-value and then by effect size?
I usually rank (and sort) genes by statistic from DE test (DESeq2 or limma), I know other people sort by log(p-value) * sign(log2FC). Both variants works fine and account for significance. Aren't they working for you?
Sorry for the late response. I'm using a Bayesian differential expression tool (MMDIFF), which provides an estimate of the effect size (e.g. ln(fold-change) between treatment and control), the posterior probability that the estimated effect size is different from 0. Unlike the frequentist approaches there's no statistic here. The ranking I'm referring to is to rank first by posterior probability in descending order (think of this as 1-p-value for the sake of this discussion) and then by the absolute value of the effect size. The reason is that some strongly differentially expressed genes will all have a posterior of 1.
I think allowing that in fgsea allows for more general usage.
I guess, you can aggregate your values to get a new "statistic" that will be ordered as you want: e.g. stat=p+ifelse(p == 1, abs(effect) * 1e-3, 0).
Additionally, GSEA method were designed for statistics that can have both positive and negative values, so I suggest mutipltiplying the values by sign of the effect size, if it's appropriate in your case.
Yep, that's exactly what I'm doing.