Does fgsea order the stats argument?
2
0
Entering edit mode
rubi ▴ 110
@rubi-6462
Last seen 6.4 years ago

Hi,

 

Does fgsea order the stats argument? What I'm currently doing is order ordering the effect (i.e. log treatment vs. control fold-changes) but their p-vaiues and passing that to fgsea. But the fgsea code suggests it re-ranks stats so the p-value ranking is lost. Is that the case? and if so is there any way to disable it?

 

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] setEnrichmentTests_0.0.0.9000 org.Ss.eg.db_3.4.0            org.Sc.sgd.db_3.4.0           org.Rn.eg.db_3.4.0            org.Pt.eg.db_3.4.0            org.Mmu.eg.db_3.4.0          
 [7] org.Mm.eg.db_3.4.0            org.Hs.eg.db_3.4.0            org.Cf.eg.db_3.4.0            org.Ce.eg.db_3.4.0            ggdendro_0.1-20               dendextend_1.5.2             
[13] fastcluster_1.1.22            cluster_2.0.5                 tidyr_0.6.3                   GOSemSim_2.0.1                matrixStats_0.51.0            doParallel_1.0.10            
[19] iterators_1.0.8               foreach_1.4.3                 snpEnrichment_1.7.0           piano_1.14.0                  topGO_2.26.0                  SparseM_1.72                 
[25] GO.db_3.4.0                   AnnotationDbi_1.36.0          Biobase_2.34.0                fgsea_1.0.2                   Rcpp_0.12.11.1                graph_1.50.0                 
[31] gageData_2.12.0               gage_2.24.0                   pryr_0.1.2                    scales_0.4.1                  stringi_1.1.5                 zoo_1.7-13                   
[37] biomaRt_2.30.0                gplots_3.0.1                  reshape2_1.4.2                plotrix_3.6-3                 Hmisc_3.17-4                  Formula_1.2-1                
[43] survival_2.40-1               lattice_0.20-34               data.table_1.9.6              annotationData_0.1.0          dplyr_0.5.0                   plyr_1.8.4                   
[49] magrittr_1.5                  gtable_0.2.0                  gridExtra_2.2.1               plotly_4.7.0                  ggplot2_2.2.1.9000            kableExtra_0.2.1             
[55] knitr_1.16                    rtracklayer_1.34.1            GenomicRanges_1.26.2          GenomeInfoDb_1.10.0           IRanges_2.8.1                 S4Vectors_0.12.1             
[61] BiocGenerics_0.20.0           yaml_2.1.14                   doBy_4.5-15                  

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2           class_7.3-14               modeltools_0.2-21          mclust_5.2                 rprojroot_1.2              XVector_0.14.0            
 [7] flexmix_2.3-13             mvtnorm_1.0-5              xml2_1.1.1                 codetools_0.2-15           splines_3.3.2              snpStats_1.24.0           
[13] robustbase_0.92-6          jsonlite_1.4               Rsamtools_1.26.1           kernlab_0.9-25             png_0.1-7                  httr_1.2.1                
[19] backports_1.0.5            assertthat_0.2.0           Matrix_1.2-7.1             lazyeval_0.2.0             limma_3.30.2               acepack_1.4.1             
[25] htmltools_0.3.6            tools_3.3.2                igraph_1.0.1               fastmatch_1.0-4            slam_0.1-40                trimcluster_0.1-2         
[31] Biostrings_2.42.1          gdata_2.17.0               fpc_2.1-10                 stringr_1.2.0              rvest_0.3.2                gtools_3.5.0              
[37] XML_3.98-1.4               DEoptimR_1.0-6             zlibbioc_1.20.0            MASS_7.3-45                relations_0.6-6            SummarizedExperiment_1.2.3
[43] RColorBrewer_1.1-2         sets_1.0-16                rpart_4.1-10               latticeExtra_0.6-28        RSQLite_1.0.0              caTools_1.17.1            
[49] BiocParallel_1.8.1         chron_2.3-47               rlang_0.1.1                prabclus_2.2-6             bitops_1.0-6               evaluate_0.10             
[55] purrr_0.2.2.2              GenomicAlignments_1.8.4    htmlwidgets_0.8            R6_2.2.0                   DBI_0.5-1                  whisker_0.3-2             
[61] foreign_0.8-67             KEGGREST_1.14.0            RCurl_1.95-4.8             nnet_7.3-12                tibble_1.3.3               KernSmooth_2.23-15        
[67] rmarkdown_1.6              viridis_0.4.0              marray_1.52.0              diptest_0.75-7             digest_0.6.12              munsell_0.4.3             
[73] viridisLite_0.2.0         
fgsea • 3.2k views
ADD COMMENT
0
Entering edit mode
alserg ▴ 280
@assaron
Last seen 4 months ago
St Louis, MO

Yes, it does sort stats argument. Sorting stats values is inherent to pre-ranked gsea. Is it really what you want to do?

ADD COMMENT
0
Entering edit mode
rubi ▴ 110
@rubi-6462
Last seen 6.4 years ago

Yes. And least I think it would be helpful to have an argument that allows specifying whther stats should be sorted or not.

ADD COMMENT
0
Entering edit mode

Can you provide an example where it's useful?

ADD REPLY
0
Entering edit mode

Hi @assaron,

 

I think it's a matter of preference what you want your enrichment analysis to pick up on. Sorting only by effect size ignores the error of the estimate (qhich can often be large in gene expression data), whereas sorting by p-value does not, so I'd prefer sorting first by p-value and then by effect size. Sounds to me like a small but useful addition to fgsea.

ADD REPLY
0
Entering edit mode

Sorry, I still don't understand. You can only sort something by one value, how can you sort first by p-value and then by effect size?

I usually rank (and sort) genes by statistic from DE test (DESeq2 or limma), I know other people sort by log(p-value) * sign(log2FC). Both variants works fine and account for significance. Aren't they working for you?

ADD REPLY
0
Entering edit mode

Sorry for the late response. I'm using a Bayesian differential expression tool (MMDIFF), which provides an estimate of the effect size (e.g. ln(fold-change) between treatment and control), the posterior probability that the estimated effect size is different from 0. Unlike the frequentist approaches there's no statistic here. The ranking I'm referring to is to rank first by posterior probability in descending order (think of this as 1-p-value for the sake of this discussion) and then by the absolute value of the effect size. The reason is that some strongly differentially expressed genes will all have a posterior of 1.

I think allowing that in fgsea allows for more general usage.

ADD REPLY
0
Entering edit mode

I guess, you can aggregate your values to get a new "statistic" that will be ordered as you want: e.g. stat=p+ifelse(p == 1, abs(effect) * 1e-3, 0). 

Additionally, GSEA method were designed for statistics that can have both positive and negative values, so I suggest mutipltiplying the values by sign of the effect size, if it's appropriate in your case.

 

 

ADD REPLY
0
Entering edit mode

Yep, that's exactly what I'm doing.

ADD REPLY

Login before adding your answer.

Traffic: 672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6