Why output of GSEA is not the same but inversed, when you have the same but inversed input data?
0
0
Entering edit mode
andromeda • 0
@a01d6f25
Last seen 7 months ago
Austria

Hi! I am not sure if I misunderstand how GSEA works or I have an error in my GSEA analysis. I am using clusterProfiler (gseKEGG function). Input data:

  • output from DESeq2 ("control vs mutant" and "mutant vs control" --> I just set the contrast in both directions. it was the same input in DESeq2)
  • ranking of genes based on log2foldchange (sign of log2FC is of course inverse for "control vs mutant" compared to "mutant vs control", but the absolute value of the log2FC is the same --> so, genes that are in "control vs mutant" on top of the ranked list are in "mutant vs control" at the bottom of the list --> ranking exactly in the inverse order.

E.g. "control vs mutant":

  1. Gene A log2FC 3
  2. Gene B log2FC 0.5
  3. Gene C log2FC -4

E.g. "mutant vs control":

  1. Gene C log2FC 4
  2. Gene B log2FC 0.5
  3. Gene A log2FC -3

Why are the enriched gene sets in "control vs mutant" and "mutant vs control" not the same (of course with inverse sign of enrichment score)? I get different significant enriched pathways for "control vs mutant" and "mutant vs control".

Results: "control vs mutant" is giving me 1 pathway enriched

"mutant vs control" is giving me 5 pathways enriched

I would expect in both e.g. the same 5 pathways enriched, but the sign of enrichment score would be inversed. Why is it not like this?

Thanks in advance for your help!


 set.seed(1234)

##liste is some output of DESeq2
#used the same code for "control vs mutant" and "mutant vs control"

liste2 = na.omit(liste) #removed genes for which I have no KEGG number annotated
original_gene_list = liste2$log2FoldChange
names(original_gene_list) = liste2$KEGG.ID

gene_list = sort(original_gene_list, decreasing = TRUE)
gene_list_d <- gene_list[!duplicated(names(gene_list))]  #tried with and without removing duplicates

kk <- gseKEGG(gene = gene_list_d,
              organism = "ko", 
              keyType = "kegg",
              pvalueCutoff = 0.05,
              pAdjustMethod = "BH",
              seed=TRUE)


sessionInfo( )
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] clusterProfiler_4.11.0.002  DESeq2_1.42.1               SummarizedExperiment_1.32.0 Biobase_2.62.0              MatrixGenerics_1.14.0       matrixStats_1.2.0          
 [7] GenomicRanges_1.54.1        GenomeInfoDb_1.38.8         IRanges_2.36.0              S4Vectors_0.40.2            BiocGenerics_0.48.1        

loaded via a namespace (and not attached):
  [1] DBI_1.2.2               bitops_1.0-7            gson_0.1.0              shadowtext_0.1.3        gridExtra_2.3           rlang_1.1.3             magrittr_2.0.3         
  [8] DOSE_3.28.2             compiler_4.3.2          RSQLite_2.3.6           png_0.1-8               vctrs_0.6.5             reshape2_1.4.4          stringr_1.5.1          
 [15] pkgconfig_2.0.3         crayon_1.5.2            fastmap_1.1.1           XVector_0.42.0          ggraph_2.2.1            utf8_1.2.4              HDO.db_0.99.1          
 [22] enrichplot_1.22.0       purrr_1.0.2             bit_4.0.5               zlibbioc_1.48.2         cachem_1.0.8            aplot_0.2.2             jsonlite_1.8.8         
 [29] blob_1.2.4              DelayedArray_0.28.0     BiocParallel_1.36.0     tweenr_2.0.3            parallel_4.3.2          R6_2.5.1                stringi_1.8.3          
 [36] RColorBrewer_1.1-3      GOSemSim_2.28.1         Rcpp_1.0.12             snow_0.4-4              Matrix_1.6-1.1          splines_4.3.2           igraph_2.0.3           
 [43] tidyselect_1.2.1        qvalue_2.34.0           rstudioapi_0.16.0       abind_1.4-5             viridis_0.6.5           codetools_0.2-19        lattice_0.21-9         
 [50] tibble_3.2.1            plyr_1.8.9              treeio_1.26.0           withr_3.0.0             KEGGREST_1.42.0         gridGraphics_0.5-1      scatterpie_0.2.2       
 [57] polyclip_1.10-6         Biostrings_2.70.3       ggtree_3.10.1           pillar_1.9.0            ggfun_0.1.4             generics_0.1.3          RCurl_1.98-1.14        
 [64] ggplot2_3.5.0           tidytree_0.4.6          munsell_0.5.1           scales_1.3.0            glue_1.7.0              lazyeval_0.2.2          tools_4.3.2            
 [71] data.table_1.15.4       fgsea_1.28.0            locfit_1.5-9.9          fs_1.6.3                graphlayouts_1.1.1      fastmatch_1.1-4         tidygraph_1.3.1        
 [78] cowplot_1.1.3           grid_4.3.2              ape_5.7-1               tidyr_1.3.1             AnnotationDbi_1.64.1    colorspace_2.1-0        nlme_3.1-163           
 [85] patchwork_1.2.0         GenomeInfoDbData_1.2.11 ggforce_0.4.2           cli_3.6.2               fansi_1.0.6             S4Arrays_1.2.1          viridisLite_0.4.2      
 [92] dplyr_1.1.4             gtable_0.3.4            yulab.utils_0.1.4       digest_0.6.35           ggplotify_0.1.2         SparseArray_1.2.4       ggrepel_0.9.5          
 [99] farver_2.1.1            memoise_2.0.1           lifecycle_1.0.4         httr_1.4.7              GO.db_3.18.0            bit64_4.0.5             MASS_7.3-60
clusterProfiler gseKEGG • 398 views
ADD COMMENT

Login before adding your answer.

Traffic: 613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6