Question

GESA with certain gene list returns no result with human genom, but many result with mouse genom.

0

Entering edit mode

sawa • 0

@15a0ef4b

Last seen 4 months ago

Japan

Hello. Today, I am trying to perform a GSEA on a gene list deriving a human. The code and the result are here.

> HuSplCD103nEqCD103pEq_GSEA <- gseGO(geneList = HuSplCD103nEqCD103pEqRanking, 
+                                     OrgDb = "org.Hs.eg.db", ont = "BP", keyType = "SYMBOL", pAdjustMethod="none")
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections...
GSEA analysis...
no term enriched under specific pvalueCutoff...

I got nothing. Therefore, I also tried org.Mn.eg.db, because I recently performed a similar analysis on mouse datasets and got a result with more than 1000 categories.

> HuSplCD103nEqCD103pEq_GSEA <- gseGO(geneList = HuSplCD103nEqCD103pEqRanking, 
+                                     OrgDb = "org.Mm.eg.db", ont = "BP", keyType = "SYMBOL", pAdjustMethod="none")
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
There were 23 warnings (use warnings() to see them)

The result contained 817 categories. What caused this difference? Is simply GSEA information concerning human genes far less than mouse genes? Now, I am tempted to deploy org.Mm.eg.db on the human genelist. But isn't it inappropriate to use org.Mm.eg.db with a human genelist?

> sessionInfo()
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=Japanese_Japan.utf8  LC_CTYPE=Japanese_Japan.utf8    LC_MONETARY=Japanese_Japan.utf8
[4] LC_NUMERIC=C                    LC_TIME=Japanese_Japan.utf8    

time zone: Asia/Tokyo
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocManager_1.30.23         org.Hs.eg.db_3.19.1         tibble_3.2.1                shiny_1.8.1.1              
 [5] ggsignif_0.6.4              data.table_1.15.4           loupeR_1.1.0                SingleR_2.6.0              
 [9] dplyr_1.1.4                 Seurat_5.1.0                SeuratObject_5.0.2          sp_2.1-4                   
[13] org.Rn.eg.db_3.19.1         ggplot2_3.5.1               enrichplot_1.24.0           openxlsx_4.2.5.2           
[17] clusterProfiler_4.12.0      limma_3.60.2                org.Mm.eg.db_3.19.1         AnnotationDbi_1.66.0       
[21] readr_2.1.5                 SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0 Biobase_2.64.0             
[25] GenomicRanges_1.56.0        GenomeInfoDb_1.40.0         IRanges_2.38.0              S4Vectors_0.42.0           
[29] BiocGenerics_0.50.0         MatrixGenerics_1.16.0       matrixStats_1.3.0          

loaded via a namespace (and not attached):
  [1] fs_1.6.4                  spatstat.sparse_3.0-3     HDO.db_0.99.1             httr_1.4.7               
  [5] RColorBrewer_1.1-3        tools_4.4.0               sctransform_0.4.1         utf8_1.2.4               
  [9] R6_2.5.1                  lazyeval_0.2.2            uwot_0.2.2                withr_3.0.0              
 [13] gridExtra_2.3             progressr_0.14.0          textshaping_0.3.7         cli_3.6.2                
 [17] spatstat.explore_3.2-7    fastDummies_1.7.3         scatterpie_0.2.3          labeling_0.4.3           
 [21] sass_0.4.9                spatstat.data_3.0-4       ggridges_0.5.6            pbapply_1.7-2            
 [25] systemfonts_1.1.0         yulab.utils_0.1.4         gson_0.1.0                DOSE_3.30.1              
 [29] parallelly_1.37.1         rstudioapi_0.16.0         RSQLite_2.3.6             generics_0.1.3           
 [33] gridGraphics_0.5-1        ica_1.0-3                 spatstat.random_3.2-3     vroom_1.6.5              
 [37] zip_2.3.1                 GO.db_3.19.1              Matrix_1.7-0              fansi_1.0.6              
 [41] abind_1.4-5               lifecycle_1.0.4           qvalue_2.36.0             SparseArray_1.4.3        
 [45] Rtsne_0.17                grid_4.4.0                blob_1.2.4                promises_1.3.0           
 [49] crayon_1.5.2              miniUI_0.1.1.1            lattice_0.22-6            beachmat_2.20.0          
 [53] cowplot_1.1.3             KEGGREST_1.44.0           pillar_1.9.0              fgsea_1.30.0             
 [57] future.apply_1.11.2       codetools_0.2-20          fastmatch_1.1-4           leiden_0.4.3.1           
 [61] glue_1.7.0                ggfun_0.1.5               vctrs_0.6.5               png_0.1-8                
 [65] treeio_1.28.0             spam_2.10-0               gtable_0.3.5              cachem_1.1.0             
 [69] S4Arrays_1.4.0            mime_0.12                 tidygraph_1.3.1           survival_3.5-8           
 [73] statmod_1.5.0             fitdistrplus_1.1-11       ROCR_1.0-11               nlme_3.1-164             
 [77] ggtree_3.12.0             bit64_4.0.5               RcppAnnoy_0.0.22          bslib_0.7.0              
 [81] irlba_2.3.5.1             KernSmooth_2.23-22        colorspace_2.1-0          DBI_1.2.3                
 [85] DESeq2_1.44.0             tidyselect_1.2.1          bit_4.0.5                 compiler_4.4.0           
 [89] hdf5r_1.3.10              DelayedArray_0.30.1       plotly_4.10.4             shadowtext_0.1.3         
 [93] scales_1.3.0              lmtest_0.9-40             stringr_1.5.1             digest_0.6.35            
 [97] goftest_1.2-3             spatstat.utils_3.0-4      XVector_0.44.0            htmltools_0.5.8.1        
[101] pkgconfig_2.0.3           sparseMatrixStats_1.16.0  fastmap_1.2.0             rlang_1.1.3              
[105] htmlwidgets_1.6.4         UCSC.utils_1.0.0          DelayedMatrixStats_1.26.0 farver_2.1.2             
[109] jquerylib_0.1.4           zoo_1.8-12                jsonlite_1.8.8            BiocParallel_1.38.0      
[113] GOSemSim_2.30.0           BiocSingular_1.20.0       magrittr_2.0.3            GenomeInfoDbData_1.2.12  
[117] ggplotify_0.1.2           dotCall64_1.1-1           patchwork_1.2.0           munsell_0.5.1            
[121] Rcpp_1.0.12               ape_5.8                   ggnewscale_0.4.10         viridis_0.6.5            
[125] reticulate_1.36.1         stringi_1.8.4             ggraph_2.2.1              zlibbioc_1.50.0          
[129] MASS_7.3-60.2             plyr_1.8.9                parallel_4.4.0            listenv_0.9.1            
[133] ggrepel_0.9.5             deldir_2.0-4              Biostrings_2.72.0         graphlayouts_1.1.1       
[137] splines_4.4.0             tensor_1.5                hms_1.1.3                 locfit_1.5-9.9           
[141] igraph_2.0.3              spatstat.geom_3.2-9       RcppHNSW_0.6.0            reshape2_1.4.4           
[145] ScaledMatrix_1.12.0       tzdb_0.4.0                tweenr_2.0.3              httpuv_1.6.15            
[149] RANN_2.6.1                tidyr_1.3.1               purrr_1.0.2               polyclip_1.10-6          
[153] future_1.33.2             scattermore_1.2           ggforce_0.4.2             rsvd_1.0.5               
[157] xtable_1.8-4              RSpectra_0.16-1           tidytree_0.4.6            later_1.3.2              
[161] ragg_1.3.2                viridisLite_0.4.2         snow_0.4-4                aplot_0.2.2              
[165] memoise_2.0.1             cluster_2.1.6             globals_0.16.3

clusterProfiler • 647 views

ADD COMMENT • link 5 months ago sawa • 0

score 1 · Answer 1 · 2024-06-13

1

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 19 minutes ago

United States

Ideally you would use something better than gene symbols for this. I realize that biologists like gene symbols, but they are really terrible, not being unique or constrained in any way. At least things like NCBI or Ensembl Gene IDs are assigned by a central authority that tries for uniqueness and identifiability.

That said, do note that human gene symbols are all caps and mouse have only the first letter capitalized. It's highly likely that your gene symbols follow the latter convention, which is why you are getting mappings for mouse and not human.

ADD COMMENT • link 5 months ago James W. MacDonald 67k

0

Entering edit mode

Thank you so much! I did not know the rule that human genes are written in upper case and mouse in first letter capitalized format. When I saw such differences in papers, I thought those were determined arbitrarily by researchers. I also converted the dataset being analyzed to mouse format. Now, I have a reasonable GSEA result.

And next time, I will use ENTREZID primarily.

ADD REPLY • link 5 months ago sawa • 0