Hello. Today, I am trying to perform a GSEA on a gene list deriving a human. The code and the result are here.
> HuSplCD103nEqCD103pEq_GSEA <- gseGO(geneList = HuSplCD103nEqCD103pEqRanking,
+ OrgDb = "org.Hs.eg.db", ont = "BP", keyType = "SYMBOL", pAdjustMethod="none")
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
preparing geneSet collections...
GSEA analysis...
no term enriched under specific pvalueCutoff...
I got nothing. Therefore, I also tried org.Mn.eg.db, because I recently performed a similar analysis on mouse datasets and got a result with more than 1000 categories.
> HuSplCD103nEqCD103pEq_GSEA <- gseGO(geneList = HuSplCD103nEqCD103pEqRanking,
+ OrgDb = "org.Mm.eg.db", ont = "BP", keyType = "SYMBOL", pAdjustMethod="none")
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
There were 23 warnings (use warnings() to see them)
The result contained 817 categories. What caused this difference? Is simply GSEA information concerning human genes far less than mouse genes? Now, I am tempted to deploy org.Mm.eg.db on the human genelist. But isn't it inappropriate to use org.Mm.eg.db with a human genelist?
> sessionInfo()
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=Japanese_Japan.utf8 LC_CTYPE=Japanese_Japan.utf8 LC_MONETARY=Japanese_Japan.utf8
[4] LC_NUMERIC=C LC_TIME=Japanese_Japan.utf8
time zone: Asia/Tokyo
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocManager_1.30.23 org.Hs.eg.db_3.19.1 tibble_3.2.1 shiny_1.8.1.1
[5] ggsignif_0.6.4 data.table_1.15.4 loupeR_1.1.0 SingleR_2.6.0
[9] dplyr_1.1.4 Seurat_5.1.0 SeuratObject_5.0.2 sp_2.1-4
[13] org.Rn.eg.db_3.19.1 ggplot2_3.5.1 enrichplot_1.24.0 openxlsx_4.2.5.2
[17] clusterProfiler_4.12.0 limma_3.60.2 org.Mm.eg.db_3.19.1 AnnotationDbi_1.66.0
[21] readr_2.1.5 SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0 Biobase_2.64.0
[25] GenomicRanges_1.56.0 GenomeInfoDb_1.40.0 IRanges_2.38.0 S4Vectors_0.42.0
[29] BiocGenerics_0.50.0 MatrixGenerics_1.16.0 matrixStats_1.3.0
loaded via a namespace (and not attached):
[1] fs_1.6.4 spatstat.sparse_3.0-3 HDO.db_0.99.1 httr_1.4.7
[5] RColorBrewer_1.1-3 tools_4.4.0 sctransform_0.4.1 utf8_1.2.4
[9] R6_2.5.1 lazyeval_0.2.2 uwot_0.2.2 withr_3.0.0
[13] gridExtra_2.3 progressr_0.14.0 textshaping_0.3.7 cli_3.6.2
[17] spatstat.explore_3.2-7 fastDummies_1.7.3 scatterpie_0.2.3 labeling_0.4.3
[21] sass_0.4.9 spatstat.data_3.0-4 ggridges_0.5.6 pbapply_1.7-2
[25] systemfonts_1.1.0 yulab.utils_0.1.4 gson_0.1.0 DOSE_3.30.1
[29] parallelly_1.37.1 rstudioapi_0.16.0 RSQLite_2.3.6 generics_0.1.3
[33] gridGraphics_0.5-1 ica_1.0-3 spatstat.random_3.2-3 vroom_1.6.5
[37] zip_2.3.1 GO.db_3.19.1 Matrix_1.7-0 fansi_1.0.6
[41] abind_1.4-5 lifecycle_1.0.4 qvalue_2.36.0 SparseArray_1.4.3
[45] Rtsne_0.17 grid_4.4.0 blob_1.2.4 promises_1.3.0
[49] crayon_1.5.2 miniUI_0.1.1.1 lattice_0.22-6 beachmat_2.20.0
[53] cowplot_1.1.3 KEGGREST_1.44.0 pillar_1.9.0 fgsea_1.30.0
[57] future.apply_1.11.2 codetools_0.2-20 fastmatch_1.1-4 leiden_0.4.3.1
[61] glue_1.7.0 ggfun_0.1.5 vctrs_0.6.5 png_0.1-8
[65] treeio_1.28.0 spam_2.10-0 gtable_0.3.5 cachem_1.1.0
[69] S4Arrays_1.4.0 mime_0.12 tidygraph_1.3.1 survival_3.5-8
[73] statmod_1.5.0 fitdistrplus_1.1-11 ROCR_1.0-11 nlme_3.1-164
[77] ggtree_3.12.0 bit64_4.0.5 RcppAnnoy_0.0.22 bslib_0.7.0
[81] irlba_2.3.5.1 KernSmooth_2.23-22 colorspace_2.1-0 DBI_1.2.3
[85] DESeq2_1.44.0 tidyselect_1.2.1 bit_4.0.5 compiler_4.4.0
[89] hdf5r_1.3.10 DelayedArray_0.30.1 plotly_4.10.4 shadowtext_0.1.3
[93] scales_1.3.0 lmtest_0.9-40 stringr_1.5.1 digest_0.6.35
[97] goftest_1.2-3 spatstat.utils_3.0-4 XVector_0.44.0 htmltools_0.5.8.1
[101] pkgconfig_2.0.3 sparseMatrixStats_1.16.0 fastmap_1.2.0 rlang_1.1.3
[105] htmlwidgets_1.6.4 UCSC.utils_1.0.0 DelayedMatrixStats_1.26.0 farver_2.1.2
[109] jquerylib_0.1.4 zoo_1.8-12 jsonlite_1.8.8 BiocParallel_1.38.0
[113] GOSemSim_2.30.0 BiocSingular_1.20.0 magrittr_2.0.3 GenomeInfoDbData_1.2.12
[117] ggplotify_0.1.2 dotCall64_1.1-1 patchwork_1.2.0 munsell_0.5.1
[121] Rcpp_1.0.12 ape_5.8 ggnewscale_0.4.10 viridis_0.6.5
[125] reticulate_1.36.1 stringi_1.8.4 ggraph_2.2.1 zlibbioc_1.50.0
[129] MASS_7.3-60.2 plyr_1.8.9 parallel_4.4.0 listenv_0.9.1
[133] ggrepel_0.9.5 deldir_2.0-4 Biostrings_2.72.0 graphlayouts_1.1.1
[137] splines_4.4.0 tensor_1.5 hms_1.1.3 locfit_1.5-9.9
[141] igraph_2.0.3 spatstat.geom_3.2-9 RcppHNSW_0.6.0 reshape2_1.4.4
[145] ScaledMatrix_1.12.0 tzdb_0.4.0 tweenr_2.0.3 httpuv_1.6.15
[149] RANN_2.6.1 tidyr_1.3.1 purrr_1.0.2 polyclip_1.10-6
[153] future_1.33.2 scattermore_1.2 ggforce_0.4.2 rsvd_1.0.5
[157] xtable_1.8-4 RSpectra_0.16-1 tidytree_0.4.6 later_1.3.2
[161] ragg_1.3.2 viridisLite_0.4.2 snow_0.4-4 aplot_0.2.2
[165] memoise_2.0.1 cluster_2.1.6 globals_0.16.3
Thank you so much! I did not know the rule that human genes are written in upper case and mouse in first letter capitalized format. When I saw such differences in papers, I thought those were determined arbitrarily by researchers. I also converted the dataset being analyzed to mouse format. Now, I have a reasonable GSEA result.
And next time, I will use ENTREZID primarily.