Question

question about gseGO() in clusterprofiler package

0

Entering edit mode

sherinesaber • 0

@4d339dfc

Last seen 6 months ago

Saudi Arabia

Hello, I would appreciate it if you can help. I am asking about the input for gseGO()

I am working with single cell data. I have already found the differentially expressed genes using findmarkers() seurat between astrocytes diseased and astrocytes healthy. Now i want to do pathway enrichment.

Do I use all the gene list ranked by decreasing av_log2fc or do I keep only genes < 0.05 adj_pvalue. Before I submit my question I found a similar one and the answer was to use all genes (gseGO input list) I want to make sure this is still the answer. And if I will have to use all of the genes then how to rank them ? is a ranked gene list with decreasing avg_log2fc sufficient? Thank you.

clusterProfiler • 943 views

ADD COMMENT • link updated 7 months ago by Guido Hooiveld ★ 4.1k • written 8 months ago by sherinesaber • 0

score 0 · Answer 1 · 2024-09-02

Note that I don't have any experience with single-cell data nor seurat, so I can not comment on the specifics of these type of data!

Yes, for a GSEA type of analysis you should use all genes as input. Use the function gseGO for GO category-based gene set enrichment analysis, or gseKEGG (that uses KEGG-based gene sets), or the generic function GSEA.

If you are interested which gene sets are enriched in a subset of the genes you measured, e.g. those with p<0.05, then you should perform a so-called over-representation analysis (ORA) using the function enrichGO (or enrichKEGG, or the generic function enricher).

See e.g. here for the differences between the 2 approaches: https://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html