Hello everyone,
I am a beginner in bioinformatics and doing my master thesis. I am performing DGE (Differential Gene Expression analysis) on an RNA-seq dataset. I have got around 500 genes after filtering. I also did a GSEA on these 500 genes and got around 78 enriched gene sets. But, I am not able to find the 500 genes in these 78 genesets, only 10-15 genes are found in the 78 genesets along with their ontological descriptions.
So, my examiner question me about what happened to the remaining genes and how can I get ontological information about them from GSEA?
I apologize if I have made a mistake or asked a stupid question.
Best regards Lawrence
Without providing any code it is impossible to give specific advice...
Having said this: to me it sounds you are doing the wrong test to find the relevant gene sets/pathways/GO categories. If you have a selected a subset of genes by on e.g. a FDR-value cutoff, you should use a so-called overrepresentation analysis (ORA), and not a gene set enrichment analysis (GSEA).
An ORA thus uses as input a subset of the genes you analyzed, but you selected because of passing e.g. a significance cutoff. You then test which gene sets (pathways, or GO category) are overrepresented in this list of (in your case 500) genes.
In contrast, GSEA assumes you use as input all genes in your data set, that are ranked on e.g. signed log p-value, and the algorithm then identifies which gene sets are enriched at the top (and bottom) of your ranked list. You may interpret this as gene sets being activated and suppressed, respectively, by your treatment compared to the reference.
That you 'loose' genes may be due to the fact that genes are not annotated to a gene set, and then, of course, they cannot be included in the analysis. In addition, in the case you used
clusterProfiler
for GSEA, then only the so-called leading edge genes are returned (aka core-enriched genes). Leading edge genes are basically the genes that 'make' a gene set significantly enriched. Check e.g. here.Please check my post here to find some more info on the types of gene set tests
clusterProfiler
can perform: what the test method for enrichGO in clusterProfiler?