DGE genes are not available in GSEA
0
0
Entering edit mode
Lawrence • 0
@87feddb2
Last seen 23 months ago
India

Hello everyone,

I am a beginner in bioinformatics and doing my master thesis. I am performing DGE (Differential Gene Expression analysis) on an RNA-seq dataset. I have got around 500 genes after filtering. I also did a GSEA on these 500 genes and got around 78 enriched gene sets. But, I am not able to find the 500 genes in these 78 genesets, only 10-15 genes are found in the 78 genesets along with their ontological descriptions.

So, my examiner question me about what happened to the remaining genes and how can I get ontological information about them from GSEA?

I apologize if I have made a mistake or asked a stupid question.

Best regards Lawrence

dge gsea • 1.2k views
ADD COMMENT
0
Entering edit mode

Without providing any code it is impossible to give specific advice...

Having said this: to me it sounds you are doing the wrong test to find the relevant gene sets/pathways/GO categories. If you have a selected a subset of genes by on e.g. a FDR-value cutoff, you should use a so-called overrepresentation analysis (ORA), and not a gene set enrichment analysis (GSEA).

An ORA thus uses as input a subset of the genes you analyzed, but you selected because of passing e.g. a significance cutoff. You then test which gene sets (pathways, or GO category) are overrepresented in this list of (in your case 500) genes.

In contrast, GSEA assumes you use as input all genes in your data set, that are ranked on e.g. signed log p-value, and the algorithm then identifies which gene sets are enriched at the top (and bottom) of your ranked list. You may interpret this as gene sets being activated and suppressed, respectively, by your treatment compared to the reference.

That you 'loose' genes may be due to the fact that genes are not annotated to a gene set, and then, of course, they cannot be included in the analysis. In addition, in the case you used clusterProfiler for GSEA, then only the so-called leading edge genes are returned (aka core-enriched genes). Leading edge genes are basically the genes that 'make' a gene set significantly enriched. Check e.g. here.

Please check my post here to find some more info on the types of gene set tests clusterProfiler can perform: what the test method for enrichGO in clusterProfiler?

ADD REPLY

Login before adding your answer.

Traffic: 813 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6