Question

Using GAGE to Analyze Pathway Enrichment Directly from Fold Change Data

0

Entering edit mode

JMallory • 0

@jmallory-13488

Last seen 5.9 years ago

I have been using the following tutorial by Stephen Turner and Will Bush to look at some RNA-seq data.

http://www.gettinggeneticsdone.com/2015/12/tutorial-rna-seq-differential.html

Looking into GAGE's documentation, it looks like this tutorial is using it in a somewhat non-standard way. Specifically, it looks like they are using it to conduct a GSEA-esque analysis, feeding it a vector of fold changes annotated by Entrez IDs and looking for enrichment within pathways contained in the `kegg.sets.hs` object.

Were this a standard GSEA analysis, I would order transcripts by log2 fold change prior to analysis. In this use case of GAGE, should transcripts also be rank ordered prior to analysis? Running it both ways appears to make a large difference, at least in the case of my data.

GAGE Gene Ontology rna-seq pathview • 2.7k views

ADD COMMENT • link updated 7.5 years ago by Luo Weijun ★ 1.6k • written 7.5 years ago by JMallory • 0

score 1 · Answer 1 · 2017-10-04

1

Entering edit mode

Luo Weijun ★ 1.6k

@luo-weijun-1783

Last seen 22 months ago

United States

Gene orders in the data should make no difference in GAGE analysis. You can randomly shuffle the rows in the example datasets (e.g. gse16873) and run GAGE, there will be no difference.

It is likely that there are multiple rows corresponding to the same gene IDs in your data. In other case, the order of rows would make a difference, as only the first row of the same gene ID will be mapped. GAGE actually assumes independence between genes/rows. So gene IDs in the user data and gene set should be unique. You will need merge your repetitive rows for the same gene ID in the a single row. You may check on the data preparation tutorial for details:

http://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/dataPrep.pdf

ADD COMMENT • link 7.5 years ago Luo Weijun ★ 1.6k

0

Entering edit mode

I see. Yes, I have multiple transcripts/splice variants of a single gene mapping to a single Entrez ID in my user data. If I am understanding correctly, this is the source of the issue. The algorithm is simply selecting the first fold change value associated with a given gene ID for use in further computations and disregarding other gene isoforms. Correct me if I am wrong and thank you for your response.

ADD REPLY • link 7.5 years ago JMallory • 0