Question

gage pathways enrichment analysis

0

Entering edit mode

francesca.defilippis ▴ 100

@francescadefilippis-7043

Last seen 10.3 years ago

European Union

Hello! I'd like to do a pathway enrichment analysis on metagenomic data. I already have the KO IDs for my genes. Does gage support bacteria genomes? than, I saw that you need to specify the organism name, but I'm not interested in a specific bacterial species. Is it possible to use it with KO ids from different species?

Thanks

Francesca

gage pathways kegg • 6.7k views

ADD COMMENT • link updated 10.3 years ago by Luo Weijun ★ 1.6k • written 10.4 years ago by francesca.defilippis ▴ 100

score 0 · Answer 1 · 2014-11-19

Yes, both gage and pathview work with KO. You may generate the pathway gene set data using function kegg.gsets (with species="ko"). Then you can follow the examples in the quick start and basic analysis sections of the gage tutorial:

http://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/gage.pdf

you may also work with any individual species (including bacteria) as long as it is included in KEGG. For details:

library(pathview)

data(korg)

head(korg)

you can also check section 7.5 of the pathview tutorial:

http://bioconductor.org/packages/release/bioc/vignettes/pathview/inst/doc/pathview.pdf

score 0 · Answer 2 · 2014-11-21

Hi followed the tutorial starting from DESeq output.

> require(gage)

> datakegg.gs)

> fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
> sel <- fc.kegg.p$greater[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$greater[, "q.val"])
> path.ids <- rownames(fc.kegg.p$greater)[sel]
> sel.l <- fc.kegg.p$less[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$less[,"q.val"])
> path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
> path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)

But how can I get a sort of list o up- or down-regulated pathways?

score 0 · Answer 3 · 2014-11-21

0

Entering edit mode

Luo Weijun ★ 1.6k

@luo-weijun-1783

Last seen 22 months ago

United States

In your code above, path.ids and path.ids.l are the list of up and down-regulated pathways selected based on q-val<0.1. the full analysis results tables are stored in fc.kegg.p$greater and fc.kegg.p$less. To take a look:

head(fc.kegg.p$greater)

ADD COMMENT • link 10.4 years ago Luo Weijun ★ 1.6k

0

Entering edit mode

My fc.kegg.p table is all NAs... Moreover, I have all pathways from human genome (i think, the IDs start with hsa). In the code above, where should I specify the organism? And what di I need to use, since I have mixed bacteria population?

ADD REPLY • link 10.4 years ago francesca.defilippis ▴ 100

score 0 · Answer 4 · 2014-11-23

0

Entering edit mode

Luo Weijun ★ 1.6k

@luo-weijun-1783

Last seen 22 months ago

United States

When you call gage function in your code above, you specify gsets = kegg.gs, which is the gene set data. However, you used the human gene set data by following the tutorial example exactly. As I mentioned above, you should create your own gene set data using kegg.gsets function (with species="ko"). Please follow the suggested links/docs above.

ADD COMMENT • link 10.4 years ago Luo Weijun ★ 1.6k

0

Entering edit mode

I'm still having troubles. I created my gset

> myg.sets=kegg.gsets(species = "ko")

And then used it in the gage function

fc.kegg.p <- gage(exp.fc, gsets = myg.sets, ref = NULL, samp = NULL)

this is my exp.fc (fold changes from deseq)

head(exp.fc)
K01866 K01740 K01892 K01583 K00147 K00290
-5.155880 -5.086753 -3.897103 -3.808646 -3.787984 -3.112827

But this still resulted in a NAs table

fc.kegg.p$greater
p.geomean stat.mean p.val q.val set.size exp1
kg.sets NA NaN NA NA 0 NA
sigmet.idx NA NaN NA NA 0 NA
sig.idx NA NaN NA NA 0 NA
met.idx NA NaN NA NA 0 NA
dise.idx NA NaN NA NA 0 NA

Where I'm doing wrong?

ADD REPLY • link 10.4 years ago francesca.defilippis ▴ 100

score 0 · Answer 5 · 2014-11-24

0

Entering edit mode

Luo Weijun ★ 1.6k

@luo-weijun-1783

Last seen 22 months ago

United States

You can create your KO gene set like:

kg.ko=kegg.gsets(species = "ko")

kegg.ko.gs=kg.ko$kg.sets[kg.ko$sigmet.idx]

kegg.ko.gs but not kg.ko is the one you use directly, you can check:

lapplykegg.ko.gs[1:3], head, 4)

names(kg.ko)

all info is documented in page 6 of the gage tutorial and the function help info:

?kegg.gsets

ADD COMMENT • link 10.4 years ago Luo Weijun ★ 1.6k

0

Entering edit mode

I'm sorry to bother you again, but I still have a table of NAs, except for setsize where there are some values

p.geomean stat.mean p.val q.val
ko00970 Aminoacyl-tRNA biosynthesis NA NaN NA NA
ko02010 ABC transporters NA NaN NA NA
ko02020 Two-component system NA NaN NA NA
ko02030 Bacterial chemotaxis NA NaN NA NA
ko02040 Flagellar assembly NA NaN NA NA
ko02060 Phosphotransferase system (PTS) NA NaN NA NA

Where I'm doing wrong?

ADD REPLY • link 10.4 years ago francesca.defilippis ▴ 100

score 0 · Answer 6 · 2014-11-24

0

Entering edit mode

Luo Weijun ★ 1.6k

@luo-weijun-1783

Last seen 22 months ago

United States

Can I see what you data look like? You may do:

str(exp.fc)

head(exp.fc)

and post the output here.

ADD COMMENT • link 10.4 years ago Luo Weijun ★ 1.6k

0

Entering edit mode

Thesev are the output:

str(exp.fc)
Named num [1:238] -5.16 -5.09 -3.9 -3.81 -3.79 ...
- attr(*, "names")= chr [1:238] "K01866" "K01740" "K01892" "K01583" ...

head(exp.fc)
K01866 K01740 K01892 K01583 K00147 K00290
-5.155880 -5.086753 -3.897103 -3.808646 -3.787984 -3.112827

ADD REPLY • link 10.4 years ago francesca.defilippis ▴ 100

score 0 · Answer 7 · 2014-11-25

0

Entering edit mode

Luo Weijun ★ 1.6k

@luo-weijun-1783

Last seen 22 months ago

United States

Okay, you only have 238 KO genes. These seems to be a selected significant gene list, since all are heavily up or down regulated based on log2 fold change values.

Gene set analysis like GAGE requires the full list of genes (proteins, molecules etc), usually thousands of them, instead of a preselected short list. Otherwise, there is no background to compare to in the statistical test. Therefore, you should include data for all KO genes output (from DEseq or other analysis) as described in gage tutorials.

BTW, you saw NA or NaN in your output because each pathways gets none or only a few genes mapped as the list is very short with 238 genes.

ADD COMMENT • link 10.4 years ago Luo Weijun ★ 1.6k

0

Entering edit mode

Hi! I'm trying to apply the gage workflow to the full count matrix (raw counts, not the results from deseq).

I normalized the counts as described in the documentation and created my kegg gsets as above. Then I tried:

counts.kegg.p <- gage(metag.normOV, gsets =kegg.ko.gs, ref = metag.normOV[,names(which(clD == 'O'))], samp = metag.normOV[,names(which(clD == 'V'))])

This is my normalized counts matrix

head(metag.normOV)

15.TO.V 37.TO.O 02.BA.O 32.TO.O 35.TO.V 33.TO.O 25.TO.V 30.BO.O 10.BO.V 24.TO.O 27.TO.V 13.BA.V
K01361 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.474514 3.000000 3.000000 3.000000 3.284227
K01362 10.510925 10.180883 9.508538 10.747639 9.783315 9.620501 9.132543 9.513370 9.918586 10.406776 10.114846 9.444985
K05841 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000
K05844 3.000000 3.648295 3.410819 3.000000 3.134276 3.153423 3.000000 3.333004 3.000000 3.000000 3.000000 3.000000
K05845 3.473451 4.992189 4.573684 4.758411 3.664711 3.292087 3.247012 4.738421 5.297531 3.855768 3.131843 3.725325
K05846 7.614699 8.596064 7.457748 6.973650 7.457559 7.209767 7.663303 7.752926 7.837494 8.267429 7.530133 7.813055

and clD is just a class vector in order to select controls vs treated samples (O and V)

But I got this error

Error in saaPrep(exprs, ref = ref, samp = samp, same.dir = same.dir, compare = compare, :
wrong reference column index

Any clue about where the problem could be?

ADD REPLY • link 10.4 years ago francesca.defilippis ▴ 100

score 0 · Answer 8 · 2014-12-04

ref and samp should be column numbers like ref=1:3, samp=4:6 etc. obviously yours are not. You may check the function info:

?gage

In addition, names(which(clD == 'O')) will yeild NULL. Please get yourself familiar with basic R syntax.

You should do something like:

cns=colnames(metag.normOV)

ref1=grep(".O$", cns)

samp1=grep(".V$", cns)

counts.kegg.p = gage(metag.normOV, gsets =kegg.ko.gs, ref=ref1, samp=samp1)