gage pathways enrichment analysis
8
0
Entering edit mode
@francescadefilippis-7043
Last seen 9.9 years ago
European Union

Hello! I'd like to do a pathway enrichment analysis on metagenomic data. I already have the KO IDs for my genes. Does gage support bacteria genomes? than, I saw that you need to specify the organism name, but I'm not interested in a specific bacterial species. Is it possible to use it with KO ids from different species?

 

Thanks

Francesca

gage pathways kegg • 6.3k views
ADD COMMENT
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 17 months ago
United States

Yes, both gage and pathview work with KO. You may generate the pathway gene set data using function kegg.gsets (with species="ko"). Then you can follow the examples in the quick start and basic analysis sections of the gage tutorial:

http://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/gage.pdf

 

you may also work with any individual species (including bacteria) as long as it is included in KEGG. For details:

library(pathview)

data(korg)

head(korg)

 

you can also check section 7.5 of the pathview tutorial:

http://bioconductor.org/packages/release/bioc/vignettes/pathview/inst/doc/pathview.pdf

ADD COMMENT
0
Entering edit mode
@francescadefilippis-7043
Last seen 9.9 years ago
European Union

Hi followed the tutorial starting from DESeq output.

> require(gage)

> datakegg.gs)

> fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
> sel <- fc.kegg.p$greater[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$greater[, "q.val"])
> path.ids <- rownames(fc.kegg.p$greater)[sel]
> sel.l <- fc.kegg.p$less[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$less[,"q.val"])
> path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
> path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)

 

But how can I get a sort of list o up- or down-regulated pathways?

 

ADD COMMENT
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 17 months ago
United States

In your code above, path.ids and path.ids.l are the list of up and down-regulated pathways selected based on q-val<0.1. the full analysis results tables are stored in fc.kegg.p$greater and fc.kegg.p$less. To take a look:

head(fc.kegg.p$greater)

ADD COMMENT
0
Entering edit mode

My fc.kegg.p table is all NAs... Moreover, I have all pathways from human genome (i think, the IDs start with hsa). In the code above, where should I specify the organism? And what di I need to use, since I have mixed bacteria population?

ADD REPLY
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 17 months ago
United States

When you call gage function in your code above, you specify gsets = kegg.gs, which is the gene set data. However, you used the human gene set data by following the tutorial example exactly. As I mentioned above, you should create your own gene set data using kegg.gsets function (with species="ko"). Please follow the suggested links/docs above.

ADD COMMENT
0
Entering edit mode

I'm still having troubles. I created my gset

> myg.sets=kegg.gsets(species = "ko")

And then used it in the gage function

fc.kegg.p <- gage(exp.fc, gsets = myg.sets, ref = NULL, samp = NULL)

this is my exp.fc (fold changes from deseq)

head(exp.fc)
   K01866    K01740    K01892    K01583    K00147    K00290 
-5.155880 -5.086753 -3.897103 -3.808646 -3.787984 -3.112827 

 

But this still resulted in a NAs table

fc.kegg.p$greater
           p.geomean stat.mean p.val q.val set.size exp1
kg.sets           NA       NaN    NA    NA        0   NA
sigmet.idx        NA       NaN    NA    NA        0   NA
sig.idx           NA       NaN    NA    NA        0   NA
met.idx           NA       NaN    NA    NA        0   NA
dise.idx          NA       NaN    NA    NA        0   NA

 

Where I'm doing wrong?

ADD REPLY
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 17 months ago
United States

You can create your KO gene set like:

kg.ko=kegg.gsets(species = "ko")

kegg.ko.gs=kg.ko$kg.sets[kg.ko$sigmet.idx]

 

kegg.ko.gs but not kg.ko is the one you use directly, you can check:

lapplykegg.ko.gs[1:3], head, 4)

names(kg.ko)

 

all info is documented in page 6 of the gage tutorial and the function help info:

?kegg.gsets

ADD COMMENT
0
Entering edit mode

I'm sorry to bother you again, but I still have a table of NAs, except for setsize where there are some values

                                        p.geomean stat.mean p.val q.val
ko00970 Aminoacyl-tRNA biosynthesis            NA       NaN    NA    NA
ko02010 ABC transporters                       NA       NaN    NA    NA
ko02020 Two-component system                   NA       NaN    NA    NA
ko02030 Bacterial chemotaxis                   NA       NaN    NA    NA
ko02040 Flagellar assembly                     NA       NaN    NA    NA
ko02060 Phosphotransferase system (PTS)        NA       NaN    NA    NA

Where I'm doing wrong?

 

ADD REPLY
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 17 months ago
United States

Can I see what you data look like? You may do:

str(exp.fc)

head(exp.fc)

 

and post the output here.

ADD COMMENT
0
Entering edit mode

Thesev are the output:

str(exp.fc)
 Named num [1:238] -5.16 -5.09 -3.9 -3.81 -3.79 ...
 - attr(*, "names")= chr [1:238] "K01866" "K01740" "K01892" "K01583" ...

 

 head(exp.fc)
   K01866    K01740    K01892    K01583    K00147    K00290 
-5.155880 -5.086753 -3.897103 -3.808646 -3.787984 -3.112827 

 

ADD REPLY
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 17 months ago
United States

Okay, you only have 238 KO genes. These seems to be a selected significant gene list, since all are heavily up or down regulated based on log2 fold change values.

Gene set analysis like GAGE requires the full list of genes (proteins, molecules etc), usually thousands of them, instead of a preselected short list. Otherwise, there is no background to compare to in the statistical test. Therefore, you should include data for all KO genes output (from DEseq or other analysis) as described in gage tutorials.

BTW, you saw NA or NaN in your output because each pathways gets none or only a few genes mapped as the list is very short with 238 genes.

ADD COMMENT
0
Entering edit mode

Hi! I'm trying to apply the gage workflow to the full count matrix (raw counts, not the results from deseq). 

 

I normalized the counts as described in the documentation and created my kegg gsets as above. Then I tried:

counts.kegg.p <- gage(metag.normOV, gsets =kegg.ko.gs, ref = metag.normOV[,names(which(clD == 'O'))], samp = metag.normOV[,names(which(clD == 'V'))])

This is my normalized counts matrix

head(metag.normOV)

         15.TO.V   37.TO.O  02.BA.O   32.TO.O  35.TO.V  33.TO.O  25.TO.V  30.BO.O  10.BO.V   24.TO.O   27.TO.V  13.BA.V
K01361  3.000000  3.000000 3.000000  3.000000 3.000000 3.000000 3.000000 3.474514 3.000000  3.000000  3.000000 3.284227
K01362 10.510925 10.180883 9.508538 10.747639 9.783315 9.620501 9.132543 9.513370 9.918586 10.406776 10.114846 9.444985
K05841  3.000000  3.000000 3.000000  3.000000 3.000000 3.000000 3.000000 3.000000 3.000000  3.000000  3.000000 3.000000
K05844  3.000000  3.648295 3.410819  3.000000 3.134276 3.153423 3.000000 3.333004 3.000000  3.000000  3.000000 3.000000
K05845  3.473451  4.992189 4.573684  4.758411 3.664711 3.292087 3.247012 4.738421 5.297531  3.855768  3.131843 3.725325
K05846  7.614699  8.596064 7.457748  6.973650 7.457559 7.209767 7.663303 7.752926 7.837494  8.267429  7.530133 7.813055

 

and clD is just a class vector in order to select controls vs treated samples (O and V)

 

But I got this error

Error in saaPrep(exprs, ref = ref, samp = samp, same.dir = same.dir, compare = compare,  : 
  wrong reference column index

Any clue about where the problem could be?

 

ADD REPLY
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 17 months ago
United States

ref and samp should be column numbers like ref=1:3, samp=4:6 etc. obviously yours are not. You may check the function info:

?gage

In addition, names(which(clD == 'O')) will yeild NULL. Please get yourself familiar with basic R syntax.

 

You should do something like:

cns=colnames(metag.normOV)

ref1=grep(".O$", cns)

samp1=grep(".V$", cns)

counts.kegg.p = gage(metag.normOV, gsets =kegg.ko.gs, ref=ref1, samp=samp1)

 

ADD COMMENT

Login before adding your answer.

Traffic: 363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6