Urgent: Do GAGE applicable for non model species?
4
0
Entering edit mode
jackipchiho ▴ 10
@jackipchiho-8282
Last seen 9.2 years ago
Hong Kong

Hi ALL,

Since I am working on non model species green mussel and intertidal whelk, I would like to know can I use GAGE to perform gene set analysis for control and chemical treatment groups (n=2 / n=3)? If yes, how can I do it?

I have done de novo transcriptome and worked out the KEGG (Ko and K number without specific species) and GO (GO number without specific species). Is that possible to customize my own gene annotation file and perform available analysis in GAGE?

Can anyone help me about that? I have read the GAGE and pathview doc, but I cannot figure out the way to do so. I am running out of time to finish my MPhil study ><".

Many thanks,

Jack

gage • 3.5k views
ADD COMMENT
0
Entering edit mode

Dear Jack,

I think I'm on the same track as yours. Can you please tell me how did you attach the KO number to the genes from non-model species? Thank you very much. 

      Sincerely,

          Kang

 

ADD REPLY
0
Entering edit mode

bioDBNet (https://biodbnet-abcc.ncifcrf.gov/) turns out help me really a lot!

ADD REPLY
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 17 months ago
United States
I don’t think that your research species are part of KEGG at this time. however, if you label your genes with ko number, then you can generate KO gene set data first: kg.ko=kegg.gsets("ko") kegg.gs=kg.ko$kg.sets[kg.ko$sigmet.idx] you can then follow the gage examples to do the KEGG pathway analysis. Note that the row names (or names) of your gene data matrix need to be ko numbers. To check the gene IDs in ko numbers fpr both the gene set and your data, you can do: lapplykegg.gs[1:3], head, 3) head(your.data) GO analysis is similar if you can generate your GO gene set data with the gene IDs you have.
ADD COMMENT
0
Entering edit mode

I have a follow-up question about this approach.

I ran KAAS on my non-model organism, and I understand your suggestion above changing the gene IDs (i.e. the row names of an expression data matrix) to the KEGG Orthology (KO) numbers assigned by KAAS before running gage.

Since the multiple genes might be assigned to the same KO number, what will happen if the input (exprs argument) to gage has non-unique row names)?

ADD REPLY
0
Entering edit mode
jackipchiho ▴ 10
@jackipchiho-8282
Last seen 9.2 years ago
Hong Kong

I have created a kegg gene set (Rc_kegg.txt) based on my own annotation data:

Ko00010    Glycolysis / Gluconeogenesis    comp10418997_c0    comp11135183_c0    comp11547_c0    comp124788_c0 ......   

Ko00020    Citrate cycle (TCA cycle)    comp106429_c0    comp11438380_c0    comp11852469_c0    comp124788_c0   ........

Ko00030    Pentose phosphate pathway    comp11085364_c0    comp11547_c0    comp17167857_c0    comp17594_c1 .......

 

And the expression data (Rc_30BgF.txt)

                        CgF1    CgF2    CgF3    BgF1    BgF2   BgF3

comp579412_c0    9.88    11.7    13.89    9.42    9.74    8.95
comp579413_c0    0.59    1.74    0.23    2.91    0.69    1.56
comp579422_c0    7.26    3.58    8.02    1.84    6.2   3.6

 

Is that okay for perform GAGE? I don't how to create my own kegg gene set in R, could you use the above file name to demonstrate once?

Many thanks

ADD COMMENT
0
Entering edit mode
jackipchiho ▴ 10
@jackipchiho-8282
Last seen 9.2 years ago
Hong Kong

I tried the following way to do the analysis based on my own kegg gene set data:

> filename=system.file("extdata/Rckegg.gmt", package = "gage")
> Rckegg.gs=readList(filename)

> data(Rc_30BgF)
> Cg=1:3
> Bg=4:6
> Rc.kegg.unp <- gage(Rc_30BgF, gsets = Rckegg.gs, ref = Cg, samp = Bg, compared = "unpaired")

Is that okay to work in this way?

Many thanks

 

ADD COMMENT
0
Entering edit mode
Gage provide an function, readList(), to read in gene data in gmt file, and readExpData() function to read in expression data. Check tutorial dataPrep.pdf for details. You were using the right function, readList(). however, you cannot copy and paste the demo code in the tutorials literally. The following lines load data pre-existing with the gage package, and won’t work for you own data. filename=system.file("extdata/Rckegg.gmt", package = "gage") … data(Rc_30BgF) you should specify the full path + file name to your data files, and use readList() and readExpData() (or read.table() R function) to read in your gene list and expression data. You may want to check the detailed info for these functions too. These are R basics and not specific to gage package, you should really get yourself familiar with R to use gage or any other package/utilities in R.
ADD REPLY
0
Entering edit mode

Thanks Weijun, I have revised the code:

>list.files()

"Rckegg.gs.gmt"    "Rcg30.txt"

>Rckegg.gs <- readList("Rckegg.gs.gmt")

>headRckegg.gs[1:2])

$Ko00010
  [1] "comp10418997_c0" "comp11135183_c0" "comp11547_c0" .......

$Ko00020
 [1] "comp106429_c0"   "comp11438380_c0" "comp11852469_c0" "comp124788_c0" ........

>Rcg30 <- readExpData("Rcg30", row.names=1)

>head(Rcg30)

                            Cg1 Cg2 Cg3 Bg1 Bg2 Bg3 Pg1 Pg2 Pg3
comp10000024_c0    1    2    3    4    5    6    7    8    9
comp10000066_c0    1    2    3    4    5    6    7    8    9

am I using the right function of readList and readExpData?

Besides, in my expt. I have one control and two treatment groups with two time-point in four types of tissue samples. 

If I would like to reveal the toxicity mechanism/pathway between (1) two chemicals or (2) exposure time (time point 1 and time point 2) in same chemical, do I run gage for control vs treatment 1 and control vs treatment 2 one by one? total 16 comparison and compare (1) treatments and (2) time point using gagaComp. or use gagePipe and gageComp to do it?

Do gageComp able to compare (control vs treatment 1) and (control vs treatment 2) or time point 1 vs time point 2?

Thanks for your great help, since I have the first student to work with NGS in my Lab, I need to try in diff. way and seek help for expert ^ ^, like Weijun, to correct me.

Many thanks,

 

 

ADD REPLY
0
Entering edit mode
amywingsze • 0
@amywingsze-8290
Last seen 9.4 years ago
Hong Kong
Hi Weijun,

I am facing exactly the same problem as Jack does! The plant species that I am working on is not a model species and also distantly related to Arabidopsis. I have annotated my de novo assembled transcriptome with Arabidopsis database and ko number. I would like to use the KO gene set data for enrichment analysis as you mentioned above. But there are multiple transcripts/isoforms that are annotated with the same ko number based on homology search. Would gage accept input that has same rowname(ko number) appearing more than once in the expression data file? Or should I just added up the FPKM value of those with the same ko number?

Thanks for your help in advance!

Amy.
ADD COMMENT

Login before adding your answer.

Traffic: 655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6