GAGE and PATHVIEW packages
1
0
Entering edit mode
@christian-de-santis-6143
Last seen 10.2 years ago
Dear Luo and list, I am successfully using GAGE and pathview for my analyses and I like the package a lot. So, thanks for developing it. I have some points on which I would appreciate some help and/or clarification. AVERAGE VALUE - The first time I run the analysis with GAGE, I used an identical setup parameters as the example prepared by you in the manual. I have 8 replicates per treatment and I initially used unique column names for each sample (i.e. "DIET02_1, DIET02_2, DIET02_3, etc.) as per your example with HN and DCIS. However, I have discovered (following a casual mistake) that if instead of having a unique name samples are named with the treatments they belong (i.e. "DIET02" for all 8 replicates), the subsequent gage analysis it generates one single value for that treatment. By comparing the p values of both the above cases I have found that they are identical. Am I correct to assume that in the latter case every value assigned to the treatment are an average of the replicates? DUPLICATE PROBES - My array has got several duplicate or triplicate probes which are correctly annotated with the same KO number. How are these probes handled by the gage analysis? For example, if I have three probes for my gene X which are annotated with the same KO number, are these going to be counted 3 times into the "set size"? Or are the values for that KO number going to be merged into one? "COMPARE" argument of "gage" function - My experiment consists of 5 treatments (x 8 replicates). None of the treatments is a proper "control". Is it correct if I use as an argument "1ongroup" choosing one of the treatment as a ref? I have also tried the "as.group" option but when I look at the results I do not get a comparison of the chosen reference with the remaining groups, but instead one single value named "exp1". I have also tried "paired" which gives completely different results. HEATMAP OUTPUT of "esset.grp" function - Is there any quick way to generate an output heatmap (as for sigGeneSet) removing the redundant pathways identified with function "esset.grp"? At the moment I am doing this manually and plotting the results into heatmap.2 from gplot. Is this the only way? Any help on the above would be greatly appreciated. Regards. Christian De Santis -- The University of Stirling has been ranked in the top 12 of UK universities for graduate employment*. 94% of our 2012 graduates were in work and/or further study within six months of graduation. *The Telegraph The University of Stirling is a charity registered in Scotland, number SC 011159. [[alternative HTML version deleted]]
gage pathview gage pathview • 1.8k views
ADD COMMENT
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 17 months ago
United States
Hi Christian, Please see my point-to-point answers below. HTHs, Weijun -------------------------------------------- On Fri, 10/4/13, Christian De Santis <christian.desantis at="" stir.ac.uk=""> wrote: Subject: GAGE and PATHVIEW packages .org" <bioconductor at="" r-project.org=""> Date: Friday, October 4, 2013, 11:27 AM Dear Luo and list, ? > I am successfully using GAGE and pathview for my analyses and I like the package a lot. So, thanks for developing it.? I have some points on which I would appreciate some help and/or clarification. Thanks for the comments. > AVERAGE VALUE - The first time I run the analysis with GAGE, I used an identical setup parameters as the example prepared by you in the manual. I have 8 replicates per treatment and I initially used unique column names for each sample (i.e. ?DIET02_1, DIET02_2, DIET02_3, etc.) as per your example with HN and DCIS. However, I have discovered (following a casual mistake) that if instead of having a unique name samples are named with the treatments they belong (i.e. ?DIET02? for all 8 replicates), the subsequent gage analysis it generates one single value for that treatment. By comparing the p values of both the above cases I have found that they are identical. Am I correct to assume that in the latter case every value assigned to the treatment are an average of the replicates? It is the average, i.e. p-value is the genometric mean, while statistics is the mean of the columns with the same name. The average mechanism is there to accomdate special needs or mistakes, but it is not recommended to use the same name for replicate samples. ? > DUPLICATE PROBES ? My array has got several duplicate or triplicate probes which are correctly annotated with the same KO number. How are these probes handled by the gage analysis? For example, if I have three probes for my gene X which are annotated with the same KO number, are these going to be counted 3 times into the ?set size?? Or are the values for that KO number going to be merged into one? Duplicate probes will be count for multiple times, which is not good. Because gene set analysis like GAGE really assume one independent variable per gene. You may summarize over duplicate probes before feed into GAGE. You can check ?mol.sum in pathview package for that. ? > ?COMPARE? argument of ?gage? function ? My experiment consists of 5 treatments (x 8 replicates). None of the treatments is a proper ?control?. Is it correct if I use as an argument ?1ongroup? choosing one of the treatment as a ref? I have also tried the ?as.group? option but when I look at the results I do not get a comparison of the chosen reference with the remaining groups, but instead one single value named ?exp1?. I have also tried ?paired? which gives completely different results. If you set ref or samp other than NULL, GAGE assume it is a two state comparison. Compare argument may assume one value of 1ongrp, paired, unpaired, as.group based on needs. They are all for two state comparison, but to do it based on whether you samples are paired or not etc. If you want to do multiple state comparison/test, you should do before GAGE on each gene, then feed the single-column results into gage with ?ref = NULL, samp = NULL?. If you want to do a two-state comparison, you should specify a control state, either all 4 groups other than your inntersting group, or the median of all groups for each gene. ? > HEATMAP OUTPUT of ?esset.grp? function ? Is there any quick way to generate an output heatmap (as for sigGeneSet) removing the redundant pathways identified with function ?esset.grp?? At the moment I am doing this manually and plotting the results into heatmap.2 from gplot. Is this the only way? You can do this quickly using esset.grp+ sigGeneSet, assuming you follow the examples till you get gse16873.kegg.esg.up and gse16873.kegg.esg.dn: ess.sets=c(gse16873.kegg.esg.up$essentialSets, gse16873.kegg.esg.dn$essentialSets) gse16873.kegg.p.ess=lapply(gse16873.kegg.p, function(x) x[ess.sets,]) gse16873.kegg.sig.ess=sigGeneSet(gse16873.kegg.p.ess, outname="gse16873.kegg.ess") ? Any help on the above would be greatly appreciated. ? Regards. Christian De Santis ? ? ? ? ? ? The University of Stirling has been ranked in the top 12 of UK universities for graduate employment*. 94% of our 2012 graduates were in work and/or further study within six months of graduation. *The Telegraph The University of Stirling is a charity registered in Scotland, number SC 011159.
ADD COMMENT
0
Entering edit mode
Hi Weijun, Thanks for your prompt reply. It was very helpful to clarify my doubts, although it generated one more. "mol.sum" it is an excellent function, thanks for pointing it out. The default sum.method for this function is "sum". I am not sure what "sum" is exactly computing (and being a novice I have difficulties to look at the code directly), but I assume that it will return the sum of the intensities associated with replicates ID. The reason why I am asking is that I am using arrays with an unbalanced number of replicates probes (i.e. 3 for gene A, 6 for gene B, etc.). I have the feeling that the "sum" option would, in my case, put a greater weight on those pathways with core genes more present on the array (i.e. gene B). I tried two different methods to test my hypothesis, and by using "sum" I indeed got one of our target pathways called significant in the top 3, while it does not show up by using "mean" for example (most other pathways are consistent). I would appreciate if you could help me clarify this doubt and make a decision. Am I correct, based on the design of my arrays, to avoid choosing the method "sum"? This should solve most of my doubts about your packages for now. Thanks again very much for your help. Best regards, Christian -----Original Message----- From: Luo Weijun [mailto:luo_weijun@yahoo.com] Sent: 07 October 2013 01:11 To: Christian De Santis Cc: bioconductor at r-project.org Subject: Re: GAGE and PATHVIEW packages Hi Christian, Please see my point-to-point answers below. HTHs, Weijun -------------------------------------------- On Fri, 10/4/13, Christian De Santis <christian.desantis at="" stir.ac.uk=""> wrote: Subject: GAGE and PATHVIEW packages To: "luo_weijun at yahoo.com" <luo_weijun at="" yahoo.com="">, "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Date: Friday, October 4, 2013, 11:27 AM Dear Luo and list, ? > I am successfully using GAGE and pathview for my analyses and I like the package a lot. So, thanks for developing it.? I have some points on which I would appreciate some help and/or clarification. Thanks for the comments. > AVERAGE VALUE - The first time I run the analysis with GAGE, I used an identical setup parameters as the example prepared by you in the manual. I have 8 replicates per treatment and I initially used unique column names for each sample (i.e. ?DIET02_1, DIET02_2, DIET02_3, etc.) as per your example with HN and DCIS. However, I have discovered (following a casual mistake) that if instead of having a unique name samples are named with the treatments they belong (i.e. ?DIET02? for all 8 replicates), the subsequent gage analysis it generates one single value for that treatment. By comparing the p values of both the above cases I have found that they are identical. Am I correct to assume that in the latter case every value assigned to the treatment are an average of the replicates? It is the average, i.e. p-value is the genometric mean, while statistics is the mean of the columns with the same name. The average mechanism is there to accomdate special needs or mistakes, but it is not recommended to use the same name for replicate samples. ? > DUPLICATE PROBES ? My array has got several duplicate or triplicate probes which are correctly annotated with the same KO number. How are these probes handled by the gage analysis? For example, if I have three probes for my gene X which are annotated with the same KO number, are these going to be counted 3 times into the ?set size?? Or are the values for that KO number going to be merged into one? Duplicate probes will be count for multiple times, which is not good. Because gene set analysis like GAGE really assume one independent variable per gene. You may summarize over duplicate probes before feed into GAGE. You can check ?mol.sum in pathview package for that. ? > ?COMPARE? argument of ?gage? function ? My experiment consists of 5 treatments (x 8 replicates). None of the treatments is a proper ?control?. Is it correct if I use as an argument ?1ongroup? choosing one of the treatment as a ref? I have also tried the ?as.group? option but when I look at the results I do not get a comparison of the chosen reference with the remaining groups, but instead one single value named ?exp1?. I have also tried ?paired? which gives completely different results. If you set ref or samp other than NULL, GAGE assume it is a two state comparison. Compare argument may assume one value of 1ongrp, paired, unpaired, as.group based on needs. They are all for two state comparison, but to do it based on whether you samples are paired or not etc. If you want to do multiple state comparison/test, you should do before GAGE on each gene, then feed the single-column results into gage with ?ref = NULL, samp = NULL?. If you want to do a two-state comparison, you should specify a control state, either all 4 groups other than your inntersting group, or the median of all groups for each gene. ? > HEATMAP OUTPUT of ?esset.grp? function ? Is there any quick way to generate an output heatmap (as for sigGeneSet) removing the redundant pathways identified with function ?esset.grp?? At the moment I am doing this manually and plotting the results into heatmap.2 from gplot. Is this the only way? You can do this quickly using esset.grp+ sigGeneSet, assuming you follow the examples till you get gse16873.kegg.esg.up and gse16873.kegg.esg.dn: ess.sets=c(gse16873.kegg.esg.up$essentialSets, gse16873.kegg.esg.dn$essentialSets) gse16873.kegg.p.ess=lapply(gse16873.kegg.p, function(x) x[ess.sets,]) gse16873.kegg.sig.ess=sigGeneSet(gse16873.kegg.p.ess, outname="gse16873.kegg.ess") ? Any help on the above would be greatly appreciated. ? Regards. Christian De Santis ? ? ? ? ? ? The University of Stirling has been ranked in the top 12 of UK universities for graduate employment*. 94% of our 2012 graduates were in work and/or further study within six months of graduation. *The Telegraph The University of Stirling is a charity registered in Scotland, number SC 011159. -- The University of Stirling has been ranked in the top 12 of UK universities for graduate employment*. 94% of our 2012 graduates were in work and/or further study within six months of graduation. *The Telegraph The University of Stirling is a charity registered in Scotland, number SC 011159.
ADD REPLY
0
Entering edit mode
Hi Christian, mol.sum is written to combine or select multiple entries/probes of the same gene/molecule into one value. It should work on the differentially expressed data, i.e. fold changes or t-tests, rather than the original expression data. Because it select probes based on their variances. For your original expression data, you may follow a similar approach as mol.sum. I would recommend to use "max.abs" to probe set with the max variance as the representative of the gene. In gage package, we have a vignette named ?Gene set and data preparation? to address your issue in detail under the section of ?Probe set ID conversion?. The vignette is available at: http://biocon ductor.org/packages/2.13/bioc/vignettes/gage/inst/doc/dataPrep.pdf Weijun -------------------------------------------- On Mon, 10/7/13, Christian De Santis <christian.desantis at="" stir.ac.uk=""> wrote: Subject: RE: GAGE and PATHVIEW packages Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Date: Monday, October 7, 2013, 7:49 AM Hi Weijun, Thanks for your prompt reply. It was very helpful to clarify my doubts, although it generated one more. "mol.sum" it is an excellent function, thanks for pointing it out. The default sum.method for this function is "sum". I am not sure what "sum" is exactly computing (and being a novice I have difficulties to look at the code directly), but I assume that it will return the sum of? the intensities associated with replicates ID. The reason why I am asking is that I am using arrays with an unbalanced number of replicates probes (i.e. 3 for gene A, 6 for gene B, etc.). I have the feeling that the "sum" option would, in my case, put a greater weight on those pathways with core genes more present on the array (i.e. gene B). I tried two different methods to test my hypothesis, and by using "sum" I indeed got one of our target pathways called significant in the top 3, while it does not show up by using "mean" for example (most other pathways are consistent). I would appreciate if you could help me clarify this doubt and make a decision. Am I correct, based on the design of my arrays, to avoid choosing the method "sum"? This should solve most of my doubts about your packages for now. Thanks again very much for your help. Best regards, Christian -----Original Message----- Sent: 07 October 2013 01:11 To: Christian De Santis Cc: bioconductor at r-project.org Subject: Re: GAGE and PATHVIEW packages Hi Christian, Please see my point-to-point answers below. HTHs, Weijun -------------------------------------------- On Fri, 10/4/13, Christian De Santis <christian.desantis at="" stir.ac.uk=""> wrote: Subject: GAGE and PATHVIEW packages "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Date: Friday, October 4, 2013, 11:27 AM Dear Luo and list, ? > I am successfully using GAGE and pathview for my analyses and I like the package a lot. So, thanks for? developing it.? I have some points on which I would? appreciate some help and/or clarification. Thanks for the comments. > AVERAGE VALUE - The first time I run the analysis with GAGE, I used an identical setup parameters as the example? prepared by you in the manual. I have 8 replicates per? treatment and I initially used unique column names for each? sample (i.e. ?DIET02_1,? DIET02_2, DIET02_3, etc.) as per your example with HN and? DCIS. However, I have discovered (following a casual mistake) that if instead of having a unique name samples are? named with the treatments they belong (i.e. ?DIET02? for all 8 replicates), the subsequent? gage analysis it generates one single value for that? treatment. By comparing the p values of both the above cases? I have found that they are identical. Am I correct to assume? that in the latter case every value assigned to the? treatment are an average of the? replicates? It is the average, i.e. p-value is the genometric mean, while statistics is the mean of the columns with the same name. The average mechanism is there to accomdate special needs or mistakes, but it is not recommended to use the same name for replicate samples. ? > DUPLICATE PROBES ? My array has got several duplicate or triplicate probes which are correctly annotated? with the same KO number. How are these probes handled by the? gage analysis? For example, if I have three probes for my? gene X which are annotated with? the same KO number, are these going to be counted 3 times? into the ?set size?? Or are the values for that? KO number going to be merged into one? Duplicate probes will be count for multiple times, which is not good. Because gene set analysis like GAGE really assume one independent variable per gene. You may summarize over duplicate probes before feed into GAGE. You can check ?mol.sum in pathview package for that. ? > ?COMPARE? argument of ?gage? function ? My experiment consists of 5 treatments (x 8? replicates). None of the treatments is a proper? ?control?. Is it correct if I use as an argument? ?1ongroup? choosing one of the treatment as a? ref? I have also tried the? ?as.group? option but when I look at the results? I do not get a comparison of the chosen reference with the? remaining groups, but instead one single value named? ?exp1?. I have also tried ?paired? which gives completely different results. If you set ref or samp other than NULL, GAGE assume it is a two state comparison. Compare argument may assume one value of 1ongrp, paired, unpaired, as.group based on needs. They are all for two state comparison, but to do it based on whether you samples are paired or not etc. If you want to do multiple state comparison/test, you should do before GAGE on each gene, then feed the single-column results into gage with ?ref = NULL, samp = NULL?. If you want to do a two-state comparison, you should specify a control state, either all 4 groups other than your inntersting group, or the median of all groups for each gene. ? > HEATMAP OUTPUT of ?esset.grp? function ? Is there any quick way to generate an output heatmap? (as for sigGeneSet) removing the redundant pathways? identified with function ?esset.grp?? At the? moment I am doing this manually and plotting the results? into heatmap.2 from gplot. Is this the only way? You can do this quickly using esset.grp+ sigGeneSet, assuming you follow the examples till you get gse16873.kegg.esg.up and gse16873.kegg.esg.dn: ess.sets=c(gse16873.kegg.esg.up$essentialSets, gse16873.kegg.esg.dn$essentialSets) gse16873.kegg.p.ess=lapply(gse16873.kegg.p, function(x) x[ess.sets,]) gse16873.kegg.sig.ess=sigGeneSet(gse16873.kegg.p.ess, outname="gse16873.kegg.ess") ? ? Any help on the above would be greatly? appreciated. ? Regards. Christian De Santis ? ? ? ? ? ? The University of Stirling has been ranked in the top 12 of UK universities? for graduate employment*. 94% of our 2012 graduates were in work and/or further study within? six months of graduation. *The Telegraph The University of Stirling is a charity registered in Scotland, number SC? 011159. -- The University of Stirling has been ranked in the top 12 of UK universities for graduate employment*. 94% of our 2012 graduates were in work and/or further study within six months of graduation. *The Telegraph The University of Stirling is a charity registered in Scotland, number SC 011159.
ADD REPLY

Login before adding your answer.

Traffic: 1320 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6