"automatic association analysis"

0

Entering edit mode

Weiwei Shi ★ 1.2k

@weiwei-shi-1407

Last seen 10.6 years ago

Dear Listers: I have a question originated from pathway analysis: Suppose i have found a pathway which strongly associates with a disease from pathway analysis; my question is on how to validate this rule? I mean, is there any tool doing some automatic association analysis with scientific record like PubMed and it can give some evaluation on the strength of such association. thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

• 1.3k views

ADD COMMENT • link updated 18.7 years ago by Francois Pepin ▴ 60 • written 18.7 years ago by Weiwei Shi ★ 1.2k

0

Entering edit mode

Francois Pepin ▴ 60

@francois-pepin-1163

Last seen 10.6 years ago

Hi Weiwei, If you want to know if a given set of genes (ie members of the pathway) are behaving differently in a given set of arrays (ie your disease samples), there are a few ways. The basic way to do this would be to use an hypergeometric test (often used in the case of GO), although it can be tricky to get right and has a few other issues. There are other methods, such as the Gene Set Enrichment method in the Category package, that combine a set of t-tests together. Other packages like safe and sigPathway have different methods of doing the same thing. There was a discussion on this recently on the mailing list, you would probably want to look over it. As far as I can tell, all of those methods require that you have your pathway already defined. Some databases like KEGG or BioCarta have pathway definitions, but they're don't cover all pathways and few, if any, are up-to-date with the literature. If we really care about a given pathway, we'll go and create our own list ourselves from the database. It is important in such a case to create the list before you've started looking at the differentially expressed genes, because you would be biasing your results. Of course, it is always good to be able to explain your results a biologically afterward, but this is not the same as showing a statistically significant correlation with a pathway. Hope this helps, Francois On Thu, 2006-08-24 at 18:57 -0400, Weiwei Shi wrote: > Dear Listers: > > I have a question originated from pathway analysis: > > Suppose i have found a pathway which strongly associates with a > disease from pathway analysis; my question is on how to validate this > rule? I mean, is there any tool doing some automatic association > analysis with scientific record like PubMed and it can give some > evaluation on the strength of such association. > > thanks. >

ADD COMMENT • link 18.7 years ago Francois Pepin ▴ 60

0

Entering edit mode

Francois Pepin ★ 1.3k

@francois-pepin-1012

Last seen 10.6 years ago

Hi Weiwei, If you want to know if a given set of genes (ie members of the pathway) are behaving differently in a given set of arrays (ie your disease samples), there are a few ways. The basic way to do this would be to use an hypergeometric test (often used in the case of GO), although it can be tricky to get right and has a few other issues. There are other methods, such as the Gene Set Enrichment method in the Category package, that combine a set of t-tests together. Other packages like safe and sigPathway have different methods of doing the same thing. There was a discussion on this recently on the mailing list, you would probably want to look over it. As far as I can tell, all of those methods require that you have your pathway already defined. Some databases like KEGG or BioCarta have pathway definitions, but they're don't cover all pathways and few, if any, are up-to-date with the literature. If we really care about a given pathway, we'll go and create our own list ourselves from the database. It is important in such a case to create the list before you've started looking at the differentially expressed genes, because you would be biasing your results. Of course, it is always good to be able to explain your results a biologically afterward, but this is not the same as showing a statistically significant correlation with a pathway. Hope this helps, Francois On Thu, 2006-08-24 at 18:57 -0400, Weiwei Shi wrote: > Dear Listers: > > I have a question originated from pathway analysis: > > Suppose i have found a pathway which strongly associates with a > disease from pathway analysis; my question is on how to validate this > rule? I mean, is there any tool doing some automatic association > analysis with scientific record like PubMed and it can give some > evaluation on the strength of such association. > > thanks. >

ADD COMMENT • link 18.7 years ago Francois Pepin ★ 1.3k

0

Entering edit mode

Weiwei Shi ★ 1.2k

@weiwei-shi-1407

Last seen 10.6 years ago

Hi, Francois and other listers: Thank you for the detailed reply. Actually, I read those papers on GO enrichment analysis or Gene Set one. There are basically two approaches in stat: baysian or frequentist. The latter could use hypergeometric or t test to derive some p-values. Currently I am using BayGO (implemented in R) which is based on the baysian inference and have some interesting results on a dataset about psoriasis. My initial question is about how to automatic "validate" or "test" the result I get from whatever methods i use, like text mining or something like that. But you mentioned that "The basic way to do this would be to use an hypergeometric test (often used in the case of GO), although it can be tricky to get right and has a few other issues.", which reminds of another question on it: how do u define the "success events" in hypergeometric test? and how do you make sure the sampling has no bias when you pick genes in your study? I will go to find by myself but maybe someone here would like to give me some suggestions too. As to the pathway, I am using GeneGO's internal Metabase. Thank you, Weiwei On 8/25/06, Francois Pepin <fpepin at="" aei.ca=""> wrote: > Hi Weiwei, > > If you want to know if a given set of genes (ie members of the pathway) > are behaving differently in a given set of arrays (ie your disease > samples), there are a few ways. The basic way to do this would be to use > an hypergeometric test (often used in the case of GO), although it can > be tricky to get right and has a few other issues. > > There are other methods, such as the Gene Set Enrichment method in the > Category package, that combine a set of t-tests together. Other packages > like safe and sigPathway have different methods of doing the same thing. > There was a discussion on this recently on the mailing list, you would > probably want to look over it. > > As far as I can tell, all of those methods require that you have your > pathway already defined. Some databases like KEGG or BioCarta have > pathway definitions, but they're don't cover all pathways and few, if > any, are up-to-date with the literature. > > If we really care about a given pathway, we'll go and create our own > list ourselves from the database. It is important in such a case to > create the list before you've started looking at the differentially > expressed genes, because you would be biasing your results. Of course, > it is always good to be able to explain your results a biologically > afterward, but this is not the same as showing a statistically > significant correlation with a pathway. > > Hope this helps, > > Francois > > On Thu, 2006-08-24 at 18:57 -0400, Weiwei Shi wrote: > > Dear Listers: > > > > I have a question originated from pathway analysis: > > > > Suppose i have found a pathway which strongly associates with a > > disease from pathway analysis; my question is on how to validate this > > rule? I mean, is there any tool doing some automatic association > > analysis with scientific record like PubMed and it can give some > > evaluation on the strength of such association. > > > > thanks. > > > > -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

ADD COMMENT • link 18.7 years ago Weiwei Shi ★ 1.2k

0

Entering edit mode

Hi Weiwei > My initial question is about > how to automatic "validate" or "test" the result I get from whatever > methods i use, like text mining or something like that. I think some packages may exist, but we do that by hand. Once we're pointed to a specific pathway, we prefer to let humans handle the rest. > how do u define the "success events" in hypergeometric test? and how > do you make sure the sampling has no bias when you pick genes in your > study? That's one of the tricky issues. People usually use differentially expressed genes, but putting a threshold there isn't obvious. One of the reasons some people do not like it (and I'm starting to feel the same way) is that the values are very continuous such that changing the threshold by a hair changes your set of genes (often changing your results significantly. I'm not sure what you mean about the sampling bias. If you filter in an unbiased way and set your universe to be what is available on the chip you should be ok. You should also deal with duplicate probes (if any) and duplicate probes per genes (if any). Again the archives have a couple of fairly detailed discussions on those issues. Francois

ADD REPLY • link 18.7 years ago Francois Pepin ★ 1.3k

0

Entering edit mode

Hi Weiwei and Francois, If my understanding is correct, you worried about false positive results, don't you. If that is the case we usually use Benjamili & Hochberg fdr to correct raw p-values which have been obtained with hypergeometirc test for GO analysis. We do that manually in R/Bioconductor or even in Microsoft Excel. cheers! Jiaping --On Friday, August 25, 2006 2:09 PM -0400 Francois Pepin <fpepin at="" cs.mcgill.ca=""> wrote: > Hi Weiwei > >> My initial question is about >> how to automatic "validate" or "test" the result I get from whatever >> methods i use, like text mining or something like that. > > I think some packages may exist, but we do that by hand. Once we're > pointed to a specific pathway, we prefer to let humans handle the rest. > >> how do u define the "success events" in hypergeometric test? and how >> do you make sure the sampling has no bias when you pick genes in your >> study? > > That's one of the tricky issues. People usually use differentially > expressed genes, but putting a threshold there isn't obvious. One of the > reasons some people do not like it (and I'm starting to feel the same > way) is that the values are very continuous such that changing the > threshold by a hair changes your set of genes (often changing your > results significantly. > > I'm not sure what you mean about the sampling bias. If you filter in an > unbiased way and set your universe to be what is available on the chip > you should be ok. You should also deal with duplicate probes (if any) > and duplicate probes per genes (if any). Again the archives have a > couple of fairly detailed discussions on those issues. > > Francois > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor ################################## Jianping Jin Ph.D. Bioinformatics scientist Center for Bioinformatics Room 3133 Bioinformatics building CB# 7104 University of Chapel Hill Chapel Hill, NC 27599 Phone: (919)843-6105 FAX: (919)843-3103 E-Mail: jjin at email.unc.edu

ADD REPLY • link 18.7 years ago Jianping Jin ▴ 890

Login before adding your answer.