probe expression profile to gene expression profile

0

Entering edit mode

Weiwei Shi ★ 1.2k

@weiwei-shi-1407

Last seen 10.6 years ago

Dear All: Here is a general question and I apologize if it is a little bit off topic (but I believe bioconductor must have some solution for that.) Is there a guideline or good tool to get "gene" expression profile from "probe" expression profile? In this process, I am concerned that such tool or guide should address the issues of "multiple probes to one gene" and "one probe to multiple genes". I believe it is a non-trivial process and automation of this process might not be easy: for example, for the former issue, how do you get an "average" expression from multiple probles for one gene? for the latter, which gene do you believe is the "right" one for the probe. Any recommendation is appreciated ! -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

probe PROcess probe PROcess • 1.1k views

ADD COMMENT • link updated 18.1 years ago by Sean Davis 21k • written 18.1 years ago by Weiwei Shi ★ 1.2k

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 9 weeks ago

United States

Weiwei Shi wrote: > Dear All: > > Here is a general question and I apologize if it is a little bit off > topic (but I believe bioconductor must have some solution for that.) > > Is there a guideline or good tool to get "gene" expression profile > from "probe" expression profile? In this process, I am concerned that > such tool or guide should address the issues of "multiple probes to > one gene" and "one probe to multiple genes". > > Don't deal with the first case. Do all of your analyses at the probe level. There probably is not a safe, totally general way to aggregate probes in a gene expression context. Instead, do you differential expression testing and then map probes to genes for downstream processing (looking up in Pubmed, etc). The second case can't be dealt with appropriately without knowing why one probe maps to multiple genes. In general, unless you do your own annotation (using blast, for example), it will be difficult to make a call in the general case. However, in some cases, the answer is "obvious". In the case you emailed about earlier today (one probe hitting 3 genes), it was fairly obvious what the answer was, since one of the genes was a "Refseq" gene while the other two were simply computationally predicted genes. The most important point is to know what sources of annotation are being used, what their limitations are, and how they relate to other sources of annotation--this knowledge is often not easy to come by, but is invaluable. > I believe it is a non-trivial process and automation of this process > might not be easy: > Automation really isn't possible, since there is not a general solution to every case. My rule of thumb is to maintain as much information as possible throughout the process of data analysis and then do some biologic knowledge curation when the gene lists are in. Unfortunately, there really isn't a fantastic substitute for this last step. Just my two-cents worth. Sean

ADD COMMENT • link 18.1 years ago Sean Davis 21k

0

Entering edit mode

To add to Sean's comments, in general probe sets should be considered as independent entities (not necessarily as multiple/replicate measurements of the same entity, i.e. the underlying gene). So the question of which probeset-to-gene map should be used is rather ill posed. The answer will generally depend on the objective of the study. For example, if the objective is to develop a predictive (classification) model, probe sets are the independent predictors and the question of gene- average expression is not really relevant. As another example, if the objective is to compare the reproducibility of gene expression between two or more platforms, then it is imperative to match data at the probe set level to allow for a meaningful evaluation. Different probe sets map to different parts of the gene and thus tend to behave independently, in many cases driven by allelic effects in the study population. Finally, if the objective is to understand the biology behind differentially expressed genes, then it is important to first double-check the validity of the "official" probe to gene mappings. Then spend some time to try to understand the implications of the relative position of the probe set on the gene sequence. The following two articles are informative in this respect: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&d opt=Ab stractPlus&list_uids=16284200&query_hl=15&itool=pubmed_docsum http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&d opt=Ab stractPlus&list_uids=17224057&query_hl=13&itool=pubmed_docsum So I would argue that this is more of a biology problem rather than a bioinformatics problem and thus not amenable to an automated solution. -Christos Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830 www.nuverabio.com > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of > Sean Davis > Sent: Monday, April 02, 2007 2:24 PM > To: Weiwei Shi > Cc: bioconductor > Subject: Re: [BioC] probe expression profile to gene > expression profile > > Weiwei Shi wrote: > > Dear All: > > > > Here is a general question and I apologize if it is a > little bit off > > topic (but I believe bioconductor must have some solution for that.) > > > > Is there a guideline or good tool to get "gene" expression profile > > from "probe" expression profile? In this process, I am > concerned that > > such tool or guide should address the issues of "multiple probes to > > one gene" and "one probe to multiple genes". > > > > > Don't deal with the first case. Do all of your analyses at > the probe level. There probably is not a safe, totally > general way to aggregate probes in a gene expression context. > Instead, do you differential expression testing and then map > probes to genes for downstream processing (looking up in > Pubmed, etc). > > The second case can't be dealt with appropriately without > knowing why one probe maps to multiple genes. In general, > unless you do your own annotation (using blast, for example), > it will be difficult to make a call in the general case. > However, in some cases, the answer is "obvious". In the case > you emailed about earlier today (one probe hitting 3 genes), > it was fairly obvious what the answer was, since one of the > genes was a "Refseq" gene while the other two were simply > computationally predicted genes. The most important point is > to know what sources of annotation are being used, what their > limitations are, and how they relate to other sources of > annotation--this knowledge is often not easy to come by, but > is invaluable. > > > I believe it is a non-trivial process and automation of > this process > > might not be easy: > > > Automation really isn't possible, since there is not a > general solution to every case. My rule of thumb is to > maintain as much information as possible throughout the > process of data analysis and then do some biologic knowledge > curation when the gene lists are in. Unfortunately, there > really isn't a fantastic substitute for this last step. > > Just my two-cents worth. > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 18.1 years ago Christos Hatzis ▴ 90

0

Entering edit mode

Hi, there: I think my first email was asking more about guidelines or generally what people deal with probe2gene issue instead of for fully automation (I mentioned "not easy"). But the discussion somehow becomes at what stage we should do probe2gene or whether we should for some objectives of study. I agree in theory that analysis at probe level can keep info and avoid early aggregation of info at gene level. However, at some point, you still need to perform further analysis at gene or pathway level to find the biological significance behind if your objective of study is. Then the question is, is analysis like differential testing at probe level safe then? (b/c some probes have been removed from this step, for example). It is like "maximum pick" instead of "average pick". Moreover, probes (mapped to one gene) are supposed to be highly correlated. Highly correlated predictors are not desired in supervised learning process, IMO. Again, in theory, I agree to check manually instead of automatically to make sure of each biological validity and the problem is more like a biological one instead of bioinformatics one. However again :), in practice, it might not be feasible for high-throughput technology, which IMHO, allows some high-level noises or errors, but gives people more statistical significance. Just my2cents, Weiwei On 4/2/07, Christos Hatzis <christos at="" nuverabio.com=""> wrote: > To add to Sean's comments, in general probe sets should be considered as > independent entities (not necessarily as multiple/replicate measurements of > the same entity, i.e. the underlying gene). So the question of which > probeset-to-gene map should be used is rather ill posed. > > The answer will generally depend on the objective of the study. For > example, if the objective is to develop a predictive (classification) model, > probe sets are the independent predictors and the question of gene- average > expression is not really relevant. As another example, if the objective is > to compare the reproducibility of gene expression between two or more > platforms, then it is imperative to match data at the probe set level to > allow for a meaningful evaluation. Different probe sets map to different > parts of the gene and thus tend to behave independently, in many cases > driven by allelic effects in the study population. > > Finally, if the objective is to understand the biology behind differentially > expressed genes, then it is important to first double-check the validity of > the "official" probe to gene mappings. Then spend some time to try to > understand the implications of the relative position of the probe set on the > gene sequence. > > The following two articles are informative in this respect: > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve &dopt=Ab > stractPlus&list_uids=16284200&query_hl=15&itool=pubmed_docsum > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve &dopt=Ab > stractPlus&list_uids=17224057&query_hl=13&itool=pubmed_docsum > > > So I would argue that this is more of a biology problem rather than a > bioinformatics problem and thus not amenable to an automated solution. > > -Christos > > Christos Hatzis, Ph.D. > Nuvera Biosciences, Inc. > 400 West Cummings Park > Suite 5350 > Woburn, MA 01801 > Tel: 781-938-3830 > www.nuverabio.com > > > > > -----Original Message----- > > From: bioconductor-bounces at stat.math.ethz.ch > > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of > > Sean Davis > > Sent: Monday, April 02, 2007 2:24 PM > > To: Weiwei Shi > > Cc: bioconductor > > Subject: Re: [BioC] probe expression profile to gene > > expression profile > > > > Weiwei Shi wrote: > > > Dear All: > > > > > > Here is a general question and I apologize if it is a > > little bit off > > > topic (but I believe bioconductor must have some solution for that.) > > > > > > Is there a guideline or good tool to get "gene" expression profile > > > from "probe" expression profile? In this process, I am > > concerned that > > > such tool or guide should address the issues of "multiple probes to > > > one gene" and "one probe to multiple genes". > > > > > > > > Don't deal with the first case. Do all of your analyses at > > the probe level. There probably is not a safe, totally > > general way to aggregate probes in a gene expression context. > > Instead, do you differential expression testing and then map > > probes to genes for downstream processing (looking up in > > Pubmed, etc). > > > > The second case can't be dealt with appropriately without > > knowing why one probe maps to multiple genes. In general, > > unless you do your own annotation (using blast, for example), > > it will be difficult to make a call in the general case. > > However, in some cases, the answer is "obvious". In the case > > you emailed about earlier today (one probe hitting 3 genes), > > it was fairly obvious what the answer was, since one of the > > genes was a "Refseq" gene while the other two were simply > > computationally predicted genes. The most important point is > > to know what sources of annotation are being used, what their > > limitations are, and how they relate to other sources of > > annotation--this knowledge is often not easy to come by, but > > is invaluable. > > > > > I believe it is a non-trivial process and automation of > > this process > > > might not be easy: > > > > > Automation really isn't possible, since there is not a > > general solution to every case. My rule of thumb is to > > maintain as much information as possible throughout the > > process of data analysis and then do some biologic knowledge > > curation when the gene lists are in. Unfortunately, there > > really isn't a fantastic substitute for this last step. > > > > Just my two-cents worth. > > > > Sean > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

ADD REPLY • link 18.1 years ago Weiwei Shi ★ 1.2k

Login before adding your answer.