Hi,
I went through the archive for a while and still did not find the good
answer for that. Sorry for the re-post :(
suppose i have some probes for the same gene, I am wondering which is
the proper way to get a statistic for the expression for this gene?
using mean, median or max or min? I think it might be affected by the
research target but I wondering if there is some ref on it.
btw, is there some ref on the data pre-processing (gene selection,
multiple comparison, better with case study) for microarray analysis
other than bioconductor book?
thanks
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
On Friday 10 November 2006 13:01, Weiwei Shi wrote:
> suppose i have some probes for the same gene, I am wondering which
is
> the proper way to get a statistic for the expression for this gene?
> using mean, median or max or min? I think it might be affected by
the
> research target but I wondering if there is some ref on it.
Hi, Weiwei.
There is not a standard, no. The answer does probably depend on the
research
question (as you suggest), the type of data, quality metrics, and
probably
other factors. Some of those "other factors", such as cross-
hybridization,
are not readily available in every case without some additional work.
> btw, is there some ref on the data pre-processing (gene selection,
> multiple comparison, better with case study) for microarray analysis
> other than bioconductor book?
The microarray literature is relatively large, but there are many
papers on
all of the aspects that you mention above. A good place to start if
you have
the bioconductor book is with the references therein.
Sean
Hi Weiwei,
I'm pretty sure there's been some discussion on this not too long ago,
but I can't recall off the top of my mind what the subject line was.
The standard answer is that it depends on why you might have different
probes for the gene and what you would expect from them.
In many cases, there are several probes because they give different
results (else they wouldn't waste the space). The canonical example
for
this would a splice variant or using an alternative poly-A site.
Depending on your amplification protocol, you might also be more
sensitive to the distance of the probe from the poly-A site as well.
If you have reason to believe that all probes should give the same
result then using the average or median would make sense. This happens
if you have the exact same probe on different places on the array.
Otherwise, you might want to take the most interesting probe and say
it
represents the whole gene. How you define the most interesting probe
can
vary. You can use the interquartile range or it could be the one
giving
you the most differential expression. The most interesting probe might
change from an experiment to the next (if we're talking about splice
variants for example).
Another option is to keep them all around. I tend to prefer this
option
if I'm not running statistical tests that depend on having a single
measurement per gene (GO and pathway analyses are the main example
that
come to mind). That whichever probe is works well will come up and if
several of them show up, you can believe that result some more.
As Sean mentioned there is an extensive literature on those subject.
Francois
On Fri, 2006-11-10 at 13:01 -0500, Weiwei Shi wrote:
> Hi,
> I went through the archive for a while and still did not find the
good
> answer for that. Sorry for the re-post :(
>
> suppose i have some probes for the same gene, I am wondering which
is
> the proper way to get a statistic for the expression for this gene?
> using mean, median or max or min? I think it might be affected by
the
> research target but I wondering if there is some ref on it.
>
> btw, is there some ref on the data pre-processing (gene selection,
> multiple comparison, better with case study) for microarray analysis
> other than bioconductor book?
>
> thanks
>
Hi.
On 11/11/06, Weiwei Shi <helprhelp at="" gmail.com=""> wrote:
> Hi,
> I went through the archive for a while and still did not find the
good
> answer for that. Sorry for the re-post :(
>
> suppose i have some probes for the same gene, I am wondering which
is
> the proper way to get a statistic for the expression for this gene?
> using mean, median or max or min? I think it might be affected by
the
> research target but I wondering if there is some ref on it.
Are we talking about finding a function summarizing the probe
intensities in a probeset (as in Affymetrix arrays) to a single value?
For 3' expression arrays there is plenty of algorithms/publications,
e.g. the single-chip model MAS 5.0, multi-chip models MBEI (dChip) and
RMA. The following article lists many more with references:
Irizarry, R.A.; Wu, Z. & Jaffee, H.A. Comparison of Affymetrix
GeneChip expression measures. Bioinformatics, 2006, 22, 789-794
Best
Henrik
>
> btw, is there some ref on the data pre-processing (gene selection,
> multiple comparison, better with case study) for microarray analysis
> other than bioconductor book?
>
> thanks
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>