Dear List,
in light of the recent discussion on multiple testing in GO analysis,
a
project we recently finished may be of interest to some members of the
list:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&d
opt=Abstract&list_uids=15254011&itool=iconabstr
We took a pragmatic approach of carrying out the analysis 1000 times
with randomized data to estimate statistical significance of our
results, which have to do with a genome-wide analysis of 5' CpG Island
genes.
For the purposes of microarray analysis however, my 2c is that there
is
no substitute for a biologically trained eyeball-o-metric analysis,
since enrichment of relatively specific terms with smaller numbers of
genes (that may not be statistically significant after multiple-
testing
corrections) certainly may suggest biological meaning.
-Peter
-----Original Message-----
Date: Thu, 5 Aug 2004 21:01:01 +0200
Subject: Re: [BioC] GOStat and multiple testing
From: rossini@blindglobe.net (A.J. Rossini)
To: Jeremy Gollub <jgollub@genome.stanford.edu>
Correct. And your work continues to confirm the second issue in
general, which is nice.
But it's the first that is particularly nasty to create a reasonable
solution for. I'd really like to see one!
best,
-tony
Jeremy Gollub <jgollub@genome.stanford.edu> writes:
> That's certainly true. We decided to be pragmatic.
>
> If I've understood the problem correctly, there are two major
problems
> with determining the significance of a GO annotation. First is the
lack
> of independence in the DAG (directed acyclic graph) structure.
> Bootstrapping won't fix that. Second, though, is the problem that,
for a
> small group of test genes at least, any GO term that comes up at all
will
> appear ridiculously significant when using a hypergeometric test.
What we
> found is that FDR calculations seem to deal with this second issue
better
> than a FWER correction.
>
> --
> Jeremy Gollub, Ph.D.
> jgollub@genome.stanford.edu
> (W) 650/736-0075
>
> On Thu, 5 Aug 2004, A.J. Rossini wrote:
>
>>
>> It (FDR by bootstrapping) doesn't solve the basic problem with lack
of
>> independence, which makes it useful but wrong, or just wrong,
>> depending on how pragmatic you want to be.
>>
>>
>> Jeremy Gollub <jgollub@genome.stanford.edu> writes:
>>
>> > Correcting p-values for multiple hypothesis testing in GO
analysis
is a
>> > hard problem conceptually. I'm not aware of any general
solution.
>> >
>> > In a recently-published set of Perl modules for GO term analysis,
>> >
>> >
http://bioinformatics.oupjournals.org/cgi/content/abstract/bth456v1
>> >
>> > we support False Discovery Rate calculations (based on
permutations
of
>> > results) as a substitute. It's probably not perfect, but
according
to our
>> > simulations it's better than either uncorrected p-values or a
simple
>> > correction (e.g., Bonferroni).
>> >
>> > Our software uses a hypergeometric test on a list of selected
genes.
>> > Another approach would be to calculate a p-value (e.g., by Cox
>> > regression) for all genes on a microarray, and test the
significance of
>> > each GO term using Fisher meta-analysis. (I'm sure I've seen a
>> > refererence to that approach, but can't recall it now.)
>> >
>> > --
>> > Jeremy Gollub, Ph.D.
>> > jgollub@genome.stanford.edu
>> > (W) 650/736-0075
>> >
>> > On Thu, 5 Aug 2004, Robert Gentleman wrote:
>> >
>> >> On Wed, Aug 04, 2004 at 01:06:30PM +0200,
Arne.Muller@aventis.com
wrote:
>> >> > Hello,
>> >> >
>> >> > I was wondering if one needs to correct the p-values from the
hypergeometirx test from GOstat for mutliple testing, since one
performs
many tests (over all GO categories found in the gene list). I'm not
sure
if correction for multiple testing makse sense since the GO terms are
highly dependent (terms on the same branch + one gene is annotated in
several terms).
>> >> >
>> >> > Robert Gentleman mentiones in the GOstats documentation that
the
multiple testing issue is not solved yet? I assume GOHyperG does not
perform any kind of multiple testing correction, is this right?
>> >>
>> >> Hi,
>> >> it does not, and I am unaware of any general solution to the
>> >> problem of adjusting p-values here. The structure of GO is
such
that
>> >> there are issues due to lack of independence. There are some
other
>> >> problems, but I have not had time to write up my ideas yet.
>> >> I have to say that I am also not so convinced that this is
>> >> the best way to do things (classifying genes as interesting or
not,
>> >> and then doing the hypergeometric test), although I have yet
to
come
>> >> up with a better way. I agree with those that have suggested
that
>> >> this is best used as a rough guide to interesting categories
(others
>> >> projects seem have different opinions, and I think some do use
some
>> >> sort of p-value correction).
>> >>
>> >> Robert
>> >>
>> >> >
>> >> > I'd be happy to receive comments on this and to heare about
your
experience.
>> >> >
>> >> > kind regards,
>> >> >
>> >> > Arne
>> >> >
>> >> > _______________________________________________
>> >> > Bioconductor mailing list
>> >> > Bioconductor@stat.math.ethz.ch
>> >> >
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>> >>
>> >> --
>> >>
+---------------------------------------------------------------------
------+
>> >> | Robert Gentleman phone : (617) 632-5250
|
>> >> | Associate Professor fax: (617) 632-2444
|
>> >> | Department of Biostatistics office: M1B20
|
>> >> | Harvard School of Public Health email:
rgentlem@jimmy.harvard.edu |
>> >>
+---------------------------------------------------------------------
------+
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor@stat.math.ethz.ch
>> >>
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>> >>
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor@stat.math.ethz.ch
>> >
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>> >
>>
>> --
>> Anthony Rossini Research Associate
Professor
>> rossini@u.washington.edu
http://www.analytics.washington.edu/
>> Biomedical and Health Informatics University of Washington
>> Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research
Center
>> UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is
unreliable
>> FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email
>>
>> CONFIDENTIALITY NOTICE: This e-mail message and any attachments may
be
>> confidential and privileged. If you received this message in error,
>> please destroy it and notify the sender. Thank you.
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
>
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
--
Anthony Rossini Research Associate Professor
rossini@u.washington.edu
http://www.analytics.washington.edu/
Biomedical and Health Informatics University of Washington
Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research
Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email
CONFIDENTIALITY NOTICE: This e-mail message and any\
attachm...{{dropped}}