Question

geneSetTest() / GESA

0

Entering edit mode

Simon Lin ▴ 270

@simon-lin-1272

Last seen 10.3 years ago

Dear Gordon, Is the geneSetTest() fast to calculate? Not sure if you used permutation test under the hood. For GSEA and GSA, sometimes we see artifacts when the size of the set is too small. Is the same true for geneSetTest? Thanks! Simon Date: Sun, 04 Mar 2007 18:51:00 +1100 From: Gordon Smyth <smyth@wehi.edu.au> Subject: [BioC] GSEA with one class metaanalysis To: Mark W Kimpel <mwkimpel at="" gmail.com=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <6.2.5.6.1.20070304184303.0242d7a0 at wehi.edu.au> Content-Type: text/plain; charset="us-ascii"; format=flowed Dear Mark, If I understand your problem correctly, neither GSEA nor GSA will accomodate it. The only option I know of is geneSetTest() in the limma package. This generally works well, although it will give you someone over optimistic p-values if there are strong positive correlations between the genes in your gene sets. Best wishes Gordon

• 1.3k views

ADD COMMENT • link updated 17.8 years ago by Gordon Smyth 52k • written 17.8 years ago by Simon Lin ▴ 270

score 0 · Answer 1 · 2007-03-06

Dear Simon, geneSetTest() is very fast if you use the default settings. In that case it's a closed form calculation. It's intended to use with individual gene sets and has no problem with small gene sets. It's usable down to size=1. GSEA and especially GSA are very sophisticated methods which use permutation over arrays as well as standardization over genes to control for possible dependence between the genes in the test set. I'm not an expert on either method, but they seem intended for two-sample situations with at least half a dozen arrays in each group, many gene sets, and many genes in each set. geneSetTest() is a far simpler (hence more flexible) approach which is aimed at a class of problems that we see regularly at the WEHI. Here the aim is to relate a gene ranking, usually achieved by fitting a linear model, to a prior set of genes of special interest. It's based on permuting the genes, not the arrays. The default method is simply a Wilcoxon test using the ranks of the genes. The caveat of geneSetTest() is that significance can arise theoretically from high correlations between genes in the test set rather than a shift in the mean, so this possibility should ideally be checked or ruled out separately. Best wishes Gordon At 10:00 PM 5/03/2007, bioconductor-request at stat.math.ethz.ch wrote: >Date: Sun, 4 Mar 2007 12:46:19 -0600 >From: "Simon Lin" <simonlin at="" duke.edu=""> >Subject: Re: [BioC] geneSetTest() / GESA >To: <bioconductor at="" stat.math.ethz.ch=""> > >Dear Gordon, > >Is the geneSetTest() fast to calculate? Not sure if you used permutation >test under the hood. > >For GSEA and GSA, sometimes we see artifacts when the size of the set is too >small. Is the same true for geneSetTest? > >Thanks! > >Simon > > >Date: Sun, 04 Mar 2007 18:51:00 +1100 >From: Gordon Smyth <smyth at="" wehi.edu.au=""> >Subject: [BioC] GSEA with one class metaanalysis >To: Mark W Kimpel <mwkimpel at="" gmail.com=""> >Cc: bioconductor at stat.math.ethz.ch >Message-ID: <6.2.5.6.1.20070304184303.0242d7a0 at wehi.edu.au> >Content-Type: text/plain; charset="us-ascii"; format=flowed > >Dear Mark, > >If I understand your problem correctly, neither GSEA nor GSA will >accomodate it. The only option I know of is geneSetTest() in the >limma package. This generally works well, although it will give you >someone over optimistic p-values if there are strong positive >correlations between the genes in your gene sets. > >Best wishes >Gordon