[topGO] More flexible way to select 'interesting' genes

1

Entering edit mode

enricoferrero ▴ 660

@enricoferrero-6037

Last seen 3.0 years ago

Switzerland

Hi, Adrian, I'm using your topGO package and really appreciate how powerful and customizable it is with respect to the choice of algorithms and statistical tests. There's just one limitation that I don't really understand, hopefully you or somebody else in the list can shed some light on this. I usually select 'interesting' genes from gene expression experiments based on two different parameters: their p-value and their log(fold-change). As far as I understand, if I want to run a GO enrichment analysis in topGO using a statistical test that uses ranked gene lists such as KS (Kolmogorov?Smirnov test, also used by GSEA), I can only filter on one paramater (tipically the p-value). This is a consequence of the way the topGOdata objects are built, e.g.: myData <- new("topGOdata", description="myData", ontology="BP", allGenes=myAllGenes, geneSel=geneSelFunc, nodeSize=5, annot=annFUN.db, affyLib="hgu133plus2.db") where: - myAllGenes is a named vector of all p-values for each probe on the array, named after their probeID and - geneSelFunc is a function to select the interesting ones, such as: geneSelFunc <- function (score) { return(score <= 0.05) } I'm basically looking for a more flexible way to perform the selection of my interesting probes: for example I'd like to only select probes that have a p-value<=0.05 and a |log(fold-change)| >= 1. Is there any way to do this? Thank you. Best, -- Enrico Ferrero PhD Student Department of Genetics Cambridge Systems Biology Centre University of Cambridge

GO probe topGO GO probe topGO • 4.5k views

ADD COMMENT • link updated 11.3 years ago by Adrian Alexa ▴ 400 • written 11.3 years ago by enricoferrero ▴ 660

0

Entering edit mode

Kevin Coombes ▴ 430

@kevin-coombes-3935

Last seen 2.1 years ago

United States

That's because you (and many other people) actually want to filter on the (posterior) probability that the log fold change is bigger than some (prespecified) value that you believe is biologically meaningful. I have no idea how to accomplish this with topGO, but if you can find a way to compute that probability, then topGO can filter your list. On Aug 7, 2013 7:47 AM, "Enrico Ferrero" <enricoferrero86@gmail.com> wrote: > Hi, > > Adrian, I'm using your topGO package and really appreciate how > powerful and customizable it is with respect to the choice of > algorithms and statistical tests. > > There's just one limitation that I don't really understand, hopefully > you or somebody else in the list can shed some light on this. > > I usually select 'interesting' genes from gene expression experiments > based on two different parameters: their p-value and their > log(fold-change). > > As far as I understand, if I want to run a GO enrichment analysis in > topGO using a statistical test that uses ranked gene lists such as KS > (KolmogorovSmirnov test, also used by GSEA), I can only filter on one > paramater (tipically the p-value). > > This is a consequence of the way the topGOdata objects are built, e.g.: > > myData <- new("topGOdata", description="myData", ontology="BP", > allGenes=myAllGenes, geneSel=geneSelFunc, nodeSize=5, annot=annFUN.db, > affyLib="hgu133plus2.db") > > where: > > - myAllGenes is a named vector of all p-values for each probe on the > array, named after their probeID and > > - geneSelFunc is a function to select the interesting ones, such as: > > geneSelFunc <- function (score) { > return(score <= 0.05) > } > > I'm basically looking for a more flexible way to perform the selection > of my interesting probes: for example I'd like to only select probes > that have a p-value<=0.05 and a |log(fold-change)| >= 1. > > Is there any way to do this? > > Thank you. > Best, > > > -- > Enrico Ferrero > PhD Student > Department of Genetics > Cambridge Systems Biology Centre > University of Cambridge > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.3 years ago Kevin Coombes ▴ 430

0

Entering edit mode

On Wed, Aug 7, 2013 at 1:24 PM, Kevin Coombes <kevin.r.coombes@gmail.com>wrote: > That's because you (and many other people) actually want to filter on the > (posterior) probability that the log fold change is bigger than some > (prespecified) value that you believe is biologically meaningful. This isn't a posterior probability, but how about ranking using Limma's TREAT method? McCarthy, D. J., & Smyth, G. K. (2009). Testing significance relative to a fold-change threshold is a TREAT. *Bioinformatics*, *25*(6), 765-771. [[alternative HTML version deleted]]

ADD REPLY • link 11.3 years ago Levi Waldron ★ 1.1k

0

Entering edit mode

enricoferrero ▴ 660

@enricoferrero-6037

Last seen 3.0 years ago

Switzerland

Hi Hongxing, Yes you can, but then you can only perform statistical tests based on numerical comparisons such as Fisher's test, while you can't use those that incorporate a ranking of some sort in the analysis, like the KS test. Cheers, On 7 August 2013 21:05, Hongxing Yang <yanghx81 at="" gmail.com=""> wrote: > Hi, > > I think you can directly feed topGO your list of studied genes. Create a > logical vector for all genes, with the values for wanted genes being TRUE > and the others FALSE; then set the selection function to select out only > those TRUE genes. > > cheers > Hongxing > > 2013/8/7 Enrico Ferrero <enricoferrero86 at="" gmail.com=""> >> >> Hi, >> >> Adrian, I'm using your topGO package and really appreciate how >> powerful and customizable it is with respect to the choice of >> algorithms and statistical tests. >> >> There's just one limitation that I don't really understand, hopefully >> you or somebody else in the list can shed some light on this. >> >> I usually select 'interesting' genes from gene expression experiments >> based on two different parameters: their p-value and their >> log(fold-change). >> >> As far as I understand, if I want to run a GO enrichment analysis in >> topGO using a statistical test that uses ranked gene lists such as KS >> (Kolmogorov?Smirnov test, also used by GSEA), I can only filter on one >> paramater (tipically the p-value). >> >> This is a consequence of the way the topGOdata objects are built, e.g.: >> >> myData <- new("topGOdata", description="myData", ontology="BP", >> allGenes=myAllGenes, geneSel=geneSelFunc, nodeSize=5, annot=annFUN.db, >> affyLib="hgu133plus2.db") >> >> where: >> >> - myAllGenes is a named vector of all p-values for each probe on the >> array, named after their probeID and >> >> - geneSelFunc is a function to select the interesting ones, such as: >> >> geneSelFunc <- function (score) { >> return(score <= 0.05) >> } >> >> I'm basically looking for a more flexible way to perform the selection >> of my interesting probes: for example I'd like to only select probes >> that have a p-value<=0.05 and a |log(fold-change)| >= 1. >> >> Is there any way to do this? >> >> Thank you. >> Best, >> >> >> -- >> Enrico Ferrero >> PhD Student >> Department of Genetics >> Cambridge Systems Biology Centre >> University of Cambridge >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Enrico Ferrero PhD Student Steve Russell Lab - Department of Genetics FlyChip - Cambridge Systems Biology Centre University of Cambridge e.ferrero at gen.cam.ac.uk http://flypress.gen.cam.ac.uk/

ADD COMMENT • link 11.3 years ago enricoferrero ▴ 660

0

Entering edit mode

Adrian Alexa ▴ 400

@adrian-alexa-936

Last seen 10.2 years ago

Hi Enrico, unfortunately, the simple answer is NO. There is not facility to have the geneSelectionFun filter on anything but the gene scores. It is a bit of an old design with limitations, sorry for that. You might be able to hack a solution, but it can get messy. There are two methods that use the geneSelectionFun (sigGenes() and numSigGenes()) and you could re-implement those to fit your needs. But in order to do that you'll have to extend the topGOdata class and define those two methods. And even if you'll do that you'll have to use a global variable with the filters (fold-change, etc.), unless your extended class will keep track of those. All these is possible, but you'll need to have quite a bit of code on top of topGO package. Let me know if you want to pursue such a solution and I can try to help you with that. Kind regards, Adrian On Wed, Aug 7, 2013 at 1:46 PM, Enrico Ferrero <enricoferrero86@gmail.com>wrote: > Hi, > > Adrian, I'm using your topGO package and really appreciate how > powerful and customizable it is with respect to the choice of > algorithms and statistical tests. > > There's just one limitation that I don't really understand, hopefully > you or somebody else in the list can shed some light on this. > > I usually select 'interesting' genes from gene expression experiments > based on two different parameters: their p-value and their > log(fold-change). > > As far as I understand, if I want to run a GO enrichment analysis in > topGO using a statistical test that uses ranked gene lists such as KS > (KolmogorovSmirnov test, also used by GSEA), I can only filter on one > paramater (tipically the p-value). > > This is a consequence of the way the topGOdata objects are built, e.g.: > > myData <- new("topGOdata", description="myData", ontology="BP", > allGenes=myAllGenes, geneSel=geneSelFunc, nodeSize=5, annot=annFUN.db, > affyLib="hgu133plus2.db") > > where: > > - myAllGenes is a named vector of all p-values for each probe on the > array, named after their probeID and > > - geneSelFunc is a function to select the interesting ones, such as: > > geneSelFunc <- function (score) { > return(score <= 0.05) > } > > I'm basically looking for a more flexible way to perform the selection > of my interesting probes: for example I'd like to only select probes > that have a p-value<=0.05 and a |log(fold-change)| >= 1. > > Is there any way to do this? > > Thank you. > Best, > > > -- > Enrico Ferrero > PhD Student > Department of Genetics > Cambridge Systems Biology Centre > University of Cambridge > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.3 years ago Adrian Alexa ▴ 400

Login before adding your answer.