Strange FDR values using GSA package with large numbers of permutations

0

Entering edit mode

Alexander C Cambon ▴ 30

@alexander-c-cambon-2336

Last seen 10.4 years ago

I am using the GSA package for gene set analysis. I have normalize data from a microarray experiment with four groups and four replicates (16 total arrays). I downloaded the set of curated gene sets ("c2.all.v2.5.symbols.gmt") from the Broad Institute web page MSigDB (I did register) and read the data into R using the GSA package as follows: geneset.obj<- GSA.read.gmt("c2.all.v2.5.symbols.gmt") GSA.obj3<-GSA(x,y, genenames=gn, genesets=geneset.obj$genesets, resp.type="Multiclass", nperms=10000) (I also tried a smaller number of permutations). Then, I got the gene set list using GSA.3<-GSA.listsets(GSA.obj3, geneset.names=geneset.obj$geneset.names,FDRcut=.5) > dim(GSA.3$positive) [1] 721 5 I noticed that when I used 10000 permutations, the p-values for the gene sets varied all the way from 0.0013 for the top gene set to 0.9434 for the gene sets at the bottom, but the FDR values for all but the last of the 721 gene sets stayed at 0.0749. This did not happen, at least not this severely, with a smaller number of permutations. Does anyone have an explanation? Alexander Cambon Biostatistician Dept of Bioinformatics and Biostatistics School of Public Health and Information Sciences University of Louisville Louisville, KY Here is a sample of some of the first ones (I x'ed out the gene set numbers and names) Gene_set Gene_set_name Score p-value FDR "xxx" "xxx" "0.0994" "0.0013" "0.0749" "xxx" "xxx" "0.1622" "0.0013" "0.0749" "xx" "xxxx" "0.3668" "0.0016" "0.0749" "xxx" "xxxx" "0.2887" "0.0016" "0.0749" Here are the last ones Gene_set Gene_set_name Score p-value FDR "xxx" "xxx" "0.0285" "0.8882" "0.0749" "xxx" " xxx" "0.024" "0.901" "0.0749" "xxx" "xxx" "0.0035" "0.9434" "0.0749" I am using Widows XP > sessionInfo() R version 2.9.1 (2009-06-26) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GSA_1.0

Microarray Microarray • 1.0k views

ADD COMMENT • link updated 15.4 years ago by Wolfgang Huber ★ 13k • written 15.4 years ago by Alexander C Cambon ▴ 30

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 5 months ago

EMBL European Molecular Biology Laborat…

Hi Alexander it would be great if someone subscribing to this list has an insight on this question. However, note that GSA is not a Bioconductor package, so you may need to explore further avenue such as contacting its authors or examining the source code of the function. Also, please note that the probability of getting a useful answer is increased if you provide a readily reproducible example. Best wishes Wolfgang PS It is curious that the numeric values in your example are reported as character strings. Alexander C Cambon wrote: > I am using the GSA package for gene set analysis. I have normalize data from a microarray experiment with four groups and four replicates (16 total arrays). > > I downloaded the set of curated gene sets ("c2.all.v2.5.symbols.gmt") from the Broad Institute web page MSigDB (I did register) and read the data into R using the GSA package as follows: > > geneset.obj<- GSA.read.gmt("c2.all.v2.5.symbols.gmt") > > GSA.obj3<-GSA(x,y, genenames=gn, genesets=geneset.obj$genesets, resp.type="Multiclass", nperms=10000) > > (I also tried a smaller number of permutations). > > Then, I got the gene set list using > > GSA.3<-GSA.listsets(GSA.obj3, geneset.names=geneset.obj$geneset.names,FDRcut=.5) > >> dim(GSA.3$positive) > [1] 721 5 > > > I noticed that when I used 10000 permutations, the p-values for the gene sets varied all the way from 0.0013 for the top gene set to 0.9434 for the gene sets at the bottom, but the FDR values for all but the last of the 721 gene sets stayed at 0.0749. This did not happen, at least not this severely, with a smaller number of permutations. > > Does anyone have an explanation? > > Alexander Cambon > Biostatistician > Dept of Bioinformatics and Biostatistics > School of Public Health and Information Sciences > University of Louisville > Louisville, KY > > > > Here is a sample of some of the first ones (I x'ed out the gene set numbers and names) > > Gene_set Gene_set_name Score p-value FDR > > "xxx" "xxx" "0.0994" "0.0013" "0.0749" > "xxx" "xxx" "0.1622" "0.0013" "0.0749" > "xx" "xxxx" "0.3668" "0.0016" "0.0749" > "xxx" "xxxx" "0.2887" "0.0016" "0.0749" > > > Here are the last ones > Gene_set Gene_set_name Score p-value FDR > > > "xxx" "xxx" "0.0285" "0.8882" "0.0749" > "xxx" " xxx" "0.024" "0.901" "0.0749" > "xxx" "xxx" "0.0035" "0.9434" "0.0749" > > > I am using Widows XP >> sessionInfo() > R version 2.9.1 (2009-06-26) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GSA_1.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 15.4 years ago Wolfgang Huber ★ 13k

Login before adding your answer.