reasonable Illumina hyperG test
2
0
Entering edit mode
@sebastien-gerega-2229
Last seen 10.2 years ago
Hi, I have been looking around at examples of the hyperGTest (in the GOstats, lumi, and other documentation) and feel like I have seen many slight variations on the methodology. These variations are usually found in the way the non-specific filtering is performed. I haven't come across many examples of a hyperGTest for KEGG pathways and would like to ask whether my approach seems reasonable or whether I should make any changes. Here is my code ("sig" is a vector of EntrezID): uni = exprs(lumi.N.P) #Remove those without PATH annotation havePATH = sapply(mget(allFeatures, lumiHumanAllPATH), function(x){ if (length(x) == 1 && is.na(x)) FALSE else TRUE }) uni <- uni[names(which(havePATH == TRUE)),] #Remove those with little variation accross samples iqrCutoff = 0.5 uni.IQR = apply(uni, 1, IQR) uni = uni[which((uni.IQR > iqrCutoff) == TRUE),] #Keep probes w/largest IQR uni = uni[findLargest(rownames(uni), uni.IQR[rownames(uni)], "lumiHumanAll"),] uni = mget(rownames(uni), lumiHumanAllENTREZID) params = new("KEGGHyperGParams", geneIds=sig, universeGeneIds = uni, annotation="lumiHumanAll", pvalueCutoff=0.05, testDirection="over") hgOver = hyperGTest(params) Does this code/approach seem reasonable? Should I correct for multiple testing after the hyperGTest? Would it be fair to perform a test on gene ontologies in teh same way (obviously after having changed the param type and specifying an ontology branch)? thanks, Sebastien
Pathways lumi Pathways lumi • 1.1k views
ADD COMMENT
0
Entering edit mode
@michael-watson-iah-c-378
Last seen 10.2 years ago
To me, it depends on where sig comes from. Did you select "sig" before or after you filtered for IQR? If you did it before, then (to me) you have falsely reduced your universe; however, if you did it after, everything seems ok. -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch on behalf of Sebastien Gerega Sent: Fri 05/09/2008 6:18 AM To: bioconductor at stat.math.ethz.ch Subject: [BioC] reasonable Illumina hyperG test Hi, I have been looking around at examples of the hyperGTest (in the GOstats, lumi, and other documentation) and feel like I have seen many slight variations on the methodology. These variations are usually found in the way the non-specific filtering is performed. I haven't come across many examples of a hyperGTest for KEGG pathways and would like to ask whether my approach seems reasonable or whether I should make any changes. Here is my code ("sig" is a vector of EntrezID): uni = exprs(lumi.N.P) #Remove those without PATH annotation havePATH = sapply(mget(allFeatures, lumiHumanAllPATH), function(x){ if (length(x) == 1 && is.na(x)) FALSE else TRUE }) uni <- uni[names(which(havePATH == TRUE)),] #Remove those with little variation accross samples iqrCutoff = 0.5 uni.IQR = apply(uni, 1, IQR) uni = uni[which((uni.IQR > iqrCutoff) == TRUE),] #Keep probes w/largest IQR uni = uni[findLargest(rownames(uni), uni.IQR[rownames(uni)], "lumiHumanAll"),] uni = mget(rownames(uni), lumiHumanAllENTREZID) params = new("KEGGHyperGParams", geneIds=sig, universeGeneIds = uni, annotation="lumiHumanAll", pvalueCutoff=0.05, testDirection="over") hgOver = hyperGTest(params) Does this code/approach seem reasonable? Should I correct for multiple testing after the hyperGTest? Would it be fair to perform a test on gene ontologies in teh same way (obviously after having changed the param type and specifying an ontology branch)? thanks, Sebastien _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States
Hi Sebastien, Sebastien Gerega wrote: > Hi, > I have been looking around at examples of the hyperGTest (in the > GOstats, lumi, and other documentation) and feel like I have seen many > slight variations on the methodology. > These variations are usually found in the way the non-specific filtering > is performed. I haven't come across many examples of a hyperGTest for > KEGG pathways and would like to ask whether my approach seems reasonable > or whether I should make any changes. > Here is my code ("sig" is a vector of EntrezID): > > uni = exprs(lumi.N.P) > > #Remove those without PATH annotation > havePATH = sapply(mget(allFeatures, lumiHumanAllPATH), > function(x){ > if (length(x) == 1 && is.na(x)) > FALSE > else TRUE > }) > uni <- uni[names(which(havePATH == TRUE)),] > > #Remove those with little variation accross samples > iqrCutoff = 0.5 > uni.IQR = apply(uni, 1, IQR) > uni = uni[which((uni.IQR > iqrCutoff) == TRUE),] > > #Keep probes w/largest IQR > uni = uni[findLargest(rownames(uni), uni.IQR[rownames(uni)], > "lumiHumanAll"),] > uni = mget(rownames(uni), lumiHumanAllENTREZID) This may have by chance removed all duplicate Entrez IDs, but maybe not. You should also ensure that you have unique Entrez Gene IDs, as duplicates will bias your results (although I believe duplicates will be stripped anyway). > > params = new("KEGGHyperGParams", geneIds=sig, universeGeneIds = uni, > annotation="lumiHumanAll", pvalueCutoff=0.05, testDirection="over") > > hgOver = hyperGTest(params) > > > Does this code/approach seem reasonable? Should I correct for multiple > testing after the hyperGTest? How to correct for multiple testing with such highly dependent data is not really clear, and is probably not necessary, especially with KEGG data. You will likely only have a few significant terms, and it is even less likely that they will all be interesting to you or your collaborators. > Would it be fair to perform a test on gene ontologies in teh same way > (obviously after having changed the param type and specifying an > ontology branch)? Yes, with the addition of removing duplicate Entrez Gene IDs. Best, Jim > > thanks, > Sebastien > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD COMMENT

Login before adding your answer.

Traffic: 526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6