How to create a GO2gene object for topGO?

0

Entering edit mode

Quin Wills ▴ 100

@quin-wills-2709

Last seen 10.6 years ago

Hello all I have some significant Illumina v1 gene expression probes (and their probe 'universe') I want to run GO enrichment analysis on. I assume that: (i) I need illumnaHumanv1.db for the GO2PROBE mappings (ii) I need to to create a GO2gene object for input into topGO as: new("topGOdata", ontology="BP", allGenes=my.probe.list, annot=annFUN.GO2genes, GO2gene=my.GO2gene) I'm just not joining the mental dots dots between (i) and (ii). Or am I completely missing the point? Any quick/simple guidance to get from my probes to a topGOdata object would be very, very welcome - thanks! Quin ** * * * * *Quin Wills* *DPhil candidate* * * *Department of Statistics* *University** of Oxford*** *1 South Parks Road* *Oxford*** *OX1 3TG United Kingdom* *01865 285 394* [[alternative HTML version deleted]]

GO topGO GO topGO • 2.7k views

ADD COMMENT • link updated 16.5 years ago by michael watson IAH-C ★ 3.4k • written 16.5 years ago by Quin Wills ▴ 100

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 10.6 years ago

This code might work. In this example, my data is in a data.frame, called array2go. The column "Accn" contains the identifier for spots on the array The column "GO_ID" contains the GO identifier. The column "Category" contains the GO category. geneNames is all Accns on the array sigGenes is the significant set mf <- array2go[array2go$Category=="Function",] mygene2GO <- sapply(unique(as.vector(mf$Accn)), function(x) as.character(unique(mf$GO_ID[mf$Accn==x]))) geneNames <- unique(array2go$Accn) sigGenes # this comes from somewhere! geneList <- factor(as.integer(geneNames %in% sigGenes)) names(geneList) <- geneNames GOdata <- new("topGOdata", ontology="MF", allGenes=geneList, annot=annFUN.gene2GO, gene2GO=mygene2GO) test.stat <- new("classicCount", testStatistic=GOFisherTest, name="Fisher Test") resultFis <- getSigGroups(GOdata, test.stat) res <- GenTable(GOdata, classic=resultFis, topNodes=288) res[1:10,] -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Quin Wills Sent: 23 October 2008 23:18 To: bioconductor at stat.math.ethz.ch Subject: [BioC] How to create a GO2gene object for topGO? Hello all I have some significant Illumina v1 gene expression probes (and their probe 'universe') I want to run GO enrichment analysis on. I assume that: (i) I need illumnaHumanv1.db for the GO2PROBE mappings (ii) I need to to create a GO2gene object for input into topGO as: new("topGOdata", ontology="BP", allGenes=my.probe.list, annot=annFUN.GO2genes, GO2gene=my.GO2gene) I'm just not joining the mental dots dots between (i) and (ii). Or am I completely missing the point? Any quick/simple guidance to get from my probes to a topGOdata object would be very, very welcome - thanks! Quin ** * * * * *Quin Wills* *DPhil candidate* * * *Department of Statistics* *University** of Oxford*** *1 South Parks Road* *Oxford*** *OX1 3TG United Kingdom* *01865 285 394* [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 16.5 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

Hello all, I have been looking at the mdqc package for automatic quality assessment of a large set of Affy SNP 6.0 data. I have already generated a set of QC stats using Affy's own software and they exclude outlier arrays using a fixed cut-off of the contrast QC scores (basically a measure of how separated the three genotype clouds are). I wanted to see if mdqc would give me the same answers. Here are some of the contrast QC scores for the first 6 arrays (out of 140). A value less than 0.4 in any of these columns could be a quality problem according to Affy. > allQC[1:6,] Contrast.QC Contrast.QC..Random. Contrast.QC..Nsp. Contrast.QC..Sty. Contrast.QC..Nsp.Sty.Overlap. 1 0.72 0.72 0.79 1.00 1.38 2 0.42 0.42 0.72 0.35 0.99 3 1.08 1.08 0.97 1.28 1.30 4 0.50 0.50 0.75 0.79 0.64 5 0.00 0.00 0.00 -0.22 0.00 6 0.47 0.47 0.76 0.49 0.71 As you can see Array 5 is clearly an outlier (<0.4) in all 5 columns and we flagged it as such originally. However, when running mdqc, it does not call array 5 an outlier at the greatest significance level. Intuitively I would expect this array to have the most extreme quality measure. > mout=mdqc(allQC) > mout Method used: nogroups Number of groups: 1 Robust estimator: S-estimatorMDs exceeding the square root of the 90 % percentile of the Chi-Square distribution [1] 5 8 14 16 48 63 75 78 81 86 91 114 117 122 126 131 132 134 137 138 MDs exceeding the square root of the 95 % percentile of the Chi- Square distribution [1] 5 8 14 48 75 78 81 86 91 114 122 126 131 132 137 138 MDs exceeding the square root of the 99 % percentile of the Chi- Square distribution [1] 48 78 81 86 122 126 131 137 138 Which leads me (finally!) to my questions:- -Is mdqc getting confused by the fact that array 5 is consistently low in all qc measures? -Does mdqc automatically assume that higher values indicate lower array quality or vice-versa? Many thanks in advance for any input, Cheers, Mark PS here is my sessionInfo() > sessionInfo() R version 2.8.0 alpha (2008-10-04 r46598) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] mdqc_1.4.0 MASS_7.2-44 cluster_1.11.11

ADD REPLY • link 16.5 years ago Mark Dunning ▴ 320

0

Entering edit mode

Thanks a stack, Michael... that's crystal clear. Mental dots are joined and I think can take it from here. The topGO documentation was just really not clear. Quin michael watson (IAH-C) wrote: > This code might work. > > In this example, my data is in a data.frame, called array2go. > > The column "Accn" contains the identifier for spots on the array > The column "GO_ID" contains the GO identifier. > The column "Category" contains the GO category. > > geneNames is all Accns on the array > sigGenes is the significant set > > mf <- array2go[array2go$Category=="Function",] > mygene2GO <- sapply(unique(as.vector(mf$Accn)), > function(x) > as.character(unique(mf$GO_ID[mf$Accn==x]))) > geneNames <- unique(array2go$Accn) > sigGenes # this comes from somewhere! > > geneList <- factor(as.integer(geneNames %in% sigGenes)) > names(geneList) <- geneNames > > GOdata <- new("topGOdata", > ontology="MF", > allGenes=geneList, > annot=annFUN.gene2GO, > gene2GO=mygene2GO) > > test.stat <- new("classicCount", > testStatistic=GOFisherTest, > name="Fisher Test") > > resultFis <- getSigGroups(GOdata, test.stat) > > res <- GenTable(GOdata, classic=resultFis, topNodes=288) > res[1:10,] > > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Quin Wills > Sent: 23 October 2008 23:18 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] How to create a GO2gene object for topGO? > > Hello all > > I have some significant Illumina v1 gene expression probes (and their > probe 'universe') I want to run GO enrichment analysis on. > > I assume that: > (i) I need illumnaHumanv1.db for the GO2PROBE mappings > (ii) I need to to create a GO2gene object for input into topGO as: > new("topGOdata", ontology="BP", allGenes=my.probe.list, > annot=annFUN.GO2genes, GO2gene=my.GO2gene) > > I'm just not joining the mental dots dots between (i) and (ii). Or am I > completely missing the point? Any quick/simple guidance to get from my > probes to a topGOdata object would be very, very welcome - thanks! > > Quin ** > > * * > > * * > > *Quin Wills* > *DPhil candidate* > > * * > > *Department of Statistics* > > *University** of Oxford*** > > *1 South Parks Road* > *Oxford*** > > *OX1 3TG > United Kingdom* > > > > *01865 285 394* > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- * * * * * * *Quin Wills* *DPhil candidate* * * *Department of Statistics* *University** of Oxford*** *1 South Parks Road* *Oxford*** *OX1 3TG United Kingdom* *01865 285 394* [[alternative HTML version deleted]]

ADD REPLY • link 16.5 years ago Quin Wills ▴ 100

Login before adding your answer.