You are making things more difficult for yourself. Rather than coming up with a vector of p-values with HUGO gene symbols as the names, you should be using the Illumina IDs as names, and using annFUN.db, just like in the vignette. That way you can just follow along with code that makes sense.
You could use the vector you have, but the help page for annFUN.org()
is, like, not very helpful. So I can show you how to use your vector, but without saying how or why I know that you should be doing this. I will use the data from the vignette as an example.
## load stuff
> library(topGO)
> data(geneList)
> library(hgu95av2.db)
## we need a vector like yours, so do some stuff
> z <- select(hgu95av2.db, names(geneList), "SYMBOL")
'select()' returned 1:many mapping between keys and columns
> z <- z[!duplicated(z[,1]),]
> geneList2 <- geneList
> names(geneList2) <- z[,2]
## the original geneList
> head(geneList)
1095_s_at 1130_at 1196_at 1329_s_at 1340_s_at 1342_g_at
1.0000000 1.0000000 0.6223795 0.5412240 1.0000000 1.0000000
## something similar to what you have
> head(geneList2)
HGF MAP2K1 RCC1 TERF1 HGF TERF1
1.0000000 1.0000000 0.6223795 0.5412240 1.0000000 1.0000000
> sampleGOdata <- new("topGOdata", description = "whatevs",ontology = "BP", allGenes = geneList2, geneSel = topDiffGenes, nodeSize = 10, annot = annFUN.org, ID = "alias", mapping = "org.Hs.eg")
Building most specific GOs ..... ( 1566 GO terms found. )
Build GO DAG topology .......... ( 4215 GO terms and 9916 relations. )
Annotating nodes ............... ( 225 genes annotated to the GO terms. )
> resultFisher <- runTest(sampleGOdata, "classic","fisher")
-- Classic Algorithm --
the algorithm is scoring 776 nontrivial nodes
parameters:
test statistic: fisher
> resultFisher
Description: whatevs
Ontology: BP
'classic' algorithm with the 'fisher' test
797 GO terms scored: 11 terms with p < 0.01
Annotation data:
Annotated genes: 310
Significant genes: 46
Min. no. of genes annotated to a GO: 10
Nontrivial nodes: 776
Note that I get fewer GO terms this way (compare to the results on page 4 of the vignette), which is probably because gene symbols are really not useful for most data analysis. If you want to do things 'the right way', you will instead rely on actual IDs like the Illumina IDs, or Entrez Gene or Ensembl IDs, which are more likely to be unique.
this is perfect thank you.