Hi All,
I'm trying to perform an enrichment analysis using topGO for ~700 genes from microarray data. With this data set I'm unable to perform a neither a GO or MF ontology, however the CC ontology is functional. I've used this same coding format with a smaller gene list and had success with the GO ontology, thus I'm fairly confident it is not a problem of the text (see below).
> data <- read.csv(file.choose(),header=FALSE, stringsAsFactors=FALSE) > rn<-paste(data[,1], sep="") > P_values=data[,-1] > names(P_values)<-rn > myGOdata=P_values > relevant.genes <- factor(as.integer(all.genes %in% myGOdata) + ) > names(relevant.genes) <- all.genes > GOdata.BP <- new("topGOdata", ontology='BP', allGenes = myGOdata, annotationFun = annFUN.org, geneSel = topDiffGenes, nodeSize=10, mapping = "org.Hs.eg.db",ID = "symbol") Building most specific GOs ..... ( 2680 GO terms found. ) Build GO DAG topology .......... There are no adj nodes for node: GO:1905313 Error in switch(type, isa = 0, partof = 1, -1) : EXPR must be a length 1 vector > GOdata.BP <- new("topGOdata", ontology='MF', allGenes = myGOdata, annotationFun = annFUN.org, geneSel = topDiffGenes, nodeSize=10, mapping = "org.Hs.eg.db",ID = "symbol") Building most specific GOs ..... ( 746 GO terms found. ) Build GO DAG topology .......... There are no adj nodes for node: GO:0102132 Error in switch(type, isa = 0, partof = 1, -1) : EXPR must be a length 1 vector
Whereas the MF ontology does work:
>GOdata.BP <- new("topGOdata", ontology='CC', allGenes = myGOdata, annotationFun = annFUN.org, geneSel = topDiffGenes, nodeSize=10, mapping = "org.Hs.eg.db",ID = "symbol") Building most specific GOs ..... ( 391 GO terms found. ) Build GO DAG topology .......... ( 620 GO terms and 1240 relations. ) Annotating nodes ............... ( 609 genes annotated to the GO terms. )
Digging through past posts, a similar problem nearly six years ago arose and the error message was linked to a problem in the annotation package itself. I'm not too familiar with topGO, so I haven't had any success trying the suggested quick fix, but it does seem that a more permanent solution is warranted. ( topGO and Arabidopsis data ).
Any advice on how to proceed? Thanks in advance.
Which versions of topGO and org.Hs.eg.db are you using? I have used topGO several times and I have never found that error.
I am using the most up to date version of Bioconductor (3.4) along with the topGO package, which I believe is version 2.26.0. I didn't think about updating the annotation package version, so thanks for that suggestion. When I did; however, the code is still returning the same error message.
Of note, this same error occurs with using other mapping variables as well, such as clariomdhumantranscriptcluster.db.
You will need to provide a self-contained, reproducible example that people can use to test. Another alternative is to use the GOstats package, or the goana function in limma.
Are the parentChild, the weight, the weight01, the lea, and the elim algorithms implemented in other packages? AFAIK the GOstats only implement the hypergeometric test and the conditional test similar to the elim algorithm.