topGO question

0

Entering edit mode

Heike Pospisil ▴ 310

@heike-pospisil-1097

Last seen 10.5 years ago

Hello list, I am trying to use topGO for GO enrichment analysis. I have data from an array which is still not supported by BioC (maize array). I have a mapping of genes to GO terms named go_list: $TM00000001 [1] "GO:0009058" "GO:0016757" $TM00000002 [1] "GO:0003700" "GO:0007275" "GO:0005634" "GO:0009414" "GO:0016563" [6] "GO:0009737" "GO:0045449" "GO:0010072" "GO:0046982" "GO:0009651" [11] "GO:0009733" "GO:0009723" "GO:0009734" "GO:0048527" "GO:0042803" [16] "GO:0009867" "GO:0010150" "GO:0009825" "GO:0009908" "GO:0003713" [21] "GO:0051607" "GO:0009790" "GO:0010014" "GO:0048467" "GO:0030528" [26] "GO:0009741" "GO:0009735" "GO:0010089" "GO:0009834" "GO:0009901" [31] "GO:0009611" "GO:0008361" "GO:0009416" "GO:0009620" "GO:0009744" [36] "GO:0009753" "GO:0009751" "GO:0010199" Moreover, the geneList is the named factor that indicates which genes are interested: > str(geneList) Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "names")= chr [1:56321] "MZ00000001" "MZ00000002" "MZ00000003" "MZ00000004" ... I have used annFUN.gene2GO as an annotation function: GOdata<-new("topGOdata",ontology="MF",allGenes=geneList,annot=annFUN.g ene2GO,gene2GO=go_list) Unfortunately, I got the following error message: Building most specific GOs .....Error in order(allGO) : argument 1 is not a vector Does anybody have an idea what is wrong in my code? Thanks and best, Heike

Annotation GO topGO Annotation GO topGO • 1.6k views

ADD COMMENT • link updated 16.4 years ago by Adrian Alexa ▴ 400 • written 16.4 years ago by Heike Pospisil ▴ 310

0

Entering edit mode

Adrian Alexa ▴ 400

@adrian-alexa-936

Last seen 10.5 years ago

Hi Heike, it seems that there is a problem with the go_list object. It should be a list of character vectors. However, it is hard to tell what is wrong with just the information you provided. Please also post the session info such that we know which version of the software are you using. The error is with the annFUN.gene2GO() function. if the go_list is correct, than the following line should pass without error: go2genes <- annFUN.gene2GO(whichOnto = "MF", gene2GO = go_list) If you get an error here, can you post the results of the following lines: allGO = unlist(go_list, use.names = FALSE) str(allGO) sumis.na(allGO)) sum(is.null(allGO)) Regards, Adrian On Fri, Sep 12, 2008 at 11:42 AM, Heike Pospisil <pospisil at="" zbh.uni-hamburg.de=""> wrote: > Hello list, > > I am trying to use topGO for GO enrichment analysis. I have data from an > array which is still not supported by BioC (maize array). > > I have a mapping of genes to GO terms named go_list: > > $TM00000001 > [1] "GO:0009058" "GO:0016757" > > $TM00000002 > [1] "GO:0003700" "GO:0007275" "GO:0005634" "GO:0009414" "GO:0016563" > [6] "GO:0009737" "GO:0045449" "GO:0010072" "GO:0046982" "GO:0009651" > [11] "GO:0009733" "GO:0009723" "GO:0009734" "GO:0048527" "GO:0042803" > [16] "GO:0009867" "GO:0010150" "GO:0009825" "GO:0009908" "GO:0003713" > [21] "GO:0051607" "GO:0009790" "GO:0010014" "GO:0048467" "GO:0030528" > [26] "GO:0009741" "GO:0009735" "GO:0010089" "GO:0009834" "GO:0009901" > [31] "GO:0009611" "GO:0008361" "GO:0009416" "GO:0009620" "GO:0009744" > [36] "GO:0009753" "GO:0009751" "GO:0010199" > > Moreover, the geneList is the named factor that indicates which genes are > interested: >> str(geneList) > Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... > - attr(*, "names")= chr [1:56321] "MZ00000001" "MZ00000002" "MZ00000003" > "MZ00000004" ... > > I have used annFUN.gene2GO as an annotation function: > > GOdata<-new("topGOdata",ontology="MF",allGenes=geneList,annot=annFUN .gene2GO,gene2GO=go_list) > > Unfortunately, I got the following error message: > Building most specific GOs .....Error in order(allGO) : argument 1 is not a > vector > > Does anybody have an idea what is wrong in my code? > > Thanks and best, > Heike > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 16.4 years ago Adrian Alexa ▴ 400

0

Entering edit mode

Hi Adrian, thanks for your reply. I passed the annFUN.gene2GO and got no errors. allGo contains no NA and no NULL. go_list is a list of character vectors: > str(go_list2) List of 9 $ TM00000001: chr [1:2] "GO:0009058" "GO:0016757" $ TM00000002: chr [1:38] "GO:0003700" "GO:0007275" "GO:0005634" "GO:0009414" ... $ TM00000003: chr [1:38] "GO:0003700" "GO:0007275" "GO:0045449" "GO:0005634" ... $ TM00000004: chr [1:5] "GO:0005634" "GO:0045449" "GO:0009943" "GO:0009947" ... $ TM00000005: chr [1:13] "GO:0016165" "GO:0040007" "GO:0006952" "GO:0009695" ... $ TM00000006: chr [1:38] "GO:0003700" "GO:0007275" "GO:0045449" "GO:0005634" ... $ TM00000007: chr [1:38] "GO:0003700" "GO:0007275" "GO:0045449" "GO:0005634" ... $ TM00000009: chr [1:42] "GO:0016301" "GO:0004672" "GO:0004674" "GO:0009733" ... $ TM00000010: chr [1:8] "GO:0009736" "GO:0005886" "GO:0004673" "GO:0009884" ... And, here is my sessionInfo: > sessionInfo() R version 2.7.2 (2008-08-25) i486-pc-linux-gnu locale: LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_DE .UTF-8;LC_MONETARY=de_DE.UTF-8;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE. UTF-8;LC_NAME=de_DE.UTF-8;LC_ADDRESS=de_DE.UTF-8;LC_TELEPHONE=de_DE.UT F-8;LC_MEASUREMENT=de_DE.UTF-8;LC_IDENTIFICATION=de_DE.UTF-8 attached base packages: [1] splines grid tcltk tools stats graphics grDevices [8] utils datasets methods base other attached packages: [1] maizeprobe_2.2.0 matchprobes_1.12.1 maizecdf_2.2.0 [4] GO_2.2.0 topGO_1.8.1 SparseM_0.78 [7] biomaRt_1.14.1 RCurl_0.9-4 GOstats_2.6.0 [10] Category_2.6.0 genefilter_1.20.0 survival_2.34-1 [13] RBGL_1.16.0 annotate_1.18.0 xtable_1.5-3 [16] GO.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.7-0 [19] DBI_0.2-4 graph_1.18.1 qvalue_1.14.0 [22] maanova_1.10.0 arrayQuality_1.18.0 RColorBrewer_1.0-2 [25] gridBase_0.4-3 hexbin_1.14.0 colorspace_0.95 [28] convert_1.16.0 marray_1.18.0 tkWidgets_1.18.0 [31] DynDoc_1.18.0 widgetTools_1.16.0 statmod_1.3.6 [34] vsn_3.6.0 lattice_0.17-14 affy_1.18.2 [37] preprocessCore_1.2.1 affyio_1.8.1 Biobase_2.0.1 [40] limma_2.14.6 rkward_0.4.9 loaded via a namespace (and not attached): [1] cluster_1.11.11 XML_1.96-0 Would be very glad if you have any idea, what went wrong. Thanks, Heike Adrian Alexa schrieb: > Hi Heike, > > it seems that there is a problem with the go_list object. It should be > a list of character vectors. However, it is hard to tell what is wrong > with just the information you provided. Please also post the session > info such that we know which version of the software are you using. > > The error is with the annFUN.gene2GO() function. if the go_list is > correct, than the following line should pass without error: > > go2genes <- annFUN.gene2GO(whichOnto = "MF", gene2GO = go_list) > > If you get an error here, can you post the results of the following lines: > > allGO = unlist(go_list, use.names = FALSE) > str(allGO) > sumis.na(allGO)) > sum(is.null(allGO)) > > > Regards, > Adrian > > > > > > > On Fri, Sep 12, 2008 at 11:42 AM, Heike Pospisil > <pospisil at="" zbh.uni-hamburg.de=""> wrote: > >> Hello list, >> >> I am trying to use topGO for GO enrichment analysis. I have data from an >> array which is still not supported by BioC (maize array). >> >> I have a mapping of genes to GO terms named go_list: >> >> $TM00000001 >> [1] "GO:0009058" "GO:0016757" >> >> $TM00000002 >> [1] "GO:0003700" "GO:0007275" "GO:0005634" "GO:0009414" "GO:0016563" >> [6] "GO:0009737" "GO:0045449" "GO:0010072" "GO:0046982" "GO:0009651" >> [11] "GO:0009733" "GO:0009723" "GO:0009734" "GO:0048527" "GO:0042803" >> [16] "GO:0009867" "GO:0010150" "GO:0009825" "GO:0009908" "GO:0003713" >> [21] "GO:0051607" "GO:0009790" "GO:0010014" "GO:0048467" "GO:0030528" >> [26] "GO:0009741" "GO:0009735" "GO:0010089" "GO:0009834" "GO:0009901" >> [31] "GO:0009611" "GO:0008361" "GO:0009416" "GO:0009620" "GO:0009744" >> [36] "GO:0009753" "GO:0009751" "GO:0010199" >> >> Moreover, the geneList is the named factor that indicates which genes are >> interested: >> >>> str(geneList) >>> >> Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... >> - attr(*, "names")= chr [1:56321] "MZ00000001" "MZ00000002" "MZ00000003" >> "MZ00000004" ... >> >> I have used annFUN.gene2GO as an annotation function: >> >> GOdata<-new("topGOdata",ontology="MF",allGenes=geneList,annot=annFU N.gene2GO,gene2GO=go_list) >> >> Unfortunately, I got the following error message: >> Building most specific GOs .....Error in order(allGO) : argument 1 is not a >> vector >> >> Does anybody have an idea what is wrong in my code? >> >> Thanks and best, >> Heike >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >

ADD REPLY • link 16.4 years ago Heike Pospisil ▴ 310

0

Entering edit mode

Hi Heike, I just looked at your data and the problem seems to be with geneList. More exactly, the geneList and go_list don't match at all. The gene identifiers in indexing the go_list are totally different from the gene identifiers found in the geneList! library(topGO) ## load the data load("myGOdata.RData") ## the list of all gene ID's geneNames <- names(go_list) str(geneNames) length(intersect(names(geneList), geneNames)) ## this is 0! I think you mixed the annotations or the process of building the list of interesting genes. There needs to be an overlap between the identifiers in the GO-to-gene mapping and the list of interesting genes. Bellow I generated a random set of interesting genes just to test if one can build the topGOdata object based on the go_list: ## generate a random list of interesting genes ## select (or define) the list of interesting genes myInterestedGenes <- sample(geneNames, 100) ## make a indicator vector showing which genes are interesting myGeneList <- factor(as.integer(geneNames %in% myInterestedGenes)) names(myGeneList) <- geneNames str(myGeneList) sum(as.integer(myGeneList) == 2) ## should be 100 ## build the topGOdata class ## there are three annotation functions available: ## 1. annFUN.db -- used for bioconductor annotation chips ## 2. annFUN.gene2GO -- used when you have mappings from each gene to GOs ## 3. annFUN.GO2genes -- used when you have mappings from each GO to genes ## GOdata <- new("topGOdata", ontology = "MF", allGenes = myGeneList, annot = annFUN.gene2GO, ## the new annotation function gene2GO = go_list) ## the gene ID to GO dataset ## display the GOdata object GOdata ------------------------- topGOdata object ------------------------- Description: - Ontology: - MF 20623 available genes (all genes from the array): - symbol: TM00000001 TM00000002 TM00000003 TM00000004 TM00000005 ... - 100 significant genes. 18098 feasible genes (genes that can be used in the analysis): - symbol: TM00000001 TM00000002 TM00000003 TM00000004 TM00000005 ... - 90 significant genes. GO graph (nodes with at least 0 genes): - a graph with directed edges - number of nodes = 1556 - number of edges = 1853 ------------------------- topGOdata object ------------------------- sessionInfo() R version 2.7.1 (2008-06-23) i686-pc-linux-gnu locale: LC_CTYPE=en_US.ISO-8859-15;LC_NUMERIC=C;LC_TIME=en_US.ISO-8859-15;LC_C OLLATE=en_US.ISO-8859-15;LC_MONETARY=C;LC_MESSAGES=en_US.ISO-8859-15;L C_PAPER=en_US.ISO-8859-15;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEA SUREMENT=en_US.ISO-8859-15;LC_IDENTIFICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] topGO_1.8.1 SparseM_0.78 GO.db_2.2.0 [4] AnnotationDbi_1.2.2 RSQLite_0.6-9 DBI_0.2-4 [7] Biobase_2.0.1 graph_1.18.1 loaded via a namespace (and not attached): [1] cluster_1.11.11 Hope that this helps, Adrian On Sun, Sep 14, 2008 at 8:02 AM, Heike Pospisil <pospisil at="" zbh.uni-hamburg.de=""> wrote: > Hi Adrian, > > thanks for your reply. > > I passed the annFUN.gene2GO and got no errors. allGo contains no NA and > no NULL. > go_list is a list of character vectors: > >> str(go_list2) > List of 9 > $ TM00000001: chr [1:2] "GO:0009058" "GO:0016757" > $ TM00000002: chr [1:38] "GO:0003700" "GO:0007275" "GO:0005634" "GO:0009414" > ... > $ TM00000003: chr [1:38] "GO:0003700" "GO:0007275" "GO:0045449" "GO:0005634" > ... > $ TM00000004: chr [1:5] "GO:0005634" "GO:0045449" "GO:0009943" "GO:0009947" > ... > $ TM00000005: chr [1:13] "GO:0016165" "GO:0040007" "GO:0006952" "GO:0009695" > ... > $ TM00000006: chr [1:38] "GO:0003700" "GO:0007275" "GO:0045449" "GO:0005634" > ... > $ TM00000007: chr [1:38] "GO:0003700" "GO:0007275" "GO:0045449" "GO:0005634" > ... > $ TM00000009: chr [1:42] "GO:0016301" "GO:0004672" "GO:0004674" "GO:0009733" > ... > $ TM00000010: chr [1:8] "GO:0009736" "GO:0005886" "GO:0004673" "GO:0009884" > ... > > And, here is my sessionInfo: > >> sessionInfo() > R version 2.7.2 (2008-08-25) > i486-pc-linux-gnu > > locale: > LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_ DE.UTF-8;LC_MONETARY=de_DE.UTF-8;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_D E.UTF-8;LC_NAME=de_DE.UTF-8;LC_ADDRESS=de_DE.UTF-8;LC_TELEPHONE=de_DE. UTF-8;LC_MEASUREMENT=de_DE.UTF-8;LC_IDENTIFICATION=de_DE.UTF-8 > > attached base packages: > [1] splines grid tcltk tools stats graphics grDevices > [8] utils datasets methods base > other attached packages: > [1] maizeprobe_2.2.0 matchprobes_1.12.1 maizecdf_2.2.0 [4] > GO_2.2.0 topGO_1.8.1 SparseM_0.78 [7] > biomaRt_1.14.1 RCurl_0.9-4 GOstats_2.6.0 [10] > Category_2.6.0 genefilter_1.20.0 survival_2.34-1 [13] > RBGL_1.16.0 annotate_1.18.0 xtable_1.5-3 [16] > GO.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.7-0 [19] DBI_0.2-4 > graph_1.18.1 qvalue_1.14.0 [22] maanova_1.10.0 > arrayQuality_1.18.0 RColorBrewer_1.0-2 [25] gridBase_0.4-3 > hexbin_1.14.0 colorspace_0.95 [28] convert_1.16.0 > marray_1.18.0 tkWidgets_1.18.0 [31] DynDoc_1.18.0 > widgetTools_1.16.0 statmod_1.3.6 [34] vsn_3.6.0 > lattice_0.17-14 affy_1.18.2 [37] preprocessCore_1.2.1 > affyio_1.8.1 Biobase_2.0.1 [40] limma_2.14.6 > rkward_0.4.9 > loaded via a namespace (and not attached): > [1] cluster_1.11.11 XML_1.96-0 > > Would be very glad if you have any idea, what went wrong. Thanks, > Heike > > > Adrian Alexa schrieb: >> >> Hi Heike, >> >> it seems that there is a problem with the go_list object. It should be >> a list of character vectors. However, it is hard to tell what is wrong >> with just the information you provided. Please also post the session >> info such that we know which version of the software are you using. >> >> The error is with the annFUN.gene2GO() function. if the go_list is >> correct, than the following line should pass without error: >> >> go2genes <- annFUN.gene2GO(whichOnto = "MF", gene2GO = go_list) >> >> If you get an error here, can you post the results of the following lines: >> >> allGO = unlist(go_list, use.names = FALSE) >> str(allGO) >> sumis.na(allGO)) >> sum(is.null(allGO)) >> >> >> Regards, >> Adrian >> >> >> >> >> >> >> On Fri, Sep 12, 2008 at 11:42 AM, Heike Pospisil >> <pospisil at="" zbh.uni-hamburg.de=""> wrote: >> >>> >>> Hello list, >>> >>> I am trying to use topGO for GO enrichment analysis. I have data from an >>> array which is still not supported by BioC (maize array). >>> >>> I have a mapping of genes to GO terms named go_list: >>> >>> $TM00000001 >>> [1] "GO:0009058" "GO:0016757" >>> >>> $TM00000002 >>> [1] "GO:0003700" "GO:0007275" "GO:0005634" "GO:0009414" "GO:0016563" >>> [6] "GO:0009737" "GO:0045449" "GO:0010072" "GO:0046982" "GO:0009651" >>> [11] "GO:0009733" "GO:0009723" "GO:0009734" "GO:0048527" "GO:0042803" >>> [16] "GO:0009867" "GO:0010150" "GO:0009825" "GO:0009908" "GO:0003713" >>> [21] "GO:0051607" "GO:0009790" "GO:0010014" "GO:0048467" "GO:0030528" >>> [26] "GO:0009741" "GO:0009735" "GO:0010089" "GO:0009834" "GO:0009901" >>> [31] "GO:0009611" "GO:0008361" "GO:0009416" "GO:0009620" "GO:0009744" >>> [36] "GO:0009753" "GO:0009751" "GO:0010199" >>> >>> Moreover, the geneList is the named factor that indicates which genes are >>> interested: >>> >>>> >>>> str(geneList) >>>> >>> >>> Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... >>> - attr(*, "names")= chr [1:56321] "MZ00000001" "MZ00000002" "MZ00000003" >>> "MZ00000004" ... >>> >>> I have used annFUN.gene2GO as an annotation function: >>> >>> >>> GOdata<-new("topGOdata",ontology="MF",allGenes=geneList,annot=annF UN.gene2GO,gene2GO=go_list) >>> >>> Unfortunately, I got the following error message: >>> Building most specific GOs .....Error in order(allGO) : argument 1 is not >>> a >>> vector >>> >>> Does anybody have an idea what is wrong in my code? >>> >>> Thanks and best, >>> Heike >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> >> > >

ADD REPLY • link 16.4 years ago Adrian Alexa ▴ 400

Login before adding your answer.