topGO w/custom annotation

0

Entering edit mode

Sebastien Gerega ▴ 370

@sebastien-gerega-2229

Last seen 10.6 years ago

Hi, I would like to identify overrepresented GO in my dataset using the topGO package. I am working with a C. neoformans oligo array and have the GO annotation in a list with the following format: $`162.m02116` [1] GO:0009536 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 $`162.m02150` [1] GO:0009536 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 $`162.m02156` [1] GO:0005554 GO:0008372 GO:0000004 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 How do I go about creating the GOdata object? GOdata = new("topGOdata", ontology = "MF", allGenes = selectedList, annot = annFUN.GO2genes, ## the new annotation function GO2genes = myGO2genes) ## the GO to gene ID's dataset Specifically what do I use for the annotation function? It is unclear to me how to write this aspect. Am I correct in understanding that I supply my list as the final argument? thanks, Sebastien

Annotation GO oligo Annotation GO oligo • 3.0k views

ADD COMMENT • link updated 16.9 years ago by James W. MacDonald 68k • written 16.9 years ago by Sebastien Gerega ▴ 370

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 14 hours ago

United States

Hi Sebasien, Sebastien Gerega wrote: > Hi, > I would like to identify overrepresented GO in my dataset using the > topGO package. > I am working with a C. neoformans oligo array and have the GO annotation > in a list with the following format: > > $`162.m02116` > [1] GO:0009536 > 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 > GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 > > $`162.m02150` > [1] GO:0009536 > 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 > GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 > > $`162.m02156` > [1] GO:0005554 GO:0008372 GO:0000004 > 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 > GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 > > How do I go about creating the GOdata object? > > GOdata = new("topGOdata", > ontology = "MF", > allGenes = selectedList, > annot = annFUN.GO2genes, ## the new annotation function > GO2genes = myGO2genes) ## the GO to gene ID's dataset > > Specifically what do I use for the annotation function? It is unclear to > me how to write this aspect. The documentation is a bit unclear, but I believe you have things set up correctly. I assume you have done the rest of the analysis and had no errors? Best, Jim > Am I correct in understanding that I supply my list as the final argument? > thanks, > Sebastien > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD COMMENT • link 16.9 years ago James W. MacDonald 68k

0

Entering edit mode

Hi, actually after attempting with what I thoguht were the correct parameters I obtained the following error: Building most specific GOs .....Error in annotationFun(ontology, .Object at allGenes, ...) : unused argument(s) (GO2genes = list("162.m02116" = 2403, "162.m02150" = 2403, "162.m02156" = c(1133, 2157, 9), "162.m02171" = 1199, "162.m02176" = c(2157, 1133, 9), "162.m02195" = 9, "162.m02197" = c(1199, 2403), "162.m02200" = c(1501, 1114, 3855, 1161, 1156, 3390, 1195), "162.m02222" = c(208, 409, 1185), "162.m02236" = c(410, 8, 1217, 3569), "162.m02241" = c(1133, 9, 2157), "162.m02255" = c(277, 1133), "162.m02262" = c(1561, 2283), "162.m02279" = c(392, 1156), "162.m02281" = c(1301, 1143, 1302, 1297), "162.m02286" = 2403, Perhaps this is to do with the annFUN.gene2GO function I used. I am not sure how to alter the function (if necessary) for my gened IDs. Do the gene IDs have to ben Entrez IDs or does that not matter? thanks, Sebastien James W. MacDonald wrote: > Hi Sebasien, > > Sebastien Gerega wrote: >> Hi, >> I would like to identify overrepresented GO in my dataset using the >> topGO package. >> I am working with a C. neoformans oligo array and have the GO >> annotation in a list with the following format: >> >> $`162.m02116` >> [1] GO:0009536 >> 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 >> GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 >> >> $`162.m02150` >> [1] GO:0009536 >> 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 >> GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 >> >> $`162.m02156` >> [1] GO:0005554 GO:0008372 GO:0000004 >> 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 >> GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 >> >> How do I go about creating the GOdata object? >> >> GOdata = new("topGOdata", >> ontology = "MF", >> allGenes = selectedList, >> annot = annFUN.GO2genes, ## the new annotation function >> GO2genes = myGO2genes) ## the GO to gene ID's dataset >> >> Specifically what do I use for the annotation function? It is unclear >> to me how to write this aspect. > > The documentation is a bit unclear, but I believe you have things set > up correctly. I assume you have done the rest of the analysis and had > no errors? > > Best, > > Jim > > >> Am I correct in understanding that I supply my list as the final >> argument? >> thanks, >> Sebastien >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 16.9 years ago Sebastien Gerega ▴ 370

0

Entering edit mode

What happens if you do annFUN.GO2genes(GO2genes = myGO2genes)? I don't think you need Entrez Gene IDs for topGO. What I don't understand is the weird mappings you are getting below. Best, Jim BTW, have you emailed Adrian Alexa directly? Not sure how closely he follows BioC help. Sebastien Gerega wrote: > Hi, > actually after attempting with what I thoguht were the correct > parameters I obtained the following error: > > Building most specific GOs .....Error in annotationFun(ontology, > .Object at allGenes, ...) : > unused argument(s) (GO2genes = list("162.m02116" = 2403, "162.m02150" = > 2403, "162.m02156" = c(1133, 2157, 9), "162.m02171" = 1199, "162.m02176" > = c(2157, 1133, 9), "162.m02195" = 9, "162.m02197" = c(1199, 2403), > "162.m02200" = c(1501, 1114, 3855, 1161, 1156, 3390, 1195), "162.m02222" > = c(208, 409, 1185), "162.m02236" = c(410, 8, 1217, 3569), "162.m02241" > = c(1133, 9, 2157), "162.m02255" = c(277, 1133), "162.m02262" = c(1561, > 2283), "162.m02279" = c(392, 1156), "162.m02281" = c(1301, 1143, 1302, > 1297), "162.m02286" = 2403, > > Perhaps this is to do with the annFUN.gene2GO function I used. I am not > sure how to alter the function (if necessary) for my gened IDs. Do the > gene IDs have to ben Entrez IDs or does that not matter? > thanks, > Sebastien > > James W. MacDonald wrote: >> Hi Sebasien, >> >> Sebastien Gerega wrote: >>> Hi, >>> I would like to identify overrepresented GO in my dataset using the >>> topGO package. >>> I am working with a C. neoformans oligo array and have the GO >>> annotation in a list with the following format: >>> >>> $`162.m02116` >>> [1] GO:0009536 >>> 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 >>> GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 >>> >>> $`162.m02150` >>> [1] GO:0009536 >>> 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 >>> GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 >>> >>> $`162.m02156` >>> [1] GO:0005554 GO:0008372 GO:0000004 >>> 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 >>> GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 >>> >>> How do I go about creating the GOdata object? >>> >>> GOdata = new("topGOdata", >>> ontology = "MF", >>> allGenes = selectedList, >>> annot = annFUN.GO2genes, ## the new annotation function >>> GO2genes = myGO2genes) ## the GO to gene ID's dataset >>> >>> Specifically what do I use for the annotation function? It is unclear >>> to me how to write this aspect. >> >> The documentation is a bit unclear, but I believe you have things set >> up correctly. I assume you have done the rest of the analysis and had >> no errors? >> >> Best, >> >> Jim >> >> >>> Am I correct in understanding that I supply my list as the final >>> argument? >>> thanks, >>> Sebastien >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- James W. MacDonald, MS Biostatistician UMCCC cDNA and Affymetrix Core University of Michigan 1500 E Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD REPLY • link 16.9 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Sebastien, I haven't been using topGO for a long time and I also depend on a custom annotation. I think I figured out how to do this, but there might be some mistake that I haven't recognized yet. I think that you are getting this error because the GO2genes is not present on your annotation function. My understanding is that the annotation function is supposed to give you a list where the names of the list are the GO categories and the components of the list are the gene identifiers. For instance, when I use my annotation function I get: > my.go.function("BP",names(geneList),list.annotation) $`GO:0000003` [1] "CK743206" "CK571803" "CO583848" "BQ108196" "CK591451" "CK591731" "CK592215" $`GO:0000022` [1] "CK594625" "CK571614" "CK571618" "CK571711" "CK571828" "BF024115" [7] "BF024122" "BF024255" "BF023849" "BF023957" "BF024087" "BE188450" [13] "CK572914" "CK591204" "CK592173" $`GO:0000045` [1] "CK572330" $`GO:0000077` [1] "CK591657" $`GO:0000079` [1] "BQ108504" The function that I use takes three arguments. The first one is the type of ontology (BP, MF or CC). The second one is a vector containing the identifiers of the genes that I am using. The third one is a list containing the three ontologies and within each ontology another list containing the genes and their GO numbers. For example: > head(names(geneList)) [1] "CK570393" "CK570476" "CK570477" "CK570482" "CK570483" "CK570484" > names(list.annotation) [1] "BP" "MF" "CC" > list.annotation$BP[1:3] $CK570482 [1] "GO:0008088" $CK570483 [1] "GO:0006412" "GO:0043039" $CK570484 [1] "GO:0016310" "GO:0006096" "GO:0001666" And the code for the function that I use is: my.go.function <- function(selected.ontology=c("BP","MF","CC"),the.genes,go.source) { if(class(go.source)!="list"&length(go.source)!=3) { stop("The go.source should be a list with the three GOs in it") } if(class(the.genes)!="character") { stop("The the.genes should be a vectors of gene names") } if(selected.ontology=="BP") { go.source <- go.source[[1]] } if(selected.ontology=="MF") { go.source <- go.source[[2]] } if(selected.ontology=="CC") { go.source <- go.source[[3]] } selected.genes <- unlist(sapply(the.genes,function(x){grep(x,names(go.source))})) genes.to.use <- go.source[selected.genes] genes.to.use.df <- as.data.frame(cbind(rep(names(genes.to.use),lapply(genes.to.use,length )),as.vector(unlist(genes.to.use)))) genes.to.use.df[,1] <- as.character(genes.to.use.df[,1]) genes.to.use.df[,2] <- as.factor(genes.to.use.df[,2]) return(split(genes.to.use.df[,1],genes.to.use.df[,2])) } I hope this helps. All the best, Artur Veloso On Thu, May 29, 2008 at 8:40 PM, Sebastien Gerega <seb@gerega.net> wrote: > Hi, > actually after attempting with what I thoguht were the correct parameters I > obtained the following error: > > Building most specific GOs .....Error in annotationFun(ontology, > .Object@allGenes, ...) : > unused argument(s) (GO2genes = list("162.m02116" = 2403, "162.m02150" = > 2403, "162.m02156" = c(1133, 2157, 9), "162.m02171" = 1199, "162.m02176" = > c(2157, 1133, 9), "162.m02195" = 9, "162.m02197" = c(1199, 2403), > "162.m02200" = c(1501, 1114, 3855, 1161, 1156, 3390, 1195), "162.m02222" = > c(208, 409, 1185), "162.m02236" = c(410, 8, 1217, 3569), "162.m02241" = > c(1133, 9, 2157), "162.m02255" = c(277, 1133), "162.m02262" = c(1561, 2283), > "162.m02279" = c(392, 1156), "162.m02281" = c(1301, 1143, 1302, 1297), > "162.m02286" = 2403, > > Perhaps this is to do with the annFUN.gene2GO function I used. I am not > sure how to alter the function (if necessary) for my gened IDs. Do the gene > IDs have to ben Entrez IDs or does that not matter? > thanks, > Sebastien > > > James W. MacDonald wrote: > >> Hi Sebasien, >> >> Sebastien Gerega wrote: >> >>> Hi, >>> I would like to identify overrepresented GO in my dataset using the topGO >>> package. >>> I am working with a C. neoformans oligo array and have the GO annotation >>> in a list with the following format: >>> >>> $`162.m02116` >>> [1] GO:0009536 >>> 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 >>> GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 >>> >>> $`162.m02150` >>> [1] GO:0009536 >>> 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 >>> GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 >>> >>> $`162.m02156` >>> [1] GO:0005554 GO:0008372 GO:0000004 >>> 4026 Levels: GO:0 GO:00 GO:000 GO:0000 GO:00000 GO:000000 GO:0000001 >>> GO:0000002 GO:0000004 GO:0000006 GO:0000009 GO:0000010 ... GO:0051861 >>> >>> How do I go about creating the GOdata object? >>> >>> GOdata = new("topGOdata", >>> ontology = "MF", >>> allGenes = selectedList, >>> annot = annFUN.GO2genes, ## the new annotation function >>> GO2genes = myGO2genes) ## the GO to gene ID's dataset >>> >>> Specifically what do I use for the annotation function? It is unclear to >>> me how to write this aspect. >>> >> >> The documentation is a bit unclear, but I believe you have things set up >> correctly. I assume you have done the rest of the analysis and had no >> errors? >> >> Best, >> >> Jim >> >> >> Am I correct in understanding that I supply my list as the final >>> argument? >>> thanks, >>> Sebastien >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 16.9 years ago Artur Veloso ▴ 340

Login before adding your answer.