GOstats minus IEA?
1
1
Entering edit mode
Loren Engrav ★ 1.0k
@loren-engrav-2040
Last seen 10.3 years ago
GO.db and org.Hs.egGO2EG and manipulating content thereof was discussed a month or so ago and is kind of complicated. Is it possible to run GOstats and exclude IEA evidence without serious custom work? I searched gmane.science.biology.informatics.conductor and the 4 GOstats pdfs and did not hit upon anything.
GOstats GOstats • 1.4k views
ADD COMMENT
1
Entering edit mode
@vincent-j-carey-jr-4
Last seen 3 months ago
United States
There does not seem to be a direct way within the GOstats tools to perform this kind of filtering. However, a help.search("evidence") can find a function called dropECode that addresses this concern if you have the annotate package installed. You would need to use it as you define your gene list and universe to exclude genes that have undesirable evidence profiles. For example, if you run the vignette GOstatsHyperG.Rnw, an object called params will be created. This includes examples of geneIds and universe vectors that are in fact entrez gene IDs. Briefly, to see how dropECode can be used, consider > Sweave("GOstatsHyperG.Rnw") > ids = params@geneIds > gids = mget(ids, org.Hs.egGO) > dgids = lapply(gids, dropECode) > table(sapply(gids,length)) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 12 18 28 17 33 51 59 63 56 44 39 34 24 22 25 19 13 7 7 12 7 6 9 6 5 2 27 28 29 30 31 32 33 34 35 36 37 39 40 41 42 43 44 45 50 54 64 65 72 79 2 1 3 5 3 1 5 2 1 2 1 1 1 4 1 1 1 2 1 1 1 1 1 1 > table(sapply(dgids,length)) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 25 26 27 91 58 77 85 89 53 41 29 20 25 12 15 7 10 1 4 8 4 2 4 3 1 1 1 6 2 30 32 35 36 40 47 2 3 1 3 2 1 This shows that prior to dropECode (which by default drops terms annotated via IEA) there were 12 genes with a single association; subsequent to dropECode, 91 genes had none and 58 had only one. Further exploration indicates that gene 10265 is one that has 11 associations, all of them coded IEA. > sessionInfo() R version 2.12.0 Under development (unstable) (2010-04-16 r51754) x86_64-apple-darwin10.3.0 locale: [1] C attached base packages: [1] grid stats graphics grDevices datasets tools utils [8] methods base other attached packages: [1] Rgraphviz_1.27.0 xtable_1.5-5 RColorBrewer_1.0-2 [4] GOstats_2.13.0 graph_1.25.1 Category_2.13.3 [7] genefilter_1.29.2 annotate_1.25.0 GO.db_2.4.0 [10] hgu95av2.db_2.4.0 org.Hs.eg.db_2.4.0 RSQLite_0.8-4 [13] DBI_0.2-5 AnnotationDbi_1.9.8 ALL_1.4.7 [16] Biobase_2.7.6 weaver_1.13.0 codetools_0.2-2 [19] digest_0.4.1 loaded via a namespace (and not attached): [1] GSEABase_1.9.0 RBGL_1.23.0 XML_2.6-0 splines_2.12.0 [5] survival_2.35-8 On Mon, Apr 26, 2010 at 10:54 AM, Loren Engrav <engrav@u.washington.edu>wrote: > GO.db and org.Hs.egGO2EG and manipulating content thereof was discussed a > month or so ago and is kind of complicated. > > Is it possible to run GOstats and exclude IEA evidence without serious > custom work? > > I searched gmane.science.biology.informatics.conductor and the 4 GOstats > pdfs and did not hit upon anything. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Thank you, looks clever Am working thru it but am stuck > GOstatsentrezUniverse <- unlist(mget(featureNames(GOstats1842v2exprs), hgu133plus2ENTREZID)) > GOstatsentrezSelected <- unlist(mget(featureNames(GOstats153v2exprs), hgu133plus2ENTREZID)) > GOstats_params_BP.001over <- new("GOHyperGParams", geneIds = GOstatsentrezSelected, universeGeneIds = GOstatsentrezUniverse, annotation = "hgu133plus2.db", ontology = "BP", pvalueCutoff = .001, conditional = FALSE, testDirection = "over") Warning messages: 1: In makeValidParams(.Object) : removing duplicate IDs in geneIds 2: In makeValidParams(.Object) : removing duplicate IDs in universeGeneIds > ids <- GOstats_params_BP.001over at geneIds > gids = mget(ids, org.Hs.egGO) Error in .checkKeysAreWellFormed(keys) : keys must be supplied in a character vector with no NAs How do I unstick gids? ========================================= > sessionInfo() R version 2.11.0 (2010-04-22) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid tools stats graphics grDevices utils datasets methods base other attached packages: [1] codetools_0.2-2 genefilter_1.30.0 RColorBrewer_1.0-2 xtable_1.5-6 Rgraphviz_1.26.0 [6] GO.db_2.4.1 hgu133plus2.db_2.4.1 org.Hs.eg.db_2.4.1 annotate_1.26.0 GOstats_2.14.0 [11] RSQLite_0.8-4 DBI_0.2-5 graph_1.26.0 Category_2.14.0 AnnotationDbi_1.10.0 [16] Biobase_2.8.0 loaded via a namespace (and not attached): [1] GSEABase_1.10.0 RBGL_1.24.0 splines_2.11.0 survival_2.35-8 XML_2.8-1 From: Vincent Carey <stvjc@channing.harvard.edu> Date: Mon, 26 Apr 2010 11:52:00 -0400 To: Loren Engrav <engrav at="" u.washington.edu=""> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> Subject: Re: [BioC] GOstats minus IEA? There does not seem to be a direct way within the GOstats tools to perform this kind of filtering.? However, a help.search("evidence") can find a function called dropECode that addresses this concern if you have the annotate package installed. You would need to use it as you define your gene list and universe to exclude genes that have undesirable evidence profiles.? For example, if you run the vignette GOstatsHyperG.Rnw, an object called params will be created.? This includes examples of geneIds and universe vectors that are in fact entrez gene IDs. Briefly, to see how dropECode can be used, consider > Sweave("GOstatsHyperG.Rnw") > ids = params at geneIds > gids = mget(ids, org.Hs.egGO) > dgids = lapply(gids, dropECode) > table(sapply(gids,length)) ?1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 12 18 28 17 33 51 59 63 56 44 39 34 24 22 25 19 13? 7? 7 12? 7? 6? 9? 6? 5? 2 27 28 29 30 31 32 33 34 35 36 37 39 40 41 42 43 44 45 50 54 64 65 72 79 ?2? 1? 3? 5? 3? 1? 5? 2? 1? 2? 1? 1? 1? 4? 1? 1? 1? 2? 1? 1? 1? 1? 1? 1 > table(sapply(dgids,length)) ?0? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15 16 17 18 19 20 21 22 25 26 27 91 58 77 85 89 53 41 29 20 25 12 15? 7 10? 1? 4? 8? 4? 2? 4? 3? 1? 1? 1? 6? 2 30 32 35 36 40 47 ?2? 3? 1? 3? 2? 1 This shows that prior to dropECode (which by default drops terms annotated via IEA) there were 12 genes with a single association; subsequent to dropECode, 91 genes had none and 58 had only one.? Further exploration indicates that gene 10265 is one that has 11 associations, all of them coded IEA. > sessionInfo() R version 2.12.0 Under development (unstable) (2010-04-16 r51754) x86_64-apple-darwin10.3.0 locale: [1] C attached base packages: [1] grid????? stats???? graphics? grDevices datasets? tools???? utils??? [8] methods?? base???? other attached packages: ?[1] Rgraphviz_1.27.0??? xtable_1.5-5??????? RColorBrewer_1.0-2 ?[4] GOstats_2.13.0????? graph_1.25.1??????? Category_2.13.3??? ?[7] genefilter_1.29.2?? annotate_1.25.0???? GO.db_2.4.0??????? [10] hgu95av2.db_2.4.0?? org.Hs.eg.db_2.4.0? RSQLite_0.8-4????? [13] DBI_0.2-5?????????? AnnotationDbi_1.9.8 ALL_1.4.7????????? [16] Biobase_2.7.6?????? weaver_1.13.0?????? codetools_0.2-2??? [19] digest_0.4.1?????? loaded via a namespace (and not attached): [1] GSEABase_1.9.0? RBGL_1.23.0???? XML_2.6-0?????? splines_2.12.0 [5] survival_2.35-8 On Mon, Apr 26, 2010 at 10:54 AM, Loren Engrav <engrav at="" u.washington.edu=""> wrote: > GO.db and org.Hs.egGO2EG and manipulating content thereof was discussed a > month or so ago and is kind of complicated. > > Is it possible to run GOstats and exclude IEA evidence without serious > custom work? > > I searched gmane.science.biology.informatics.conductor and the 4 GOstats > pdfs and did not hit upon anything. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On Mon, Apr 26, 2010 at 9:26 PM, Loren Engrav <engrav@u.washington.edu>wrote: > Thank you, looks clever > Am working thru it but am stuck > > > GOstatsentrezUniverse <- unlist(mget(featureNames(GOstats1842v2exprs), > hgu133plus2ENTREZID)) > > GOstatsentrezSelected <- unlist(mget(featureNames(GOstats153v2exprs), > hgu133plus2ENTREZID)) > > GOstats_params_BP.001over <- new("GOHyperGParams", geneIds = > GOstatsentrezSelected, universeGeneIds = GOstatsentrezUniverse, annotation > = > "hgu133plus2.db", ontology = "BP", pvalueCutoff = .001, conditional = > FALSE, > testDirection = "over") > Warning messages: > 1: In makeValidParams(.Object) : removing duplicate IDs in geneIds > 2: In makeValidParams(.Object) : removing duplicate IDs in universeGeneIds > > ids <- GOstats_params_BP.001over@geneIds > > gids = mget(ids, org.Hs.egGO) > Error in .checkKeysAreWellFormed(keys) : > keys must be supplied in a character vector with no NAs > > try anyis.na(ids)) -- if this is TRUE you will need to do something like mget(na.omit(ids), ...) if it is not TRUE then you will have to send some exemplars from gids for diagnosis > How do I unstick gids? > > ========================================= > > > > sessionInfo() > R version 2.11.0 (2010-04-22) > x86_64-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] grid tools stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] codetools_0.2-2 genefilter_1.30.0 RColorBrewer_1.0-2 > xtable_1.5-6 Rgraphviz_1.26.0 > [6] GO.db_2.4.1 hgu133plus2.db_2.4.1 org.Hs.eg.db_2.4.1 > annotate_1.26.0 GOstats_2.14.0 > [11] RSQLite_0.8-4 DBI_0.2-5 graph_1.26.0 > Category_2.14.0 AnnotationDbi_1.10.0 > [16] Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] GSEABase_1.10.0 RBGL_1.24.0 splines_2.11.0 survival_2.35-8 > XML_2.8-1 > > > > From: Vincent Carey <stvjc@channing.harvard.edu> > Date: Mon, 26 Apr 2010 11:52:00 -0400 > To: Loren Engrav <engrav@u.washington.edu> > Cc: "bioconductor@stat.math.ethz.ch" <bioconductor@stat.math.ethz.ch> > Subject: Re: [BioC] GOstats minus IEA? > > There does not seem to be a direct way within the GOstats tools to perform > this kind of filtering. However, a help.search("evidence") can find a > function called dropECode that addresses this concern if you have the > annotate package installed. > > You would need to use it as you define your gene list and universe to > exclude genes that have undesirable evidence profiles. For example, if you > run the vignette GOstatsHyperG.Rnw, an object called params will be > created. This includes examples of geneIds and universe vectors that are > in > fact entrez gene IDs. > > Briefly, to see how dropECode can be used, consider > > > Sweave("GOstatsHyperG.Rnw") > > ids = params@geneIds > > gids = mget(ids, org.Hs.egGO) > > dgids = lapply(gids, dropECode) > > table(sapply(gids,length)) > > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > 26 > 12 18 28 17 33 51 59 63 56 44 39 34 24 22 25 19 13 7 7 12 7 6 9 6 5 > 2 > 27 28 29 30 31 32 33 34 35 36 37 39 40 41 42 43 44 45 50 54 64 65 72 79 > 2 1 3 5 3 1 5 2 1 2 1 1 1 4 1 1 1 2 1 1 1 1 1 1 > > > table(sapply(dgids,length)) > 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 25 26 > 27 > 91 58 77 85 89 53 41 29 20 25 12 15 7 10 1 4 8 4 2 4 3 1 1 1 6 > 2 > 30 32 35 36 40 47 > 2 3 1 3 2 1 > > This shows that prior to dropECode (which by default drops terms annotated > via IEA) there were 12 genes with a single association; subsequent to > dropECode, 91 genes had none and 58 had only one. Further exploration > indicates that gene 10265 is one that has 11 associations, all of them > coded > IEA. > > > sessionInfo() > R version 2.12.0 Under development (unstable) (2010-04-16 r51754) > x86_64-apple-darwin10.3.0 > > locale: > [1] C > > attached base packages: > [1] grid stats graphics grDevices datasets tools utils > [8] methods base > > other attached packages: > [1] Rgraphviz_1.27.0 xtable_1.5-5 RColorBrewer_1.0-2 > [4] GOstats_2.13.0 graph_1.25.1 Category_2.13.3 > [7] genefilter_1.29.2 annotate_1.25.0 GO.db_2.4.0 > [10] hgu95av2.db_2.4.0 org.Hs.eg.db_2.4.0 RSQLite_0.8-4 > [13] DBI_0.2-5 AnnotationDbi_1.9.8 ALL_1.4.7 > [16] Biobase_2.7.6 weaver_1.13.0 codetools_0.2-2 > [19] digest_0.4.1 > > loaded via a namespace (and not attached): > [1] GSEABase_1.9.0 RBGL_1.23.0 XML_2.6-0 splines_2.12.0 > [5] survival_2.35-8 > > > On Mon, Apr 26, 2010 at 10:54 AM, Loren Engrav <engrav@u.washington.edu> > wrote: > > GO.db and org.Hs.egGO2EG and manipulating content thereof was discussed a > > month or so ago and is kind of complicated. > > > > Is it possible to run GOstats and exclude IEA evidence without serious > > custom work? > > > > I searched gmane.science.biology.informatics.conductor and the 4 GOstats > > pdfs and did not hit upon anything. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Thank you I gave GOstats the entrez IDs directly and that solved the NA problem Somehow extracting them from hgu133plus2ENTREZID was problematic with funny NAs Then used your code to fish out non-IEA So it works, thank you From: Vincent Carey <stvjc@channing.harvard.edu> Date: Mon, 26 Apr 2010 21:55:12 -0400 To: Loren Engrav <engrav at="" u.washington.edu=""> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> Subject: Re: [BioC] GOstats minus IEA? On Mon, Apr 26, 2010 at 9:26 PM, Loren Engrav <engrav at="" u.washington.edu=""> wrote: > Thank you, looks clever > Am working thru it but am stuck > >> GOstatsentrezUniverse <- unlist(mget(featureNames(GOstats1842v2exprs), > hgu133plus2ENTREZID)) >> GOstatsentrezSelected <- unlist(mget(featureNames(GOstats153v2exprs), > hgu133plus2ENTREZID)) >> GOstats_params_BP.001over <- new("GOHyperGParams", geneIds = > GOstatsentrezSelected, universeGeneIds = GOstatsentrezUniverse, annotation = > "hgu133plus2.db", ontology = "BP", pvalueCutoff = .001, conditional = FALSE, > testDirection = "over") > Warning messages: > 1: In makeValidParams(.Object) : removing duplicate IDs in geneIds > 2: In makeValidParams(.Object) : removing duplicate IDs in universeGeneIds >> ids <- GOstats_params_BP.001over at geneIds >> gids = mget(ids, org.Hs.egGO) > Error in .checkKeysAreWellFormed(keys) : > ?keys must be supplied in a character vector with no NAs > try anyis.na <http: is.na=""> (ids)) -- if this is TRUE you will need to do something like mget(na.omit(ids), ...) if it is not TRUE then you will have to send some exemplars from gids for diagnosis ? > How do I unstick gids? > > ========================================= > > >> sessionInfo() > R version 2.11.0 (2010-04-22) > x86_64-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] grid ? ? ?tools ? ? stats ? ? graphics ?grDevices utils ? ? datasets > methods ? base > > other attached packages: > ?[1] codetools_0.2-2 ? ? ?genefilter_1.30.0 ? ?RColorBrewer_1.0-2 > xtable_1.5-6 ? ? ? ? Rgraphviz_1.26.0 > ?[6] GO.db_2.4.1 ? ? ? ? ?hgu133plus2.db_2.4.1 org.Hs.eg.db_2.4.1 > annotate_1.26.0 ? ? ?GOstats_2.14.0 > [11] RSQLite_0.8-4 ? ? ? ?DBI_0.2-5 ? ? ? ? ? ?graph_1.26.0 > Category_2.14.0 ? ? ?AnnotationDbi_1.10.0 > [16] Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] GSEABase_1.10.0 RBGL_1.24.0 ? ? splines_2.11.0 ?survival_2.35-8 > XML_2.8-1 > > > > From: Vincent Carey <stvjc at="" channing.harvard.edu=""> > Date: Mon, 26 Apr 2010 11:52:00 -0400 > To: Loren Engrav <engrav at="" u.washington.edu=""> > Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] GOstats minus IEA? > > There does not seem to be a direct way within the GOstats tools to perform > this kind of filtering.? However, a help.search("evidence") can find a > function called dropECode that addresses this concern if you have the > annotate package installed. > > You would need to use it as you define your gene list and universe to > exclude genes that have undesirable evidence profiles.? For example, if you > run the vignette GOstatsHyperG.Rnw, an object called params will be > created.? This includes examples of geneIds and universe vectors that are in > fact entrez gene IDs. > > Briefly, to see how dropECode can be used, consider > >> Sweave("GOstatsHyperG.Rnw") >> ids = params at geneIds >> gids = mget(ids, org.Hs.egGO) >> dgids = lapply(gids, dropECode) >> table(sapply(gids,length)) > > ?1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > 26 > 12 18 28 17 33 51 59 63 56 44 39 34 24 22 25 19 13? 7? 7 12? 7? 6? 9? 6? 5? > 2 > 27 28 29 30 31 32 33 34 35 36 37 39 40 41 42 43 44 45 50 54 64 65 72 79 > ?2? 1? 3? 5? 3? 1? 5? 2? 1? 2? 1? 1? 1? 4? 1? 1? 1? 2? 1? 1? 1? 1? 1? 1 > >> table(sapply(dgids,length)) > ?0? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15 16 17 18 19 20 21 22 25 26 > 27 > 91 58 77 85 89 53 41 29 20 25 12 15? 7 10? 1? 4? 8? 4? 2? 4? 3? 1? 1? 1? 6? > 2 > 30 32 35 36 40 47 > ?2? 3? 1? 3? 2? 1 > > This shows that prior to dropECode (which by default drops terms annotated > via IEA) there were 12 genes with a single association; subsequent to > dropECode, 91 genes had none and 58 had only one.? Further exploration > indicates that gene 10265 is one that has 11 associations, all of them coded > IEA. > >> sessionInfo() > R version 2.12.0 Under development (unstable) (2010-04-16 r51754) > x86_64-apple-darwin10.3.0 > > locale: > [1] C > > attached base packages: > [1] grid????? stats???? graphics? grDevices datasets? tools???? utils??? > [8] methods?? base???? > > other attached packages: > ?[1] Rgraphviz_1.27.0??? xtable_1.5-5??????? RColorBrewer_1.0-2 > ?[4] GOstats_2.13.0????? graph_1.25.1??????? Category_2.13.3??? > ?[7] genefilter_1.29.2?? annotate_1.25.0???? GO.db_2.4.0??????? > [10] hgu95av2.db_2.4.0?? org.Hs.eg.db_2.4.0? RSQLite_0.8-4????? > [13] DBI_0.2-5?????????? AnnotationDbi_1.9.8 ALL_1.4.7????????? > [16] Biobase_2.7.6?????? weaver_1.13.0?????? codetools_0.2-2??? > [19] digest_0.4.1?????? > > loaded via a namespace (and not attached): > [1] GSEABase_1.9.0? RBGL_1.23.0???? XML_2.6-0?????? splines_2.12.0 > [5] survival_2.35-8 > > > On Mon, Apr 26, 2010 at 10:54 AM, Loren Engrav <engrav at="" u.washington.edu=""> > wrote: >> GO.db and org.Hs.egGO2EG and manipulating content thereof was discussed a >> month or so ago and is kind of complicated. >> >> Is it possible to run GOstats and exclude IEA evidence without serious >> custom work? >> >> I searched gmane.science.biology.informatics.conductor and the 4 GOstats >> pdfs and did not hit upon anything. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 651 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6