PostForm() with KEGG
0
0
Entering edit mode
Voke AO ▴ 760
@voke-ao-4830
Last seen 10.3 years ago
Hi Duncan and Martin, My bad, no bug whatsoever...it was me. Got my code sorted for the most part. Thanks again for all the help. It's much appreciated. -Avoks On Wed, Feb 29, 2012 at 12:19 PM, Ovokeraye Achinike-Oduaran <ovokeraye at="" gmail.com=""> wrote: > Hi Morgan, > > Thanks. I think there's possibly a bug with the > getHTMLFormDescription() but I do understand what you've explained. > > Thanks again. > > > -Avoks > > On Tue, Feb 28, 2012 at 6:19 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >> On 02/28/2012 06:14 AM, Ovokeraye Achinike-Oduaran wrote: >>> >>> Hi Duncan, >>> >>> My understanding is that xpathSApply() combines both the geneSetNode() >>> and the sapply(). I hope that this is a correct assumption. In >>> attempting to retrieve nodes in general from the pathway, I used ?both >>> >>> xpathSApply(doc, "//li/node()", ?xmlGetAttr, "href") >>> and >>> xpathSApply(doc, "//li/a/node()", ?xmlGetAttr, "href") >>> >>> and the I get nothing (null) back even though no visible error pops >>> up. I something wrong with the way I'm using the path or do I just not >>> yet grasp the whole XPath concept (I did read the online tutorial)? >> >> >> the NULL means that no nodes match your xpath query. >> >> >>> >>> Sorry to drag this on, but please help. >> >> >> I used Duncan's RHTMLForms suggestion >> >> ?library(RHTMLForms) >> ?url = "http://www.genome.jp/kegg/tool/map_pathway1.html" >> ?u = "http://www.genome.jp/kegg-bin/search_pathway_object" >> ?ff = getHTMLFormDescription(url) >> >> ?fun = createFunction(ff[[1]]) >> ?txt = fun(unclassified = "ko:K01803 cpd:C00111 cpd:C00118 K00134 C00236", >> target = "alias", .url = u) >> >> to retrieve the text and then >> >> ?library(XML) >> ?xml = htmlTreeParse(txt, asText=TRUE, useInternalNodes=TRUE) >> >> to parse to xml (maybe there is a more direct way, using the reader argument >> to createFunction?). If I experiment a little, I see for instance that >> >> ?getNodeSet(xml, "//li/a") >> >> returns the 'li' elements with nested 'a' elements, and >> >> ?getNodeSet(xml, "//li/a[@target]") >> >> returns the subset of those elements that have a 'target' attribute. Finally >> >>> head(xpathSApply(xml, "//li/a[@target]", xmlValue)) >> [1] "ko00010 Glycolysis / Gluconeogenesis" >> [2] "ko01100 Metabolic pathways" >> [3] "ko01110 Biosynthesis of secondary metabolites" >> [4] "ko01120 Microbial metabolism in diverse environments" >> [5] "ko00710 Carbon fixation in photosynthetic organisms" >> [6] "ko00562 Inositol phosphate metabolism" >> >> seems to be about what you want, or >> >> >> head(xpathSApply(xml, "//li/a/@href")) >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?href >> "/kegg-bin/show_pathway?13304448561022/ko00010.args" >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?href >> ? ? ? ? ? ? ? ? ? ? "javascript:display('ko00010')" >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?href >> "/kegg-bin/show_pathway?13304448561022/ko01100.args" >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?href >> ? ? ? ? ? ? ? ? ? ? "javascript:display('ko01100')" >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?href >> "/kegg-bin/show_pathway?13304448561022/ko01110.args" >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?href >> ? ? ? ? ? ? ? ? ? ? "javascript:display('ko01110')" >> >> Maybe the KEGGSOAP package already does what you're interested in? The web >> scraping you're doing is going to break as soon as the web site tweaks its >> presentation. >> >> Or maybe >> >>> library(org.Hs.eg.db) >>> head(toTable(revmap(org.Hs.egPATH)[c("00232", "04142")])) >> ?gene_id path_id >> 1 ? ? ? 9 ? 00232 >> 2 ? ? ?10 ? 00232 >> 3 ? ? ?20 ? 04142 >> 4 ? ? ?53 ? 04142 >> 5 ? ? ?54 ? 04142 >> 6 ? ? 162 ? 04142 >> >> The KEGG information in the org.* and KEGG packages dates to the last free >> public release, and so are starting to be dated). >> >> Martin >> >> >>> >>> Thanks. >>> >>> Avoks >>> >>> On Mon, Feb 27, 2012 at 4:09 PM, Ovokeraye Achinike-Oduaran >>> <ovokeraye at="" gmail.com=""> ?wrote: >>>> >>>> Thank you so very much, Duncan. I will go get myself enlightened:). >>>> Thanks again. >>>> >>>> Avoks >>>> >>>> On Mon, Feb 27, 2012 at 3:50 PM, Duncan Temple Lang >>>> <duncan at="" wald.ucdavis.edu=""> ?wrote: >>>>> >>>>> >>>>> Use >>>>> >>>>> ? target = "alias" >>>>> >>>>> in the call. >>>>> >>>>> If you don't know how to map form elements to parameters in the request, >>>>> you >>>>> can either read ?a tutorial on HTML forms, or alternatively, use >>>>> the RHTMLForms package which you have loaded according to your search >>>>> path, e.g. >>>>> >>>>> ?# read the form ?and then turn the information into an R function. >>>>> ff = >>>>> getHTMLFormDescription("http://www.genome.jp/kegg/tool/map_pathw ay1.html") >>>>> fun = createFunction(ff[[1]]) >>>>> >>>>> ?# Since the action in the form is javascript, we'll provide the >>>>> ?# URL manually. >>>>> u = "http://www.genome.jp/kegg-bin/search_pathway_object" >>>>> out = fun(unclassified = "ko:K01803 cpd:C00111 cpd:C00118 K00134 >>>>> C00236", >>>>> ? ? ? ? ?target = "alias", .url = u) >>>>> >>>>> The benefits of the RHTMLForms include using the same defaults >>>>> as the form on the Web page, adding hidden parameters, identifying >>>>> the names of the parameters. >>>>> >>>>> ? D >>>>> >>>>> >>>>> On 2/27/12 3:08 AM, Ovokeraye Achinike-Oduaran wrote: >>>>>> >>>>>> Hi Duncan, >>>>>> >>>>>> I noticed that with the script as is, it doesn't take into >>>>>> consideration the "include alias" checkbox. I tried modifying the >>>>>> script to force include that option but it still did not work. Any >>>>>> ideas? >>>>>> >>>>>> u = "http://www.genome.jp/kegg-bin/search_pathway_object" >>>>>> data = postForm(u, >>>>>> ? ? ? ? ? ? ? ?.params = list(org_name = "hsadd", >>>>>> ? ? ? ? ? ? ? ?unclassified = paste(readLines(file.choose()), collapse >>>>>> = "\n"), >>>>>> ? ? ? ? ? ? ? ?file = "", checkbox = "alias", submit = "Exec")) >>>>>> >>>>>> >>>>>> Thanks again. >>>>>> >>>>>> Avoks >>>>>> >>>>>> >>>>>> On Mon, Feb 27, 2012 at 10:24 AM, Ovokeraye Achinike-Oduaran >>>>>> <ovokeraye at="" gmail.com=""> ?wrote: >>>>>>> >>>>>>> Hi Duncan, >>>>>>> >>>>>>> Thanks a bunch. >>>>>>> >>>>>>> -Avoks >>>>>>> >>>>>>> On Fri, Feb 24, 2012 at 11:09 PM, Duncan Temple Lang >>>>>>> <duncan at="" wald.ucdavis.edu=""> ?wrote: >>>>>>>> >>>>>>>> Hi Avoks >>>>>>>> >>>>>>>> While the form is provided by KEGG and so bio-relatd, >>>>>>>> you might have been better posting this to the more general r-help >>>>>>>> mailing list. >>>>>>>> >>>>>>>> >>>>>>>> You are posting the HTTP request to the wrong URL. That is the URL >>>>>>>> of the Web page that displays the form, not the URL that processes >>>>>>>> the input from the form. >>>>>>>> You have to look at the JavaScript that is referenced in the action >>>>>>>> attribute of the HTML form element. >>>>>>>> >>>>>>>> The second issue is that you are submitting the name of a local file. >>>>>>>> This won't work as is. ?You either need to identify this is the name >>>>>>>> of a file and not the contents >>>>>>>> of the file to send, or else send the contents. ?In this form, you >>>>>>>> can send the >>>>>>>> contents via the the unclassified parameter. >>>>>>>> >>>>>>>> >>>>>>>> u = "http://www.genome.jp/kegg-bin/search_pathway_object" >>>>>>>> data = postForm(u, >>>>>>>> ? ? ? ? ? ? ? ?.params = list(org_name = "hsadd", >>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unclassified = "hsa:7167 hsa:GPI >>>>>>>> cpd:C00118\nALDOA 1.2.1.12 C00236", >>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? file = "", submit = "Exec")) >>>>>>>> >>>>>>>> >>>>>>>> If your input is in a file, you can use >>>>>>>> >>>>>>>> ?unclassified = paste(readLines(file.choose()), collapse = "\n") >>>>>>>> >>>>>>>> as the value for the unclassified parameter. >>>>>>>> >>>>>>>> >>>>>>>> There are additional parameters that the form accepts that may be >>>>>>>> relevant for your search. >>>>>>>> >>>>>>>> >>>>>>>> As for processing the results, you will want to use >>>>>>>> >>>>>>>> ?doc = htmlParse(data, asText = TRUE) >>>>>>>> >>>>>>>> and then use getNodeSet()/xpathSApply() or direct tree extraction to >>>>>>>> access the nodes you want, e.g. >>>>>>>> >>>>>>>> ?xpathSApply(doc, "//li/a", ?xmlGetAttr, "href") >>>>>>>> >>>>>>>> >>>>>>>> ?D. >>>>>>>> >>>>>>>> >>>>>>>> On 2/24/12 6:09 AM, Ovokeraye Achinike-Oduaran wrote: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I am trying to use postForm() with the KEGG website but I am stuck >>>>>>>>> on >>>>>>>>> how to get my results. Is it possible (code below) or am I using >>>>>>>>> postForm() wrongly? The code appears to run but I'm not quite sure >>>>>>>>> how >>>>>>>>> to read the results assuming there are any. Please help. >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> Avoks >>>>>>>>> ____ >>>>>>>>> >>>>>>>>> data = postForm("http://www.genome.jp/kegg/tool/map_pathway1.html", >>>>>>>>> org_name = "hsadd", >>>>>>>>> file = file.choose(), >>>>>>>>> submit = "Exec") >>>>>>>>> >>>>>>>>>> sessionInfo() >>>>>>>>> >>>>>>>>> R version 2.14.1 (2011-12-22) >>>>>>>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>>>>>>> >>>>>>>>> locale: >>>>>>>>> [1] LC_COLLATE=English_xxx.1252 ?LC_CTYPE=English_xxx.1252 >>>>>>>>> [3] LC_MONETARY=English_xxx.1252 LC_NUMERIC=C >>>>>>>>> [5] LC_TIME=English_xxx.1252 >>>>>>>>> >>>>>>>>> attached base packages: >>>>>>>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >>>>>>>>> >>>>>>>>> other attached packages: >>>>>>>>> [1] RHTMLForms_0.5-1 XML_3.9-4.1 ? ? ?RCurl_1.91-1.1 >>>>>>>>> bitops_1.0-4.1 >>>>>>>>> >>>>>>>>> loaded via a namespace (and not attached): >>>>>>>>> [1] tools_2.14.1 >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioconductor mailing list >>>>>>>>> Bioconductor at r-project.org >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>> Search the archives: >>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioconductor mailing list >>>>>>>> Bioconductor at r-project.org >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>> Search the archives: >>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> -- >> Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >> >> Location: M1-B861 >> Telephone: 206 667-2793
GO Cancer KEGGSOAP GO Cancer KEGGSOAP • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6