"Special" characters in URI
2
0
Entering edit mode
@gorjanc-gregor-1198
Last seen 10.4 years ago
Hello! I am crossposting this to R-help and BioC, since it is relevant to both groups. I wrote a wrapper for Entrez search utility (link for this is provided bellow), which can add some new search functionality to existing code in Bioconductor's package 'annotate'*. http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html Entrez search utuility returns a XML document but I have a problem to use URI to retrieve that file, since URI can also contain characters, which should not be there according to http://www.faqs.org/rfcs/rfc2396.html I encountered problems with "[" and "]" as well as with space characters. However there might also be a problem with others i.e. reserved characters in URI syntax. My R example is: R> library("annotate") Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, simply type: openVignette() For details on reading vignettes, see the openVignette help page. R> library(XML) R> tmp$term <- "gorjanc g[au]" R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fc gi?term=gorjanc g[au]" R> tmp $term [1] "gorjanc g[au]" $URL [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=go rjanc g[au]" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au] # so I have a problem with space and [ and ] # let's reduce a problem to just space or [] to be sure R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fc gi?term=gorjanc g" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fc gi?term=gorjanc[au]" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : error in creating parser for http://eutils.ncbi.nlm.nih.gov/en trez/eutils/esearch.fcgi?term=gorjanc[au] # now show that it works fine without special chars R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fc gi?term=gorjanc" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) $doc $file [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=go rjanc" $version [1] "1.0" $children ... # now show a workaround for space tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? term=gorjanc%20g" xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fc gi?term=gorjanc%20g" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) $doc $file [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=go rjanc%20g" $version [1] "1.0" $children ... As can be seen from above there is a possibility to handle this special characters and I wonder if this has already been done somewhere? If not I thought on a function fixURLchar, which would replace reserved characters with ther escaped sequences. Any comments, pointers, ... ? from = c(" ", "\"", ",", "#"), to = c("%20", "%22", "%2c", "%23")) *When I'll solve problem I will send my code to 'annotate' maintainer and he can include it at his will in a package. Lep pozdrav / With regards, Gregor Gorjanc ---------------------------------------------------------------------- University of Ljubljana Biotechnical Faculty URI: http://www.bfro.uni-lj.si/MR/ggorjan Zootechnical Department mail: gregor.gorjanc <at> bfro.uni-lj.si Groblje 3 tel: +386 (0)1 72 17 861 SI-1230 Domzale fax: +386 (0)1 72 17 888 Slovenia, Europe ---------------------------------------------------------------------- "One must learn by doing the thing; for though you think you know it, you have no certainty until you try." Sophocles ~ 450 B.C.
• 1.0k views
ADD COMMENT
0
Entering edit mode
Francois Pepin ★ 1.3k
@francois-pepin-1012
Last seen 10.4 years ago
There are safe ways of encoding URLs that contain funny characters: (space) %20 [ %5B ] %5D so your url would be: URL<-'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=g orjanc%20g%5Bau%5D' That makes your snippet work just fine. http://www.macromedia.com/cfusion/knowledgebase/index.cfm?id=tn_14143 has the list. Francois On Mon, 2005-05-02 at 19:46, Gorjanc Gregor wrote: > Hello! > > I am crossposting this to R-help and BioC, since it is relevant to both > groups. > > I wrote a wrapper for Entrez search utility (link for this is provided bellow), > which can add some new search functionality to existing code in Bioconductor's > package 'annotate'*. > > http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html > > Entrez search utuility returns a XML document but I have a problem to > use URI to retrieve that file, since URI can also contain characters, > which should not be there according to > > http://www.faqs.org/rfcs/rfc2396.html > > I encountered problems with "[" and "]" as well as with space characters. > However there might also be a problem with others i.e. reserved characters > in URI syntax. > > My R example is: > > R> library("annotate") > Loading required package: Biobase > Loading required package: tools > Welcome to Bioconductor > Vignettes contain introductory material. To view, > simply type: openVignette() > For details on reading vignettes, see > the openVignette help page. > R> library(XML) > R> tmp$term <- "gorjanc g[au]" > R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch. fcgi?term=gorjanc g[au]" > R> tmp > $term > [1] "gorjanc g[au]" > > $URL > [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term= gorjanc g[au]" > R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) > Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : > error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au] > > # so I have a problem with space and [ and ] > # let's reduce a problem to just space or [] to be sure > R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch. fcgi?term=gorjanc g" > R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) > Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : > error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g > R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch. fcgi?term=gorjanc[au]" > R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) > Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : > error in creating parser for http://eutils.ncbi.nlm.nih.gov/ entrez/eutils/esearch.fcgi?term=gorjanc[au] > > # now show that it works fine without special chars > R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch. fcgi?term=gorjanc" > R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) > $doc > $file > [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term= gorjanc" > > $version > [1] "1.0" > > $children > ... > > # now show a workaround for space > tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcg i?term=gorjanc%20g" > xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) > R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch. fcgi?term=gorjanc%20g" > R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) > $doc > $file > [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term= gorjanc%20g" > > $version > [1] "1.0" > > $children > ... > > As can be seen from above there is a possibility to handle this special > characters and I wonder if this has already been done somewhere? If not > I thought on a function fixURLchar, which would replace reserved characters > with ther escaped sequences. Any comments, pointers, ... ? > > from = c(" ", "\"", ",", "#"), > to = c("%20", "%22", "%2c", "%23")) > > *When I'll solve problem I will send my code to 'annotate' maintainer > and he can include it at his will in a package. > > Lep pozdrav / With regards, > Gregor Gorjanc > > ---------------------------------------------------------------------- > University of Ljubljana > Biotechnical Faculty URI: http://www.bfro.uni- lj.si/MR/ggorjan > Zootechnical Department mail: gregor.gorjanc <at> bfro.uni-lj.si > Groblje 3 tel: +386 (0)1 72 17 861 > SI-1230 Domzale fax: +386 (0)1 72 17 888 > Slovenia, Europe > ---------------------------------------------------------------------- > "One must learn by doing the thing; for though you think you know it, > you have no certainty until you try." Sophocles ~ 450 B.C. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT
0
Entering edit mode
@gorjanc-gregor-1198
Last seen 10.4 years ago
> -----Original Message----- > From: Francois Pepin [mailto:fpepin@cs.mcgill.ca] > Sent: tor 2005-05-03 17:24 > To: Gorjanc Gregor > Cc: r-help@stat.math.ethz.ch; bioconductor@stat.math.ethz.ch > Subject: Re: [BioC] "Special" characters in URI > > There are safe ways of encoding URLs that contain funny characters: > (space) %20 > [ %5B > ] %5D > > so your url would be: > > URL<-'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term =gorjanc%20g%5Bau%5D' > > That makes your snippet work just fine. > > http://www.macromedia.com/cfusion/knowledgebase/index.cfm?id=tn_14143 > has the list. > > Francois > On Mon, 2005-05-02 at 19:46, Gorjanc Gregor wrote: >> from = c(" ", "\"", ",", "#"), >> to = c("%20", "%22", "%2c", "%23")) Francois, thank you for this. I know of this solution, however I just wanted to know, how others handle with it and if there is any R utility for that. Thans again! Lep pozdrav / With regards, Gregor Gorjanc ---------------------------------------------------------------------- University of Ljubljana Biotechnical Faculty URI: http://www.bfro.uni-lj.si/MR/ggorjan Zootechnical Department mail: gregor.gorjanc <at> bfro.uni-lj.si Groblje 3 tel: +386 (0)1 72 17 861 SI-1230 Domzale fax: +386 (0)1 72 17 888 Slovenia, Europe ---------------------------------------------------------------------- "One must learn by doing the thing; for though you think you know it, you have no certainty until you try." Sophocles ~ 450 B.C.
ADD COMMENT

Login before adding your answer.

Traffic: 547 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6