Hello! I am crossposting this to R-help and BioC, since it is relevant to both groups. I wrote a wrapper for Entrez search utility (link for this is provided bellow), which can add some new search functionality to existing code in Bioconductor's package 'annotate'*. http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html Entrez search utuility returns a XML document but I have a problem to use URI to retrieve that file, since URI can also contain characters, which should not be there according to http://www.faqs.org/rfcs/rfc2396.html I encountered problems with "[" and "]" as well as with space characters. However there might also be a problem with others i.e. reserved characters in URI syntax. My R example is: R> library("annotate") Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, simply type: openVignette() For details on reading vignettes, see the openVignette help page. R> library(XML) R> tmp$term <- "gorjanc g[au]" R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au]" R> tmp $term [1] "gorjanc g[au]" $URL [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au]" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au] # so I have a problem with space and [ and ] # let's reduce a problem to just space or [] to be sure R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc[au]" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc[au] # now show that it works fine without special chars R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) $doc $file [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc" $version [1] "1.0" $children ... # now show a workaround for space tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g" xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g" R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) $doc $file [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g" $version [1] "1.0" $children ... As can be seen from above there is a possibility to handle this special characters and I wonder if this has already been done somewhere? If not I thought on a function fixURLchar, which would replace reserved characters with ther escaped sequences. Any comments, pointers, ... ? from = c(" ", "\"", ",", "#"), to = c("%20", "%22", "%2c", "%23")) *When I'll solve problem I will send my code to 'annotate' maintainer and he can include it at his will in a package. Lep pozdrav / With regards, Gregor Gorjanc ---------------------------------------------------------------------- University of Ljubljana Biotechnical Faculty URI: http://www.bfro.uni-lj.si/MR/ggorjan Zootechnical Department mail: gregor.gorjanc <at> bfro.uni-lj.si Groblje 3 tel: +386 (0)1 72 17 861 SI-1230 Domzale fax: +386 (0)1 72 17 888 Slovenia, Europe ---------------------------------------------------------------------- "One must learn by doing the thing; for though you think you know it, you have no certainty until you try." Sophocles ~ 450 B.C.
On Tue, 3 May 2005, Gorjanc Gregor wrote:> I am crossposting this to R-help and BioC, since it is relevant to both > groups.I don't see the relevance to R-help. But the answer to your subject is unambiguous: valid URLs do not contain `special' characters -- they must be encoded. See RFC1738 at e.g. ftp://ftp.funet.fi/pub/doc/rfc/rfc1738.txt At some point (probably 2.2.0) I intend to ensure that the mapping to file:// URLs that is done is a few places is encoded as necessary. This will likely result in a utility function filePathToURL or some such. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
There are safe ways of encoding URLs that contain funny characters: (space) %20 [ %5B ] %5D so your url would be: URL<-'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g%5Bau%5D' That makes your snippet work just fine. http://www.macromedia.com/cfusion/knowledgebase/index.cfm?id=tn_14143 has the list. Francois On Mon, 2005-05-02 at 19:46, Gorjanc Gregor wrote:> Hello! > > I am crossposting this to R-help and BioC, since it is relevant to both > groups. > > I wrote a wrapper for Entrez search utility (link for this is provided bellow), > which can add some new search functionality to existing code in Bioconductor's > package 'annotate'*. > > http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html > > Entrez search utuility returns a XML document but I have a problem to > use URI to retrieve that file, since URI can also contain characters, > which should not be there according to > > http://www.faqs.org/rfcs/rfc2396.html > > I encountered problems with "[" and "]" as well as with space characters. > However there might also be a problem with others i.e. reserved characters > in URI syntax. > > My R example is: > > R> library("annotate") > Loading required package: Biobase > Loading required package: tools > Welcome to Bioconductor > Vignettes contain introductory material. To view, > simply type: openVignette() > For details on reading vignettes, see > the openVignette help page. > R> library(XML) > R> tmp$term <- "gorjanc g[au]" > R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au]" > R> tmp > $term > [1] "gorjanc g[au]" > > $URL > [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au]" > R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) > Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : > error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g[au] > > # so I have a problem with space and [ and ] > # let's reduce a problem to just space or [] to be sure > R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g" > R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) > Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : > error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc g > R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc[au]" > R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) > Error in xmlTreeParse(tmp$URL, isURL = TRUE, handlers = NULL, asTree = TRUE) : > error in creating parser for http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc[au] > > # now show that it works fine without special chars > R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc" > R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) > $doc > $file > [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc" > > $version > [1] "1.0" > > $children > ... > > # now show a workaround for space > tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g" > xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) > R> tmp$URL <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g" > R> xmlTreeParse(tmp$URL, isURL=TRUE, handlers=NULL, asTree=TRUE) > $doc > $file > [1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=gorjanc%20g" > > $version > [1] "1.0" > > $children > ... > > As can be seen from above there is a possibility to handle this special > characters and I wonder if this has already been done somewhere? If not > I thought on a function fixURLchar, which would replace reserved characters > with ther escaped sequences. Any comments, pointers, ... ? > > from = c(" ", "\"", ",", "#"), > to = c("%20", "%22", "%2c", "%23")) > > *When I'll solve problem I will send my code to 'annotate' maintainer > and he can include it at his will in a package. > > Lep pozdrav / With regards, > Gregor Gorjanc > > ---------------------------------------------------------------------- > University of Ljubljana > Biotechnical Faculty URI: http://www.bfro.uni-lj.si/MR/ggorjan > Zootechnical Department mail: gregor.gorjanc <at> bfro.uni-lj.si > Groblje 3 tel: +386 (0)1 72 17 861 > SI-1230 Domzale fax: +386 (0)1 72 17 888 > Slovenia, Europe > ---------------------------------------------------------------------- > "One must learn by doing the thing; for though you think you know it, > you have no certainty until you try." Sophocles ~ 450 B.C. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor