Tal Galili
2010-Dec-07 12:30 UTC
[R] Encoding problem - I fails to read Hebrew text from online
Hello all, # I am trying to read the text in this URL: u <- http://google.com/complete/search?output=toolbar&q=%d7%a9%d7%9c%d7%95%d7%9d # By using this command: readLines(u) And no matter what variation I tried, I keep getting this output: [1] "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"שלום\"/>< (etc...) Instead of this output: <?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion data="שלום"/><num_queries int="16800000"/></CompleteSuggestion><CompleteSuggestion><suggestion data="שלום חנוך"/><num_queries int="232000"/></CompleteSuggestion><CompleteSuggestion><suggestion data="שלום עליכם"/ (etc....) I tried: readLines(u, encoding= "latin1") readLines(u, encoding= "UTF-8") And also changing Sys.setlocale: Sys.setlocale("LC_ALL", "Hebrew") # must be done for Hebrew to work. Sys.setlocale("LC_ALL", "English") # must be done for Hebrew to work. Are there any more options I could try to get this text properly encoded? Thanks! Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- [[alternative HTML version deleted]]
Tal Galili
2010-Dec-09 17:21 UTC
[R] Encoding problem - I fails to read Hebrew text from online
I am bumping this question in the hopes that someone might be able to advise. This Hebrew and R business is not as smooth as I had hoped... Thanks, Tal Older massage: On Tue, Dec 7, 2010 at 2:30 PM, Tal Galili <tal.galili@gmail.com> wrote:> Hello all, > > # I am trying to read the text in this URL: > u <- > http://google.com/complete/search?output=toolbar&q=%d7%a9%d7%9c%d7%95%d7%9d > # By using this command: > readLines(u) > > And no matter what variation I tried, I keep getting this output: > [1] "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion > data=\"שלום\"/>< (etc...) >> Instead of this output: > <?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion data="שלום > "/><num_queries int="16800000"/></CompleteSuggestion><CompleteSuggestion><suggestion > data="שלום חנוך"/><num_queries int="232000"/></CompleteSuggestion> > <CompleteSuggestion><suggestion data="שלום עליכם"/ > (etc....) > >> I tried: > readLines(u, encoding= "latin1") > readLines(u, encoding= "UTF-8") > And also changing Sys.setlocale: > Sys.setlocale("LC_ALL", "Hebrew") # must be done for Hebrew to work. > Sys.setlocale("LC_ALL", "English") # must be done for Hebrew to work. > > Are there any more options I could try to get this text properly encoded? > > Thanks! > Tal > > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili@gmail.com | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > > ---------------------------------------------------------------------------------------------- > > >[[alternative HTML version deleted]]