Tal Galili
2010-Dec-07  12:30 UTC
[R] Encoding problem - I fails to read Hebrew text from online
Hello all,
# I am trying to read the text in this URL:
u <-
http://google.com/complete/search?output=toolbar&q=%d7%a9%d7%9c%d7%95%d7%9d
# By using this command:
readLines(u)
And no matter what variation I tried, I keep getting this output:
[1] "<?xml
version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion
data=\"שלום\"/><  
(etc...)
Instead of this output:
<?xml
version="1.0"?><toplevel><CompleteSuggestion><suggestion
data="שלום"/><num_queries
int="16800000"/></CompleteSuggestion><CompleteSuggestion><suggestion
data="שלום
חנוך"/><num_queries
int="232000"/></CompleteSuggestion><CompleteSuggestion><suggestion
data="שלום עליכם"/
(etc....)
I tried:
  readLines(u, encoding= "latin1")
  readLines(u, encoding= "UTF-8")
And also changing Sys.setlocale:
  Sys.setlocale("LC_ALL", "Hebrew") # must be done for
Hebrew to work.
  Sys.setlocale("LC_ALL", "English") # must be done for
Hebrew to work.
Are there any more options I could try to get this text properly encoded?
Thanks!
Tal
----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------
	[[alternative HTML version deleted]]
Tal Galili
2010-Dec-09  17:21 UTC
[R] Encoding problem - I fails to read Hebrew text from online
I am bumping this question in the hopes that someone might be able to advise. This Hebrew and R business is not as smooth as I had hoped... Thanks, Tal Older massage: On Tue, Dec 7, 2010 at 2:30 PM, Tal Galili <tal.galili@gmail.com> wrote:> Hello all, > > # I am trying to read the text in this URL: > u <- > http://google.com/complete/search?output=toolbar&q=%d7%a9%d7%9c%d7%95%d7%9d > # By using this command: > readLines(u) > > And no matter what variation I tried, I keep getting this output: > [1] "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion > data=\"שלום\"/>< (etc...) >> Instead of this output: > <?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion data="שלום > "/><num_queries int="16800000"/></CompleteSuggestion><CompleteSuggestion><suggestion > data="שלום חנוך"/><num_queries int="232000"/></CompleteSuggestion> > <CompleteSuggestion><suggestion data="שלום עליכם"/ > (etc....) > >> I tried: > readLines(u, encoding= "latin1") > readLines(u, encoding= "UTF-8") > And also changing Sys.setlocale: > Sys.setlocale("LC_ALL", "Hebrew") # must be done for Hebrew to work. > Sys.setlocale("LC_ALL", "English") # must be done for Hebrew to work. > > Are there any more options I could try to get this text properly encoded? > > Thanks! > Tal > > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili@gmail.com | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > > ---------------------------------------------------------------------------------------------- > > >[[alternative HTML version deleted]]