Erwin
2012-Jun-11 09:26 UTC
[Rails 3.2] REXML::ParseException ... invalid byte sequence in UTF-8
when parsing an xml response ( UTF-8 encoding) I get a parsing error
response =>
"<?xml version=\"1.0\"
encoding=\"UTF-8\"?>\n<rss
xmlns:opensearch\"http://a9.com/-/spec/opensearch/1.1/\"
xmlns:dc=\"http://purl.org/dc/
elements/1.1/\" version=\"2.0\">\n <channel>\n
<title>link:http://
lvh.me:3000 - Google Recherche de blogs</title>\n <link>http://
www.google.com/search?q=link:http://lvh.me:3000&tbm=blg</link>\n
<description>Aucun document ne correspond aux termes de recherche sp
\xE9cifi\xE9s (<b>link:http://lvh.me:3000</b>).</
description>\n
<opensearch:totalResults>0</opensearch:totalResults>
\n <opensearch:startIndex>1</opensearch:startIndex>\n
<opensearch:itemsPerPage>10</opensearch:itemsPerPage>\n
</channel>\n</
rss>"
parse_rss(response)
def parse_rss(body)
xml = REXML::Document.new(body)
REXML::ParseException Exception: #<REXML::ParseException:
#<ArgumentError: invalid byte sequence in UTF-8>
which seems to be raised by the <description> tag with a french text
using accentuated characters... like sp\xE9cifi\xE9s
is it an REXML bug ? ( in this case I may switch to Nokogiri...)
or did I missed any mandatory parameter in my request ?
thanks for your feedback
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.
[SOLVED] found the answer here :
http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/
.. I forgot to mention I am using Ruby 1.9.3 .....
so
xml
REXML::Document.new(body.force_encoding("ISO-8859-1").encode("UTF-8"))
is the right way to handle the response
On Jun 11, 11:26 am, Erwin <yves_duf...-ee4meeAH724@public.gmane.org>
wrote:> when parsing an xml response ( UTF-8 encoding) I get a parsing error
>
> response =>
> "<?xml version=\"1.0\"
encoding=\"UTF-8\"?>\n<rss xmlns:opensearch>
\"http://a9.com/-/spec/opensearch/1.1/\"
xmlns:dc=\"http://purl.org/dc/
> elements/1.1/\" version=\"2.0\">\n <channel>\n
<title>link:http://
> lvh.me:3000 - Google Recherche de blogs</title>\n
<link>http://www.google.com/search?q=link:http://lvh.me:3000&tbm=blg</link>\n
> <description>Aucun document ne correspond aux termes de recherche sp
> \xE9cifi\xE9s (<b>link:http://lvh.me:3000</b>).</
> description>\n
<opensearch:totalResults>0</opensearch:totalResults>
> \n <opensearch:startIndex>1</opensearch:startIndex>\n
> <opensearch:itemsPerPage>10</opensearch:itemsPerPage>\n
</channel>\n</
> rss>"
>
> parse_rss(response)
>
> def parse_rss(body)
> xml = REXML::Document.new(body)
> REXML::ParseException Exception: #<REXML::ParseException:
> #<ArgumentError: invalid byte sequence in UTF-8>
>
> which seems to be raised by the <description> tag with a french
text
> using accentuated characters... like sp\xE9cifi\xE9s
>
> is it an REXML bug ? ( in this case I may switch to Nokogiri...)
> or did I missed any mandatory parameter in my request ?
>
> thanks for your feedback
--
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.