Erwin
2012-Jun-11 09:26 UTC
[Rails 3.2] REXML::ParseException ... invalid byte sequence in UTF-8
when parsing an xml response ( UTF-8 encoding) I get a parsing error response => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<rss xmlns:opensearch\"http://a9.com/-/spec/opensearch/1.1/\" xmlns:dc=\"http://purl.org/dc/ elements/1.1/\" version=\"2.0\">\n <channel>\n <title>link:http:// lvh.me:3000 - Google Recherche de blogs</title>\n <link>http:// www.google.com/search?q=link:http://lvh.me:3000&tbm=blg</link>\n <description>Aucun document ne correspond aux termes de recherche sp \xE9cifi\xE9s (<b>link:http://lvh.me:3000</b>).</ description>\n <opensearch:totalResults>0</opensearch:totalResults> \n <opensearch:startIndex>1</opensearch:startIndex>\n <opensearch:itemsPerPage>10</opensearch:itemsPerPage>\n </channel>\n</ rss>" parse_rss(response) def parse_rss(body) xml = REXML::Document.new(body) REXML::ParseException Exception: #<REXML::ParseException: #<ArgumentError: invalid byte sequence in UTF-8> which seems to be raised by the <description> tag with a french text using accentuated characters... like sp\xE9cifi\xE9s is it an REXML bug ? ( in this case I may switch to Nokogiri...) or did I missed any mandatory parameter in my request ? thanks for your feedback -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
[SOLVED] found the answer here : http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/ .. I forgot to mention I am using Ruby 1.9.3 ..... so xml REXML::Document.new(body.force_encoding("ISO-8859-1").encode("UTF-8")) is the right way to handle the response On Jun 11, 11:26 am, Erwin <yves_duf...-ee4meeAH724@public.gmane.org> wrote:> when parsing an xml response ( UTF-8 encoding) I get a parsing error > > response => > "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<rss xmlns:opensearch> \"http://a9.com/-/spec/opensearch/1.1/\" xmlns:dc=\"http://purl.org/dc/ > elements/1.1/\" version=\"2.0\">\n <channel>\n <title>link:http:// > lvh.me:3000 - Google Recherche de blogs</title>\n <link>http://www.google.com/search?q=link:http://lvh.me:3000&tbm=blg</link>\n > <description>Aucun document ne correspond aux termes de recherche sp > \xE9cifi\xE9s (<b>link:http://lvh.me:3000</b>).</ > description>\n <opensearch:totalResults>0</opensearch:totalResults> > \n <opensearch:startIndex>1</opensearch:startIndex>\n > <opensearch:itemsPerPage>10</opensearch:itemsPerPage>\n </channel>\n</ > rss>" > > parse_rss(response) > > def parse_rss(body) > xml = REXML::Document.new(body) > REXML::ParseException Exception: #<REXML::ParseException: > #<ArgumentError: invalid byte sequence in UTF-8> > > which seems to be raised by the <description> tag with a french text > using accentuated characters... like sp\xE9cifi\xE9s > > is it an REXML bug ? ( in this case I may switch to Nokogiri...) > or did I missed any mandatory parameter in my request ? > > thanks for your feedback-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.