Aaron Starr
2009-Jul-06 02:35 UTC
[Mechanize-users] Encoding in meta http-equiv tags, but not in response headers
Hey, all, I just ran into a situation where a web site is not specifying a character encoding in the response headers. It then specifies UTF-8 in a content-type meta tag. Mechanize doesn''t read the meta tags to find the encoding, and ends up using an encoding that doesn''t work, ISO-8859-1. Here''s a quick monkey patch that checks for a meta http-equiv tag in the page body and uses it to set the encoding if the encoding isn''t specified in the response headers: http://pastie.org/535231 I don''t have time to pretend to be smart by making this pretty, but I think it may still be useful for someone, and similar functionality would be welcome in the core sources. Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090705/24e87c62/attachment.html>
Aaron Patterson
2009-Jul-06 03:17 UTC
[Mechanize-users] Encoding in meta http-equiv tags, but not in response headers
On Sun, Jul 5, 2009 at 7:35 PM, Aaron Starr<astarr at wiredquote.com> wrote:> Hey, all, > I just ran into a situation where a web site is not specifying a character > encoding in the response headers. It then specifies UTF-8 in a content-type > meta tag. Mechanize doesn''t read the meta tags to find the encoding, and > ends up using an encoding that doesn''t work, ISO-8859-1. > Here''s a quick monkey patch that checks for a meta http-equiv tag in the > page body and uses it to set the encoding if the encoding isn''t specified in > the response headers: > ?? ?http://pastie.org/535231 > I don''t have time to pretend to be smart by making this pretty, but I think > it may still be useful for someone, and similar functionality would be > welcome in the core sources.Can you write a test to reproduce this error? Nokogiri *should* pick up the encoding from the meta tag. Can you try printing out: puts page.parser.encoding That should contain the encoding that nokogiri intuited. -- Aaron Patterson http://tenderlovemaking.com/
Aaron Starr
2009-Jul-06 03:55 UTC
[Mechanize-users] Encoding in meta http-equiv tags, but not in response headers
> > > I don''t have time to pretend to be smart by making this pretty [...] > > Can you write a test to reproduce this error? Nokogiri *should* pick > up the encoding from the meta tag. > > Can you try printing out: > > puts page.parser.encoding > > That should contain the encoding that nokogiri intuited.Sure, when I get a chance I''ll fuss with it some more. There is a distinct difference in behavior for me, though, with and without the monkey patch. Unfortunately, I can''t just direct you to the page, because it''s hidden behind passwords and byzantine AJAX protocols, and whatnot. Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mechanize-users/attachments/20090705/60e62684/attachment.html>