SpringFlowers AutumnMoon
2008-Sep-28 03:01 UTC
h() doesn''t have any parameter for encoding being used?
it seems that there is no parameter for the function h() (html_escape()) to indicate the character encoding being used? for PHP, its htmlspecialchars() function has a dozen encoding possible, such as UTF-8, Chinese Big5, Chinese GB, Russia, Japanese. i think thought, h() will work for UTF-8, since h() will only touch the 4 special characters < > & " and replace them with < etc and those 4 characters are all in the 0x00 to 0x7F range, and h() will leave the other bytes intact (unchanged). Now, since a character in UTF-8 can be 1 to 4 bytes, and that any ASCII will be represented as 1 byte, which is 0x00 to 0x7F itself, and that 0x80 to 0xFF and other unicode characters will be 2 to 4 bytes long, but with the 1st to 4th bytes all being in the 0x80 to 0xFF range (see UTF-8 http://en.wikipedia.org/wiki/Utf-8 ), so when h() replaces those 4 ASCII characters, it will successfully do so when h() sees those 4 characters as a 1-byte character, and then it will bypass all the 1st to 4th bytes characters because those characters are in the 0x80 to 0xFF range, and therefore can never be matched as one of those 4 special characters, so the job of replacing those 4 characters will be done with no side effect whatsoever done to the non-ASCII characters. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Ryan Bigg
2008-Sep-28 04:18 UTC
Re: h() doesn''t have any parameter for encoding being used?
I don''t think Rails supports UTF8 yet... but I could be wrong. ----- Ryan Bigg Freelancer http://frozenplague.net On 28/09/2008, at 12:31 PM, SpringFlowers AutumnMoon wrote:> > it seems that there is no parameter for the function h() > (html_escape()) > to indicate the character encoding being used? > > for PHP, its htmlspecialchars() function has a dozen encoding > possible, > such as UTF-8, Chinese Big5, Chinese GB, Russia, Japanese. > > i think thought, h() will work for UTF-8, since h() will only touch > the > 4 special characters > > < > & " > > and replace them with < etc and those 4 characters are all in the > 0x00 to 0x7F range, and h() will leave the other bytes intact > (unchanged). Now, since a character in UTF-8 can be 1 to 4 bytes, and > that any ASCII will be represented as 1 byte, which is 0x00 to 0x7F > itself, and that 0x80 to 0xFF and other unicode characters will be 2 > to > 4 bytes long, but with the 1st to 4th bytes all being in the 0x80 to > 0xFF range (see UTF-8 http://en.wikipedia.org/wiki/Utf-8 ), so when > h() > replaces those 4 ASCII characters, it will successfully do so when h() > sees those 4 characters as a 1-byte character, and then it will bypass > all the 1st to 4th bytes characters because those characters are in > the > 0x80 to 0xFF range, and therefore can never be matched as one of > those 4 > special characters, so the job of replacing those 4 characters will be > done with no side effect whatsoever done to the non-ASCII characters. > -- > Posted via http://www.ruby-forum.com/. > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Frederick Cheung
2008-Sep-28 09:21 UTC
Re: h() doesn''t have any parameter for encoding being used?
On 28 Sep 2008, at 05:18, Ryan Bigg <radarlistener-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > I don''t think Rails supports UTF8 yet... but I could be wrong.Actually it should handle utf-8 just fine. Rails 1.2 added a whole bunch of stuff to augment ruby''s somewhat lackluster support. What does h do to utf-8 strings that it shouldn''t? Fred> > ----- > Ryan Bigg > Freelancer > http://frozenplague.net > > > > > > > > On 28/09/2008, at 12:31 PM, SpringFlowers AutumnMoon wrote: > >> >> it seems that there is no parameter for the function h() >> (html_escape()) >> to indicate the character encoding being used? >> >> for PHP, its htmlspecialchars() function has a dozen encoding >> possible, >> such as UTF-8, Chinese Big5, Chinese GB, Russia, Japanese. >> >> i think thought, h() will work for UTF-8, since h() will only touch >> the >> 4 special characters >> >> < > & " >> >> and replace them with < etc and those 4 characters are all in the >> 0x00 to 0x7F range, and h() will leave the other bytes intact >> (unchanged). Now, since a character in UTF-8 can be 1 to 4 bytes, >> and >> that any ASCII will be represented as 1 byte, which is 0x00 to 0x7F >> itself, and that 0x80 to 0xFF and other unicode characters will be 2 >> to >> 4 bytes long, but with the 1st to 4th bytes all being in the 0x80 to >> 0xFF range (see UTF-8 http://en.wikipedia.org/wiki/Utf-8 ), so when >> h() >> replaces those 4 ASCII characters, it will successfully do so when >> h() >> sees those 4 characters as a 1-byte character, and then it will >> bypass >> all the 1st to 4th bytes characters because those characters are in >> the >> 0x80 to 0xFF range, and therefore can never be matched as one of >> those 4 >> special characters, so the job of replacing those 4 characters will >> be >> done with no side effect whatsoever done to the non-ASCII characters. >> -- >> Posted via http://www.ruby-forum.com/. >> >>> > > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Conrad Taylor
2008-Sep-28 09:21 UTC
Re: h() doesn''t have any parameter for encoding being used?
On Sat, Sep 27, 2008 at 9:18 PM, Ryan Bigg <radarlistener-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > I don''t think Rails supports UTF8 yet... but I could be wrong.The default charset for action renderings is UTF-8 since Rails 1.2. -Conrad --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Xavier Noria
2008-Sep-28 19:13 UTC
Re: h() doesn''t have any parameter for encoding being used?
On Sun, Sep 28, 2008 at 5:01 AM, SpringFlowers AutumnMoon <rails-mailing-list-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> it seems that there is no parameter for the function h() (html_escape()) > to indicate the character encoding being used? > > for PHP, its htmlspecialchars() function has a dozen encoding possible, > such as UTF-8, Chinese Big5, Chinese GB, Russia, Japanese. > > i think thought, h() will work for UTF-8, since h() will only touch the > 4 special characters > > < > & " > > and replace them with < etc and those 4 characters are all in the > 0x00 to 0x7F range, and h() will leave the other bytes intact > (unchanged). Now, since a character in UTF-8 can be 1 to 4 bytes, and > that any ASCII will be represented as 1 byte, which is 0x00 to 0x7F > itself, and that 0x80 to 0xFF and other unicode characters will be 2 to > 4 bytes long, but with the 1st to 4th bytes all being in the 0x80 to > 0xFF range (see UTF-8 http://en.wikipedia.org/wiki/Utf-8 ), so when h() > replaces those 4 ASCII characters, it will successfully do so when h() > sees those 4 characters as a 1-byte character, and then it will bypass > all the 1st to 4th bytes characters because those characters are in the > 0x80 to 0xFF range, and therefore can never be matched as one of those 4 > special characters, so the job of replacing those 4 characters will be > done with no side effect whatsoever done to the non-ASCII characters.Ruby 1.8 has a global idea of character enconding, which is configured in the $KCODE global variable. Rails 1.2 and above by default set $KCODE to a value that means everything is UTF-8. Source code, strings, regexps, etc. It also sets a HTTP header that tells the client (X)HTML goes as UTF-8. Thus, the client sends form data back in UTF-8 as well. And everything works transparently. When you do I/O you are responsible for knowing the encoding of incoming data, and the expected encoding of outgoing data. You use iconv if needed to guarantee them. Any I/O operation has to be in control of the involved character encodings. Some stuff in Ruby 1.8 does not play well with UTF-8, for example you cannot compute the length of a string with String#length because that method counts bytes. But some other stuff do work, like pattern matching. For example "." really matches a character, which may not be a byte in UTF-8, as you point out. So, if you are using regexps you are safe in that regard. The helper #h is really an ERb alias of the ERb method #html_escape (it is not a Rails helper), and that method is implemented using regexps: def html_escape(s) s.to_s.gsub(/&/, "&").gsub(/\"/, """).gsub(/>/, ">").gsub(/</, "<") end Hence, it works correctly in UTF-8. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---