buddycat
2009-Apr-12 07:41 UTC
invalid byte sequence utf-8 OR best option to sanitize content brought in with net::http? single non-utf character causes rails to crash
hi all, platform: debian lenny, ruby1.91.p0, passenger/apache-multithread, rails2.3 in vendor/postres and sql server via odbc. all current gems. i have legacy asp content on win2k servers that i wrap in rails controllers. this all worked great with ruby1.8, but now that we are dealing with encoded strings in ruby1.9, i am having page crashes randomly as users have cut and pasted high ascii code characters (e.g. ascii 150 - a fancy dash) that are ms only and non-standard. normally, i just wouldn''t have cared or even worried about it that much; however, in testing this a bit further after a few mysterious rails page crashes, i did more experimenting. i found that if i put the following in my asp page, it will cause the rails page to fail with "invalid byte sequence in utf-8" ror/vendor/rails/activesupport/ lib/active_support/core_ext/blank.rb: 50 the offending asp code is: <%= chr(150) %> this is my own doing to reproduce the issue, but there are many non- standard windows characters that are not utf-8 compliant that probably riddle my sql server database because users like to cut and paste content from word and other places. it turns out that because the content that i bring in via ruby net::http has non-utf8 characters, the encoding is set to ascii8bit and when i do force_encoding(utf-8), valid_encoding is false and the page just fails. html::sanitize isn''t an option as i don''t want to strip the tags. the content is from internal trusted servers that i control. i just need to sanizite, i guess, the bad characters. my thoughts/questions: 1) seems like rails should be less brittle about managing encoding such that blank? doesn''t just fail when the valid_encoding is false. or you shouldn''t be able to create a string if the encoding is bad. or it should make best efforts to transliterate the bad characters. something. 2) is iconv my best option. seems kind of nuts that i have to reencode the entire html page for one character. this does work using the translit//ignore options i get my pages, but i wonder at the overhead. 3) as usual, trying to make my ms iis5 servers do anything useful is a non-starter. sure it says it can generate utf-8, but trying it the (typically confused and poorly documented) 25 different ways to make it do so, results in nothing but more wasted time. so i need a good rails solution that "just works." 4) it occurs to me that it could also be that ruby is setting the default to acsii for net::http regardless of how iis is sending it. how do i check/set the encoding.default_external in rails. why does rails remove the Encoding class. it isn''t there in console, but is in irb. i dislike rails remvoing native ruby classes. please. i am so close to having ruby1.9/rails2.3 working, but this encoding stuff is really a hassle. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hector Gomez
2009-Apr-12 15:49 UTC
Re: invalid byte sequence utf-8 OR best option to sanitize content brought in with net::http? single non-utf character causes rails to crash
1) 1.9 is the wild wild west unfortunately, even more in all this encoding mess so as a developer right now is your responsability to transcode any external data to UTF-8(or you encoding of choice). I have sent a GSOC proposal to resolve this problems and let rails handle this problems for you and well "just work". 2) You can use the String#encode method supplied in ruby 1.9 That does conversion between the supported encodings in ruby. It has a parameter to ignore or to replace invalid character with a placeholder value # encoding: utf-8 pi = "pi = π " puts pi.encode("iso-8859-1", undef: :replace, replace: "??") returns pi = ?? 4) What you really want is to set the internal_encoding. If you have set the internal_encoding of your program every IO is transcode from its external_encoding to your internal_encoding in a transparent way. I recommend you read this blog: http://blog.grayproductions.net/articles/ruby_19s_three_default_encodings Rails doesn''t remove the Encoding class is available in the console. I think your console for some reason is using ruby 1.8. On Apr 12, 2:41 am, buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> hi all, > > platform: debian lenny, ruby1.91.p0, passenger/apache-multithread, > rails2.3 in vendor/postres and sql server via odbc. all current gems. > > i have legacy asp content on win2k servers that i wrap in rails > controllers. this all worked great with ruby1.8, but now that we are > dealing with encoded strings in ruby1.9, i am having page crashes > randomly as users have cut and pasted high ascii code characters (e.g. > ascii 150 - a fancy dash) that are ms only and non-standard. > > normally, i just wouldn''t have cared or even worried about it that > much; however, in testing this a bit further after a few mysterious > rails page crashes, i did more experimenting. i found that if i put > the following in my asp page, it will cause the rails page to fail > with "invalid byte sequence in utf-8" ror/vendor/rails/activesupport/ > lib/active_support/core_ext/blank.rb: 50 > > the offending asp code is: > > <%= chr(150) %> > this is my own doing to reproduce the issue, but there are many non- > standard windows characters that are not utf-8 compliant that probably > riddle my sql server database because users like to cut and paste > content from word and other places. > > it turns out that because the content that i bring in via ruby > net::http has non-utf8 characters, the encoding is set to ascii8bit > and when i do force_encoding(utf-8), valid_encoding is false and the > page just fails. html::sanitize isn''t an option as i don''t want to > strip the tags. the content is from internal trusted servers that i > control. i just need to sanizite, i guess, the bad characters. > > my thoughts/questions: > 1) seems like rails should be less brittle about managing encoding > such that blank? doesn''t just fail when the valid_encoding is false. > or you shouldn''t be able to create a string if the encoding is bad. or > it should make best efforts to transliterate the bad characters. > something. > > 2) is iconv my best option. seems kind of nuts that i have to reencode > the entire html page for one character. this does work using the > translit//ignore options i get my pages, but i wonder at the > overhead. > > 3) as usual, trying to make my ms iis5 servers do anything useful is a > non-starter. sure it says it can generate utf-8, but trying it the > (typically confused and poorly documented) 25 different ways to make > it do so, results in nothing but more wasted time. so i need a good > rails solution that "just works." > > 4) it occurs to me that it could also be that ruby is setting the > default to acsii for net::http regardless of how iis is sending it. > how do i check/set the encoding.default_external in rails. why does > rails remove the Encoding class. it isn''t there in console, but is in > irb. i dislike rails remvoing native ruby classes. > > please. i am so close to having ruby1.9/rails2.3 working, but this > encoding stuff is really a hassle.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
buddycat
2009-Apr-13 00:17 UTC
Re: invalid byte sequence utf-8 OR best option to sanitize content brought in with net::http? single non-utf character causes rails to crash
thanks hector, i think you are right about the console. i tried the non-compat change to case statements in ruby 1.9 with colons and console seemed fine with that. so i guess somehow even though i change script/console to #!/usr/local/ruby1.9/bin/ruby or even comment out the sherbang and rename it script/console.rb and run it with my /usr/local ruby, i still get 1.8. iguess is there a way to set the ruby that console runs. this is one of those things that i think it pretty convoluted in rails. we should just have an external config file and set these things. all the calculated paths and other "convention" stuff works most of the time, but sometimes it just creates confusion. imho regardless, any ideas about how to set the ruby version for console? ...gg On Apr 12, 11:49 am, Hector Gomez <hector...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> 1) 1.9 is the wild wild west unfortunately, even more in all this > encoding > mess so as a developer right now is your responsability to transcode > any > external data to UTF-8(or you encoding of choice). > I have sent a GSOC proposal to resolve this problems and let rails > handle this problems for you and well "just work". > > 2) You can use the String#encode method supplied in ruby 1.9 > That does conversion between the supported encodings in ruby. > It has a parameter to ignore or to replace invalid character with > a placeholder value > > # encoding: utf-8 > pi = "pi = π " > puts pi.encode("iso-8859-1", undef: :replace, replace: "??") > returns pi = ?? > > 4) What you really want is to set the internal_encoding. > If you have set the internal_encoding of your program > every IO is transcode from its external_encoding to your > internal_encoding in a transparent way. > I recommend you read this blog: > > http://blog.grayproductions.net/articles/ruby_19s_three_default_encod... > > Rails doesn''t remove the Encoding class is available in the console. > I think your console for some reason is using ruby 1.8. > > On Apr 12, 2:41 am, buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > hi all, > > > platform: debian lenny, ruby1.91.p0, passenger/apache-multithread, > > rails2.3 in vendor/postres and sql server via odbc. all current gems. > > > i have legacy asp content on win2k servers that i wrap in rails > > controllers. this all worked great with ruby1.8, but now that we are > > dealing with encoded strings in ruby1.9, i am having page crashes > > randomly as users have cut and pasted high ascii code characters (e.g. > > ascii 150 - a fancy dash) that are ms only and non-standard. > > > normally, i just wouldn''t have cared or even worried about it that > > much; however, in testing this a bit further after a few mysterious > > rails page crashes, i did more experimenting. i found that if i put > > the following in my asp page, it will cause the rails page to fail > > with "invalid byte sequence in utf-8" ror/vendor/rails/activesupport/ > > lib/active_support/core_ext/blank.rb: 50 > > > the offending asp code is: > > > <%= chr(150) %> > > this is my own doing to reproduce the issue, but there are many non- > > standard windows characters that are not utf-8 compliant that probably > > riddle my sql server database because users like to cut and paste > > content from word and other places. > > > it turns out that because the content that i bring in via ruby > > net::http has non-utf8 characters, the encoding is set to ascii8bit > > and when i do force_encoding(utf-8), valid_encoding is false and the > > page just fails. html::sanitize isn''t an option as i don''t want to > > strip the tags. the content is from internal trusted servers that i > > control. i just need to sanizite, i guess, the bad characters. > > > my thoughts/questions: > > 1) seems like rails should be less brittle about managing encoding > > such that blank? doesn''t just fail when the valid_encoding is false. > > or you shouldn''t be able to create a string if the encoding is bad. or > > it should make best efforts to transliterate the bad characters. > > something. > > > 2) is iconv my best option. seems kind of nuts that i have to reencode > > the entire html page for one character. this does work using the > > translit//ignore options i get my pages, but i wonder at the > > overhead. > > > 3) as usual, trying to make my ms iis5 servers do anything useful is a > > non-starter. sure it says it can generate utf-8, but trying it the > > (typically confused and poorly documented) 25 different ways to make > > it do so, results in nothing but more wasted time. so i need a good > > rails solution that "just works." > > > 4) it occurs to me that it could also be that ruby is setting the > > default to acsii for net::http regardless of how iis is sending it. > > how do i check/set the encoding.default_external in rails. why does > > rails remove the Encoding class. it isn''t there in console, but is in > > irb. i dislike rails remvoing native ruby classes. > > > please. i am so close to having ruby1.9/rails2.3 working, but this > > encoding stuff is really a hassle.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
buddycat
2009-Apr-13 00:27 UTC
Re: invalid byte sequence utf-8 OR best option to sanitize content brought in with net::http? single non-utf character causes rails to crash
thanks hector, i think you are right about the console. i tried the non-compat change to case statements in ruby 1.9 with colons and console seemed fine with that. so i guess somehow even though i change script/console to #!/usr/local/ruby1.9/bin/ruby or even comment out the sherbang and rename it script/console.rb and run it with my /usr/local ruby, i still get 1.8. iguess is there a way to set the ruby that console runs. this is one of those things that i think it pretty convoluted in rails. we should just have an external config file and set these things. all the calculated paths and other "convention" stuff works most of the time, but sometimes it just creates confusion. imho regardless, any ideas about how to set the ruby version for console? ...gg On Apr 12, 11:49 am, Hector Gomez <hector...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> 1) 1.9 is the wild wild west unfortunately, even more in all this > encoding > mess so as a developer right now is your responsability to transcode > any > external data to UTF-8(or you encoding of choice). > I have sent a GSOC proposal to resolve this problems and let rails > handle this problems for you and well "just work". > > 2) You can use the String#encode method supplied in ruby 1.9 > That does conversion between the supported encodings in ruby. > It has a parameter to ignore or to replace invalid character with > a placeholder value > > # encoding: utf-8 > pi = "pi = π " > puts pi.encode("iso-8859-1", undef: :replace, replace: "??") > returns pi = ?? > > 4) What you really want is to set the internal_encoding. > If you have set the internal_encoding of your program > every IO is transcode from its external_encoding to your > internal_encoding in a transparent way. > I recommend you read this blog: > > http://blog.grayproductions.net/articles/ruby_19s_three_default_encod... > > Rails doesn''t remove the Encoding class is available in the console. > I think your console for some reason is using ruby 1.8. > > On Apr 12, 2:41 am, buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > hi all, > > > platform: debian lenny, ruby1.91.p0, passenger/apache-multithread, > > rails2.3 in vendor/postres and sql server via odbc. all current gems. > > > i have legacy asp content on win2k servers that i wrap in rails > > controllers. this all worked great with ruby1.8, but now that we are > > dealing with encoded strings in ruby1.9, i am having page crashes > > randomly as users have cut and pasted high ascii code characters (e.g. > > ascii 150 - a fancy dash) that are ms only and non-standard. > > > normally, i just wouldn''t have cared or even worried about it that > > much; however, in testing this a bit further after a few mysterious > > rails page crashes, i did more experimenting. i found that if i put > > the following in my asp page, it will cause the rails page to fail > > with "invalid byte sequence in utf-8" ror/vendor/rails/activesupport/ > > lib/active_support/core_ext/blank.rb: 50 > > > the offending asp code is: > > > <%= chr(150) %> > > this is my own doing to reproduce the issue, but there are many non- > > standard windows characters that are not utf-8 compliant that probably > > riddle my sql server database because users like to cut and paste > > content from word and other places. > > > it turns out that because the content that i bring in via ruby > > net::http has non-utf8 characters, the encoding is set to ascii8bit > > and when i do force_encoding(utf-8), valid_encoding is false and the > > page just fails. html::sanitize isn''t an option as i don''t want to > > strip the tags. the content is from internal trusted servers that i > > control. i just need to sanizite, i guess, the bad characters. > > > my thoughts/questions: > > 1) seems like rails should be less brittle about managing encoding > > such that blank? doesn''t just fail when the valid_encoding is false. > > or you shouldn''t be able to create a string if the encoding is bad. or > > it should make best efforts to transliterate the bad characters. > > something. > > > 2) is iconv my best option. seems kind of nuts that i have to reencode > > the entire html page for one character. this does work using the > > translit//ignore options i get my pages, but i wonder at the > > overhead. > > > 3) as usual, trying to make my ms iis5 servers do anything useful is a > > non-starter. sure it says it can generate utf-8, but trying it the > > (typically confused and poorly documented) 25 different ways to make > > it do so, results in nothing but more wasted time. so i need a good > > rails solution that "just works." > > > 4) it occurs to me that it could also be that ruby is setting the > > default to acsii for net::http regardless of how iis is sending it. > > how do i check/set the encoding.default_external in rails. why does > > rails remove the Encoding class. it isn''t there in console, but is in > > irb. i dislike rails remvoing native ruby classes. > > > please. i am so close to having ruby1.9/rails2.3 working, but this > > encoding stuff is really a hassle.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
buddycat
2009-Apr-13 04:40 UTC
Re: invalid byte sequence utf-8 OR best option to sanitize content brought in with net::http? single non-utf character causes rails to crash
ok. found this patch to ../railties/lib/commands/console.rb https://rails.lighthouseapp.com/attachments/93770/script-console-invoke-used-rubys-irb.diff as script/console is just a wrapper for console.rb, that is the place to intervene. stock rails just ends up calling your default irb without bothering to see what version of ruby you are running. this fixes my immediate problem so thanks. am going to grep RUBY_PLATFORM to see if that can just be set somewhere in rails as that seems to be referenced before searching for system location of irb. ...gg On Apr 12, 8:27 pm, buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> thanks hector, > > i think you are right about the console. i tried the non-compat change > to case statements in ruby 1.9 with colons and console seemed fine > with that. so i guess somehow even though i change script/console to > #!/usr/local/ruby1.9/bin/ruby or even comment out the sherbang and > rename it script/console.rb and run it with my /usr/local ruby, i > still get 1.8. iguess > > is there a way to set the ruby that console runs. this is one of those > things that i think it pretty convoluted in rails. we should just have > an external config file and set these things. all the calculated paths > and other "convention" stuff works most of the time, but sometimes it > just creates confusion. imho > > regardless, any ideas about how to set the ruby version for console? > ...gg > > On Apr 12, 11:49 am, Hector Gomez <hector...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > 1) 1.9 is the wild wild west unfortunately, even more in all this > > encoding > > mess so as a developer right now is your responsability to transcode > > any > > external data to UTF-8(or you encoding of choice). > > I have sent a GSOC proposal to resolve this problems and let rails > > handle this problems for you and well "just work". > > > 2) You can use the String#encode method supplied in ruby 1.9 > > That does conversion between the supported encodings in ruby. > > It has a parameter to ignore or to replace invalid character with > > a placeholder value > > > # encoding: utf-8 > > pi = "pi = π " > > puts pi.encode("iso-8859-1", undef: :replace, replace: "??") > > returns pi = ?? > > > 4) What you really want is to set the internal_encoding. > > If you have set the internal_encoding of your program > > every IO is transcode from its external_encoding to your > > internal_encoding in a transparent way. > > I recommend you read this blog: > > >http://blog.grayproductions.net/articles/ruby_19s_three_default_encod... > > > Rails doesn''t remove the Encoding class is available in the console. > > I think your console for some reason is using ruby 1.8. > > > On Apr 12, 2:41 am, buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > hi all, > > > > platform: debian lenny, ruby1.91.p0, passenger/apache-multithread, > > > rails2.3 in vendor/postres and sql server via odbc. all current gems. > > > > i have legacy asp content on win2k servers that i wrap in rails > > > controllers. this all worked great with ruby1.8, but now that we are > > > dealing with encoded strings in ruby1.9, i am having page crashes > > > randomly as users have cut and pasted high ascii code characters (e.g. > > > ascii 150 - a fancy dash) that are ms only and non-standard. > > > > normally, i just wouldn''t have cared or even worried about it that > > > much; however, in testing this a bit further after a few mysterious > > > rails page crashes, i did more experimenting. i found that if i put > > > the following in my asp page, it will cause the rails page to fail > > > with "invalid byte sequence in utf-8" ror/vendor/rails/activesupport/ > > > lib/active_support/core_ext/blank.rb: 50 > > > > the offending asp code is: > > > > <%= chr(150) %> > > > this is my own doing to reproduce the issue, but there are many non- > > > standard windows characters that are not utf-8 compliant that probably > > > riddle my sql server database because users like to cut and paste > > > content from word and other places. > > > > it turns out that because the content that i bring in via ruby > > > net::http has non-utf8 characters, the encoding is set to ascii8bit > > > and when i do force_encoding(utf-8), valid_encoding is false and the > > > page just fails. html::sanitize isn''t an option as i don''t want to > > > strip the tags. the content is from internal trusted servers that i > > > control. i just need to sanizite, i guess, the bad characters. > > > > my thoughts/questions: > > > 1) seems like rails should be less brittle about managing encoding > > > such that blank? doesn''t just fail when the valid_encoding is false. > > > or you shouldn''t be able to create a string if the encoding is bad. or > > > it should make best efforts to transliterate the bad characters. > > > something. > > > > 2) is iconv my best option. seems kind of nuts that i have to reencode > > > the entire html page for one character. this does work using the > > > translit//ignore options i get my pages, but i wonder at the > > > overhead. > > > > 3) as usual, trying to make my ms iis5 servers do anything useful is a > > > non-starter. sure it says it can generate utf-8, but trying it the > > > (typically confused and poorly documented) 25 different ways to make > > > it do so, results in nothing but more wasted time. so i need a good > > > rails solution that "just works." > > > > 4) it occurs to me that it could also be that ruby is setting the > > > default to acsii for net::http regardless of how iis is sending it. > > > how do i check/set the encoding.default_external in rails. why does > > > rails remove the Encoding class. it isn''t there in console, but is in > > > irb. i dislike rails remvoing native ruby classes. > > > > please. i am so close to having ruby1.9/rails2.3 working, but this > > > encoding stuff is really a hassle.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
buddycat
2009-Apr-13 08:04 UTC
Re: invalid byte sequence utf-8 OR best option to sanitize content brought in with net::http? single non-utf character causes rails to crash
hector, further update: i was able to set both my internal and external encoding thanks to hongli lai at phusion passenger. he helped me with a wrapper for my local ruby that uses the encoding option. not suggesting that this is his preferred method though, but you don''t seem to be able to pass ruby options any other way that i''m aware of in passenger''s apache config. /usr/local/ruby1.9/bin/ruby_wrapper: #!/bin/bash exec /usr/local/ruby1.9/bin/ruby -E utf-8:utf-8 "$@" then in apache2.conf: PassengerRuby /usr/local/ruby1.9/bin/ruby_wrapper restart apache. in a controller: raise "#{Encoding.default_internal} #{Encoding.default_internal}" results in: utf-8 utf-8 so all is good. for my app anyway. irb and script/console is a pain. unfortunately, after all this, my asp pages still get ascii encoded when brought in by net::http (after adding all the asp settings i can to convince it to use utf). also, more unfortunately, your assertion that if i have the default encodings set right (particularly default_internal which i do now), that it will silently and fautlessly convert my ascii page without error. no joy. got same utf encoding error that i started with. so...guess i am back to doing explicit encoding like you suggested or going back to iconv. all in all i have to say that ruby1.9 and rails2.3 and encoding and irb and compiling your own ruby and... are still very rough. ...gg On Apr 13, 12:40 am, buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> ok. found this patch to ../railties/lib/commands/console.rb > > https://rails.lighthouseapp.com/attachments/93770/script-console-invo... > > as script/console is just a wrapper for console.rb, that is the place > to intervene. stock rails just ends up calling your default irb > without bothering to see what version of ruby you are running. > > this fixes my immediate problem so thanks. am going to grep > RUBY_PLATFORM to see if that can just be set somewhere in rails as > that seems to be referenced before searching for system location of > irb. > > ...gg > > On Apr 12, 8:27 pm, buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > thanks hector, > > > i think you are right about the console. i tried the non-compat change > > to case statements in ruby 1.9 with colons and console seemed fine > > with that. so i guess somehow even though i change script/console to > > #!/usr/local/ruby1.9/bin/ruby or even comment out the sherbang and > > rename it script/console.rb and run it with my /usr/local ruby, i > > still get 1.8. iguess > > > is there a way to set the ruby that console runs. this is one of those > > things that i think it pretty convoluted in rails. we should just have > > an external config file and set these things. all the calculated paths > > and other "convention" stuff works most of the time, but sometimes it > > just creates confusion. imho > > > regardless, any ideas about how to set the ruby version for console? > > ...gg > > > On Apr 12, 11:49 am, Hector Gomez <hector...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > 1) 1.9 is the wild wild west unfortunately, even more in all this > > > encoding > > > mess so as a developer right now is your responsability to transcode > > > any > > > external data to UTF-8(or you encoding of choice). > > > I have sent a GSOC proposal to resolve this problems and let rails > > > handle this problems for you and well "just work". > > > > 2) You can use the String#encode method supplied in ruby 1.9 > > > That does conversion between the supported encodings in ruby. > > > It has a parameter to ignore or to replace invalid character with > > > a placeholder value > > > > # encoding: utf-8 > > > pi = "pi = π " > > > puts pi.encode("iso-8859-1", undef: :replace, replace: "??") > > > returns pi = ?? > > > > 4) What you really want is to set the internal_encoding. > > > If you have set the internal_encoding of your program > > > every IO is transcode from its external_encoding to your > > > internal_encoding in a transparent way. > > > I recommend you read this blog: > > > >http://blog.grayproductions.net/articles/ruby_19s_three_default_encod... > > > > Rails doesn''t remove the Encoding class is available in the console. > > > I think your console for some reason is using ruby 1.8. > > > > On Apr 12, 2:41 am, buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > > hi all, > > > > > platform: debian lenny, ruby1.91.p0, passenger/apache-multithread, > > > > rails2.3 in vendor/postres and sql server via odbc. all current gems. > > > > > i have legacy asp content on win2k servers that i wrap in rails > > > > controllers. this all worked great with ruby1.8, but now that we are > > > > dealing with encoded strings in ruby1.9, i am having page crashes > > > > randomly as users have cut and pasted high ascii code characters (e.g. > > > > ascii 150 - a fancy dash) that are ms only and non-standard. > > > > > normally, i just wouldn''t have cared or even worried about it that > > > > much; however, in testing this a bit further after a few mysterious > > > > rails page crashes, i did more experimenting. i found that if i put > > > > the following in my asp page, it will cause the rails page to fail > > > > with "invalid byte sequence in utf-8" ror/vendor/rails/activesupport/ > > > > lib/active_support/core_ext/blank.rb: 50 > > > > > the offending asp code is: > > > > > <%= chr(150) %> > > > > this is my own doing to reproduce the issue, but there are many non- > > > > standard windows characters that are not utf-8 compliant that probably > > > > riddle my sql server database because users like to cut and paste > > > > content from word and other places. > > > > > it turns out that because the content that i bring in via ruby > > > > net::http has non-utf8 characters, the encoding is set to ascii8bit > > > > and when i do force_encoding(utf-8), valid_encoding is false and the > > > > page just fails. html::sanitize isn''t an option as i don''t want to > > > > strip the tags. the content is from internal trusted servers that i > > > > control. i just need to sanizite, i guess, the bad characters. > > > > > my thoughts/questions: > > > > 1) seems like rails should be less brittle about managing encoding > > > > such that blank? doesn''t just fail when the valid_encoding is false. > > > > or you shouldn''t be able to create a string if the encoding is bad. or > > > > it should make best efforts to transliterate the bad characters. > > > > something. > > > > > 2) is iconv my best option. seems kind of nuts that i have to reencode > > > > the entire html page for one character. this does work using the > > > > translit//ignore options i get my pages, but i wonder at the > > > > overhead. > > > > > 3) as usual, trying to make my ms iis5 servers do anything useful is a > > > > non-starter. sure it says it can generate utf-8, but trying it the > > > > (typically confused and poorly documented) 25 different ways to make > > > > it do so, results in nothing but more wasted time. so i need a good > > > > rails solution that "just works." > > > > > 4) it occurs to me that it could also be that ruby is setting the > > > > default to acsii for net::http regardless of how iis is sending it. > > > > how do i check/set the encoding.default_external in rails. why does > > > > rails remove the Encoding class. it isn''t there in console, but is in > > > > irb. i dislike rails remvoing native ruby classes. > > > > > please. i am so close to having ruby1.9/rails2.3 working, but this > > > > encoding stuff is really a hassle.--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Conrad Taylor
2009-Apr-13 09:05 UTC
Re: invalid byte sequence utf-8 OR best option to sanitize content brought in with net::http? single non-utf character causes rails to crash
2009/4/13 buddycat <buddy12lbcat-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> > hector, > > further update: > > i was able to set both my internal and external encoding thanks to > hongli lai at phusion passenger. he helped me with a wrapper for my > local ruby that uses the encoding option. not suggesting that this is > his preferred method though, but you don''t seem to be able to pass > ruby options any other way that i''m aware of in passenger''s apache > config. > > /usr/local/ruby1.9/bin/ruby_wrapper: > #!/bin/bash > exec /usr/local/ruby1.9/bin/ruby -E utf-8:utf-8 "$@" > > then in apache2.conf: > PassengerRuby /usr/local/ruby1.9/bin/ruby_wrapper > > restart apache. > > in a controller: > raise "#{Encoding.default_internal} #{Encoding.default_internal}" > > results in: > utf-8 utf-8 > > so all is good. for my app anyway. irb and script/console is a pain. > > unfortunately, after all this, my asp pages still get ascii encoded > when brought in by net::http (after adding all the asp settings i can > to convince it to use utf). also, more unfortunately, your assertion > that if i have the default encodings set right (particularly > default_internal which i do now), that it will silently and fautlessly > convert my ascii page without error. no joy. got same utf encoding > error that i started with. > > so...guess i am back to doing explicit encoding like you suggested or > going back to iconv. > > all in all i have to say that ruby1.9 and rails2.3 and encoding and > irb and compiling your own ruby and... are still very rough. > > ...gg >Do you have a test case that I can reproduce the issue that you''re seeing? Thanks, -Conrad --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
buddycat
2009-Apr-13 22:52 UTC
Re: invalid byte sequence utf-8 OR best option to sanitize content brought in with net::http? single non-utf character causes rails to crash
so i use lib/asp.rb module to get legacy asp content from internal win2k/iis5/asp (classic not .net) servers as a mixin and require it in my application_controller.rb as i have many asp pages. i do it this way because it gives me a smooth incremental upgrade path to rails from asp by replacing page for page as we write a better rails replacement. this way my routes are all rails and i just call asp_get_content when i have an asp page to wrap. controller: def my_legacy_page asp_get_content end lib/asp.rb module asp def asp_get_content @asp_response = Net::HTTP.start(host, port) {|x| x.read_timeout = 1200 x.send_request(method, path, data, headers) } # return false on redirects so we can use custom renders like so: # render :foo => :bar if asp_get_content while still allowing just # asp_get_content without anything else for standard stuff case @asp_response when Net::HTTPRedirection redirect_to "#{@asp_response[''location'']}" false else true end end view: <%= @asp_response.body %> to reproduce the issue, just add <%= chr(150) %> to the asp page. rails will choke with invalid byte sequence utf-8 as soon as the response.rb tries to parse @asp_response.body. see the above comments for the stack trace. this is just my particular situation. i suspect you can add any high, non-standard ascii code that windows likes like ascii 128-159. my test case is ascii 150 that will reliably reproduce the issue. my point is not with encoding per se, i just think that rails should be a bit more fault tolerant around encodings as interop makes it almost a certainty that we will pull incontent with bad encodings just as we pull in malformed html. we cope with the latter well but now need to do so with the former. imho. thanks...gg On Apr 13, 5:05 am, Conrad Taylor <conra...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> 2009/4/13 buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > > > > > > > hector, > > > further update: > > > i was able to set both my internal and external encoding thanks to > > hongli lai at phusion passenger. he helped me with a wrapper for my > > local ruby that uses the encoding option. not suggesting that this is > > his preferred method though, but you don''t seem to be able to pass > > ruby options any other way that i''m aware of in passenger''s apache > > config. > > > /usr/local/ruby1.9/bin/ruby_wrapper: > > #!/bin/bash > > exec /usr/local/ruby1.9/bin/ruby -E utf-8:utf-8 "$@" > > > then in apache2.conf: > > PassengerRuby /usr/local/ruby1.9/bin/ruby_wrapper > > > restart apache. > > > in a controller: > > raise "#{Encoding.default_internal} #{Encoding.default_internal}" > > > results in: > > utf-8 utf-8 > > > so all is good. for my app anyway. irb and script/console is a pain. > > > unfortunately, after all this, my asp pages still get ascii encoded > > when brought in by net::http (after adding all the asp settings i can > > to convince it to use utf). also, more unfortunately, your assertion > > that if i have the default encodings set right (particularly > > default_internal which i do now), that it will silently and fautlessly > > convert my ascii page without error. no joy. got same utf encoding > > error that i started with. > > > so...guess i am back to doing explicit encoding like you suggested or > > going back to iconv. > > > all in all i have to say that ruby1.9 and rails2.3 and encoding and > > irb and compiling your own ruby and... are still very rough. > > > ...gg > > Do you have a test case that I can reproduce the issue that you''re seeing? > > Thanks, > > -Conrad--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
buddycat
2009-Apr-13 22:57 UTC
Re: invalid byte sequence utf-8 OR best option to sanitize content brought in with net::http? single non-utf character causes rails to crash
also, my particular case is with asp content; but i am sure that the problem can be reproduced with any web stack or even a static text file with these characters. On Apr 13, 6:52 pm, buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> so i use lib/asp.rb module to get legacy asp content from internal > win2k/iis5/asp (classic not .net) servers as a mixin and require it in > my application_controller.rb as i have many asp pages. i do it this > way because it gives me a smooth incremental upgrade path to rails > from asp by replacing page for page as we write a better rails > replacement. this way my routes are all rails and i just call > asp_get_content when i have an asp page to wrap. > > controller: > def my_legacy_page > asp_get_content > end > > lib/asp.rb > module asp > def asp_get_content > > @asp_response = Net::HTTP.start(host, port) {|x| > x.read_timeout = 1200 > x.send_request(method, path, data, headers) > } > > # return false on redirects so we can use custom renders like so: > # render :foo => :bar if asp_get_content while still allowing just > # asp_get_content without anything else for standard stuff > case @asp_response > when Net::HTTPRedirection > redirect_to "#{@asp_response[''location'']}" > false > else > true > end > end > > view: > <%= @asp_response.body %> > > to reproduce the issue, just add > > <%= chr(150) %> to the asp page. rails will choke with invalid byte > sequence utf-8 as soon as the response.rb tries to parse > @asp_response.body. see the above comments for the stack trace. > > this is just my particular situation. i suspect you can add any high, > non-standard ascii code that windows likes like ascii 128-159. my test > case is ascii 150 that will reliably reproduce the issue. my point is > not with encoding per se, i just think that rails should be a bit more > fault tolerant around encodings as interop makes it almost a certainty > that we will pull incontent with bad encodings just as we pull in > malformed html. we cope with the latter well but now need to do so > with the former. imho. > > thanks...gg > > On Apr 13, 5:05 am, Conrad Taylor <conra...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > 2009/4/13 buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > > > > hector, > > > > further update: > > > > i was able to set both my internal and external encoding thanks to > > > hongli lai at phusion passenger. he helped me with a wrapper for my > > > local ruby that uses the encoding option. not suggesting that this is > > > his preferred method though, but you don''t seem to be able to pass > > > ruby options any other way that i''m aware of in passenger''s apache > > > config. > > > > /usr/local/ruby1.9/bin/ruby_wrapper: > > > #!/bin/bash > > > exec /usr/local/ruby1.9/bin/ruby -E utf-8:utf-8 "$@" > > > > then in apache2.conf: > > > PassengerRuby /usr/local/ruby1.9/bin/ruby_wrapper > > > > restart apache. > > > > in a controller: > > > raise "#{Encoding.default_internal} #{Encoding.default_internal}" > > > > results in: > > > utf-8 utf-8 > > > > so all is good. for my app anyway. irb and script/console is a pain. > > > > unfortunately, after all this, my asp pages still get ascii encoded > > > when brought in by net::http (after adding all the asp settings i can > > > to convince it to use utf). also, more unfortunately, your assertion > > > that if i have the default encodings set right (particularly > > > default_internal which i do now), that it will silently and fautlessly > > > convert my ascii page without error. no joy. got same utf encoding > > > error that i started with. > > > > so...guess i am back to doing explicit encoding like you suggested or > > > going back to iconv. > > > > all in all i have to say that ruby1.9 and rails2.3 and encoding and > > > irb and compiling your own ruby and... are still very rough. > > > > ...gg > > > Do you have a test case that I can reproduce the issue that you''re seeing? > > > Thanks, > > > -Conrad--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hector Gomez
2009-Apr-14 21:01 UTC
Re: invalid byte sequence utf-8 OR best option to sanitize content brought in with net::http? single non-utf character causes rails to crash
Sorry for the late response. I took a dive in the Net:HTTP code and I have some bad news. It uses a BufferedIO over the socket of the connection. And when it reads from the socket it uses IO#sysread that is the lowest read you can use in ruby. This methods always returns a ASCII-8BIT string. So you have to transcode or force_encoding the responses from Net:HTTP explicitly. Hector On Apr 13, 5:57 pm, buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> also, my particular case is with asp content; but i am sure that the > problem can be reproduced with any web stack or even a static text > file with these characters. > > On Apr 13, 6:52 pm, buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > so i use lib/asp.rb module to get legacy asp content from internal > > win2k/iis5/asp (classic not .net) servers as a mixin and require it in > > my application_controller.rb as i have many asp pages. i do it this > > way because it gives me a smooth incremental upgrade path to rails > > from asp by replacing page for page as we write a better rails > > replacement. this way my routes are all rails and i just call > > asp_get_content when i have an asp page to wrap. > > > controller: > > def my_legacy_page > > asp_get_content > > end > > > lib/asp.rb > > module asp > > def asp_get_content > > > @asp_response = Net::HTTP.start(host, port) {|x| > > x.read_timeout = 1200 > > x.send_request(method, path, data, headers) > > } > > > # return false on redirects so we can use custom renders like so: > > # render :foo => :bar if asp_get_content while still allowing just > > # asp_get_content without anything else for standard stuff > > case @asp_response > > when Net::HTTPRedirection > > redirect_to "#{@asp_response[''location'']}" > > false > > else > > true > > end > > end > > > view: > > <%= @asp_response.body %> > > > to reproduce the issue, just add > > > <%= chr(150) %> to the asp page. rails will choke with invalid byte > > sequence utf-8 as soon as the response.rb tries to parse > > @asp_response.body. see the above comments for the stack trace. > > > this is just my particular situation. i suspect you can add any high, > > non-standard ascii code that windows likes like ascii 128-159. my test > > case is ascii 150 that will reliably reproduce the issue. my point is > > not with encoding per se, i just think that rails should be a bit more > > fault tolerant around encodings as interop makes it almost a certainty > > that we will pull incontent with bad encodings just as we pull in > > malformed html. we cope with the latter well but now need to do so > > with the former. imho. > > > thanks...gg > > > On Apr 13, 5:05 am, Conrad Taylor <conra...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > 2009/4/13 buddycat <buddy12lb...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > > > > > hector, > > > > > further update: > > > > > i was able to set both my internal and external encoding thanks to > > > > hongli lai at phusion passenger. he helped me with a wrapper for my > > > > local ruby that uses the encoding option. not suggesting that this is > > > > his preferred method though, but you don''t seem to be able to pass > > > > ruby options any other way that i''m aware of in passenger''s apache > > > > config. > > > > > /usr/local/ruby1.9/bin/ruby_wrapper: > > > > #!/bin/bash > > > > exec /usr/local/ruby1.9/bin/ruby -E utf-8:utf-8 "$@" > > > > > then in apache2.conf: > > > > PassengerRuby /usr/local/ruby1.9/bin/ruby_wrapper > > > > > restart apache. > > > > > in a controller: > > > > raise "#{Encoding.default_internal} #{Encoding.default_internal}" > > > > > results in: > > > > utf-8 utf-8 > > > > > so all is good. for my app anyway. irb and script/console is a pain. > > > > > unfortunately, after all this, my asp pages still get ascii encoded > > > > when brought in by net::http (after adding all the asp settings i can > > > > to convince it to use utf). also, more unfortunately, your assertion > > > > that if i have the default encodings set right (particularly > > > > default_internal which i do now), that it will silently and fautlessly > > > > convert my ascii page without error. no joy. got same utf encoding > > > > error that i started with. > > > > > so...guess i am back to doing explicit encoding like you suggested or > > > > going back to iconv. > > > > > all in all i have to say that ruby1.9 and rails2.3 and encoding and > > > > irb and compiling your own ruby and... are still very rough. > > > > > ...gg > > > > Do you have a test case that I can reproduce the issue that you''re seeing? > > > > Thanks, > > > > -Conrad--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---