I''m looking for feedback on this, read on, please... I''m just fed up with Microsoft''s "stupid quotes" feature (and for sake of later searchers, I''ll add that it''s often known as "smart quotes", although as with anything Microsoft you''re safe to substitute the word "stupid" anywhere they use the word "smart"). I just completed a nice application, and suddenly an external piece failed. It first uses xmlrpc to grab some data from the database and stick it in a yaml file. A couple of other programs read the yaml file and create various other files. Those programs were crapping because it couldn''t read the entire yaml file. It turns out that the problem was with people using stupid quotes. Here''s the sledgehammer that I applied in app/controllers/application.rb: before_filter :fix_stupid_quotes_in_params def fix_stupid_quotes_in_params dig_deep(@params) { |s| fix_stupid_quotes!(s) } end def dig_deep(hash, &block) if hash.instance_of? String yield(hash) elsif hash.kind_of? Hash hash.each_key { |h| dig_deep(hash[h]) { |s| block.call(s) } } else nil end end def fix_stupid_quotes!(s) s.gsub!(/\x82/,'','') s.gsub!(/\x84/,'',,'') s.gsub!(/\x85/,''...'') s.gsub!(/\x88/,''^'') s.gsub!(/\x89/,''o/oo'') s.gsub!(/\x8b/,''<'') s.gsub!(/\x8c/,''OE'') s.gsub!(/\x91|\x92/,"''") s.gsub!(/\x93|\x94/,''"'') s.gsub!(/\x95/,''*'') s.gsub!(/\x96/,''-'') s.gsub!(/\x97/,''--'') s.gsub!(/\x98/,''~'') s.gsub!(/\x99/,''TM'') s.gsub!(/\x9b/,''>'') s.gsub!(/\x9c/,''oe'') end If this is a bad idea, I''ll have to implement it on one particular page. The fact is, though, that these characters are always invalid (in Latin/UTF-8 type char sets) so I see no reason to allow them through ever. I hate modifying the params, but again, these are just not valid characters. I don''t want to have to think about it in each model or controller. This is a sledgehammer approach, as it will always walk through params on every page and fix the stupid quotes characters. I''m looking for any thoughts, suggestions, comments, etc. on the above code. Thanks, Michael -- Michael Darrin Chaney mdchaney-c1nKWHh82D8TjS1aD1bK6AC/G2K4zDHf@public.gmane.org http://www.michaelchaney.com/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk -~----------~----~----~----~------~----~------~--~---
Michael Chaney wrote:> I''m looking for feedback on this, read on, please... > these characters are always invalid (in Latin/UTF-8 type char sets)It may be hard to go back and recreate all this. I bumped into this page and thought I''d comment. My question is if Microsoft is reporting that the request is in UTF-8. It may say that the request is in some other code page. And, if you want UTF-8 and the request is not in UTF-8, then you need to convert it using something like iconv. If the request says it is utf-8 and, as you point out, it has invalid utf-8 code points, then you have a great reason to bitch. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
IIRC, these are double-byte characters. The problem is not so much in using them, but in interpreting them. For e.g., user types stuff into Word, copies, then pastes into a textarea. Hits send. Application obediently stores in database. Application displays data and s(mart| upid) quotes are in place correctly. Then programmer gets a wild idea -- like restoring the database from a backup. Because the backup is a text file, the DBCs are misinterpreted as they are imported into the database. Result, improper display of these characters. If you come up with a solution that works for these cute characters that (whatever you call them) everyone has in their word processing documents, let us all know. Here are some references that sort of work: demoronizer (Perl script) www.fourmilab.ch/webtools/demoroniser I can''t attribute this second one, but it''s a shell script. I tried it on a database dump and it left me with less cleanup work -- maybe it will provide some clues for you: #!/bin/sh this_directory=`pwd` for x do echo -n "converting $x: " if test "$x" = runiconv.sh; then echo "not editing script itself!" elif [ -d $x ]; then (cp runiconv.sh $x; cd $x; sh runiconv.sh *; rm -f runiconv.sh cd .. ) elif test -s $x; then iconv --from-code=euc-kr --to-code=UTF-8 < $x > $this_directory/$x$$ ; if [ $? == 0 ] then cp $this_directory/$x$$ $x rm -f $this_directory/$x$$ else echo -n "ICONVE ERROr " rm -f $this_directory/$x$$ fi echo "done"; else echo "original file is empty" fi done echo "all done" On Nov 19, 2007, at 9:06 AM, Perry Smith wrote:> > Michael Chaney wrote: >> I''m looking for feedback on this, read on, please... >> these characters are always invalid (in Latin/UTF-8 type char sets) > > It may be hard to go back and recreate all this. I bumped into this > page and thought I''d comment. > > My question is if Microsoft is reporting that the request is in UTF-8. > It may say that the request is in some other code page. And, if you > want UTF-8 and the request is not in UTF-8, then you need to > convert it > using something like iconv. > > If the request says it is utf-8 and, as you point out, it has invalid > utf-8 code points, then you have a great reason to bitch. > > -- > Posted via http://www.ruby-forum.com/. > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---