I have spent days trying to figure out how to get UTF-8 working with my site. Here''s my environment: Linux version 2.6.16.29-xen_3.0.3.0 Ruby 1.8.4 (2005-12-24 [i386-linux] Rails 1.2.3 mongrel (1.0.1) mongrel_cluster (1.0.2, 0.2.1) ferret (0.11.4) acts_as_ferret stable plugin Ferret DRB server When I don''t use an analyzer with my acts_as_ferret declaration, everything works fine. However, I can''t expect users to enter "?lex Rodr?guez" when searching.. they''re going to put "alex rodriguez" (or some variation of his name, which I handle using a fuzzy search) So then call an analyzer in my acts_as_ferret declaration: acts_as_ferret({ :fields => {:first_name => {:store => :no}, :last_name => {:store => :no}, :db_state => {:index => :untokenized_omit_norms, :term_vector => :no}}, :remote => true}, {:analyzer => UtfAnalyzer.new}) Here''s the analyzer I''m using... pretty much taken from from here: http://ferret.davebalmain.com/api/classes/Ferret/Analysis/MappingFilter.html ----- class UtfAnalyzer < Ferret::Analysis::Analyzer include Ferret::Analysis CHARACTER_MAPPINGS = { [''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?''] => ''a'', ''?'' => ''ae'', [''?'',''?''] => ''d'', [''?'',''?'',''?'',''?'',''?''] => ''c'', [''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?'',] => ''e'', [''?''] => ''f'', [''?'',''?'',''?'',''?''] => ''g'', [''?'',''?''] => ''h'', [''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?''] => ''i'', [''?'',''?'',''?'',''?''] => ''j'', [''?'',''?''] => ''k'', [''?'',''?'',''?'',''?'',''?''] => ''l'', [''?'',''?'',''?'',''?'',''?'',''?''] => ''n'', [''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?''] => ''o'', [''?''] => ''oek'', [''?''] => ''q'', [''?'',''?'',''?''] => ''r'', [''?'',''?'',''?'',''?'',''?''] => ''s'', [''?'',''?'',''?'',''?''] => ''t'', [''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?''] => ''u'', [''?''] => ''w'', [''?'',''?'',''?''] => ''y'', [''?'',''?'',''?''] => ''z'' } def token_stream(field, str) MappingFilter.new(StandardTokenizer.new(str), CHARACTER_MAPPINGS) end end I think Ferret is working fine... because when I run some tests, the mapping filter correctly pulls out the accented characters... exactly as it should. However, when something is persisted via the model (acts_as_ferret and DRB server), I get unexpected behavior... - using a model with ONE field declared in acts_as_ferret, and a string with accented characters -- I can search it as expected - with either accented or non-accented character, adn I get the results returned; however, I don''t get any other results for the non-accented records. ONLY the accented records get returned when searching. - using a model with multiple characters defined (as in Player model above) -- nothing gets returned, neither accented or non-accented records, or any combination My ferret_server.log file shows characters that are very different from the accented characters I''m trying to search on... Search entered in form: ?lex Rodr?guez ferret_server.log: ?lex rodr??guez Not sure why this is occuring, but I''ve also redisplayed the submitted text on a web page and it displays correctly. This leads me to believe that Ruby/Rails is successfully getting the information, and that html page encoding is correct, along with environment variables, etc.. As I stated earlier, my Ferret test takes the string "Rodr?guez" and returns token["Rodriguez":0:10:1] demonstrating the UtfAnalyzer works fine outside of acts_as_ferret... So any help here would be much appreciated. Thanks, Brandon -- Posted via http://www.ruby-forum.com/.
Hi! This is really strange - are you sure the DRb server runs in a proper utf8 environment, just as your testcases do? Jens On Thu, Sep 20, 2007 at 08:01:48PM +0200, Brandon Kelly wrote:> I have spent days trying to figure out how to get UTF-8 working with my > site. > > Here''s my environment: > > Linux version 2.6.16.29-xen_3.0.3.0 > Ruby 1.8.4 (2005-12-24 [i386-linux] > Rails 1.2.3 > mongrel (1.0.1) > mongrel_cluster (1.0.2, 0.2.1) > ferret (0.11.4) > acts_as_ferret stable plugin > Ferret DRB server > > > When I don''t use an analyzer with my acts_as_ferret declaration, > everything works fine. However, I can''t expect users to enter "?lex > Rodr?guez" when searching.. they''re going to put "alex rodriguez" (or > some variation of his name, which I handle using a fuzzy search) > > So then call an analyzer in my acts_as_ferret declaration: > > > acts_as_ferret({ :fields => {:first_name => {:store => :no}, > :last_name => {:store => :no}, > :db_state => {:index => > :untokenized_omit_norms, :term_vector => :no}}, > :remote => true}, {:analyzer => UtfAnalyzer.new}) > > > Here''s the analyzer I''m using... pretty much taken from from here: > http://ferret.davebalmain.com/api/classes/Ferret/Analysis/MappingFilter.html > > ----- > class UtfAnalyzer < Ferret::Analysis::Analyzer > include Ferret::Analysis > CHARACTER_MAPPINGS = { > [''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?''] => ''a'', > ''?'' => ''ae'', > [''?'',''?''] => ''d'', > [''?'',''?'',''?'',''?'',''?''] => ''c'', > [''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?'',] => ''e'', > [''?''] => ''f'', > [''?'',''?'',''?'',''?''] => ''g'', > [''?'',''?''] => ''h'', > [''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?''] => ''i'', > [''?'',''?'',''?'',''?''] => ''j'', > [''?'',''?''] => ''k'', > [''?'',''?'',''?'',''?'',''?''] => ''l'', > [''?'',''?'',''?'',''?'',''?'',''?''] => ''n'', > [''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?''] => ''o'', > [''?''] => ''oek'', > [''?''] => ''q'', > [''?'',''?'',''?''] => ''r'', > [''?'',''?'',''?'',''?'',''?''] => ''s'', > [''?'',''?'',''?'',''?''] => ''t'', > [''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?'',''?''] => ''u'', > [''?''] => ''w'', > [''?'',''?'',''?''] => ''y'', > [''?'',''?'',''?''] => ''z'' > } > > def token_stream(field, str) > MappingFilter.new(StandardTokenizer.new(str), CHARACTER_MAPPINGS) > end > > end > > I think Ferret is working fine... because when I run some tests, the > mapping filter correctly pulls out the accented characters... exactly as > it should. > > However, when something is persisted via the model (acts_as_ferret and > DRB server), I get unexpected behavior... > > - using a model with ONE field declared in acts_as_ferret, and a string > with accented characters -- I can search it as expected - with either > accented or non-accented character, adn I get the results returned; > however, I don''t get any other results for the non-accented records. > ONLY the accented records get returned when searching. > > - using a model with multiple characters defined (as in Player model > above) -- nothing gets returned, neither accented or non-accented > records, or any combination > > My ferret_server.log file shows characters that are very different from > the accented characters I''m trying to search on... > > Search entered in form: ?lex Rodr?guez > ferret_server.log: ?lex rodr??guez > > Not sure why this is occuring, but I''ve also redisplayed the submitted > text on a web page and it displays correctly. This leads me to believe > that Ruby/Rails is successfully getting the information, and that html > page encoding is correct, along with environment variables, etc.. As I > stated earlier, my Ferret test takes the string "Rodr?guez" and returns > token["Rodriguez":0:10:1] demonstrating the UtfAnalyzer works fine > outside of acts_as_ferret... > > So any help here would be much appreciated. > > Thanks, > > Brandon > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk-- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database
Thanks for the quick response Jens. Okay -- my problem apparently is that I''ve been deploying new code (which stops and starts the ferret server), then I would go in and delete the index. So the index gets recreated, but the DRB server "remembers" the previous index, or settings, or whatever. When I follow these steps, the index is created correctly, and the analyzer works fine... 1. deploy new code 2. script/ferret_stop 3. rm -rf index/production 4. script/ferret_start The key for me to remember is to stop the DRB server BEFORE deleting the index. I''ve created a simple capistrano recipe to handle this in the future. Thanks again. - Brandon>Jens Kraemer wrote: > Hi! > > This is really strange - are you sure the DRb server runs in a proper > utf8 environment, just as your testcases do? > > Jens >-- Posted via http://www.ruby-forum.com/.
On Fri, Sep 21, 2007 at 02:31:04AM +0200, Brandon Kelly wrote:> Thanks for the quick response Jens. > > Okay -- my problem apparently is that I''ve been deploying new code > (which stops and starts the ferret server), then I would go in and > delete the index. So the index gets recreated, but the DRB server > "remembers" the previous index, or settings, or whatever. > > When I follow these steps, the index is created correctly, and the > analyzer works fine... > > 1. deploy new code > 2. script/ferret_stop > 3. rm -rf index/production > 4. script/ferret_start > > The key for me to remember is to stop the DRB server BEFORE deleting the > index. > > I''ve created a simple capistrano recipe to handle this in the future.cool. I usually put the index directory into shared/ and symlink it into the current release during deploment. This saves you the index rebuild after deploying. Cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
I do have the site setup this way. My deploy script stops and starts the DRB server without touching the index (which is what I want most of the time). My problem arose when I needed to delete the index. I''d deploy new code, DRB would restart with the old index in place... then I''d delete the old index (while DRB server was running)... and watch it rebuild. The rebuilt index had problems. Wasn''t until I realized I need to delete the index only when DRB server isnt'' running. (at least that works for me). Thanks again.> cool. I usually put the index directory into shared/ and symlink it into > the current release during deploment. This saves you the index rebuild > after deploying.-- Posted via http://www.ruby-forum.com/.
On Fri, Sep 21, 2007 at 04:10:37PM +0200, Brandon Kelly wrote:> I do have the site setup this way. > > My deploy script stops and starts the DRB server without touching the > index (which is what I want most of the time). > > My problem arose when I needed to delete the index. I''d deploy new > code, DRB would restart with the old index in place... then I''d delete > the old index (while DRB server was running)... and watch it rebuild. > The rebuilt index had problems. Wasn''t until I realized I need to > delete the index only when DRB server isnt'' running. (at least that > works for me).yes, deleting the index while the server is running isn''t a good idea. You may also run Model.rebuild_index from a script after deployment to rebuild the index, or even create a rebuild_index deployment recipe via Capistrano. cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database