Alain Ravet
2007-Nov-13 12:47 UTC
[Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)
Hi all, I cannot make aaf (rev. 220) use my custom analyzer, despite following the indications @ http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage To pinpoint the problem, I created a model + a simple analyzer with 2 stop words : "fax" and "gsm". test 1 : model.rebuild_index + model.find_by_contents("fax") # fax is a stop word. => I get a result when I should not. (note : I delete the index directory => I can see the index is recreated, index/develop ). test 2 : insert a ''raise'' in the token_stream() method => it''s never thrown. test 3 : use the standard analyzer, to exclude the 2 stop words => same wrong result. class AccessPointKind2 < ActiveRecord::Base set_table_name "access_point_kinds2" acts_as_ferret( {:remote => true, :fields => { :name => {:store => :yes}} } , { :analyzer => Ferret::Analysis::StandardAnalyzer.new(["fax","gsm"]) } ) end Here are the model and the analyzer : MODEL : class AccessPointKind2 < ActiveRecord::Base set_table_name "access_point_kinds2" acts_as_ferret( {:remote => true, :fields => { :name => {:store => :yes}} } , {:analyzer => PlainAsciiAnalyzer.new} ) end ANALYZER lib : plain_ascii_analyzer.rb class PlainAsciiAnalyzer < ::Ferret::Analysis::Analyzer include ::Ferret::Analysis def token_stream(field, str) StopFilter.new( StandardTokenizer.new(str) , ["fax", "gsm"] ) # raise <<<----- is never executed when uncommented !! end end In the console, I rebuild the index + search for a stop word => I get a results, when I should not :>> reload!; AccessPointKind2.rebuild_index ;AccessPointKind2.find_by_contents("gsm").collect &:name Reloading... AccessPointKind2 Columns (0.002963) SHOW FIELDS FROM access_point_kinds2 Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil, looks like we are not the server Will use remote index server which should be available at druby://localhost:9010 default field list: [:name] AccessPointKind2 Load (0.002706) SELECT * FROM access_point_kinds2 WHERE (access_point_kinds2.id in (''7'',''12'',''13'',''8'',''2'')) Query: gsm total hits: 5, results delivered: 5 => ["gsm", "gsm", "gsm(werk)", "gsm(priv?)", "gsm(priv?)"]>>I guess it''s obvious, but I cannot see it. Help. Thanks in advance. Alain -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071113/29bd78c4/attachment.html
Jens Kraemer
2007-Nov-14 09:25 UTC
[Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)
Hi, I just tried and I''m afraid I couldn''t reproduce your problem here (with aaf trunk). I just committed a testcase using StandardAnalyzer with your stop word list, and it works as intended. I also tried with your analyzer class from below, same result. Could you please try the lates aaf from trunk to see if it fixes your problem? Cheers, Jens On Tue, Nov 13, 2007 at 01:47:04PM +0100, Alain Ravet wrote:> Hi all, > > > I cannot make aaf (rev. 220) use my custom analyzer, despite following the > indications @ > > http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage > > > To pinpoint the problem, I created a model + a simple analyzer with 2 stop > words : "fax" and "gsm". > > test 1 : model.rebuild_index + model.find_by_contents("fax") # fax is a > stop word. > => I get a result when I should not. > > (note : I delete the index directory => I can see the index is recreated, > index/develop > > ). > > test 2 : insert a ''raise'' in the token_stream() method => it''s never thrown. > > test 3 : use the standard analyzer, to exclude the 2 stop words => same > wrong result. > class AccessPointKind2 < ActiveRecord::Base > > set_table_name "access_point_kinds2" > > acts_as_ferret( > {:remote => true, :fields => { :name => {:store => :yes}} } , > { :analyzer => > Ferret::Analysis::StandardAnalyzer.new(["fax","gsm"]) > } > ) > end > > > > > > Here are the model and the analyzer : > MODEL : > > class AccessPointKind2 < ActiveRecord::Base > set_table_name "access_point_kinds2" > > acts_as_ferret( > {:remote => true, :fields => { :name => {:store => :yes}} } , > {:analyzer => PlainAsciiAnalyzer.new} > ) > end > > > ANALYZER > lib : plain_ascii_analyzer.rb > class PlainAsciiAnalyzer < ::Ferret::Analysis::Analyzer > include ::Ferret::Analysis > def token_stream(field, str) > StopFilter.new( > StandardTokenizer.new(str) , > ["fax", "gsm"] > ) > # raise <<<----- is never executed when uncommented !! > end > end > > > > In the console, I rebuild the index + search for a stop word => I get a > results, when I should not : > > > >> reload!; AccessPointKind2.rebuild_index ; > AccessPointKind2.find_by_contents("gsm").collect &:name > Reloading... > AccessPointKind2 Columns (0.002963) SHOW FIELDS FROM access_point_kinds2 > Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil, > looks like we are not the server > Will use remote index server which should be available at > druby://localhost:9010 > default field list: [:name] > AccessPointKind2 Load (0.002706) SELECT * FROM access_point_kinds2 WHERE > (access_point_kinds2.id in (''7'',''12'',''13'',''8'',''2'')) > Query: gsm > total hits: 5, results delivered: 5 > => ["gsm", "gsm", "gsm(werk)", "gsm(priv?)", "gsm(priv?)"] > >> > > > I guess it''s obvious, but I cannot see it. > Help. > > Thanks in advance. > > Alain> _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk-- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database
Alain Ravet
2007-Nov-14 21:51 UTC
[Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)
Jens, > I just tried and I''m afraid I couldn''t reproduce your problem here (with aaf trunk). ... > Could you please try the lates aaf from trunk to see if it fixes your problem? Same problem after installing the lasted version (262) of aaf : the custop analyzer I pass as an aaf parameter is not used. As a quick test, I tried using the "No Stop Word" custom analyzer as documented @ http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage on a simple LUT table/model, to no avail. I tried the new syntax with the same wrong result. Setup : * I''ve installed the latest trunk version of aaf (262) * killed + restarted a (new) DrB server $ ./script/ferret_server -e production start * checked the Ferret version : $ gem list ferret ==> ferret (0.11.4) Test : I created a record where the name is a default stop word >> Country.find 11 Country Load (0.000388) SELECT * FROM countries WHERE (countries.`id` = 11) => #<Country id: 11, name: " the"> model, way 1 : class Country < ActiveRecord::Base acts_as_ferret( { :fields => [:name] }, { :analyzer => Ferret::Analysis::StandardAnalyzer.new( []) } ) end model, way 2 : class Country < ActiveRecord::Base acts_as_ferret( :fields => [:name] , :remote => true, :ferret => {:analyzer => Ferret::Analysis:: StandardAnalyzer.new([]) } ) end PROBLEM : in both cases it doesn''t find any record where the name is ''the'' >> reload! ; Country.*rebuild_index* ; Country.*find_by_contents*(" the") >> reload! ; Country.rebuild_index ; Country.find_by_contents ("the") Reloading... Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil, looks like we are not the server Will use remote index server which should be available at druby://localhost:9010 default field list: [:name] Query: the total hits: 0, results delivered: 0 => #<ActsAsFerret::SearchResults:0x324ab3c @per_page=0, @current_page=nil, @total_hits=0, @results=[], @total_pages=0> I tried with my custom analyser (from the previous message), with the same wrong result. So, it looks like aaf is not using the custom analyzer I declared in the model. It doesn''t make any sense to me. Alain Ravet -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071114/a02e70a9/attachment.html
Alain Ravet
2007-Nov-14 21:58 UTC
[Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)
remark : some spaces were erroneously inserted before the word "the" when I formatted the email, and are not present in the real code. So > => #<Country id: 11, name: " the"> > .. > >> reload! ; Country.rebuild_index ; Country.find_by_contents(" the") should read : > => #<Country id: 11, name: "the"> > .. > >> reload! ; Country.rebuild_index ; Country.find_by_contents("the")
Alain Ravet
2007-Nov-14 23:00 UTC
[Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)
I''m one step further : - Good : I now know aaf knows about/received the custom analyzer but - Bad : the analyzer is not used by aaf ( : it stops on words it should not stop on) New test : a "no stop word" analyzer, adapted from the german stemming analyser @ http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage file: model/country.rb ---------------------- class Test2Analyzer < ::Ferret::Analysis::Analyzer include Ferret::Analysis def initialize(stop_words = []) @stop_words = stop_words end def token_stream(field, str) StemFilter.new(StopFilter.new(LowerCaseFilter.new( StandardTokenizer.new(str)), @stop_words), ''de'') end end class Country < ActiveRecord::Base acts_as_ferret( :fields => [:name] , :remote => true, :ferret => {:analyzer => Test2Analyzer.new([]) } ) end 0?/ delete the ferret index directory 1?/ restart the console and rebuild the index : ./script/console >> Country.rebuild_index Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil, looks like we are not the server Will use remote index server which should be available at druby://localhost:9010 default field list: [:name] => nil 2?/ confirm that aaf knows about my "no_stop_words" custom analyzer :>> puts Country.aaf_index.to_yaml--- !ruby/object:ActsAsFerret::RemoteIndex config: :fields: - :name :mysql_fast_batches: true :name: countries :class_name: Country :index_dir: /Users/aravet/aaprojets/newgids/newgids_machine/index/development/country :remote: druby://localhost:9010 :reindex_batch_size: 1000 :store_class_name: false :ferret_fields: :name: :store: :no :term_vector: :with_positions_offsets :boost: 1.0 :index: :yes :highlight: :yes :single_index: false :ferret: &id001 :key: :id :auto_flush: true :or_default: false :path: /Users/aravet/aaprojets/newgids/newgids_machine/index/development/country :create_if_missing: true :handle_parse_errors: true :analyzer: !ruby/object:Test2Analyzer <<<<----------- Good stop_words: [] <<<<----------- Good :default_field: - :name :enabled: true ferret_config: *id001 server: !ruby/object:DRb::DRbObject ref: uri: druby://localhost:9010 => nil 3?/ confirm that there is record with name == "the" >> Country.find_by_name "the" Country Load (0.000427) SELECT * FROM countries WHERE (countries.`name` = ''the'') LIMIT 1 => #<Country id: 11, name: "the"> 4?/ try and find "t*" it with aaf => DOES NOT WORK (does not find Country[:name => "the"]) >> Country.find_by_contents "t*" Query: t* total hits: 0, results delivered: 0 => #<ActsAsFerret::SearchResults:0x31ff754 @per_page=0, @current_page=nil, @total_hits=0, @results=[], @total_pages=0> 5?/ do the same for "t*", a non stop word => IT WORKS (finds Country[:name => "Frankrijk"])>> Country.find_by_contents "f*"Country Load (0.000420) SELECT * FROM countries WHERE (countries.id in (''2'')) Query: f* total hits: 1, results delivered: 1 => #<ActsAsFerret::SearchResults:0x31fa4ac @per_page=1, @current_page=nil, @total_hits=1, @results=[#<Country id: 2, name: "Frankrijk">], total_pages1 So, aaf (rev 262) * associates the right custom analyzer with the model, * but doesn''t seem to use it when finding_by_contents (? and rebuilding the index ??) Alain -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20071115/1c0510a6/attachment-0001.html
Hongli Lai
2007-Nov-14 23:24 UTC
[Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)
Alain Ravet wrote:> class Country < ActiveRecord::Base > acts_as_ferret( > :fields => [:name] , > :remote => true, > :ferret => {:analyzer => Test2Analyzer.new([]) } > ) > endTry this: acts_as_ferret({ :fields => [:name], :remote => true }, { :analyzer => Test2Analyzer.new([]) })
Jens Kraemer
2007-Nov-15 09:07 UTC
[Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)
On Thu, Nov 15, 2007 at 12:24:25AM +0100, Hongli Lai wrote:> Alain Ravet wrote: > > class Country < ActiveRecord::Base > > acts_as_ferret( > > :fields => [:name] , > > :remote => true, > > :ferret => {:analyzer => Test2Analyzer.new([]) } > > ) > > end > > Try this: > > acts_as_ferret({ :fields => [:name], :remote => true }, > { :analyzer => Test2Analyzer.new([]) })this won''t help, these are both valid ways to call acts_as_ferret. The :ferret syntax is the preferred one, however. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
Jens Kraemer
2007-Nov-15 09:13 UTC
[Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)
Hi Alain, could you please check the index created by aaf with plain ferret and your custom analyzer to see if your queries deliver the expected results then? That way we should be able to find out if the problem is with indexing or searching through aaf. Jens On Thu, Nov 15, 2007 at 12:00:04AM +0100, Alain Ravet wrote:> I''m one step further : > - Good : I now know aaf knows about/received the custom analyzer > but > - Bad : the analyzer is not used by aaf ( : it stops on words it should > not stop on) > > New test : a "no stop word" analyzer, adapted from the german stemming > analyser @ > http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage > > > file: model/country.rb > ---------------------- > class Test2Analyzer < ::Ferret::Analysis::Analyzer > include Ferret::Analysis > def initialize(stop_words = []) > @stop_words = stop_words > end > def token_stream(field, str) > StemFilter.new(StopFilter.new(LowerCaseFilter.new( > StandardTokenizer.new(str)), @stop_words), ''de'') > end > end > class Country < ActiveRecord::Base > acts_as_ferret( > :fields => [:name] , > :remote => true, > :ferret => {:analyzer => Test2Analyzer.new([]) } > ) > end > > > 0?/ delete the ferret index directory > 1?/ restart the console and rebuild the index : > > > ./script/console > >> Country.rebuild_index > Asked for a remote server ? true, ENV["FERRET_USE_LOCAL_INDEX"] is nil, > looks like we are not the server > Will use remote index server which should be available at > druby://localhost:9010 > default field list: [:name] > => nil > > > 2?/ confirm that aaf knows about my "no_stop_words" custom analyzer : > > >> puts Country.aaf_index.to_yaml > --- !ruby/object:ActsAsFerret::RemoteIndex > config: > :fields: > - :name > :mysql_fast_batches: true > :name: countries > :class_name: Country > :index_dir: > /Users/aravet/aaprojets/newgids/newgids_machine/index/development/country > :remote: druby://localhost:9010 > :reindex_batch_size: 1000 > :store_class_name: false > :ferret_fields: > :name: > :store: :no > :term_vector: :with_positions_offsets > :boost: 1.0 > :index: :yes > :highlight: :yes > :single_index: false > :ferret: &id001 > :key: :id > :auto_flush: true > :or_default: false > :path: > /Users/aravet/aaprojets/newgids/newgids_machine/index/development/country > :create_if_missing: true > :handle_parse_errors: true > :analyzer: !ruby/object:Test2Analyzer <<<<----------- Good > stop_words: [] <<<<----------- Good > :default_field: > - :name > :enabled: true > ferret_config: *id001 > server: !ruby/object:DRb::DRbObject > ref: > uri: druby://localhost:9010 > => nil > > > > > 3?/ confirm that there is record with name == "the" > > >> Country.find_by_name "the" > Country Load (0.000427) SELECT * FROM countries WHERE (countries.`name` > = ''the'') LIMIT 1 > => #<Country id: 11, name: "the"> > > > 4?/ try and find "t*" it with aaf > => DOES NOT WORK (does not find Country[:name => "the"]) > > >> Country.find_by_contents "t*" > Query: t* > total hits: 0, results delivered: 0 > => #<ActsAsFerret::SearchResults:0x31ff754 @per_page=0, @current_page=nil, > @total_hits=0, @results=[], @total_pages=0> > > > 5?/ do the same for "t*", a non stop word > => IT WORKS (finds Country[:name => "Frankrijk"]) > > >> Country.find_by_contents "f*" > Country Load (0.000420) SELECT * FROM countries WHERE (countries.id in > (''2'')) > Query: f* > total hits: 1, results delivered: 1 > => #<ActsAsFerret::SearchResults:0x31fa4ac @per_page=1, @current_page=nil, > @total_hits=1, @results=[#<Country id: 2, name: "Frankrijk">], total_pages1 > > > So, aaf (rev 262) > * associates the right custom analyzer with the model, > * but doesn''t seem to use it when finding_by_contents (? and rebuilding the > index ??) > > > Alain> _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk-- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
syrius.ml at no-log.org
2008-Jan-29 16:04 UTC
[Ferret-talk] acts_as_ferret : cannot use a customized Analyzer (as indicated in the AdvancedUsageNotes)
Jens Kraemer <kraemer at webit.de> writes:> On Thu, Nov 15, 2007 at 12:24:25AM +0100, Hongli Lai wrote: >> Alain Ravet wrote: >> > class Country < ActiveRecord::Base >> > acts_as_ferret( >> > :fields => [:name] , >> > :remote => true, >> > :ferret => {:analyzer => Test2Analyzer.new([]) } >> > ) >> > end >> >> Try this: >> >> acts_as_ferret({ :fields => [:name], :remote => true }, >> { :analyzer => Test2Analyzer.new([]) }) > > this won''t help, these are both valid ways to call acts_as_ferret. The > :ferret syntax is the preferred one, however.Just for information, I was using an old or bad syntax for aaf. I was using acts_as_ferret :fields [], :analyzer => MyAnalyzer.new and it wasn''t working. (A raise in initialize of MyAnalyzer was raising but not in token_stream) I''m now using :ferret => {:analyzer => MyAnalyzer} and it works as expected. --