Max Williams
2009-Apr-09 11:45 UTC
[Ferret-talk] Weird analyzer issue with the word ''fly''
Hi all I''m using a_a_f in rails with a StemmingAnalyzer, in the index and in my search. I got the idea from this topic: http://www.ruby-forum.com/topic/80178 I''m having a problem with some search terms - i narrowed one of them down to the inclusion of the word ''fly''. Can anyone give me any clues at to what might be happening, or even how i can investigate? My index is set up like this: acts_as_ferret({ :store_class_name => true, :analyzer => Ferret::Analysis::StemmingAnalyzer.new, :fields => {:name => { :boost => 2.0 }, ... }}) And this analyzer is defined in a module thus: module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end Now, here''s a search without using the analyzer:>> TeachingObject.find_with_ferret("flea fly", :per_page => 2000).size=> 14 And with the analyzer:>> TeachingObject.find_with_ferret("flea fly", :per_page => 2000, :analyzer => Ferret::Analysis::StemmingAnalyzer.new).size=> 0 Now, for other searches, the analyzer seems to be doing it''s job nicely. EG i have lots of resources with the word ''brass''. With the analyzer, a search for ''brasses'' brings all these resources back, while without the analyzer i don''t get any of them: that''s all fine, it''s working out that ''brasses'' and ''brass'' are equivalent searches. So what''s going on with the word ''fly''? It''s definitely this word because if i change one of the "flea fly" resources to be called "flea walk" then a search for ''flea walk'' brings it back, as does a search for ''flea walks''. I''m guessing that the analyzer takes a word and converts it into other terms, or some symbols or something, and searches with that combined set, and during this process the orginal word ''fly'' gets lost somewhere. But, i don''t know where to look to monitor this process. Any help/advice/clues very welcome... thanks max -- Posted via http://www.ruby-forum.com/.
Max Williams
2009-Apr-09 12:13 UTC
[Ferret-talk] Weird analyzer issue with the word ''fly''
Just a bit more info - i started to look at what''s going on in the analyzer by putting a bit of logging in: module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) RAILS_DEFAULT_LOGGER.debug "SEARCHING, field = #{field}, text = #{text}" StemFilter.new(StandardTokenizer.new(text)) end end end And, i see these results for a single search on "flea fly": SEARCHING, field = property_ancestor_names, text = flea SEARCHING, field = description, text = flea SEARCHING, field = name, text = flea SEARCHING, field = keyword_string, text = flea SEARCHING, field = property_ids_string, text = flea SEARCHING, field = property_names, text = flea SEARCHING, field = unaccented_name, text = flea SEARCHING, field = property_titles, text = flea SEARCHING, field = resource_id, text = flea One call to token_stream for each of my indexed methods, but with each only using the first word of the search! Now i''m even more confused... -- Posted via http://www.ruby-forum.com/.
Jens Kraemer
2009-Apr-09 12:40 UTC
[Ferret-talk] Weird analyzer issue with the word ''fly''
Hi Max! On 09.04.2009, at 13:45, Max Williams wrote:> > I''m having a problem with some search terms - i narrowed one of them > down to the inclusion of the word ''fly''. Can anyone give me any clues > at to what might be happening, or even how i can investigate?First of all I''d have a look at what the analyzer does to your query terms: ts = StemmingAnalyzer.new.token_stream nil, ''flea fly'' while token = ts.next puts token end For some reason the word ''fly'' is turned into ''fli'' by the analyzer. But that''s ok, as long as it works the same way at indexing time. Next use the ferret_browser tool to inspect your index and check whether the term ''fli'' really appears in your index. I doubt that, because if this was the case everything would work as expected. So I guess we have a problem with the analysis at indexing time.> My index is set up like this: > > acts_as_ferret({ :store_class_name => true, > :analyzer => Ferret::Analysis::StemmingAnalyzer.new, > :fields => {:name => { :boost => 2.0 }, > ... > }})now that I look at this the second time the problem seems quite obvious :-) The analyzer option needs to be given as part of a separate ferret options hash like this: acts_as_ferret :store_class_name => true, :ferret => { :analyzer => Ferret::Analysis::StemmingAnalyzer.new }, :fields => { ... } rebuild your index and everything should be working as expected. Cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 194 bytes Desc: This is a digitally signed message part URL: <http://rubyforge.org/pipermail/ferret-talk/attachments/20090409/7bedec35/attachment.bin>
Max Williams
2009-Apr-09 14:29 UTC
[Ferret-talk] Weird analyzer issue with the word ''fly''
2009/4/9 Jens Kraemer <jk at jkraemer.net>> Hi Max!Hi Jens, thanks for responding so quickly.> > > > For some reason the word ''fly'' is turned into ''fli'' by the analyzer.Indeed it is:>> ts = Ferret::Analysis::StemmingAnalyzer.new.token_stream nil, ''flea fly''=> #<Ferret::Analysis::StemFilter:0xb48b3b48>>> while token = ts.next >> puts token >> endtoken["flea":0:4:1] token["fli":5:8:1]> But that''s ok, as long as it works the same way at indexing time. Next use > the ferret_browser tool to inspect your index and check whether the term > ''fli'' really appears in your indexI''ve not seen this tool before, it sounds useful - would you mind pointing me at some docs for it? I can find the class in the ferret rdoc but there''s no explanation for it as far as i can see.> acts_as_ferret :store_class_name => true, > :ferret => { :analyzer => > Ferret::Analysis::StemmingAnalyzer.new }, > :fields => { ... } > > rebuild your index and everything should be working as expected.It is indeed! Thanks very much Jens, i really appreciate the support. Hope you have a great easter weekend! cheers max -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ferret-talk/attachments/20090409/9bbaadc3/attachment-0001.html>
Jens Kraemer
2009-Apr-09 19:20 UTC
[Ferret-talk] Weird analyzer issue with the word ''fly''
Hi! On 09.04.2009, at 16:29, Max Williams wrote: [..]> > I''ve not seen this tool before, it sounds useful - would you mind > pointing me at some docs for it? I can find the class in the > ferret rdoc but there''s no explanation for it as far as i can see.ferret_browser is a standalone web application that gets installed along with ferret. Just run it with ferret_browser path/to/index and point your browser to the url shown in the output. should be pretty self explaining then.> > acts_as_ferret :store_class_name => true, > :ferret => { :analyzer => > Ferret::Analysis::StemmingAnalyzer.new }, > :fields => { ... } > > rebuild your index and everything should be working as expected. > > It is indeed! Thanks very much Jens, i really appreciate the > support. > > Hope you have a great easter weekend!Thank you, and the same to you! Cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 194 bytes Desc: This is a digitally signed message part URL: <http://rubyforge.org/pipermail/ferret-talk/attachments/20090409/8ebd7972/attachment.bin>