I am unable to find results for models when one or more of the terms are not being indexed. Lets suppose I index a User on the phrase "Ruby on Rails." If I then search using User.find_by_contents("Ruby on Rails") I get no results, since "or" is a common term and does not get indexed. Of course, User.find_by_contents("Ruby Rails") works just fine. I would like to find a way to search for terms such as "Ruby on Rails" and have the query analyzer automatically ignore tokens (ie, "or") that the indexer would normally avoid. Any thoughts on how to go about solving this? Rami -- Posted via http://www.ruby-forum.com/.
Jens Kraemer
2006-Aug-21 22:23 UTC
[Ferret-talk] missing terms in index causing search errors
On Sun, Aug 20, 2006 at 06:53:30AM +0200, Rami wrote:> I am unable to find results for models when one or more of the terms are > not being indexed. > > Lets suppose I index a User on the phrase "Ruby on Rails." If I then > search using User.find_by_contents("Ruby on Rails") I get no results, > since "or" is a common term and does not get indexed. Of course, > User.find_by_contents("Ruby Rails") works just fine.this shouldn''t happen. Do you build your index through acts_as_ferret ? The cause of your problem seems to be that there''s a different anylyzer in use for query parsing than the one that was used for building the index. usually queries should get analyzed the same way as contents to avoid those problems.> I would like to find a way to search for terms such as "Ruby on Rails" > and have the query analyzer automatically ignore tokens (ie, "or") that > the indexer would normally avoid. Any thoughts on how to go about > solving this?try to specify an analyzer in your call to acts_as_ferret: acts_as_ferret( { :fields => [ .. field list, may be a hash, too ] }, { :analyzer => Ferret::Analysis::StopAnalyzer.new } ) Please let me know if this helps. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Benjamin Krause
2006-Aug-22 07:34 UTC
[Ferret-talk] missing terms in index causing search errors
>> Lets suppose I index a User on the phrase "Ruby on Rails." If I then >> search using User.find_by_contents("Ruby on Rails") I get no results, >> since "or" is a common term and does not get indexed. Of course, >> User.find_by_contents("Ruby Rails") works just fine. > > this shouldn''t happen. Do you build your index through acts_as_ferret ?hey .. i had the same problem.. using ferret, not acts_as_ferret.. the stopwords are described here: http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StopAnalyzer.html what I do is to remove all stopwords from the query before searching.. def self.filter_stop_words( q ) query = q.split(" ") query.delete_if { |w| Indexer::STOP_WORDS.include?( w.downcase ) }.join(" ") end Ben
David Balmain
2006-Aug-22 18:30 UTC
[Ferret-talk] missing terms in index causing search errors
On 8/22/06, Benjamin Krause <bk at benjaminkrause.com> wrote:> > >> Lets suppose I index a User on the phrase "Ruby on Rails." If I then > >> search using User.find_by_contents("Ruby on Rails") I get no results, > >> since "or" is a common term and does not get indexed. Of course, > >> User.find_by_contents("Ruby Rails") works just fine. > > > > this shouldn''t happen. Do you build your index through acts_as_ferret ? > > hey .. > > i had the same problem.. using ferret, not acts_as_ferret.. the stopwords > are described here: > http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StopAnalyzer.html > > what I do is to remove all stopwords from the query before searching.. > > def self.filter_stop_words( q ) > query = q.split(" ") > query.delete_if { |w| Indexer::STOP_WORDS.include?( w.downcase ) > }.join(" ") > end > > > BenHi Ben, This shouldn''t be necessary. What Jens said is correct. If you use the same analyzer in your indexer as you use in your query parser then a search for "Ruby on Rails" should work. If you use the Index::Index class this will be handled for you. Cheers, Dave
Benjamin Krause
2006-Aug-22 19:16 UTC
[Ferret-talk] missing terms in index causing search errors
Hey..> This shouldn''t be necessary. What Jens said is correct. If you use the > same analyzer in your indexer as you use in your query parser then a > search for "Ruby on Rails" should work. If you use the Index::Index > class this will be handled for you.i do not use any ''non-default'' analysers yet.. but still got the problem.. i even got the problem that i wanted to search for a phrase that was build completely on stop words.. and it did not find anything .. however, i will give it a 2nd look with 0.10 and maybe i did miss something.. Ben
Jens Kraemer
2006-Aug-22 22:04 UTC
[Ferret-talk] missing terms in index causing search errors
On Wed, Aug 23, 2006 at 03:30:46AM +0900, David Balmain wrote:> On 8/22/06, Benjamin Krause <bk at benjaminkrause.com> wrote: > > > > >> Lets suppose I index a User on the phrase "Ruby on Rails." If I then > > >> search using User.find_by_contents("Ruby on Rails") I get no results, > > >> since "or" is a common term and does not get indexed. Of course, > > >> User.find_by_contents("Ruby Rails") works just fine. > > >[..]> > This shouldn''t be necessary. What Jens said is correct. If you use the > same analyzer in your indexer as you use in your query parser then a > search for "Ruby on Rails" should work. If you use the Index::Index > class this will be handled for you.As this problem seems to be fairly common recently, I did some tests and I think I found a common pattern that seems to lead to wrong query analyzing when using the Index::Index class: def test_stopwords i = Ferret::Index::Index.new( :occur_default => Ferret::Search::BooleanClause::Occur::MUST, :default_search_field => ''*'') d = Ferret::Document::Document.new # adding this additional field to the document leads to failure below # comment out this statement and all tests pass: d << Ferret::Document::Field.new(''id'', ''1'', Ferret::Document::Field::Store::YES, Ferret::Document::Field::Index::UNTOKENIZED) d << Ferret::Document::Field.new(''content'', ''Move or shake'', Ferret::Document::Field::Store::NO, Ferret::Document::Field::Index::TOKENIZED, Ferret::Document::Field::TermVector::NO, false, 1.0) i << d hits = i.search ''move nothere shake'' assert_equal 0, hits.size hits = i.search ''move shake'' assert_equal 1, hits.size hits = i.search ''move or shake'' assert_equal 1, hits.size # fails when id field is present end the id field is constructed just like we do it in aaf. I tried some variations of the way the field is constructed (another name, other flags), but as soon as there is more than one field, the test doesn''t work any more. Setting the default_search_field to ''content'' makes the tests pass, btw. Dave, any suggestions ? Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
David Balmain
2006-Aug-23 05:30 UTC
[Ferret-talk] missing terms in index causing search errors
On 8/23/06, Jens Kraemer <kraemer at webit.de> wrote:> On Wed, Aug 23, 2006 at 03:30:46AM +0900, David Balmain wrote: > > On 8/22/06, Benjamin Krause <bk at benjaminkrause.com> wrote: > > > > > > >> Lets suppose I index a User on the phrase "Ruby on Rails." If I then > > > >> search using User.find_by_contents("Ruby on Rails") I get no results, > > > >> since "or" is a common term and does not get indexed. Of course, > > > >> User.find_by_contents("Ruby Rails") works just fine. > > > > > [..] > > > > This shouldn''t be necessary. What Jens said is correct. If you use the > > same analyzer in your indexer as you use in your query parser then a > > search for "Ruby on Rails" should work. If you use the Index::Index > > class this will be handled for you. > > As this problem seems to be fairly common recently, I did some tests and > I think I found a common pattern that seems to lead to wrong query > analyzing when using the Index::Index class: > > def test_stopwords > i = Ferret::Index::Index.new( > :occur_default => Ferret::Search::BooleanClause::Occur::MUST, > :default_search_field => ''*'') > d = Ferret::Document::Document.new > > # adding this additional field to the document leads to failure below > # comment out this statement and all tests pass: > d << Ferret::Document::Field.new(''id'', ''1'', > Ferret::Document::Field::Store::YES, > Ferret::Document::Field::Index::UNTOKENIZED) > > d << Ferret::Document::Field.new(''content'', ''Move or shake'', > Ferret::Document::Field::Store::NO, > Ferret::Document::Field::Index::TOKENIZED, > Ferret::Document::Field::TermVector::NO, > false, 1.0) > i << d > hits = i.search ''move nothere shake'' > assert_equal 0, hits.size > hits = i.search ''move shake'' > assert_equal 1, hits.size > hits = i.search ''move or shake'' > assert_equal 1, hits.size # fails when id field is present > end > > > the id field is constructed just like we do it in aaf. I tried some > variations of the way the field is constructed (another name, other > flags), but as soon as there is more than one field, the test doesn''t > work any more. > > Setting the default_search_field to ''content'' makes the tests pass, btw. > > Dave, any suggestions ?Thanks Jens, This was a bug after all at it was very easy to find and fix with your example/bug-report. Thanks. I''ve just put out a gem; version 0.9.6. This will be compatible with acts_as_ferret. I''ll try and find time to write a patch for acts_as_ferret to work with 0.10.0 but hopefully you''ll beat me to it. The documentation is a little more thorough than previous versions of Ferret but it still requires a bit of work, especially considering there is no-longer any Ruby source to work from. Let me know if you have any questions. Cheers, Dave
Jens Kraemer
2006-Aug-23 09:29 UTC
[Ferret-talk] missing terms in index causing search errors
On Wed, Aug 23, 2006 at 02:30:56PM +0900, David Balmain wrote:> On 8/23/06, Jens Kraemer <kraemer at webit.de> wrote: > > On Wed, Aug 23, 2006 at 03:30:46AM +0900, David Balmain wrote: > > > On 8/22/06, Benjamin Krause <bk at benjaminkrause.com> wrote: > > > > > > > > >> Lets suppose I index a User on the phrase "Ruby on Rails." If I then > > > > >> search using User.find_by_contents("Ruby on Rails") I get no results, > > > > >> since "or" is a common term and does not get indexed. Of course, > > > > >> User.find_by_contents("Ruby Rails") works just fine. > > > > > > > [..] > > > > > > This shouldn''t be necessary. What Jens said is correct. If you use the > > > same analyzer in your indexer as you use in your query parser then a > > > search for "Ruby on Rails" should work. If you use the Index::Index > > > class this will be handled for you. > > > > As this problem seems to be fairly common recently, I did some tests and > > I think I found a common pattern that seems to lead to wrong query > > analyzing when using the Index::Index class: > > > > def test_stopwords > > i = Ferret::Index::Index.new( > > :occur_default => Ferret::Search::BooleanClause::Occur::MUST, > > :default_search_field => ''*'') > > d = Ferret::Document::Document.new > > > > # adding this additional field to the document leads to failure below > > # comment out this statement and all tests pass: > > d << Ferret::Document::Field.new(''id'', ''1'', > > Ferret::Document::Field::Store::YES, > > Ferret::Document::Field::Index::UNTOKENIZED) > > > > d << Ferret::Document::Field.new(''content'', ''Move or shake'', > > Ferret::Document::Field::Store::NO, > > Ferret::Document::Field::Index::TOKENIZED, > > Ferret::Document::Field::TermVector::NO, > > false, 1.0) > > i << d > > hits = i.search ''move nothere shake'' > > assert_equal 0, hits.size > > hits = i.search ''move shake'' > > assert_equal 1, hits.size > > hits = i.search ''move or shake'' > > assert_equal 1, hits.size # fails when id field is present > > end > > > > > > the id field is constructed just like we do it in aaf. I tried some > > variations of the way the field is constructed (another name, other > > flags), but as soon as there is more than one field, the test doesn''t > > work any more. > > > > Setting the default_search_field to ''content'' makes the tests pass, btw. > > > > Dave, any suggestions ? > > Thanks Jens, > > This was a bug after all at it was very easy to find and fix with your > example/bug-report. Thanks. I''ve just put out a gem; version 0.9.6. > This will be compatible with acts_as_ferret. I''ll try and find time to > write a patch for acts_as_ferret to work with 0.10.0 but hopefully > you''ll beat me to it. The documentation is a little more thorough than > previous versions of Ferret but it still requires a bit of work, > especially considering there is no-longer any Ruby source to work > from. Let me know if you have any questions.works great, thanks for the quick fix. I''ll start working on a 0.10.0 compatible version of aaf now, I''ll keep you up to date on my progress. The latest (and last) aaf version to work with Ferret 0.9.x series is 0.2.3, located at svn://projects.jkraemer.net/acts_as_ferret/tags/0.2.3 Please note the changed base URL, I decided to leave out the ''plugin'' directory below ''tags'' from now on. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66