Alastair Moore
2006-Sep-05 19:51 UTC
[Ferret-talk] ferret finds ''tests'' but not ''test''
Hello all, Quick question (possibly!) - I''ve got a few records indexed and doing a search for ''test'' reports in no hits even though I know the word ''tests'' exists in the indexed field. Doing a search for ''tests'' produces a result. I would have thought that ''test'' would match ''tests'' but no such luck! Thanks, Alastair -- Posted via http://www.ruby-forum.com/.
On 9/6/06, Alastair Moore <rubyonrails at transmogrify.co.uk> wrote:> Hello all, > > Quick question (possibly!) - I''ve got a few records indexed and doing a > search for ''test'' reports in no hits even though I know the word ''tests'' > exists in the indexed field. Doing a search for ''tests'' produces a > result. I would have thought that ''test'' would match ''tests'' but no such > luck! > > Thanks, > > AlastairThe default analyzer doesn''t perform any stemming. You need to create your own analyzer with a stemmer. Something like this; require ''rubygems'' require ''ferret'' module Ferret::Analysis class MyAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new) index << "test" index << "tests debate debater debating the for," puts index.search("test").total_hits Hope that helps, Dave
Alastair Moore
2006-Sep-06 12:36 UTC
[Ferret-talk] ferret finds ''tests'' but not ''test''
David Balmain wrote:> On 9/6/06, Alastair Moore <rubyonrails at transmogrify.co.uk> wrote: >> Alastair > The default analyzer doesn''t perform any stemming. You need to create > your own analyzer with a stemmer. Something like this; > > require ''rubygems'' > require ''ferret'' > > module Ferret::Analysis > class MyAnalyzer > def token_stream(field, text) > StemFilter.new(StandardTokenizer.new(text)) > end > end > end > > index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new) > > index << "test" > index << "tests debate debater debating the for," > puts index.search("test").total_hits > > Hope that helps, > DaveHi Dave, Many thanks for the help, it does help! However given the short timespan for this project, I think the users of the site will just have to be a bit more specific in their search terms :) Cheers and will bookmark your reply for a later project. Alastair -- Posted via http://www.ruby-forum.com/.
Hi there, Thanks for this useful piece of information! What I''m wondering is how do stemming on queries as well. My first try was: query = Ferret::QueryParser.new(:analyzer => Ferret::Analysis::StemmingAnalyzer.new).parse(query_string) index.search_each(query) { |doc, score| ... } But this does not work the way I would expect it to work, i.e., it seems to deliver empty results independent of the input. Does anybody have an idea what I''m doing wrong? Cheers, Albert David Balmain wrote:> On 9/6/06, Alastair Moore <rubyonrails at transmogrify.co.uk> wrote: >> Alastair > The default analyzer doesn''t perform any stemming. You need to create > your own analyzer with a stemmer. Something like this; > > require ''rubygems'' > require ''ferret'' > > module Ferret::Analysis > class MyAnalyzer > def token_stream(field, text) > StemFilter.new(StandardTokenizer.new(text)) > end > end > end > > index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new) > > index << "test" > index << "tests debate debater debating the for," > puts index.search("test").total_hits > > Hope that helps, > Dave-- Posted via http://www.ruby-forum.com/.
On 9/29/06, Albert <albert at mymail.nospam.com> wrote:> David Balmain wrote: > > On 9/6/06, Alastair Moore <rubyonrails at transmogrify.co.uk> wrote: > >> Alastair > > The default analyzer doesn''t perform any stemming. You need to create > > your own analyzer with a stemmer. Something like this; > > > > require ''rubygems'' > > require ''ferret'' > > > > module Ferret::Analysis > > class MyAnalyzer > > def token_stream(field, text) > > StemFilter.new(StandardTokenizer.new(text)) > > end > > end > > end > > > > index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new) > > > > index << "test" > > index << "tests debate debater debating the for," > > puts index.search("test").total_hits > > > > Hope that helps, > > Dave > > Hi there, > > Thanks for this useful piece of information! What I''m wondering is how > do stemming on queries as well. My first try was: > > query = Ferret::QueryParser.new(:analyzer => > Ferret::Analysis::StemmingAnalyzer.new).parse(query_string) > > index.search_each(query) { |doc, score| ... } > > But this does not work the way I would expect it to work, i.e., it seems > to deliver empty results independent of the input. > > Does anybody have an idea what I''m doing wrong? > > Cheers, > > AlbertHi Albert, Could you show us your implementation of StemmingAnalyzer as well. Also, you need to be sure to use the same analyzer for both indexing and analysis, although I think you already new this. Cheers, Dave
Hi Dave, Thanks for following up! The StemmingAnalyzer is actually just the MyAnalyzer from the example above: module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end I''ve been trying to find the error but no success. The searching is done this way: i = Ferret::Index::Index.new(:path => index) qp = Ferret::QueryParser.new(:analyzer => Ferret::Analysis::StemmingAnalyzer.new) query = qp.parse(query_string) i.search_each(query) { |doc, score| ... } What I don''t get is that search_each(query) never returns a result whereas when I use the original query string as in i = Ferret::Index::Index.new(:path => index) # qp = Ferret::QueryParser.new(:analyzer => Ferret::Analysis::StemmingAnalyzer.new) # query = qp.parse(query_string) i.search_each(query_string) { |doc, score| ... } ------------ things work as expected (modulo the stemmming, of course). So, it may be that I fundamentally misunderstand something or make a stupid mistake ... Cheers, Albert David Balmain wrote:> On 9/29/06, Albert <albert at mymail.nospam.com> wrote: >> > class MyAnalyzer >> > puts index.search("test").total_hits >> Ferret::Analysis::StemmingAnalyzer.new).parse(query_string) >> Albert > Hi Albert, > > Could you show us your implementation of StemmingAnalyzer as well. > Also, you need to be sure to use the same analyzer for both indexing > and analysis, although I think you already new this. > > Cheers, > Dave-- Posted via http://www.ruby-forum.com/.
On 9/30/06, Albert <albert at mymail.nospam.com> wrote:> > Hi Dave, > > Thanks for following up! The StemmingAnalyzer is actually just the > MyAnalyzer from the example above: > > module Ferret::Analysis > class StemmingAnalyzer > def token_stream(field, text) > StemFilter.new(StandardTokenizer.new(text)) > end > end > end > > I''ve been trying to find the error but no success. The searching is > done this way: > > i = Ferret::Index::Index.new(:path => index) > qp = Ferret::QueryParser.new(:analyzer => > Ferret::Analysis::StemmingAnalyzer.new) > query = qp.parse(query_string) > i.search_each(query) { |doc, score| ... } > > What I don''t get is that search_each(query) never returns a result > whereas when I use the original query string as in > > i = Ferret::Index::Index.new(:path => index) > # qp = Ferret::QueryParser.new(:analyzer => > Ferret::Analysis::StemmingAnalyzer.new) > # query = qp.parse(query_string) > i.search_each(query_string) { |doc, score| ... } > ------------ > > things work as expected (modulo the stemmming, of course). So, it may > be that I fundamentally misunderstand something or make a stupid mistake > ... > > Cheers, > > Albert >Sorry, I must have been tired last night. The problem is obvious to me now. You need to set the :fields parameter. The above query parser should work as long as you explicitly specify all fields in your query. For example: "content:(ruby rails) title:(ruby rails)" But if you want to search all fields by default then you need to tell the QueryParser what fields exist. The Index class will handle all of this for you including using the same analyzer as is used during indexing. It looks like you are using the Index class for your searches so why not just leave the query parsing to it. Otherwise you can get the fields from the reader. query = Ferret::QueryParser.new( :analyzer => Ferret::Analysis::StemmingAnalyzer.new, :fields => reader.fields, :tokenized_fields => reader.tokenized_fields ).parse(query_string) index.search_each(query) { |doc, score| ... } Hope that helps, Dave
Hi Dave, Wonderful! Thanks! I should have taken a deeper look at the documentation, indeed. Anyway, thanks for your patience! Cheers, Al. David Balmain wrote:> On 9/30/06, Albert <albert at mymail.nospam.com> wrote: >> end >> i.search_each(query) { |doc, score| ... } >> >> things work as expected (modulo the stemmming, of course). So, it may >> be that I fundamentally misunderstand something or make a stupid mistake >> ... >> >> Cheers, >> >> Albert >> > > Sorry, I must have been tired last night. The problem is obvious to me > now. You need to set the :fields parameter. The above query parser > should work as long as you explicitly specify all fields in your > query. For example: > > "content:(ruby rails) title:(ruby rails)" > > But if you want to search all fields by default then you need to tell > the QueryParser what fields exist. The Index class will handle all of > this for you including using the same analyzer as is used during > indexing. It looks like you are using the Index class for your > searches so why not just leave the query parsing to it. Otherwise you > can get the fields from the reader. > > query = Ferret::QueryParser.new( > :analyzer => Ferret::Analysis::StemmingAnalyzer.new, > :fields => reader.fields, > :tokenized_fields => reader.tokenized_fields > ).parse(query_string) > > index.search_each(query) { |doc, score| ... } > > Hope that helps, > Dave-- Posted via http://www.ruby-forum.com/.
Alastair Moore wrote:> Hello all, > > Quick question (possibly!) - I''ve got a few records indexed and doing a > search for ''test'' reports in no hits even though I know the word ''tests'' > exists in the indexed field. Doing a search for ''tests'' produces a > result. I would have thought that ''test'' would match ''tests'' but no such > luck! > > Thanks, > > AlastairAlastair - if you only want to find the plural of something and not the full stem of words then ROR has a plurisation capability. It will take test and bring back all the plurals or take tests and bring back the singulars. You can then search on all these words. It is not a full stemmer but in some circumstances perhaps this may be all that you are wanting to do. One thing to watch that caught us out was that as standard pluralistation of words with two ''ss'' at the end does not work properly. For example, "glass" would come back as "glas" from the pluralizer. There is a simple fix that is in the ROR forum that covers all this off. I would only use the ror pluraliser if all you are looking to do is bring back plurals of words and are not interested in the full stemming of the words. For example, if you do a search on "tax" full stemming should also search on "taxes" and "taxation". Pluralise would not search on "taxation". Hope this helps. Clare -- Posted via http://www.ruby-forum.com/.
Hi, if I use this stemming analyzer, where do I put it ? /lib/ and require it in each model? -Anrake David Balmain wrote:> On 9/6/06, Alastair Moore <rubyonrails at transmogrify.co.uk> wrote: >> Alastair > The default analyzer doesn''t perform any stemming. You need to create > your own analyzer with a stemmer. Something like this; > > require ''rubygems'' > require ''ferret'' > > module Ferret::Analysis > class MyAnalyzer > def token_stream(field, text) > StemFilter.new(StandardTokenizer.new(text)) > end > end > end > > index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new) > > index << "test" > index << "tests debate debater debating the for," > puts index.search("test").total_hits > > Hope that helps, > Dave-- Posted via http://www.ruby-forum.com/.
anrake wrote:> Hi, if I use this stemming analyzer, where do I put it ? /lib/ and > require it in each model? > > -Anrake > > David Balmain wrote: >> On 9/6/06, Alastair Moore <rubyonrails at transmogrify.co.uk> wrote:Can someone give Can someone give me an idiots guide as to how to implement this custom stemming analyser. I do not know where to start. Thanks for your patience.>>> Alastair >> The default analyzer doesn''t perform any stemming. You need to create >> your own analyzer with a stemmer. Something like this; >> >> require ''rubygems'' >> require ''ferret'' >> >> module Ferret::Analysis >> class MyAnalyzer >> def token_stream(field, text) >> StemFilter.new(StandardTokenizer.new(text)) >> end >> end >> end >> >> index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new) >> >> index << "test" >> index << "tests debate debater debating the for," >> puts index.search("test").total_hits >> >> Hope that helps, >> Dave-- Posted via http://www.ruby-forum.com/.
On 26.10.2006, at 22:06, Ghost wrote:> Can someone give me an idiots guide as to how to implement this custom > stemming analyser. I do not know where to start.1. Create the analyzer as David outlined it and name the file "my_analyzer.rb". If you put it in /app/models you don''t need any require statements since every .rb file in /app/models gets automagically ''required'' by Rails.> # file: app/models/my_analyzer.rb > > require ''rubygems'' > require ''ferret'' > > module Ferret::Analysis > class MyAnalyzer > def token_stream(field, text) > StemFilter.new(StandardTokenizer.new(text)) > end > end > end2. When you create an Index instance, pass it your analyzer, like so: index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new) 3. Test your analyzer, e.g. index << "walking" index << "walked" index << "walks" index.search("walk").total_hits # -> 3> Thanks for your patience.You''re welcome. And may I kindly ask you to use a valid email address and perhaps your real name for future posts? Kind regards, Andreas
Hi I''m still having trouble with this. Probably something stupid but here goes. I''m using ferret version 0.13 and aaf. I created this file in my app/models directory require ''ferret'' include Ferret module Ferret::Analysis class MyAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end naming it my_analyzer.rb as directed. and then in my ferret model i have the following declarion. acts_as_ferret :fields=> [''short_description''],:analyzer => Ferret::Analysis::MyAnalyzer.new I tried to rebuild my index but it crashes out with the following error:>> VoObject.rebuild_indexNameError: uninitialized constant MyAnalyzer from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:123:in `const_missing'' from script/../config/../config/../app/models/vo_object.rb:14 from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:140:in `load'' from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:56:in `require_or_load'' from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:30:in `depend_on'' from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:85:in `require_dependency'' from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:98:in `const_missing'' from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:131:in `const_missing'' from (irb):11>>Nasty eh? Any idea what is going on here? Why can''t my VoObject model see the new analyzer? Thanks again.> You''re welcome. And may I kindly ask you to use a valid email address > and perhaps your real name for future posts?I used to post with a valid email address. But then the number of spam messages i recieved went from 1 or 2 a week to 50-60 a day. Ruby Forum used to print the email addresses on the page. Heres a comprimise. Regards Caspar -- Posted via http://www.ruby-forum.com/.
Hi Caspar, On 27.10.2006, at 11:58, Ghost wrote:> Hi I''m still having trouble with this. Probably something stupid but > here goes. > > I created this file in my app/models directory > naming it my_analyzer.rb as directed. > > I tried to rebuild my index but it crashes out with the following > error: > >>> VoObject.rebuild_index > NameError: uninitialized constant MyAnalyzerSorry, I forgot to mention that the directory structure needs to resemble the module nesting, i.e. the file must go in app/models/ ferret/analysis instead of just app/models. Cheers, Andy
Andreas Korth wrote:> Hi Caspar, > > On 27.10.2006, at 11:58, Ghost wrote: > >> NameError: uninitialized constant MyAnalyzer > Sorry, I forgot to mention that the directory structure needs to > resemble the module nesting, i.e. the file must go in app/models/ > ferret/analysis instead of just app/models. > > Cheers, > AndyI''ve been trying to use the solution for stemming discussed in this thread and have run into a bit of trouble. I''m using this analyzer: module Ferret::Analysis class StemmingAnalyzer def token_stream(field, text) StemFilter.new(StandardTokenizer.new(text)) end end end I''ve configured aaf thusly: AAF_DEFAULT_FERRET_OPTIONS = {:analyzer => Ferret::Analysis::StemmingAnalyzer.new} acts_as_ferret({:store_class_name => true, :fields => {:description => {:store => :yes}}}.merge(AAF_DEFAULT_OPTIONS), AAF_DEFAULT_FERRET_OPTIONS) The first time I search for something a new index is created in index, and it successfully returns a set of results. The second time I search, however, I get a strange error: uninitialized constant Ferret::Search #{RAILS_ROOT}/vendor/rails/activesupport/lib/active_support/dependencies.rb:264:in `load_missing_constant'' #{RAILS_ROOT}/vendor/rails/activesupport/lib/active_support/dependencies.rb:453:in `const_missing'' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:160:in `query_for_record'' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:152:in `document_number'' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:135:in `highlight'' /opt/local/lib/ruby/1.8/monitor.rb:238:in `synchronize'' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:134:in `highlight'' #{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/instance_methods.rb:30:in `highlight'' Perhaps it has something to do with loading an already created index? Thanks, -Adam -- Posted via http://www.ruby-forum.com/.
This is just postscript correction for this thread, in case anyone else browses to it (like i did) and gets sent down the slightly wrong track. If you''re going to include the :analyzer option in your call to acts_as_ferret, then it needs to live inside another option hash called :ferret. EG, some of the examples above say to do this: acts_as_ferret :fields=> [''short_description''], :analyzer => Ferret::Analysis::MyAnalyzer.new This won''t work - it needs to be like this: acts_as_ferret :fields=> [''short_description''], :ferret => {:analyzer => Ferret::Analysis::MyAnalyzer.new} Thanks to Jens for setting me straight on this :) -- Posted via http://www.ruby-forum.com/.