Hi, List, as a long time user of Lucene in my Java projects I''m pretty delighted that there are ongoing efforts to port Lucene to ruby. Thanks to "Brian''s Waste of Time" http://kasparov.skife.org/blog/ I''ve found David Balmains "Ferret" http://ferret.davebalmain.com/trac/ . Great, great, great! Since search is of such a great importance in webapps I''m pretty sure that we all could help out testing... Thank you so much David!!! best regards Jan Prill
On 10/24/05, Jan Prill <JanPrill-sTn/vYlS8ieELgA04lAiVw@public.gmane.org> wrote:> as a long time user of Lucene in my Java projects I''m pretty delighted > that there are ongoing efforts to port Lucene to ruby. Thanks to > "Brian''s Waste of Time" http://kasparov.skife.org/blog/ I''ve found David > Balmains "Ferret" http://ferret.davebalmain.com/trac/ . Great, great, > great! Since search is of such a great importance in webapps I''m pretty > sure that we all could help out testing... Thank you so much David!!! >Thanks for the plug Jan. All feedback, good or bad, is most welcome. Dave _______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails
On Oct 23, 2005, at 5:18 PM, David Balmain wrote:> On 10/24/05, Jan Prill <JanPrill-sTn/vYlS8ieELgA04lAiVw@public.gmane.org> wrote: > as a long time user of Lucene in my Java projects I''m pretty delighted > that there are ongoing efforts to port Lucene to ruby. Thanks to > "Brian''s Waste of Time" http://kasparov.skife.org/blog/ I''ve found > David > Balmains "Ferret" http://ferret.davebalmain.com/trac/ . Great, great, > great! Since search is of such a great importance in webapps I''m > pretty > sure that we all could help out testing... Thank you so much David!!! > > Thanks for the plug Jan. All feedback, good or bad, is most welcome.Hmm, it rocks? I am poking at it with escalating levels of load and it does pretty nicely. Index rebuilding is the pain part, so need to muck with Ferret <-> Java Lucene interop, let java do the reindexing =) Dave -- anything specific you want any help with? I''m more than happy to poke at code for this if you want it. Otherwise, will beat on the api and hunt down bugs =) -Brian> > Dave > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >_______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails
On 10/24/05, Brian McCallister <brianm-1oDqGaOF3Lkdnm+yROfE0A@public.gmane.org> wrote:> > > On Oct 23, 2005, at 5:18 PM, David Balmain wrote: > > On 10/24/05, Jan Prill <JanPrill-sTn/vYlS8ieELgA04lAiVw@public.gmane.org> wrote: > > > as a long time user of Lucene in my Java projects I''m pretty delighted > > that there are ongoing efforts to port Lucene to ruby. Thanks to > > "Brian''s Waste of Time" http://kasparov.skife.org/blog/ I''ve found David > > Balmains "Ferret" http://ferret.davebalmain.com/trac/ . Great, great, > > great! Since search is of such a great importance in webapps I''m pretty > > sure that we all could help out testing... Thank you so much David!!! > > > > Thanks for the plug Jan. All feedback, good or bad, is most welcome. > > > Hmm, it rocks? I am poking at it with escalating levels of load and it > does pretty nicely. Index rebuilding is the pain part, so need to muck with > Ferret <-> Java Lucene interop, let java do the reindexing =) >This problem will be solved when I add my C indexer. =) Dave -- anything specific you want any help with? I''m more than happy to> poke at code for this if you want it. Otherwise, will beat on the api and > hunt down bugs =) >Definitely beat down on the API. That''s what I need. Once I''m sure I won''t need to change much in the Index module, I''ll integrate my C indexer. I want to keep the native ruby part of the indexer available though. Also, I''d like to get the analyzers to support European languages at least. For example, currently my LetterTokenizer just uses /[A-Za-z]+/ to match tokens. I was hoping /[[:alpha:]]+/u would match letters with accents etc. but, alas, it doesn''t. Any ideas how to do this? Dave -Brian> > > Dave > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails > > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails > > >_______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails
David Balmain wrote:> Also, I''d like to get the analyzers to support European languages at least. > For example, currently my LetterTokenizer just uses /[A-Za-z]+/ to match > tokens. I was hoping /[[:alpha:]]+/u would match letters with accents etc. > but, alas, it doesn''t. Any ideas how to do this?If I might just chip in here, I''d suggest having a different LetterTokenizer per character set - and maybe even for different languages in a character set - and using user-supplied information to switch. Trying to do everything all at once sounds like a world of pain to me. Apologies if this is stating the bleedin'' obvious... -- Alex
Hi, David, I did a little research on the umlauts, accent, etc. problem. Obviously this is a little offtopic on the rails mailinglist, but anyway: First of all: I''m a newbie on ruby as well as on rails, so my assumptions maybe wrong. The problem domain of how "Unicode Makes a Mess of Things" is described on http://www.regular-expressions.info/unicode.html . In Java the cure would be ''\\pL''. My first assumption (got in a short test) is that the standard regex-engine of ruby won''t understand things like \p{L} or \p{Letter}. My second assumption after googling a little bit is, that rubys regex capabilities are a great thing but that rubys unicode support isn''t that great on the other hand. Some further research got me to http://www.geocities.jp/kosako3/oniguruma/ . Oniguruma seems to be the next generation regex-engine that has been already incorporated in ruby-lang cvs. The library might be particularly interesting for your project because it also comes as a C-library that you might have use for in cFerret. I''ve compiled the latest release and the test ran without failures. In the "Latest release version 2.5.0" there is patching of ruby-sources described to include this regex-engine in ruby. The ''Regular Expressions'' document of 3.8.9 release states that a [:alpha:] in ''Unicode Case'' (whatever that is) should find characters out of the Letter or Mark category, which leads back to something similar to \p{Letter} \p{Mark}. best regards Jan Prill David Balmain wrote:> On 10/24/05, *Brian McCallister* <brianm-1oDqGaOF3Lkdnm+yROfE0A@public.gmane.org > <mailto:brianm-1oDqGaOF3Lkdnm+yROfE0A@public.gmane.org>> wrote: > > > On Oct 23, 2005, at 5:18 PM, David Balmain wrote: > >> On 10/24/05, *Jan Prill* < JanPrill-sTn/vYlS8ieELgA04lAiVw@public.gmane.org >> <mailto:JanPrill-sTn/vYlS8ieELgA04lAiVw@public.gmane.org>> wrote: >> >> as a long time user of Lucene in my Java projects I''m pretty >> delighted >> that there are ongoing efforts to port Lucene to ruby. Thanks to >> "Brian''s Waste of Time" http://kasparov.skife.org/blog/ >> <http://kasparov.skife.org/blog/> I''ve found David >> Balmains "Ferret" http://ferret.davebalmain.com/trac/ >> <http://ferret.davebalmain.com/trac/> . Great, great, >> great! Since search is of such a great importance in webapps >> I''m pretty >> sure that we all could help out testing... Thank you so much >> David!!! >> >> >> Thanks for the plug Jan. All feedback, good or bad, is most welcome. > > > Hmm, it rocks? I am poking at it with escalating levels of load > and it does pretty nicely. Index rebuilding is the pain part, so > need to muck with Ferret <-> Java Lucene interop, let java do the > reindexing =) > > > This problem will be solved when I add my C indexer. =) > > Dave -- anything specific you want any help with? I''m more than > happy to poke at code for this if you want it. Otherwise, will > beat on the api and hunt down bugs =) > > > Definitely beat down on the API. That''s what I need. Once I''m sure I > won''t need to change much in the Index module, I''ll integrate my C > indexer. I want to keep the native ruby part of the indexer available > though. > > Also, I''d like to get the analyzers to support European languages at > least. For example, currently my LetterTokenizer just uses /[A-Za-z]+/ > to match tokens. I was hoping /[[:alpha:]]+/u would match letters with > accents etc. but, alas, it doesn''t. Any ideas how to do this? > > Dave > > -Brian > >> >> Dave >> _______________________________________________ >> Rails mailing list >> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org <mailto:Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org> >> http://lists.rubyonrails.org/mailman/listinfo/rails >> <http://lists.rubyonrails.org/mailman/listinfo/rails> >> > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org <mailto:Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org> > http://lists.rubyonrails.org/mailman/listinfo/rails > > > >------------------------------------------------------------------------ > >_______________________________________________ >Rails mailing list >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org >http://lists.rubyonrails.org/mailman/listinfo/rails > >
Awesome, thanks Jan. I was hoping Oniguruma would solve some of my problems, so you''ve just saved me a lot of time. Now I just need to make it available for people in Ruby 1.8. :D On 10/24/05, Jan Prill <JanPrill-sTn/vYlS8ieELgA04lAiVw@public.gmane.org> wrote:> > Hi, David, > > I did a little research on the umlauts, accent, etc. problem. Obviously > this is a little offtopic on the rails mailinglist, but anyway: > > First of all: I''m a newbie on ruby as well as on rails, so my > assumptions maybe wrong. > > The problem domain of how "Unicode Makes a Mess of Things" is described > on http://www.regular-expressions.info/unicode.html . In Java the cure > would be ''\\pL''. My first assumption (got in a short test) is that the > standard regex-engine of ruby won''t understand things like \p{L} or > \p{Letter}. My second assumption after googling a little bit is, that > rubys regex capabilities are a great thing but that rubys unicode > support isn''t that great on the other hand. > > Some further research got me to > http://www.geocities.jp/kosako3/oniguruma/ . Oniguruma seems to be the > next generation regex-engine that has been already incorporated in > ruby-lang cvs. The library might be particularly interesting for your > project because it also comes as a C-library that you might have use for > in cFerret. I''ve compiled the latest release and the test ran without > failures. In the "Latest release version 2.5.0" there is patching of > ruby-sources described to include this regex-engine in ruby. The > ''Regular Expressions'' document of 3.8.9 release states that a [:alpha:] > in ''Unicode Case'' (whatever that is) should find characters out of the > Letter or Mark category, which leads back to something similar to > \p{Letter} \p{Mark}. > > best regards > Jan Prill > > > David Balmain wrote: > > > On 10/24/05, *Brian McCallister* <brianm-1oDqGaOF3Lkdnm+yROfE0A@public.gmane.org > > <mailto:brianm-1oDqGaOF3Lkdnm+yROfE0A@public.gmane.org>> wrote: > > > > > > On Oct 23, 2005, at 5:18 PM, David Balmain wrote: > > > >> On 10/24/05, *Jan Prill* < JanPrill-sTn/vYlS8ieELgA04lAiVw@public.gmane.org > >> <mailto:JanPrill-sTn/vYlS8ieELgA04lAiVw@public.gmane.org>> wrote: > >> > >> as a long time user of Lucene in my Java projects I''m pretty > >> delighted > >> that there are ongoing efforts to port Lucene to ruby. Thanks to > >> "Brian''s Waste of Time" http://kasparov.skife.org/blog/ > >> <http://kasparov.skife.org/blog/> I''ve found David > >> Balmains "Ferret" http://ferret.davebalmain.com/trac/ > >> <http://ferret.davebalmain.com/trac/> . Great, great, > >> great! Since search is of such a great importance in webapps > >> I''m pretty > >> sure that we all could help out testing... Thank you so much > >> David!!! > >> > >> > >> Thanks for the plug Jan. All feedback, good or bad, is most welcome. > > > > > > Hmm, it rocks? I am poking at it with escalating levels of load > > and it does pretty nicely. Index rebuilding is the pain part, so > > need to muck with Ferret <-> Java Lucene interop, let java do the > > reindexing =) > > > > > > This problem will be solved when I add my C indexer. =) > > > > Dave -- anything specific you want any help with? I''m more than > > happy to poke at code for this if you want it. Otherwise, will > > beat on the api and hunt down bugs =) > > > > > > Definitely beat down on the API. That''s what I need. Once I''m sure I > > won''t need to change much in the Index module, I''ll integrate my C > > indexer. I want to keep the native ruby part of the indexer available > > though. > > > > Also, I''d like to get the analyzers to support European languages at > > least. For example, currently my LetterTokenizer just uses /[A-Za-z]+/ > > to match tokens. I was hoping /[[:alpha:]]+/u would match letters with > > accents etc. but, alas, it doesn''t. Any ideas how to do this? > > > > Dave > > > > -Brian > > > >> > >> Dave > >> _______________________________________________ > >> Rails mailing list > >> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org <mailto:Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org> > >> http://lists.rubyonrails.org/mailman/listinfo/rails > >> <http://lists.rubyonrails.org/mailman/listinfo/rails> > >> > > > > > > _______________________________________________ > > Rails mailing list > > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org <mailto:Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org> > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > > > >------------------------------------------------------------------------ > > > >_______________________________________________ > >Rails mailing list > >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > >http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >_______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails
This is cool, I was looking for something like this for a little while now. I even tried searching google for thinks like "Lucene for ruby", and every other combo but never saw this. Yeah! Ok, now to try it out and see what happens before I get to excited. -Nick
Quick question David. What would you consider to be the best place to keep the index in a rails app so that whole app has access to it? (not talking about files, but the actual ''index'' object) It would seem that you can only really have one instance of the ''Index::Index'' open at a particular time for a particular index. Is this correct? More generally, how would best use this for a multi-threaded app where more then one person/thread could be accessing it at a time? So far I think this will do exactly what I want/need, just looking not to do something completly wrong with the setup. Thanks! -Nick
Hi I wasn''t sure where else to put this, so I''m hoping that I''ll get a reply here. I''ve been playing around with Ferret and seem to be getting two errors consistently. Now, odds are that I''m just doing something wrong, but if anyone could either point me in the right direction or confirm that they have the same problems I''d appreciate that. I have an index created, and have been adding rows with the after_save callback like so: index << { :key => self.id, :type => ''company'', :content => "#{self.name} #{self.specialising}" } Now, the following works: index.search_each(''content:(test co)'') {|doc, score| puts "Document #{doc} with a score of #{score} & id #{index[doc][''key'']}"} Document 0 with a score of 0.105099913427182 & id 3490 Document 1 with a score of 0.105099913427182 & id 3490 However, when I try to do the following it gives me an error: index.search_each(''key:(3489)'') {|doc, score| puts "Document #{doc} with a score of #{score} & id #{index[doc][''key'']}"} NoMethodError: undefined method `weight'' for nil:NilClass from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/search/index_searcher.rb:92:in `search'' from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/index/index.rb:115:in `search'' from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/index/index.rb:127:in `search_each'' from (irb):8 from /usr/lib/ruby/site_ruby/1.8/rubygems/specification.rb:241 and when I surround the search terms with "" and there is only one term I get the following error: index.search_each(''content:"test"'') {|doc, score| puts "Document #{doc} with a score of #{score} & id #{index[doc][''key'']}"} NameError: undefined local variable or method `words'' for #<Ferret::QueryParser:0xb7862950> from lib/ferret/query_parser/query_parser.y:374:in `get_phrase_query'' from lib/ferret/query_parser/query_parser.y:93:in `_reduce_24'' from (irb):15:in `_racc_do_parse_c'' from /usr/lib/ruby/1.8/racc/parser.rb:102:in `do_parse'' from lib/ferret/query_parser/query_parser.y:190:in `parse'' from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/index/index.rb:111:in `search'' from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/index/index.rb:127:in `search_each'' from (irb):15 from /usr/lib/ruby/site_ruby/1.8/rubygems/specification.rb:241 What exactly is going on here? TIA Luke
Hi, Luke, I''m trying something like that right now. Regarding your first error: Are you sure that you have a document with the key of 3489 in your index? It seems just as if there comes no doc from your index (so the result ist nil) and obviously you can''t call the weight method on nil to give out the score... best regards Jan Prill Luke Randall wrote:>Hi > >I wasn''t sure where else to put this, so I''m hoping that I''ll get a >reply here. I''ve been playing around with Ferret and seem to be >getting two errors consistently. Now, odds are that I''m just doing >something wrong, but if anyone could either point me in the right >direction or confirm that they have the same problems I''d appreciate >that. > >I have an index created, and have been adding rows with the after_save >callback like so: > >index << { :key => self.id, :type => ''company'', :content => >"#{self.name} #{self.specialising}" } > >Now, the following works: > >index.search_each(''content:(test co)'') {|doc, score| puts "Document >#{doc} with a score of #{score} & id #{index[doc][''key'']}"} >Document 0 with a score of 0.105099913427182 & id 3490 >Document 1 with a score of 0.105099913427182 & id 3490 > >However, when I try to do the following it gives me an error: > >index.search_each(''key:(3489)'') {|doc, score| puts "Document #{doc} >with a score of #{score} & id #{index[doc][''key'']}"} >NoMethodError: undefined method `weight'' for nil:NilClass > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/search/index_searcher.rb:92:in >`search'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/index/index.rb:115:in >`search'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/index/index.rb:127:in >`search_each'' > from (irb):8 > from /usr/lib/ruby/site_ruby/1.8/rubygems/specification.rb:241 > > >and when I surround the search terms with "" and there is only one >term I get the following error: > >index.search_each(''content:"test"'') {|doc, score| puts "Document >#{doc} with a score of #{score} & id #{index[doc][''key'']}"} >NameError: undefined local variable or method `words'' for >#<Ferret::QueryParser:0xb7862950> > from lib/ferret/query_parser/query_parser.y:374:in `get_phrase_query'' > from lib/ferret/query_parser/query_parser.y:93:in `_reduce_24'' > from (irb):15:in `_racc_do_parse_c'' > from /usr/lib/ruby/1.8/racc/parser.rb:102:in `do_parse'' > from lib/ferret/query_parser/query_parser.y:190:in `parse'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/index/index.rb:111:in >`search'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/index/index.rb:127:in >`search_each'' > from (irb):15 > from /usr/lib/ruby/site_ruby/1.8/rubygems/specification.rb:241 > >What exactly is going on here? > >TIA > >Luke >_______________________________________________ >Rails mailing list >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org >http://lists.rubyonrails.org/mailman/listinfo/rails > > >
Hi Luke, You''re the best tester yet. That''s two bugs from two, one that I''d already found and one that I hadn''t. I''m just about to put out another minor release now so grab 0.1.2 and keep testing. ;-) Regards, Dave On 10/25/05, Luke Randall <luke.randall-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > Hi > > I wasn''t sure where else to put this, so I''m hoping that I''ll get a > reply here. I''ve been playing around with Ferret and seem to be > getting two errors consistently. Now, odds are that I''m just doing > something wrong, but if anyone could either point me in the right > direction or confirm that they have the same problems I''d appreciate > that. > > I have an index created, and have been adding rows with the after_save > callback like so: > > index << { :key => self.id <http://self.id>, :type => ''company'', :content > => > "#{self.name <http://self.name>} #{self.specialising}" } > > Now, the following works: > > index.search_each(''content:(test co)'') {|doc, score| puts "Document > #{doc} with a score of #{score} & id #{index[doc][''key'']}"} > Document 0 with a score of 0.105099913427182 & id 3490 > Document 1 with a score of 0.105099913427182 & id 3490 > > However, when I try to do the following it gives me an error: > > index.search_each(''key:(3489)'') {|doc, score| puts "Document #{doc} > with a score of #{score} & id #{index[doc][''key'']}"} > NoMethodError: undefined method `weight'' for nil:NilClass > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1 > /lib/ferret/search/index_searcher.rb:92:in > `search'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1 > /lib/ferret/index/index.rb:115:in > `search'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1 > /lib/ferret/index/index.rb:127:in > `search_each'' > from (irb):8 > from /usr/lib/ruby/site_ruby/1.8/rubygems/specification.rb:241 > > > and when I surround the search terms with "" and there is only one > term I get the following error: > > index.search_each(''content:"test"'') {|doc, score| puts "Document > #{doc} with a score of #{score} & id #{index[doc][''key'']}"} > NameError: undefined local variable or method `words'' for > #<Ferret::QueryParser:0xb7862950> > from lib/ferret/query_parser/query_parser.y:374:in `get_phrase_query'' > from lib/ferret/query_parser/query_parser.y:93:in `_reduce_24'' > from (irb):15:in `_racc_do_parse_c'' > from /usr/lib/ruby/1.8/racc/parser.rb:102:in `do_parse'' > from lib/ferret/query_parser/query_parser.y:190:in `parse'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1 > /lib/ferret/index/index.rb:111:in > `search'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1 > /lib/ferret/index/index.rb:127:in > `search_each'' > from (irb):15 > from /usr/lib/ruby/site_ruby/1.8/rubygems/specification.rb:241 > > What exactly is going on here? > > TIA > > Luke > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >_______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails
You''re not the only one Luke. I''m seeing the exact behavior here. Doing search terms on strings it seems to work fine, say I do a index.search("content:z") works fine (doesn''t return any results or anything). But if I do index.search("content:4") it blows up giving the Nil error message that is shown below. So, doing search on integers appears to be busted? As its not the single character search as shown above. Looking through the code it seems as though you are performing checks that the query is a String object which seems unnecessary to me, or at least misunderstood. Why not be able to search on numbers? On 10/24/05, Luke Randall <luke.randall-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Hi > > I wasn''t sure where else to put this, so I''m hoping that I''ll get a > reply here. I''ve been playing around with Ferret and seem to be > getting two errors consistently. Now, odds are that I''m just doing > something wrong, but if anyone could either point me in the right > direction or confirm that they have the same problems I''d appreciate > that. > > I have an index created, and have been adding rows with the after_save > callback like so: > > index << { :key => self.id, :type => ''company'', :content => > "#{self.name} #{self.specialising}" } > > Now, the following works: > > index.search_each(''content:(test co)'') {|doc, score| puts "Document > #{doc} with a score of #{score} & id #{index[doc][''key'']}"} > Document 0 with a score of 0.105099913427182 & id 3490 > Document 1 with a score of 0.105099913427182 & id 3490 > > However, when I try to do the following it gives me an error: > > index.search_each(''key:(3489)'') {|doc, score| puts "Document #{doc} > with a score of #{score} & id #{index[doc][''key'']}"} > NoMethodError: undefined method `weight'' for nil:NilClass > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/search/index_searcher.rb:92:in > `search'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/index/index.rb:115:in > `search'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/index/index.rb:127:in > `search_each'' > from (irb):8 > from /usr/lib/ruby/site_ruby/1.8/rubygems/specification.rb:241 > > > and when I surround the search terms with "" and there is only one > term I get the following error: > > index.search_each(''content:"test"'') {|doc, score| puts "Document > #{doc} with a score of #{score} & id #{index[doc][''key'']}"} > NameError: undefined local variable or method `words'' for > #<Ferret::QueryParser:0xb7862950> > from lib/ferret/query_parser/query_parser.y:374:in `get_phrase_query'' > from lib/ferret/query_parser/query_parser.y:93:in `_reduce_24'' > from (irb):15:in `_racc_do_parse_c'' > from /usr/lib/ruby/1.8/racc/parser.rb:102:in `do_parse'' > from lib/ferret/query_parser/query_parser.y:190:in `parse'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/index/index.rb:111:in > `search'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.1.1/lib/ferret/index/index.rb:127:in > `search_each'' > from (irb):15 > from /usr/lib/ruby/site_ruby/1.8/rubygems/specification.rb:241 > > What exactly is going on here? > > TIA > > Luke > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
> Are you sure that you have a document with the key of 3489 in your > index?Hi Jan Yes, I figured that the error regarding the weight method is a symptom of it not returning any results. However, that''s where I''m stumped because I know that I DO have a document with a key of 3489. Accessing it directly as index[0][''key''] returns 3489, so I know it''s there. I''m wondering if maybe having a straight integer as the field value changes things somehow, although this doesn''t seem obvious from the docs so I doubt it.
On 10/25/05, Luke Randall <luke.randall-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> I''m wondering if maybe having a straight integer as the field value > changes things somehow, although this doesn''t seem obvious from the > docs so I doubt it. >The problem was that a different analyzer was being used by the query parser to the one being used by the indexer. I''ve fixed that now. By the way, if you want to try out the query parser without having to create indexes and all that jazz, just try; ruby /usr/lib/ruby/gems/1.8/gems/ferret-0.1.2 /lib/ferret/query_parser/query_parser.tab.rb Or whereever it happens to be on your machine. You can do the same with that standard analyzer to see what happens to your strings when they get tokenized. ruby /usr/lib/ruby/gems/1.8/gems/ferret-0.1.2 /lib/ferret/analysis/standard_tokenizer.rb Enjoy, Dave _______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails
On 10/24/05, Nick Stuart <nicholas.stuart-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > Quick question David. What would you consider to be the best place to > keep the index in a rails app so that whole app has access to it? (not > talking about files, but the actual ''index'' object) > > It would seem that you can only really have one instance of the > ''Index::Index'' open at a particular time for a particular index. Is > this correct? More generally, how would best use this for a > multi-threaded app where more then one person/thread could be > accessing it at a time?I''m not really sure about this just yet. I need to look at how the Java guys are doing it. I''ll be sure to put a howto up when I know. Basically, I think you''ll have to use the lower level IndexReader and IndexWriter. Anyway, I''ll make this high priority and try to get back to you as soon as I know. Dave Thanks!> -Nick >_______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails
Luke, as I mentioned in my previous e-mail, I believe this is exactly what is causing the issue. Its throwing an exception because the query it tries to run is Nil, not because it didn''t return anything. The query is nil because the query parser seems to not like straight integers. Funny thing is if you do a range, "> 3 & <5" you get back what you expected (the document with id of 4. Seems something is amiss here. Debugging through the QueryParser it looks like its actually parsing it out right, but somewhere between the QueryParse and the Index Searcher the query is getting set to null. The only place I couldn''t follow the code was the call to do_parse in QueryParser.parse. Spitting the contents of @q out right before that call I get something like: -- WORDid -- :: -- WORD4 -- false$ Which looks fine to me, its just seems to go away after this though. On 10/24/05, Luke Randall <luke.randall-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > Are you sure that you have a document with the key of 3489 in your > > index? > > Hi Jan > > Yes, I figured that the error regarding the weight method is a symptom > of it not returning any results. However, that''s where I''m stumped > because I know that I DO have a document with a key of 3489. Accessing > it directly as index[0][''key''] returns 3489, so I know it''s there. > > I''m wondering if maybe having a straight integer as the field value > changes things somehow, although this doesn''t seem obvious from the > docs so I doubt it. > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
I had the same thoughts on the IndexReader and IndexWriter. Almost all of the people using the site will simply need access to the reader and will not have/need any access to the writer. I was looking through the docs, but couldn''t find a way to make a new seperate reader with out going through Index::Index (not really what I want since only one Index::Index can be active at a time due to write-locks). Thanks for looking into this for me! Look forward to hearing something back. -Nick On 10/24/05, David Balmain <dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> On 10/24/05, Nick Stuart <nicholas.stuart-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > Quick question David. What would you consider to be the best place to > > keep the index in a rails app so that whole app has access to it? (not > > talking about files, but the actual ''index'' object) > > > > It would seem that you can only really have one instance of the > > ''Index::Index'' open at a particular time for a particular index. Is > > this correct? More generally, how would best use this for a > > multi-threaded app where more then one person/thread could be > > accessing it at a time? > > > I''m not really sure about this just yet. I need to look at how the Java > guys are doing it. I''ll be sure to put a howto up when I know. Basically, I > think you''ll have to use the lower level IndexReader and IndexWriter. > Anyway, I''ll make this high priority and try to get back to you as soon as I > know. > > Dave > > > > Thanks! > > -Nick > > > >
Just an FYI on Java Lucene.... it''s no problem to index from multiple threads, but only a _single_ IndexWriter instance may be used. This locking prevents the index from getting out of sync. It is possible to have multiple IndexSearcher''s open at the same time against an index in Java Lucene, but it is not really recommended, and certainly not necessary. A single instance of IndexSearcher is the recommended approach, keeping it cached over multiple requests to allow for some caching of the term dictionary and sorting keys into RAM to help with successive searches. Erik On 24 Oct 2005, at 13:25, Nick Stuart wrote:> I had the same thoughts on the IndexReader and IndexWriter. Almost all > of the people using the site will simply need access to the reader and > will not have/need any access to the writer. I was looking through the > docs, but couldn''t find a way to make a new seperate reader with out > going through Index::Index (not really what I want since only one > Index::Index can be active at a time due to write-locks). > > Thanks for looking into this for me! Look forward to hearing > something back. > > -Nick > > On 10/24/05, David Balmain <dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > >> On 10/24/05, Nick Stuart <nicholas.stuart-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >>> Quick question David. What would you consider to be the best >>> place to >>> keep the index in a rails app so that whole app has access to it? >>> (not >>> talking about files, but the actual ''index'' object) >>> >>> It would seem that you can only really have one instance of the >>> ''Index::Index'' open at a particular time for a particular index. Is >>> this correct? More generally, how would best use this for a >>> multi-threaded app where more then one person/thread could be >>> accessing it at a time? >>> >> >> >> I''m not really sure about this just yet. I need to look at how >> the Java >> guys are doing it. I''ll be sure to put a howto up when I know. >> Basically, I >> think you''ll have to use the lower level IndexReader and IndexWriter. >> Anyway, I''ll make this high priority and try to get back to you as >> soon as I >> know. >> >> Dave >> >> >> >>> Thanks! >>> -Nick >>> >>> >> >> >> > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
> You''re the best tester yet. That''s two bugs from two, one that I''d already found > and one that I hadn''t. I''m just about to put out another minor release now so grab > 0.1.2 and keep testing. ;-)Hey, well I''m happy that I was able to help. Thanks for the quick response. I''ve dloaded 0.1.2 and so far it''s working great!> The problem was that a different analyzer was being used by the query > parser to the one being used by the indexer. I''ve fixed that now. By the > way, if you want to try out the query parser without having to create > indexes and all that jazz, just try; > > ruby > /usr/lib/ruby/gems/1.8/gems/ferret-0.1.2/lib/ferret/query_parser/query_parser.tab.rbThanks, this will help me out a lot. Regards Luke
Makes sense, and nice to see a familiar face around here Erik (know you from several Java lists). Looking through the Index.rb file for ferret it looks like you (david) only allow for one reader OR one writer to be open, but not both. This would seem to work fine for single threaded environment, but no so well other wise. Can see issues where you''ll open a reader, but then someone might come through and try and write and then erase the reader. In any case, it looks as though the ''lower'' level API through the direct Reader/Writer classes will work out better for this type of multi-threaded setup. No a problem in my book, but should be known to others as well. On 10/24/05, Erik Hatcher <erik-LIifS8st6VgJvtFkdXX2HpqQE7yCjDx5@public.gmane.org> wrote:> Just an FYI on Java Lucene.... it''s no problem to index from multiple > threads, but only a _single_ IndexWriter instance may be used. This > locking prevents the index from getting out of sync. > > It is possible to have multiple IndexSearcher''s open at the same time > against an index in Java Lucene, but it is not really recommended, > and certainly not necessary. A single instance of IndexSearcher is > the recommended approach, keeping it cached over multiple requests to > allow for some caching of the term dictionary and sorting keys into > RAM to help with successive searches. > > Erik > > > > > On 24 Oct 2005, at 13:25, Nick Stuart wrote: > > > I had the same thoughts on the IndexReader and IndexWriter. Almost all > > of the people using the site will simply need access to the reader and > > will not have/need any access to the writer. I was looking through the > > docs, but couldn''t find a way to make a new seperate reader with out > > going through Index::Index (not really what I want since only one > > Index::Index can be active at a time due to write-locks). > > > > Thanks for looking into this for me! Look forward to hearing > > something back. > > > > -Nick > > > > On 10/24/05, David Balmain <dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > >> On 10/24/05, Nick Stuart <nicholas.stuart-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > >> > >>> Quick question David. What would you consider to be the best > >>> place to > >>> keep the index in a rails app so that whole app has access to it? > >>> (not > >>> talking about files, but the actual ''index'' object) > >>> > >>> It would seem that you can only really have one instance of the > >>> ''Index::Index'' open at a particular time for a particular index. Is > >>> this correct? More generally, how would best use this for a > >>> multi-threaded app where more then one person/thread could be > >>> accessing it at a time? > >>> > >> > >> > >> I''m not really sure about this just yet. I need to look at how > >> the Java > >> guys are doing it. I''ll be sure to put a howto up when I know. > >> Basically, I > >> think you''ll have to use the lower level IndexReader and IndexWriter. > >> Anyway, I''ll make this high priority and try to get back to you as > >> soon as I > >> know. > >> > >> Dave > >> > >> > >> > >>> Thanks! > >>> -Nick > >>> > >>> > >> > >> > >> > > _______________________________________________ > > Rails mailing list > > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
On 10/25/05, Nick Stuart <nicholas.stuart-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > Makes sense, and nice to see a familiar face around here Erik (know > you from several Java lists). Looking through the Index.rb file for > ferret it looks like you (david) only allow for one reader OR one > writer to be open, but not both. This would seem to work fine for > single threaded environment, but no so well other wise. Can see issues > where you''ll open a reader, but then someone might come through and > try and write and then erase the reader.Actually, I''ll need to synchronize a few of those methods but apart from that it should be fine. What Ferret::Index::Index does is flush the writer whenever you do any read or search so you''ll be searching on the latest index. So in the multithreaded environment you mentioned (once I fix the aformentioned synch problem) the reader will get updated next time you try to read. (This doesn''t apply to the reader, writer or searcher methods and I''ll probably make them private.) This is fine if performance isn''t a concern. Otherwise, you should do what Erik said. I recommend reading his book "Lucene in Action". Most of it also applies to Ferret and one of us will probably port the example code over to Ruby some time soon. In any case, it looks as though the ''lower'' level API through the> direct Reader/Writer classes will work out better for this type of > multi-threaded setup. No a problem in my book, but should be known to > others as well. > > On 10/24/05, Erik Hatcher <erik-LIifS8st6VgJvtFkdXX2HpqQE7yCjDx5@public.gmane.org> wrote: > > Just an FYI on Java Lucene.... it''s no problem to index from multiple > > threads, but only a _single_ IndexWriter instance may be used. This > > locking prevents the index from getting out of sync. > > > > It is possible to have multiple IndexSearcher''s open at the same time > > against an index in Java Lucene, but it is not really recommended, > > and certainly not necessary. A single instance of IndexSearcher is > > the recommended approach, keeping it cached over multiple requests to > > allow for some caching of the term dictionary and sorting keys into > > RAM to help with successive searches. > > > > Erik > > > > > > > > > > On 24 Oct 2005, at 13:25, Nick Stuart wrote: > > > > > I had the same thoughts on the IndexReader and IndexWriter. Almost all > > > of the people using the site will simply need access to the reader and > > > will not have/need any access to the writer. I was looking through the > > > docs, but couldn''t find a way to make a new seperate reader with out > > > going through Index::Index (not really what I want since only one > > > Index::Index can be active at a time due to write-locks). > > > > > > Thanks for looking into this for me! Look forward to hearing > > > something back. > > > > > > -Nick > > > > > > On 10/24/05, David Balmain <dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > > >> On 10/24/05, Nick Stuart <nicholas.stuart-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > >> > > >>> Quick question David. What would you consider to be the best > > >>> place to > > >>> keep the index in a rails app so that whole app has access to it? > > >>> (not > > >>> talking about files, but the actual ''index'' object) > > >>> > > >>> It would seem that you can only really have one instance of the > > >>> ''Index::Index'' open at a particular time for a particular index. Is > > >>> this correct? More generally, how would best use this for a > > >>> multi-threaded app where more then one person/thread could be > > >>> accessing it at a time? > > >>> > > >> > > >> > > >> I''m not really sure about this just yet. I need to look at how > > >> the Java > > >> guys are doing it. I''ll be sure to put a howto up when I know. > > >> Basically, I > > >> think you''ll have to use the lower level IndexReader and IndexWriter. > > >> Anyway, I''ll make this high priority and try to get back to you as > > >> soon as I > > >> know. > > >> > > >> Dave > > >> > > >> > > >> > > >>> Thanks! > > >>> -Nick > > >>> > > >>> > > >> > > >> > > >> > > > _______________________________________________ > > > Rails mailing list > > > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > > > _______________________________________________ > > Rails mailing list > > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >_______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails
Great! And ya, I understood what the Index class was doing, but saw that it was particularly safe for web-apps. If you are going to change that, well then I would have no reason no to use it. :) I''ll at least use it the first phase of the app and see what happens with performance. I dont expect a whole lot of traffic on the site, and even less searching it so we''ll see. -Nick On 10/24/05, David Balmain <dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> On 10/25/05, Nick Stuart <nicholas.stuart-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > Makes sense, and nice to see a familiar face around here Erik (know > > you from several Java lists). Looking through the Index.rb file for > > ferret it looks like you (david) only allow for one reader OR one > > writer to be open, but not both. This would seem to work fine for > > single threaded environment, but no so well other wise. Can see issues > > where you''ll open a reader, but then someone might come through and > > try and write and then erase the reader. > > Actually, I''ll need to synchronize a few of those methods but apart from > that it should be fine. What Ferret::Index::Index does is flush the writer > whenever you do any read or search so you''ll be searching on the latest > index. So in the multithreaded environment you mentioned (once I fix the > aformentioned synch problem) the reader will get updated next time you try > to read. (This doesn''t apply to the reader, writer or searcher methods and > I''ll probably make them private.) This is fine if performance isn''t a > concern. > > Otherwise, you should do what Erik said. I recommend reading his book > "Lucene in Action". Most of it also applies to Ferret and one of us will > probably port the example code over to Ruby some time soon. > > > > > In any case, it looks as though the ''lower'' level API through the > > direct Reader/Writer classes will work out better for this type of > > multi-threaded setup. No a problem in my book, but should be known to > > others as well. > > > > On 10/24/05, Erik Hatcher <erik-LIifS8st6VgJvtFkdXX2HpqQE7yCjDx5@public.gmane.org> wrote: > > > Just an FYI on Java Lucene.... it''s no problem to index from multiple > > > threads, but only a _single_ IndexWriter instance may be used. This > > > locking prevents the index from getting out of sync. > > > > > > It is possible to have multiple IndexSearcher''s open at the same time > > > against an index in Java Lucene, but it is not really recommended, > > > and certainly not necessary. A single instance of IndexSearcher is > > > the recommended approach, keeping it cached over multiple requests to > > > allow for some caching of the term dictionary and sorting keys into > > > RAM to help with successive searches. > > > > > > Erik > > > > > > > > > > > > > > > On 24 Oct 2005, at 13:25, Nick Stuart wrote: > > > > > > > I had the same thoughts on the IndexReader and IndexWriter. Almost all > > > > of the people using the site will simply need access to the reader and > > > > will not have/need any access to the writer. I was looking through the > > > > docs, but couldn''t find a way to make a new seperate reader with out > > > > going through Index::Index (not really what I want since only one > > > > Index::Index can be active at a time due to write-locks). > > > > > > > > Thanks for looking into this for me! Look forward to hearing > > > > something back. > > > > > > > > -Nick > > > > > > > > On 10/24/05, David Balmain < dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > > > > >> On 10/24/05, Nick Stuart <nicholas.stuart-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > >> > > > >>> Quick question David. What would you consider to be the best > > > >>> place to > > > >>> keep the index in a rails app so that whole app has access to it? > > > >>> (not > > > >>> talking about files, but the actual ''index'' object) > > > >>> > > > >>> It would seem that you can only really have one instance of the > > > >>> ''Index::Index'' open at a particular time for a particular index. Is > > > >>> this correct? More generally, how would best use this for a > > > >>> multi-threaded app where more then one person/thread could be > > > >>> accessing it at a time? > > > >>> > > > >> > > > >> > > > >> I''m not really sure about this just yet. I need to look at how > > > >> the Java > > > >> guys are doing it. I''ll be sure to put a howto up when I know. > > > >> Basically, I > > > >> think you''ll have to use the lower level IndexReader and IndexWriter. > > > >> Anyway, I''ll make this high priority and try to get back to you as > > > >> soon as I > > > >> know. > > > >> > > > >> Dave > > > >> > > > >> > > > >> > > > >>> Thanks! > > > >>> -Nick > > > >>> > > > >>> > > > >> > > > >> > > > >> > > > > _______________________________________________ > > > > Rails mailing list > > > > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > > > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > > > > > > _______________________________________________ > > > Rails mailing list > > > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > _______________________________________________ > > Rails mailing list > > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails > > >