Hi all, I''m new on the list, and glad to participate. I would like to make some questions about the ferret project... - Is the http://ferret.davebalmain.com/ official page of the project? (I''m always getting 502 Bad Gateway) - Where I can find the road map of the project? - In the http://rubyforge.org/projects/ferret/ I see the last realize was in November 28, 2007, that is true? - Is ferret discontinued? Please don''t take this questions as offensive, I really like to know about how ferret is reliable for a long life product. Here on my company we are planning to make a big product with a indexing engine, I would like to know if the ferret is "alive". Thanks for the answers! -- Atenciosamente - Best regards, Fernando Luiz Parisotto -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ferret-talk/attachments/20080819/2f53d469/attachment-0001.html>
I''ve been using Ferret in a project still under development, and it works pretty well. As far as I can tell, the project is dying, if not already dead. David Balmain is still the only listed developer, and he seems to have moved on to other things. However, since the software is still meeting my project''s needs, I am not terribly bothered by that. I suppose that eventually (in a few years?) something will change enough that Ferret will stop working, and then we''ll have to find something else. If you can find an alternative that has active development, I would recommend you go with that. (And if you find one, please post about it.) But, if you can''t, Ferret will probably be good enough for a while. On Tue, Aug 19, 2008 at 3:24 PM, Fernando Parisotto <fernando.parisotto at gmail.com> wrote:> Hi all, > > I''m new on the list, and glad to participate. > I would like to make some questions about the ferret project... > - Is the http://ferret.davebalmain.com/ official page of the project? (I''m > always getting 502 Bad Gateway) > - Where I can find the road map of the project? > - In the http://rubyforge.org/projects/ferret/ I see the last realize was in > November 28, 2007, that is true? > - Is ferret discontinued? > > Please don''t take this questions as offensive, I really like to know about > how ferret is reliable for a long life product. > Here on my company we are planning to make a big product with a indexing > engine, I would like to know if the ferret is "alive". > Thanks for the answers! > > -- > Atenciosamente - Best regards, > > Fernando Luiz Parisotto > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-- Paul Lynch Aquilent, Inc. National Library of Medicine (Contractor)
I would also be interested in Ferret alternatives for IR in ruby, a simple search on rubyforge returned mainly a bunch of projects that look to be abandoned... - Rise (does not appear to be actively developed) - rubylucene (looks to be a dead project) - Ruby Simple Indexer (also looks dead) - Ruby Odeum (simple ruby-bindings for a fast inverted index) If anyone knows of any ruby IR projects which are mature, and are being actively developed I would love to hear about them. Thanks -- Eric On Wednesday, August 27, at 10:29, Paul Lynch wrote: > I''ve been using Ferret in a project still under development, and it > works pretty well. As far as I can tell, the project is dying, if not > already dead. David Balmain is still the only listed developer, and > he seems to have moved on to other things. However, since the > software is still meeting my project''s needs, I am not terribly > bothered by that. I suppose that eventually (in a few years?) > something will change enough that Ferret will stop working, and then > we''ll have to find something else. > > If you can find an alternative that has active development, I would > recommend you go with that. (And if you find one, please post about > it.) But, if you can''t, Ferret will probably be good enough for a > while. > > On Tue, Aug 19, 2008 at 3:24 PM, Fernando Parisotto > <fernando.parisotto at gmail.com> wrote: > > Hi all, > > > > I''m new on the list, and glad to participate. > > I would like to make some questions about the ferret project... > > - Is the http://ferret.davebalmain.com/ official page of the project? (I''m > > always getting 502 Bad Gateway) > > - Where I can find the road map of the project? > > - In the http://rubyforge.org/projects/ferret/ I see the last realize was in > > November 28, 2007, that is true? > > - Is ferret discontinued? > > > > Please don''t take this questions as offensive, I really like to know about > > how ferret is reliable for a long life product. > > Here on my company we are planning to make a big product with a indexing > > engine, I would like to know if the ferret is "alive". > > Thanks for the answers! > > > > -- > > Atenciosamente - Best regards, > > > > Fernando Luiz Parisotto > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > > > -- > Paul Lynch > Aquilent, Inc. > National Library of Medicine (Contractor) > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk -- schulte
On Aug 27, 2008, at 8:20 AM, Eric Schulte wrote:> If anyone knows of any ruby IR projects which are mature, and are > being actively developed I would love to hear about them.FWIW, I recently finished porting all module code in KinoSearch to C. If we write binding code and port the test suite, it will be usable from Ruby. KinoSearch is sort of a sister project to Ferret. The dev branch implements many of the ideas that Dave Balmain and I designed together for the Lucy project. Marvin Humphrey Rectangular Research http://www.rectangular.com/
Reformatted excerpts from Eric Schulte''s message of 2008-08-27:> If anyone knows of any ruby IR projects which are mature, and are > being actively developed I would love to hear about them.sphinxsearch.com Much less useable API than Ferret, and you have to run it as a separate server process, but it''s fast, stable, and actively maintained. -- William <wmorgan-ferret at masanjin.net>
How bout Sphinx? On Wed, Aug 27, 2008 at 11:20 AM, Eric Schulte <schulte.eric at gmail.com>wrote:> I would also be interested in Ferret alternatives for IR in ruby, a > simple search on rubyforge returned mainly a bunch of projects that > look to be abandoned... > > - Rise (does not appear to be actively developed) > - rubylucene (looks to be a dead project) > - Ruby Simple Indexer (also looks dead) > - Ruby Odeum (simple ruby-bindings for a fast inverted index) > > If anyone knows of any ruby IR projects which are mature, and are > being actively developed I would love to hear about them. > > Thanks -- Eric > > On Wednesday, August 27, at 10:29, Paul Lynch wrote: > > I''ve been using Ferret in a project still under development, and it > > works pretty well. As far as I can tell, the project is dying, if not > > already dead. David Balmain is still the only listed developer, and > > he seems to have moved on to other things. However, since the > > software is still meeting my project''s needs, I am not terribly > > bothered by that. I suppose that eventually (in a few years?) > > something will change enough that Ferret will stop working, and then > > we''ll have to find something else. > > > > If you can find an alternative that has active development, I would > > recommend you go with that. (And if you find one, please post about > > it.) But, if you can''t, Ferret will probably be good enough for a > > while. > > > > On Tue, Aug 19, 2008 at 3:24 PM, Fernando Parisotto > > <fernando.parisotto at gmail.com> wrote: > > > Hi all, > > > > > > I''m new on the list, and glad to participate. > > > I would like to make some questions about the ferret project... > > > - Is the http://ferret.davebalmain.com/ official page of the project? > (I''m > > > always getting 502 Bad Gateway) > > > - Where I can find the road map of the project? > > > - In the http://rubyforge.org/projects/ferret/ I see the last realize > was in > > > November 28, 2007, that is true? > > > - Is ferret discontinued? > > > > > > Please don''t take this questions as offensive, I really like to know > about > > > how ferret is reliable for a long life product. > > > Here on my company we are planning to make a big product with a > indexing > > > engine, I would like to know if the ferret is "alive". > > > Thanks for the answers! > > > > > > -- > > > Atenciosamente - Best regards, > > > > > > Fernando Luiz Parisotto > > > > > > _______________________________________________ > > > Ferret-talk mailing list > > > Ferret-talk at rubyforge.org > > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > > > > > > > > -- > > Paul Lynch > > Aquilent, Inc. > > National Library of Medicine (Contractor) > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > -- > schulte > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ferret-talk/attachments/20080827/71c17392/attachment-0001.html>
As far as I know, Sphinx only can only index tables that have a unique numeric id (e.g. and auto-incrementing int).... I looked at using it, but we use md5 hashes for the id/primary key on the tables I want to index... so we were out of luck. For what it''s worth, I use Ferret 0.11.6 and love it. I re-index about ~90 million rows (and growing) worth of "stuff" (title, description, author, etc...) every night... works like a champ. Searching is fast (provided you don''t want to sort on something other than relevance) and accurate. On Wed, Aug 27, 2008 at 9:57 AM, arvind gautam <arvindsg at gmail.com> wrote:> How bout Sphinx? > > > On Wed, Aug 27, 2008 at 11:20 AM, Eric Schulte <schulte.eric at gmail.com>wrote: > >> I would also be interested in Ferret alternatives for IR in ruby, a >> simple search on rubyforge returned mainly a bunch of projects that >> look to be abandoned... >> >> - Rise (does not appear to be actively developed) >> - rubylucene (looks to be a dead project) >> - Ruby Simple Indexer (also looks dead) >> - Ruby Odeum (simple ruby-bindings for a fast inverted index) >> >> If anyone knows of any ruby IR projects which are mature, and are >> being actively developed I would love to hear about them. >> >> Thanks -- Eric >> >> On Wednesday, August 27, at 10:29, Paul Lynch wrote: >> > I''ve been using Ferret in a project still under development, and it >> > works pretty well. As far as I can tell, the project is dying, if not >> > already dead. David Balmain is still the only listed developer, and >> > he seems to have moved on to other things. However, since the >> > software is still meeting my project''s needs, I am not terribly >> > bothered by that. I suppose that eventually (in a few years?) >> > something will change enough that Ferret will stop working, and then >> > we''ll have to find something else. >> > >> > If you can find an alternative that has active development, I would >> > recommend you go with that. (And if you find one, please post about >> > it.) But, if you can''t, Ferret will probably be good enough for a >> > while. >> > >> > On Tue, Aug 19, 2008 at 3:24 PM, Fernando Parisotto >> > <fernando.parisotto at gmail.com> wrote: >> > > Hi all, >> > > >> > > I''m new on the list, and glad to participate. >> > > I would like to make some questions about the ferret project... >> > > - Is the http://ferret.davebalmain.com/ official page of the >> project? (I''m >> > > always getting 502 Bad Gateway) >> > > - Where I can find the road map of the project? >> > > - In the http://rubyforge.org/projects/ferret/ I see the last >> realize was in >> > > November 28, 2007, that is true? >> > > - Is ferret discontinued? >> > > >> > > Please don''t take this questions as offensive, I really like to know >> about >> > > how ferret is reliable for a long life product. >> > > Here on my company we are planning to make a big product with a >> indexing >> > > engine, I would like to know if the ferret is "alive". >> > > Thanks for the answers! >> > > >> > > -- >> > > Atenciosamente - Best regards, >> > > >> > > Fernando Luiz Parisotto >> > > >> > > _______________________________________________ >> > > Ferret-talk mailing list >> > > Ferret-talk at rubyforge.org >> > > http://rubyforge.org/mailman/listinfo/ferret-talk >> > > >> > >> > >> > >> > -- >> > Paul Lynch >> > Aquilent, Inc. >> > National Library of Medicine (Contractor) >> > _______________________________________________ >> > Ferret-talk mailing list >> > Ferret-talk at rubyforge.org >> > http://rubyforge.org/mailman/listinfo/ferret-talk >> >> -- >> schulte >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk >> > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ferret-talk/attachments/20080827/6cd9657c/attachment.html>
Thanks for all the info, I just found a very good related discussion from ruby-forum which I thought I''d share http://www.ruby-forum.com/topic/137629 On Wednesday, August 27, at 11:57, arvind gautam wrote: > How bout Sphinx? > > On Wed, Aug 27, 2008 at 11:20 AM, Eric Schulte <schulte.eric at gmail.com>wrote: > > > I would also be interested in Ferret alternatives for IR in ruby, a > > simple search on rubyforge returned mainly a bunch of projects that > > look to be abandoned... > > > > - Rise (does not appear to be actively developed) > > - rubylucene (looks to be a dead project) > > - Ruby Simple Indexer (also looks dead) > > - Ruby Odeum (simple ruby-bindings for a fast inverted index) > > > > If anyone knows of any ruby IR projects which are mature, and are > > being actively developed I would love to hear about them. > > > > Thanks -- Eric > > > > On Wednesday, August 27, at 10:29, Paul Lynch wrote: > > > I''ve been using Ferret in a project still under development, and it > > > works pretty well. As far as I can tell, the project is dying, if not > > > already dead. David Balmain is still the only listed developer, and > > > he seems to have moved on to other things. However, since the > > > software is still meeting my project''s needs, I am not terribly > > > bothered by that. I suppose that eventually (in a few years?) > > > something will change enough that Ferret will stop working, and then > > > we''ll have to find something else. > > > > > > If you can find an alternative that has active development, I would > > > recommend you go with that. (And if you find one, please post about > > > it.) But, if you can''t, Ferret will probably be good enough for a > > > while. > > > > > > On Tue, Aug 19, 2008 at 3:24 PM, Fernando Parisotto > > > <fernando.parisotto at gmail.com> wrote: > > > > Hi all, > > > > > > > > I''m new on the list, and glad to participate. > > > > I would like to make some questions about the ferret project... > > > > - Is the http://ferret.davebalmain.com/ official page of the project? > > (I''m > > > > always getting 502 Bad Gateway) > > > > - Where I can find the road map of the project? > > > > - In the http://rubyforge.org/projects/ferret/ I see the last realize > > was in > > > > November 28, 2007, that is true? > > > > - Is ferret discontinued? > > > > > > > > Please don''t take this questions as offensive, I really like to know > > about > > > > how ferret is reliable for a long life product. > > > > Here on my company we are planning to make a big product with a > > indexing > > > > engine, I would like to know if the ferret is "alive". > > > > Thanks for the answers! > > > > > > > > -- > > > > Atenciosamente - Best regards, > > > > > > > > Fernando Luiz Parisotto > > > > > > > > _______________________________________________ > > > > Ferret-talk mailing list > > > > Ferret-talk at rubyforge.org > > > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > > > > > > > > > > > > > -- > > > Paul Lynch > > > Aquilent, Inc. > > > National Library of Medicine (Contractor) > > > _______________________________________________ > > > Ferret-talk mailing list > > > Ferret-talk at rubyforge.org > > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > -- > > schulte > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > -- schulte
On Wednesday, August 27, at 08:34, Marvin Humphrey wrote: > KinoSearch is sort of a sister project to Ferret. The dev branch > implements many of the ideas that Dave Balmain and I designed together > for the Lucy project. What is the status of the Lucy project? A ruby api into the venerable library Lucene seems to be the obvious first step towards developing a truly stable effective IR solution for Ruby. The last update on the Lucy webpage http://lucene.apache.org/lucy/ seems to be from 2006. Also, I may be missing something obvious here, but I don''t understand why there is no ruby API directly to the Lucene Java library, why would the only Lucene/Ruby API be to the C-port of lucene? Much Thanks -- Eric -- schulte
On Aug 27, 2008, at 11:36 AM, Eric Schulte wrote:> What is the status of the Lucy project?The dev branch of KinoSearch is basically Lucy. When Dave became unavailable, I didn''t really have anyone else to bounce ideas off of for Lucy (since it was a from-scratch project without a community), so I returned to the established KS community -- but took the code base in the direction that Dave and I had worked out. My current plan is to make an official KinoSearch release for Perl, write some experimental bindings for other languages, achieve stability, then make KinoSearch the "maint" branch and Lucy the "dev" branch.> Also, I may be missing something obvious here, but I don''t understand > why there is no ruby API directly to the Lucene Java library,If you want to use Lucene, just go with Solr. Marvin Humphrey Rectangular Research http://www.rectangular.com/
On Aug 27, 2008, at 11:20 AM, Eric Schulte wrote:> If anyone knows of any ruby IR projects which are mature, and are > being actively developed I would love to hear about them.disclaimer: highly opinionated response follows.... :) Solr is the way to go for Ruby projects*. solr-ruby, if I do say so myself, ain''t half bad. It''s downright beautiful to interact with Solr via Ruby: <http://wiki.apache.org/solr/solr-ruby>. I have plenty of wishes for where solr-ruby could still evolve, so it''s not done yet. * pragmatically I realize that another moving piece, especially a JVM, isn''t a good fit for many current production deployment environments. See below for my answer to that... Ferret is awesome, let me be clear about that! I have always loved it''s power, even beyond Lucene Java in some cases. But I''ve stuck with Lucene through the tough times and it''s always been good to me. Solr''s goodness on top of Lucene Java make it extremely compelling for every environment, be it Ruby, Python, Java itself, what have you. I''ve always been fonder of the JVM than native C stuff, and when Ferret went that direction I stuck with Java. acts_as_solr, however, hasn''t yet reached its potential - and my little hack that kick started it wasn''t really beneficial to the community, my apologies - since I basically "abandoned" it. But it ain''t half bad either thanks to Thiago''s hard work, and does make cake work out of RDBMS <-> Solr, whereas it takes something this ugly to do it in Java: <http://wiki.apache.org/solr/DataImportHandler> (oh Ruby how I love you!). Solr is incredibly powerful, beyond the features I think almost all of the other open source search engines offer. It''s scalability evolves almost daily, as does the pluggability capabilities of it. And for those JRuby folks out there.... well, I guess there aren''t (m)any of those on the ferret list, but think about the possibilities... SolrJRuby! Wow. Erik
On Wed, 2008-08-27 at 08:34 -0700, Marvin Humphrey wrote:> > FWIW, I recently finished porting all module code in KinoSearch to C. > If we write binding code and port the test suite, it will be usable > from Ruby. > > KinoSearch is sort of a sister project to Ferret. The dev branch > implements many of the ideas that Dave Balmain and I designed together > for the Lucy project.Hi Marvin, In my experience the Ruby community is crying out for a "drop-in" replacement for Ferret. Sphinx is great, but different. Xapian looks good but doesn''t have the Ruby maturity of Ferret yet (especially considering acts_as_ferret). I keep coming across people using Ferret successfully but have little niggles here and there. Is KinoSearch something that could be a Ferret replacement? Or the foundations of a Ferret replacement? What are the differences between it and Ferret? Out of interest, what are the differences between it and the planned Lucy project (would be good to hear more about what your plans were for Lucy. Maybe it''ll inspire somebody else?) Do you happen to know if Dave is likely to work on Ferret again someday? I think we''ve seen some commits from him recentlyish but no word I''ve seen. Hope all is well. Thanks, John. -- http://johnleach.co.uk
Hi, Le 28 ao?t 08 ? 12:11, John Leach a ?crit :> In my experience the Ruby community is crying out for a "drop-in" > replacement for Ferret. Sphinx is great, but different. Xapian looks > good but doesn''t have the Ruby maturity of Ferret yet (especially > considering acts_as_ferret). I keep coming across people using Ferret > successfully but have little niggles here and there.The best would probably be to have some of us dig into ferret and help to fix the remeaining bugs! I''d like to experiment with beanstalkd http://xph.us/software/beanstalkd/ which - I''ve been told - is a better alternative to Drb for background indexing. Still using ferret on many websites, and it''s so simple to use, why use something else ?
On Aug 27, 2008, at 3:28 PM, Marvin Humphrey wrote:> On Aug 27, 2008, at 11:36 AM, Eric Schulte wrote: > >> >> Also, I may be missing something obvious here, but I don''t understand >> why there is no ruby API directly to the Lucene Java library,Mainly because Ruby has been too slow to have something pure. Ferret is about as close as it gets to Lucene Java compatibility, and really only diverged from the file format because of wise practical reasons.>> If you want to use Lucene, just go with Solr.+1 Solr is great in Ruby environments to. Really it is. Sure, there''s this JVM beast, and deployment issues, and all that, but they generally aren''t that painful. And the benefits are totally worth it. Erik
Hi! On 27.08.2008, at 20:20, Eric Schulte wrote:> Thanks for all the info, I just found a very good related discussion > from ruby-forum which I thought I''d share > > http://www.ruby-forum.com/topic/137629well, in this discussion there''s (besides some useful information) some pretty biased statements from several people who obviously must have had a frustrating time with Ferret, or just didn''t get it working right out of the box and decided it was cheaper to make their clients switch search technology (and possibly losing features) than to fix their deployment. I never had somebody from engine yard contact me regarding their massive ferret deployment problems, not sure how hard they really tried to get over them. Imho it''s not very likely that it''s Ferret''s fault that, while all around the world people are running ferret based apps fine, *every* client of engine yard experiences the same set of problems... So here''s my very own biased opinion just to complete the picture :) I use Ferret in several productive projects with several customers, and also choose it for new projects like the soon-to-be-released new full text search for the german selfhtml.org portal or the search feature at www.fahrrad-xxl.de, which tightly integrates aaf with rdig (shameless plug: selfhtml.org search will be powered by Stellr [1] ;-). I have absolutely no problem with Ferret not being very actively maintained, because it works for me just like it is. Honestly, I *never* had ferret segfault in any one one of my own production apps. (But I admit I saw it segfault in other places, maybe I just don''t do the right things to make it crash...) So why do I stick to Ferret while others declare it a ''dead'' project? Ferret''s flexibility and feature set plus the level of Rails integration it offers by means of aaf is very unlikely to be reached by any other combination of search engine lib + Rails plugin in the near future. Having that said, I''m really interested how the KinoSearch/Lucy stuff will go on... Solr, while being an interesting project without doubt, won''t ever reach the level of Rails integration that''s possible with acts_as_ferret, simply because it''s server doesn''t run in the context of the rails app with model classes and all that stuff. It''s an independent server indexing whatever you throw over the fence via http +xml. That framework independence is a great plus under some circumstances (and my Stellr project scratches exactly that itch in a much more lightweight and undoubtedly less scalable manner), but sometimes it''s also a bad thing. How to use a custom analyzer with solr? You have to code it in Java (or you do your analysis before feeding the data into java land, which I wouldn''t consider good app design). But even if you do that then you have a) half a java project (I don''t want that) and b) no way to use your existing rails classes in that custom analyzer (I *have* analyzers using rails models to retrieve synonyms and narrower terms for thesaurus based query expansion) Not to speak of Sphinx here, which offers even less integration with your Rails application because it''s tied directly to the database and doesn''t support stuff like real incremental indexing. It''s easy to be several times faster when you leave out most of the features... Of course there are lots of use cases where Sphinx or Solr are perfectly valid choices, because their feature set suits the requirements and/or you''re comfortable with running a servlet container in your production env and spreading your application logic across several languages. Here''s what I would do *if* I experienced severe problems with Ferret in any of my projects: Take aaf, replace Ferret with Lucene or even make it modular to decide at run time which one to use, run the DRb server (or the whole app, that depends) under JRuby and call it acts_as_lucene :-) Et voila - great Rails integration plus Lucene''s maturity. But as long as Ferret''s working fine for me that''s really unlikely to happen... Unless somebody wants to sponsor that project, of course ;) Cheers, Jens [1] http://rubyforge.org/projects/stellr -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database
On Aug 28, 2008, at 9:52 AM, Jens Kraemer wrote:> So here''s my very own biased opinion just to complete the picture :)Hey, software should be opinionated! That''s totally fair :)> (shameless plug: selfhtml.org search will be powered by Stellr > [1] ;-).Stellr - great name. Interesting... that''s pretty sweet.> Solr, while being an interesting project without doubt, won''t ever > reach the level of Rails integration that''s possible with > acts_as_ferret, simply because it''s server doesn''t run in the > context of the rails app with model classes and all that stuff.What advantage does Ferret have in terms of ActiveRecord integration that Solr wouldn''t have? If you''re talking about custom analyzers being in Ruby, more on that below.> It''s an independent server indexing whatever you throw over the > fence via http+xml.Solr can index CSV as well now a relational database directly (with the new DataImportHandler). It also responds with Ruby hash structure (just add &wt=ruby to the URLs, or use solr-ruby which does that automatically and hides all server communication from you anyway).> How to use a custom analyzer with solr? You have to code it in Java > (or you do your analysis before feeding the data into java land, > which I wouldn''t consider good app design).Most users would not need to write a custom analyzer. Many of the built-in ones are quite configurable. Yes, Solr does require schema configuration via an XML file, but there have been acts_as_solr variants (good and bad thing about this git craze) that generate that for you automatically from an AR model.> But even if you do that then you have > a) half a java project (I don''t want that)That''s totally fair, and really the primary compelling reason for a Ferret over Solr for pure Ruby/Rails projects. I dig that. But isn''t Ferret is like 60k lines of C code too?!> and b) no way to use your existing rails classes in that custom > analyzer (I *have* analyzers using rails models to retrieve synonyms > and narrower terms for thesaurus based query expansion)You could leverage client-side query expansion with Solr... just take the users query, massage it, and send whatever query you like to Solr. Solr also has synonym and stop word capability too. However, there is also no reason (and I have this on my copious-free- time-TOOD-list) that JRuby couldn''t be used behind the scenes of a Solr analyzer/tokenizer/filter or even request handler... and do all the cool Ruby stuff you like right there. Heck, you could even send the Ruby code over to Solr to execute there if you like ;)> Here''s what I would do *if* I experienced severe problems with > Ferret in any of my projects: > > Take aaf, replace Ferret with Lucene or even make it modular to > decide at run time which one to use, run the DRb server (or the > whole app, that depends) under JRuby and call it acts_as_lucene :-) > Et voila - great Rails integration plus Lucene''s maturity. But as > long as Ferret''s working fine for me that''s really unlikely to > happen... Unless somebody wants to sponsor that project, of course ;)Just using Solr and fixing up acts_as_solr to meet your needs (if it doesn''t) would be even easier than all that :) Solr really is a better starting point than Lucene directly, for caching, scalability, replication, faceting, etc. I''d be curious to see scalability comparisons between Ferret and Solr - or perhaps more properly between Stellr and Solr - as it boils down to number of documents, queries per second, and faceting and highlighting speed. I''m betting on Solr myself (by being so into it and basing my professional life on it). Erik
That is one awesome rebuttal, Jens. I read that forum topic below, and while I have a great respect for Ezra (from his fine book Deploying Rails Applications), I must say I disagree with him with respect to Ferret/AAF combination. We run Ferret/AAF as a DRb server in production and on our staging servers and I''ve never seen a Ferret segfault. That said, we''re not high search load like Google, but even when hit with heavy load testing, I haven''t experienced a Ferret segfault, nor corrupt indexes. Now, corrupt indexes in development is another issue. In development, you are not running a DRb server. Each mongrel is hitting the index directly. You typically have only one mongrel running in development. But if you open an interactive script/console session, and play with your models side-by-side a running mongrel, you WILL corrupt your Ferret index. That''s because both the mongrel and the script/console will be writers to the same index, something that Ferret doesn''t support. Heck, running a rake db:migrate along side a running mongrel will cause index corruption, for the same reason: multiple writers. I''m wondering if that''s why so many people experience Ferret indexing problems in development? It''s not immediately obvious that you''re in a multiple writer scenario some times. For now, I''m sticking with the Ferret/AAF combination until one or the other falls over completely. Sheldon Maloff Developer http://ideas.veer.com On 08-Aug-28, at 7:52 AM, Jens Kraemer wrote:> Hi! > > On 27.08.2008, at 20:20, Eric Schulte wrote: >> Thanks for all the info, I just found a very good related discussion >> from ruby-forum which I thought I''d share >> >> http://www.ruby-forum.com/topic/137629 > > well, in this discussion there''s (besides some useful information) > some pretty biased statements from several people who obviously must > have had a frustrating time with Ferret, or just didn''t get it > working right out of the box and decided it was cheaper to make > their clients switch search technology (and possibly losing > features) than to fix their deployment. I never had somebody from > engine yard contact me regarding their massive ferret deployment > problems, not sure how hard they really tried to get over them.
On Aug 28, 2008, at 3:11 AM, John Leach wrote:> Is KinoSearch something that could be a Ferret replacement?Yes. The projects are roughly comparable. I''d be happier if Ferret''s ultimate successor was named "Lucy", though, because then more credit would flow to Dave.> What are the differences between it and Ferret?From a high level, they''re pretty similar. Analyzer, QueryParser, IndexReader, and all that. There are superficial differences in the implementations of individual classes. For instance, Ferret provides several different Tokenizer classes; KinoSearch provides one, based on a regex pattern matching one token. # KinoSearch version of WhiteSpaceTokenizer tokenizer = Tokenizer.new(:pattern => "\\S+") At a low level, things start to diverge. For instance, all metadata in the KinoSearch index file format is encoded as JSON, so it''s human- readable for easy spelunking and debugging. Also, it''s easier to override methods in KinoSearch, so you can do things like implement SearchServer/SearchClient or MockScorer or KSx::Highlight::Summarizer in pure Perl; I believe the mechanism will work similarly with Ruby bindings.> what are the differences between it and the planned Lucy projectPersonally, I think of them as the same project. KinoSearch is at version 0.x and will soon become version 1.0. Lucy will be version 2 -- KinoSearch''s successor. Lucy has never had a high-level API -- the work Dave and I did was all on the low-level core. That core has now been fully implemented in the KinoSearch dev branch. What happens between version 1 and 2 depends on how the rollout of version 1 goes.> Do you happen to know if Dave is likely to work on Ferret again > someday?I know he would like to. However, I hope to persuade him to return to his work on Lucy. :) Marvin Humphrey Rectangular Research http://www.rectangular.com/
On 28.08.2008, at 17:17, Erik Hatcher wrote:> On Aug 28, 2008, at 9:52 AM, Jens Kraemer wrote: >> >> Solr, while being an interesting project without doubt, won''t ever >> reach the level of Rails integration that''s possible with >> acts_as_ferret, simply because it''s server doesn''t run in the >> context of the rails app with model classes and all that stuff. > > What advantage does Ferret have in terms of ActiveRecord integration > that Solr wouldn''t have? > > If you''re talking about custom analyzers being in Ruby, more on that > below.It''s not only custom analyzers, but the fact that acts_as_ferret''s DRb runs with the full Rails application loaded, so i.e. to bulk index a number of records aaf just hands the server the ids and class name of the records to index, and the server does the rest. It''s debatable if one approach is better than the other, in terms of index server load it might even be better to do as much as possible on the client side, but still it''s a much tighter coupling than you get with the application agnostic interfaces of solr or stellr. I must admit that I have a hard time to come up with another example besides my synonym/thesaurus analysis stuff where this might useful, but I think there are more use cases where such a tight integration might come in handy.>> It''s an independent server indexing whatever you throw over the >> fence via http+xml. > > Solr can index CSV as well now a relational database directly (with > the new DataImportHandler). > > It also responds with Ruby hash structure (just add &wt=ruby to the > URLs, or use solr-ruby which does that automatically and hides all > server communication from you anyway).Yeah, I know, but anyway there is a strict line between your application and Solr, which doesn''t know a thing about the application using it.>> How to use a custom analyzer with solr? You have to code it in Java >> (or you do your analysis before feeding the data into java land, >> which I wouldn''t consider good app design). > > Most users would not need to write a custom analyzer. Many of the > built-in ones are quite configurable. Yes, Solr does require schema > configuration via an XML file, but there have been acts_as_solr > variants (good and bad thing about this git craze) that generate > that for you automatically from an AR model.Glad you mentioned this ;) I don''t want to configure an analyzer via xml when I can throw my own together with 4 or 5 lines of easy to read ruby code. Same for index structure. Philosophical mismatch between the Java and Ruby worlds I think :)>> But even if you do that then you have >> a) half a java project (I don''t want that) > > That''s totally fair, and really the primary compelling reason for a > Ferret over Solr for pure Ruby/Rails projects. I dig that. > > But isn''t Ferret is like 60k lines of C code too?!true, but I don''t have to compile that every time I deploy my app...>> and b) no way to use your existing rails classes in that custom >> analyzer (I *have* analyzers using rails models to retrieve >> synonyms and narrower terms for thesaurus based query expansion) > > You could leverage client-side query expansion with Solr... just > take the users query, massage it, and send whatever query you like > to Solr. Solr also has synonym and stop word capability too.yeah, I could do that. But that''s moving analysis stuff into my application, which is quite contrary to the purpose of analyzers - encapsulate this logic and make it pluggable into the search engine library. So less style points for this solution...> However, there is also no reason (and I have this on my copious-free- > time-TOOD-list) that JRuby couldn''t be used behind the scenes of a > Solr analyzer/tokenizer/filter or even request handler... and do all > the cool Ruby stuff you like right there. Heck, you could even send > the Ruby code over to Solr to execute there if you like ;)that sounds sexy ;)>> Here''s what I would do *if* I experienced severe problems with >> Ferret in any of my projects: >> >> Take aaf, replace Ferret with Lucene or even make it modular to >> decide at run time which one to use, run the DRb server (or the >> whole app, that depends) under JRuby and call it acts_as_lucene :-) >> Et voila - great Rails integration plus Lucene''s maturity. But as >> long as Ferret''s working fine for me that''s really unlikely to >> happen... Unless somebody wants to sponsor that project, of course ;) > > Just using Solr and fixing up acts_as_solr to meet your needs (if it > doesn''t) would be even easier than all that :) Solr really is a > better starting point than Lucene directly, for caching, > scalability, replication, faceting, etc.Depends on whether you need these features or not. From my experience, lots of projects don''t need these things anyway, because they''re running on a single host and nearly every other part of the application is slower than search... Maybe it''s because I''m quite involved with the topic and am familiar with lucene''s API, but to me Solr looks like an additional layer of abstraction and complexity which I only want to have when it really gives me a feature I need. Plus the last time I checked Lucene didn''t need xml configuration files ;) In development environments and especially when it comes to automated tests / CI it''s also quite comfortable not having to run a separate server but using the short cut directly to the index, which isn''t possible with Solr.> I''d be curious to see scalability comparisons between Ferret and > Solr - or perhaps more properly between Stellr and Solr - as it > boils down to number of documents, queries per second, and faceting > and highlighting speed. I''m betting on Solr myself (by being so > into it and basing my professional life on it).This would be interesting, but I wouldn''t be that disappointed with Stellr ending up second given the little amount of time I''ve spent building it so far. Just out of curiosity, do you have some kind of performance testing suite for Solr which I could throw at Stellr? Cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database
Hi! On 28.08.2008, at 18:24, Marvin Humphrey wrote: [..]> There are superficial differences in the implementations of > individual classes. For instance, Ferret provides several different > Tokenizer classes; KinoSearch provides one, based on a regex pattern > matching one token. > > # KinoSearch version of WhiteSpaceTokenizer > tokenizer = Tokenizer.new(:pattern => "\\S+")That''s pretty simple ;) With Ferret I can use custom tokenizers to inject additional terms at the same offset (i.e., synonyms), is there another way to achieve that with KinoSearch? [..]>> Do you happen to know if Dave is likely to work on Ferret again >> someday? > > I know he would like to. However, I hope to persuade him to return > to his work on Lucy. :)whatever, as long as it''s as powerful and easy to use as Ferret and has ruby bindings I''m all for it :) Cheers, Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49351467660 | Telefax +493514676666 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold
On Aug 28, 2008, at 10:10 AM, Jens Kr?mer wrote:> With Ferret I can use custom tokenizers to inject additional terms > at the same offset (i.e., synonyms), is there another way to achieve > that with KinoSearch?Synonym support isn''t part of the public API right now, but since the basic principle is the same in KinoSearch as it is in Ferret and Lucene, it shouldn''t be hard to add. I don''t think we''d do this by extending Tokenizer; I think we''d want SynonymFilter/SynonymMap classes akin to the ones provided by Solr. Marvin Humphrey Rectangular Research http://www.rectangular.com/
On Aug 28, 2008, at 1:02 PM, Jens Kraemer wrote:>> What advantage does Ferret have in terms of ActiveRecord >> integration that Solr wouldn''t have? >> >> If you''re talking about custom analyzers being in Ruby, more on >> that below. > > It''s not only custom analyzers, but the fact that acts_as_ferret''s > DRb runs with the full Rails application loaded, so i.e. to bulk > index a number of records aaf just hands the server the ids and > class name of the records to index, and the server does the rest.Gotcha. Meaning the search server is pulling from the DB directly. That''s what the DataImportHandler in Solr does as well. It''d be a simple single HTTP request to Solr (once the DB stuff is configured, of course) to have it do full or incremental DB indexing.>>> >>> How to use a custom analyzer with solr? You have to code it in >>> Java (or you do your analysis before feeding the data into java >>> land, which I wouldn''t consider good app design). >> >> Most users would not need to write a custom analyzer. Many of the >> built-in ones are quite configurable. Yes, Solr does require >> schema configuration via an XML file, but there have been >> acts_as_solr variants (good and bad thing about this git craze) >> that generate that for you automatically from an AR model. > > Glad you mentioned this ;) I don''t want to configure an analyzer via > xml when I can throw my own together with 4 or 5 lines of easy to > read ruby code. Same for index structure. Philosophical mismatch > between the Java and Ruby worlds I think :)Don''t get me wrong... I''m a Ruby fanatic myself! XML makes me ill, generally speaking (it has its uses, but for configuration it is just plain wrong). For using the built-in tokenizer/filters, a smarter acts_as_solr could generate the right config based on a model specifying parameters for analysis.>>> But even if you do that then you have >>> a) half a java project (I don''t want that) >> >> That''s totally fair, and really the primary compelling reason for a >> Ferret over Solr for pure Ruby/Rails projects. I dig that. >> >> But isn''t Ferret is like 60k lines of C code too?! > > true, but I don''t have to compile that every time I deploy my app...My point was that Ferret isn''t just Ruby, just a counter point to your "half a java project". No one has to recompile Solr either.>>> and b) no way to use your existing rails classes in that custom >>> analyzer (I *have* analyzers using rails models to retrieve >>> synonyms and narrower terms for thesaurus based query expansion) >> >> You could leverage client-side query expansion with Solr... just >> take the users query, massage it, and send whatever query you like >> to Solr. Solr also has synonym and stop word capability too. > > yeah, I could do that. But that''s moving analysis stuff into my > application, which is quite contrary to the purpose of analyzers - > encapsulate this logic and make it pluggable into the search engine > library. So less style points for this solution...I was just saying :) It''s debatable exactly where in the client- server spectrum synonym expansion belongs... and it really depends on the needs of the project. Nothing wrong with a client doing some user input massaging before a query hits the search server.>> However, there is also no reason (and I have this on my copious- >> free-time-TOOD-list) that JRuby couldn''t be used behind the scenes >> of a Solr analyzer/tokenizer/filter or even request handler... and >> do all the cool Ruby stuff you like right there. Heck, you could >> even send the Ruby code over to Solr to execute there if you like ;) > > that sounds sexy ;)Should be fairly trivial to wire JRuby in. The DataImportHandler already has scripting language support for data transformation: <http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9 > (shield your eyes from the XML wrapping it!), so I believe JRuby should already work in that context. This is sort of like the Mapper stuff I built into solr-ruby, transforming data from domain to search engine "documents".>>> Here''s what I would do *if* I experienced severe problems with >>> Ferret in any of my projects: >>> >>> Take aaf, replace Ferret with Lucene or even make it modular to >>> decide at run time which one to use, run the DRb server (or the >>> whole app, that depends) under JRuby and call it acts_as_lucene :-) >>> Et voila - great Rails integration plus Lucene''s maturity. But as >>> long as Ferret''s working fine for me that''s really unlikely to >>> happen... Unless somebody wants to sponsor that project, of >>> course ;) >> >> Just using Solr and fixing up acts_as_solr to meet your needs (if >> it doesn''t) would be even easier than all that :) Solr really is a >> better starting point than Lucene directly, for caching, >> scalability, replication, faceting, etc. > > Depends on whether you need these features or not. From my > experience, lots of projects don''t need these things anyway, because > they''re running on a single host and nearly every other part of the > application is slower than search... Maybe it''s because I''m quite > involved with the topic and am familiar with lucene''s API, but to me > Solr looks like an additional layer of abstraction and complexity > which I only want to have when it really gives me a feature I need. > Plus the last time I checked Lucene didn''t need xml configuration > files ;)I hear ya about the XML config files. And always to be fair to Solr here, you really only need to set things up from a basic example configuration that covers most scenarios already - so it really isn''t necessary to even touch XML config except for tweaking little things. But Solr''s advantages over just Lucene are built out of experiences that most Lucene projects eventually build anyway. Caching - really important for faceting, which is a need that every project I touch these days needs. Replication - really really important for scalability of massive querying load. It''s really not such a big chunk over Lucene to bite off... and in almost all respects it is even simpler to use Solr than Lucene anyway.> In development environments and especially when it comes to > automated tests / CI it''s also quite comfortable not having to run a > separate server but using the short cut directly to the index, which > isn''t possible with Solr.Not true. Solr can work embedded. There is a base SolrServer abstraction, with an implementation that runs embedded (inside the same JVM) versus over HTTP. Exactly the same interface for both operations, using a very simple API (SolrJ, much like Lucene''s basic API actually).>> I''d be curious to see scalability comparisons between Ferret and >> Solr - or perhaps more properly between Stellr and Solr - as it >> boils down to number of documents, queries per second, and faceting >> and highlighting speed. I''m betting on Solr myself (by being so >> into it and basing my professional life on it). > > This would be interesting, but I wouldn''t be that disappointed with > Stellr ending up second given the little amount of time I''ve spent > building it so far. Just out of curiosity, do you have some kind of > performance testing suite for Solr which I could throw at Stellr?No, I don''t have those kinds of tests myself. While I can speak to Solr''s performance based on what I hear from our clients and the reports in the mailing lists, I don''t consider myself a performance savvy person myself. I''m curious - what are the numbers of documents being put into Ferret indexes out there? millions? hundreds of millions? billions? And are folks doing faceting? Does Ferret have faceting support? Erik
On 28.08.2008, at 20:03, Erik Hatcher wrote:> > On Aug 28, 2008, at 1:02 PM, Jens Kraemer wrote: >>> What advantage does Ferret have in terms of ActiveRecord >>> integration that Solr wouldn''t have? >>> >>> If you''re talking about custom analyzers being in Ruby, more on >>> that below. >> >> It''s not only custom analyzers, but the fact that acts_as_ferret''s >> DRb runs with the full Rails application loaded, so i.e. to bulk >> index a number of records aaf just hands the server the ids and >> class name of the records to index, and the server does the rest. > > Gotcha. Meaning the search server is pulling from the DB directly. > That''s what the DataImportHandler in Solr does as well. It''d be a > simple single HTTP request to Solr (once the DB stuff is configured, > of course) to have it do full or incremental DB indexing.With the slight difference that custom model logic defined in the rails model class is still involved to preprocess data, index values calculated at indexing time or even have certain records refuse being indexed based on their current state. Having per document boosts depending on some value from the database (i.e. record popularity) is also a classic... Aaf never just pulls data from the db, it always uses rails model objects. Doesn''t make indexing faster of course... [..]> XML makes me ill, generally speaking (it has its uses, but for > configuration it is just plain wrong).FULL ACK :)> For using the built-in tokenizer/filters, a smarter acts_as_solr > could generate the right config based on a model specifying > parameters for analysis. > >>>> But even if you do that then you have >>>> a) half a java project (I don''t want that) >>> >>> That''s totally fair, and really the primary compelling reason for >>> a Ferret over Solr for pure Ruby/Rails projects. I dig that. >>> >>> But isn''t Ferret is like 60k lines of C code too?! >> >> true, but I don''t have to compile that every time I deploy my app... > > My point was that Ferret isn''t just Ruby, just a counter point to > your "half a java project". No one has to recompile Solr either.but the custom analyzer implemented in Java... By saying ''half a java project'' I didn''t mean solr, but the parts of my application logic that have to be implemented in Java in order to be plugged into solr. But the JRuby route looks promising here of course.>>>> and b) no way to use your existing rails classes in that custom >>>> analyzer (I *have* analyzers using rails models to retrieve >>>> synonyms and narrower terms for thesaurus based query expansion) >>> >>> You could leverage client-side query expansion with Solr... just >>> take the users query, massage it, and send whatever query you like >>> to Solr. Solr also has synonym and stop word capability too. >> >> yeah, I could do that. But that''s moving analysis stuff into my >> application, which is quite contrary to the purpose of analyzers - >> encapsulate this logic and make it pluggable into the search engine >> library. So less style points for this solution... > > I was just saying :) It''s debatable exactly where in the client- > server spectrum synonym expansion belongs... and it really depends > on the needs of the project. Nothing wrong with a client doing some > user input massaging before a query hits the search server.[..]>>>> Here''s what I would do *if* I experienced severe problems with >>>> Ferret in any of my projects: >>>> >>>> Take aaf, replace Ferret with Lucene or even make it modular to >>>> decide at run time which one to use, run the DRb server (or the >>>> whole app, that depends) under JRuby and call it acts_as_lucene :-) >>>> Et voila - great Rails integration plus Lucene''s maturity. But as >>>> long as Ferret''s working fine for me that''s really unlikely to >>>> happen... Unless somebody wants to sponsor that project, of >>>> course ;) >>> >>> Just using Solr and fixing up acts_as_solr to meet your needs (if >>> it doesn''t) would be even easier than all that :) Solr really is >>> a better starting point than Lucene directly, for caching, >>> scalability, replication, faceting, etc. >> >> Depends on whether you need these features or not. From my >> experience, lots of projects don''t need these things anyway, >> because they''re running on a single host and nearly every other >> part of the application is slower than search... Maybe it''s because >> I''m quite involved with the topic and am familiar with lucene''s >> API, but to me Solr looks like an additional layer of abstraction >> and complexity which I only want to have when it really gives me a >> feature I need. Plus the last time I checked Lucene didn''t need xml >> configuration files ;) > > I hear ya about the XML config files. And always to be fair to Solr > here, you really only need to set things up from a basic example > configuration that covers most scenarios already - so it really > isn''t necessary to even touch XML config except for tweaking little > things.But I still have to read it in order to see if it fits my needs. Okay, I''ll stop whining about that xml now ;) [..]>> In development environments and especially when it comes to >> automated tests / CI it''s also quite comfortable not having to run >> a separate server but using the short cut directly to the index, >> which isn''t possible with Solr. > > Not true. Solr can work embedded. There is a base SolrServer > abstraction, with an implementation that runs embedded (inside the > same JVM) versus over HTTP. Exactly the same interface for both > operations, using a very simple API (SolrJ, much like Lucene''s basic > API actually).cool, but that won''t work for Rails projects running on MRI and accessing solr via solr-ruby.>>> I''d be curious to see scalability comparisons between Ferret and >>> Solr - or perhaps more properly between Stellr and Solr - as it >>> boils down to number of documents, queries per second, and >>> faceting and highlighting speed. I''m betting on Solr myself (by >>> being so into it and basing my professional life on it). >> >> This would be interesting, but I wouldn''t be that disappointed with >> Stellr ending up second given the little amount of time I''ve spent >> building it so far. Just out of curiosity, do you have some kind of >> performance testing suite for Solr which I could throw at Stellr? > > No, I don''t have those kinds of tests myself. While I can speak to > Solr''s performance based on what I hear from our clients and the > reports in the mailing lists, I don''t consider myself a performance > savvy person myself. > > I''m curious - what are the numbers of documents being put into > Ferret indexes out there? millions? hundreds of millions? > billions? And are folks doing faceting? Does Ferret have faceting > support?not sure about the billions, but afair an earlier message in this thread stated an index size of 90 million documents with aaf. Altlaw.org has reported an index size of > 4GB with around 700k documents last fall. The selfhtml.org index has approximately 1 million forum entries indexed, index size around 2GB. Stellr doesn''t ever use more than around 50MB of RAM during indexing and searching this index. I know RAM is cheap and all, but RAM size still has a quite large influence on the price of the server you rent for your app, at least here in germany. Without doubt Solr has much more references in the area of such large installations than ferret/aaf. I for myself never saw aaf as a drop-in solution for indexes of this size, but more as an easy to use out of the box solution for the average rails app with maybe several thousands or tens of thousands records, but I''m happy to see it still works in larger scale setups. Heck, it all began with a simple full text search for my blog ;) Regarding the faceting - it''s not built into ferret, and aaf doesn''t support it either since I didn''t need it yet, and nobody else requested this feature so far. All in all I think the average usage scenarios of solr and aaf are quite different atm... I''ll try to find the time to benchmark the selfhtml.org data set with solr and stellr. I''ll report my findings here. Cheers, Jens -- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database
On Aug 28, 2008, at 3:02 PM, Jens Kraemer wrote:>> Gotcha. Meaning the search server is pulling from the DB >> directly. That''s what the DataImportHandler in Solr does as well. >> It''d be a simple single HTTP request to Solr (once the DB stuff is >> configured, of course) to have it do full or incremental DB indexing. > > With the slight difference that custom model logic defined in the > rails model class is still involved to preprocess data, index values > calculated at indexing time or even have certain records refuse > being indexed based on their current state. Having per document > boosts depending on some value from the database (i.e. record > popularity) is also a classic... Aaf never just pulls data from the > db, it always uses rails model objects. Doesn''t make indexing faster > of course...All great points. ActiveRecord is much more pleasant than any other database access that I''ve ever worked with. I don''t generally work with databases personally, though. The bulk of my full-text searching experiences don''t involve databases at all. I suppose the Java counterpart would be Hibernate Search - surely involving a lot more hideous XML and @annotations - ewww.>>> >>> In development environments and especially when it comes to >>> automated tests / CI it''s also quite comfortable not having to run >>> a separate server but using the short cut directly to the index, >>> which isn''t possible with Solr. >> >> Not true. Solr can work embedded. There is a base SolrServer >> abstraction, with an implementation that runs embedded (inside the >> same JVM) versus over HTTP. Exactly the same interface for both >> operations, using a very simple API (SolrJ, much like Lucene''s >> basic API actually). > > cool, but that won''t work for Rails projects running on MRI and > accessing solr via solr-ruby.Fair point. Again, the answer comes back to JRuby ;) Forget MRI. Good point about solr-ruby - it is specifically designed for Solr over HTTP. It wouldn''t take much to refactor it to work with embedded Solr via JRuby though. But if JRuby is a given, it''d be just as easy to work with SolrJ''s API directly. Though for testing purposes, solr-ruby is easily mocked. solr-ruby touts great (98% or something like that) code coverage with unit tests, many of those tests are against solr-ruby''s API with Solr itself mocked. And there are tests that fire up Solr in the background and test that way too for full functional tests. So for unit testing purposes, having Solr running isn''t needed, but it launches plenty fast enough for testing end-to-end if desired.>> I''m curious - what are the numbers of documents being put into >> Ferret indexes out there? millions? hundreds of millions? >> billions? And are folks doing faceting? Does Ferret have faceting >> support? > > not sure about the billions, but afair an earlier message in this > thread stated an index size of 90 million documents with aaf. > Altlaw.org has reported an index size of > 4GB with around 700k > documents last fall. The selfhtml.org index has approximately 1 > million forum entries indexed, index size around 2GB. Stellr doesn''t > ever use more than around 50MB of RAM during indexing and searching > this index. I know RAM is cheap and all, but RAM size still has a > quite large influence on the price of the server you rent for your > app, at least here in germany.90 million is impressive for sure. RAM - well, when Ferret/Stellr does faceting we''ll revisit that discussion :) Solr loves RAM! It still can run in modest environments, but the more RAM you can give it to use for caches (depending on your needs) the better it is.> Without doubt Solr has much more references in the area of such > large installations than ferret/aaf. I for myself never saw aaf as a > drop-in solution for indexes of this size, but more as an easy to > use out of the box solution for the average rails app with maybe > several thousands or tens of thousands records, but I''m happy to > see it still works in larger scale setups.Indeed! ferret: +1 - no question!> Heck, it all began with a simple full text search for my blog ;)Same for me (though I abandoned it when I realized that regular blogging and server maintenance weren''t for me).> Regarding the faceting - it''s not built into ferret, and aaf doesn''t > support it either since I didn''t need it yet, and nobody else > requested this feature so far. All in all I think the average usage > scenarios of solr and aaf are quite different atm...I''m really surprised by that. Faceting is the major feature that attracts folks to Solr. It''s critical for all of our customers. But yeah, no question that Lucene/Solr and Ferret/Stellr can happily coexist and aren''t necessarily competition for every project. But there definitely are those areas of overlap where a project could go with either solution. And I would definitely not try to shoehorn Solr into a project where it didn''t fit and Ferret worked fine. I''m pragmatic like that.> I''ll try to find the time to benchmark the selfhtml.org data set > with solr and stellr. I''ll report my findings here.Awesome. If you have the data in some easily digestible format, I''d be happy to toss it into Solr and report back numbers from my development machine. Drop me a line offline if you''d like. Erik