thr3ads.net - Ferret talk - [Ferret-talk] Road map of ferret [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Fernando Parisotto

2008-Aug-19 19:24 UTC

[Ferret-talk] Road map of ferret

Hi all,

I''m new on the list, and glad to participate.
I would like to make some questions about the ferret project...
- Is the http://ferret.davebalmain.com/ official page of the project?
(I''m
always getting 502 Bad Gateway)
- Where I can find the road map of the project?
- In the http://rubyforge.org/projects/ferret/ I see the last realize was in
November 28, 2007, that is true?
- Is ferret discontinued?

Please don''t take this questions as offensive, I really like to know
about
how ferret is reliable for a long life product.
Here on my company we are planning to make a big product with a indexing
engine, I would like to know if the ferret is "alive".
Thanks for the answers!

-- 
Atenciosamente - Best regards,

Fernando Luiz Parisotto
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/ferret-talk/attachments/20080819/2f53d469/attachment-0001.html>

Paul Lynch

2008-Aug-27 14:29 UTC

head link

[Ferret-talk] Road map of ferret

I''ve been using Ferret in a project still under development, and it
works pretty well.  As far as I can tell, the project is dying, if not
already dead.  David Balmain is still the only listed developer, and
he seems to have moved on to other things.  However, since the
software is still meeting my project''s needs, I am not terribly
bothered by that.  I suppose that eventually (in a few years?)
something will change enough that Ferret will stop working, and then
we''ll have to find something else.

If you can find an alternative that has active development, I would
recommend you go with that.  (And if you find one, please post about
it.)  But, if you can''t, Ferret will probably be good enough for a
while.

On Tue, Aug 19, 2008 at 3:24 PM, Fernando Parisotto
<fernando.parisotto at gmail.com> wrote:> Hi all,
>
> I''m new on the list, and glad to participate.
> I would like to make some questions about the ferret project...
> - Is the http://ferret.davebalmain.com/ official page of the project?
(I''m
> always getting 502 Bad Gateway)
> - Where I can find the road map of the project?
> - In the http://rubyforge.org/projects/ferret/ I see the last realize was
in
> November 28, 2007, that is true?
> - Is ferret discontinued?
>
> Please don''t take this questions as offensive, I really like to
know about
> how ferret is reliable for a long life product.
> Here on my company we are planning to make a big product with a indexing
> engine, I would like to know if the ferret is "alive".
> Thanks for the answers!
>
> --
> Atenciosamente - Best regards,
>
> Fernando Luiz Parisotto
>
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>

-- 
Paul Lynch
Aquilent, Inc.
National Library of Medicine (Contractor)

Eric Schulte

2008-Aug-27 15:20 UTC

head link

[Ferret-talk] Road map of ferret

I would also be interested in Ferret alternatives for IR in ruby, a
simple search on rubyforge returned mainly a bunch of projects that
look to be abandoned...

- Rise (does not appear to be actively developed)
- rubylucene (looks to be a dead project)
- Ruby Simple Indexer (also looks dead)
- Ruby Odeum (simple ruby-bindings for a fast inverted index)

If anyone knows of any ruby IR projects which are mature, and are
being actively developed I would love to hear about them.

Thanks -- Eric

On Wednesday, August 27, at 10:29, Paul Lynch wrote:
 > I''ve been using Ferret in a project still under development, and
it
 > works pretty well.  As far as I can tell, the project is dying, if not
 > already dead.  David Balmain is still the only listed developer, and
 > he seems to have moved on to other things.  However, since the
 > software is still meeting my project''s needs, I am not terribly
 > bothered by that.  I suppose that eventually (in a few years?)
 > something will change enough that Ferret will stop working, and then
 > we''ll have to find something else.
 > 
 > If you can find an alternative that has active development, I would
 > recommend you go with that.  (And if you find one, please post about
 > it.)  But, if you can''t, Ferret will probably be good enough for
a
 > while.
 > 
 > On Tue, Aug 19, 2008 at 3:24 PM, Fernando Parisotto
 > <fernando.parisotto at gmail.com> wrote:
 > > Hi all,
 > >
 > > I''m new on the list, and glad to participate.
 > > I would like to make some questions about the ferret project...
 > > - Is the http://ferret.davebalmain.com/ official page of the project?
(I''m
 > > always getting 502 Bad Gateway)
 > > - Where I can find the road map of the project?
 > > - In the http://rubyforge.org/projects/ferret/ I see the last realize
was in
 > > November 28, 2007, that is true?
 > > - Is ferret discontinued?
 > >
 > > Please don''t take this questions as offensive, I really like
to know about
 > > how ferret is reliable for a long life product.
 > > Here on my company we are planning to make a big product with a
indexing
 > > engine, I would like to know if the ferret is "alive".
 > > Thanks for the answers!
 > >
 > > --
 > > Atenciosamente - Best regards,
 > >
 > > Fernando Luiz Parisotto
 > >
 > > _______________________________________________
 > > Ferret-talk mailing list
 > > Ferret-talk at rubyforge.org
 > > http://rubyforge.org/mailman/listinfo/ferret-talk
 > >
 > 
 > 
 > 
 > -- 
 > Paul Lynch
 > Aquilent, Inc.
 > National Library of Medicine (Contractor)
 > _______________________________________________
 > Ferret-talk mailing list
 > Ferret-talk at rubyforge.org
 > http://rubyforge.org/mailman/listinfo/ferret-talk

-- 
schulte

Marvin Humphrey

2008-Aug-27 15:34 UTC

head link

[Ferret-talk] Road map of ferret

On Aug 27, 2008, at 8:20 AM, Eric Schulte wrote:
> If anyone knows of any ruby IR projects which are mature, and are
> being actively developed I would love to hear about them.
FWIW, I recently finished porting all module code in KinoSearch to C.   
If we write binding code and port the test suite, it will be usable  
from Ruby.

KinoSearch is sort of a sister project to Ferret.  The dev branch  
implements many of the ideas that Dave Balmain and I designed together  
for the Lucy project.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

William Morgan

2008-Aug-27 15:35 UTC

head link

[Ferret-talk] Road map of ferret

Reformatted excerpts from Eric Schulte''s message of
2008-08-27:> If anyone knows of any ruby IR projects which are mature, and are
> being actively developed I would love to hear about them.
sphinxsearch.com

Much less useable API than Ferret, and you have to run it as a separate
server process, but it''s fast, stable, and actively maintained.
-- 
William <wmorgan-ferret at masanjin.net>

arvind gautam

2008-Aug-27 15:57 UTC

head link

[Ferret-talk] Road map of ferret

How bout Sphinx?

On Wed, Aug 27, 2008 at 11:20 AM, Eric Schulte <schulte.eric at
gmail.com>wrote:
> I would also be interested in Ferret alternatives for IR in ruby, a
> simple search on rubyforge returned mainly a bunch of projects that
> look to be abandoned...
>
> - Rise (does not appear to be actively developed)
> - rubylucene (looks to be a dead project)
> - Ruby Simple Indexer (also looks dead)
> - Ruby Odeum (simple ruby-bindings for a fast inverted index)
>
> If anyone knows of any ruby IR projects which are mature, and are
> being actively developed I would love to hear about them.
>
> Thanks -- Eric
>
> On Wednesday, August 27, at 10:29, Paul Lynch wrote:
>  > I''ve been using Ferret in a project still under development,
and it
>  > works pretty well.  As far as I can tell, the project is dying, if
not
>  > already dead.  David Balmain is still the only listed developer, and
>  > he seems to have moved on to other things.  However, since the
>  > software is still meeting my project''s needs, I am not
terribly
>  > bothered by that.  I suppose that eventually (in a few years?)
>  > something will change enough that Ferret will stop working, and then
>  > we''ll have to find something else.
>  >
>  > If you can find an alternative that has active development, I would
>  > recommend you go with that.  (And if you find one, please post about
>  > it.)  But, if you can''t, Ferret will probably be good enough
for a
>  > while.
>  >
>  > On Tue, Aug 19, 2008 at 3:24 PM, Fernando Parisotto
>  > <fernando.parisotto at gmail.com> wrote:
>  > > Hi all,
>  > >
>  > > I''m new on the list, and glad to participate.
>  > > I would like to make some questions about the ferret project...
>  > > - Is the http://ferret.davebalmain.com/ official page of the
project?
> (I''m
>  > > always getting 502 Bad Gateway)
>  > > - Where I can find the road map of the project?
>  > > - In the http://rubyforge.org/projects/ferret/ I see the last
realize
> was in
>  > > November 28, 2007, that is true?
>  > > - Is ferret discontinued?
>  > >
>  > > Please don''t take this questions as offensive, I really
like to know
> about
>  > > how ferret is reliable for a long life product.
>  > > Here on my company we are planning to make a big product with a
> indexing
>  > > engine, I would like to know if the ferret is "alive".
>  > > Thanks for the answers!
>  > >
>  > > --
>  > > Atenciosamente - Best regards,
>  > >
>  > > Fernando Luiz Parisotto
>  > >
>  > > _______________________________________________
>  > > Ferret-talk mailing list
>  > > Ferret-talk at rubyforge.org
>  > > http://rubyforge.org/mailman/listinfo/ferret-talk
>  > >
>  >
>  >
>  >
>  > --
>  > Paul Lynch
>  > Aquilent, Inc.
>  > National Library of Medicine (Contractor)
>  > _______________________________________________
>  > Ferret-talk mailing list
>  > Ferret-talk at rubyforge.org
>  > http://rubyforge.org/mailman/listinfo/ferret-talk
>
> --
> schulte
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/ferret-talk/attachments/20080827/71c17392/attachment-0001.html>

Jeremy Hopple

2008-Aug-27 16:21 UTC

head link

[Ferret-talk] Road map of ferret

As far as I know, Sphinx only can only index tables that have a unique
numeric id (e.g. and auto-incrementing int)....  I looked at using it, but
we use md5 hashes for the id/primary key on the tables I want to index... so
we were out of luck.
For what it''s worth, I use Ferret 0.11.6 and love it.  I re-index about
~90
million rows (and growing) worth of "stuff" (title, description,
author,
etc...) every night...  works like a champ.  Searching is fast (provided you
don''t want to sort on something other than relevance) and accurate.

On Wed, Aug 27, 2008 at 9:57 AM, arvind gautam <arvindsg at gmail.com>
wrote:
> How bout Sphinx?
>
>
> On Wed, Aug 27, 2008 at 11:20 AM, Eric Schulte <schulte.eric at
gmail.com>wrote:
>
>> I would also be interested in Ferret alternatives for IR in ruby, a
>> simple search on rubyforge returned mainly a bunch of projects that
>> look to be abandoned...
>>
>> - Rise (does not appear to be actively developed)
>> - rubylucene (looks to be a dead project)
>> - Ruby Simple Indexer (also looks dead)
>> - Ruby Odeum (simple ruby-bindings for a fast inverted index)
>>
>> If anyone knows of any ruby IR projects which are mature, and are
>> being actively developed I would love to hear about them.
>>
>> Thanks -- Eric
>>
>> On Wednesday, August 27, at 10:29, Paul Lynch wrote:
>>  > I''ve been using Ferret in a project still under
development, and it
>>  > works pretty well.  As far as I can tell, the project is dying,
if not
>>  > already dead.  David Balmain is still the only listed developer,
and
>>  > he seems to have moved on to other things.  However, since the
>>  > software is still meeting my project''s needs, I am not
terribly
>>  > bothered by that.  I suppose that eventually (in a few years?)
>>  > something will change enough that Ferret will stop working, and
then
>>  > we''ll have to find something else.
>>  >
>>  > If you can find an alternative that has active development, I
would
>>  > recommend you go with that.  (And if you find one, please post
about
>>  > it.)  But, if you can''t, Ferret will probably be good
enough for a
>>  > while.
>>  >
>>  > On Tue, Aug 19, 2008 at 3:24 PM, Fernando Parisotto
>>  > <fernando.parisotto at gmail.com> wrote:
>>  > > Hi all,
>>  > >
>>  > > I''m new on the list, and glad to participate.
>>  > > I would like to make some questions about the ferret
project...
>>  > > - Is the http://ferret.davebalmain.com/ official page of the
>> project? (I''m
>>  > > always getting 502 Bad Gateway)
>>  > > - Where I can find the road map of the project?
>>  > > - In the http://rubyforge.org/projects/ferret/ I see the
last
>> realize was in
>>  > > November 28, 2007, that is true?
>>  > > - Is ferret discontinued?
>>  > >
>>  > > Please don''t take this questions as offensive, I
really like to know
>> about
>>  > > how ferret is reliable for a long life product.
>>  > > Here on my company we are planning to make a big product
with a
>> indexing
>>  > > engine, I would like to know if the ferret is
"alive".
>>  > > Thanks for the answers!
>>  > >
>>  > > --
>>  > > Atenciosamente - Best regards,
>>  > >
>>  > > Fernando Luiz Parisotto
>>  > >
>>  > > _______________________________________________
>>  > > Ferret-talk mailing list
>>  > > Ferret-talk at rubyforge.org
>>  > > http://rubyforge.org/mailman/listinfo/ferret-talk
>>  > >
>>  >
>>  >
>>  >
>>  > --
>>  > Paul Lynch
>>  > Aquilent, Inc.
>>  > National Library of Medicine (Contractor)
>>  > _______________________________________________
>>  > Ferret-talk mailing list
>>  > Ferret-talk at rubyforge.org
>>  > http://rubyforge.org/mailman/listinfo/ferret-talk
>>
>> --
>> schulte
>> _______________________________________________
>> Ferret-talk mailing list
>> Ferret-talk at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/ferret-talk
>>
>
>
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/ferret-talk/attachments/20080827/6cd9657c/attachment.html>

Eric Schulte

2008-Aug-27 18:20 UTC

head link

[Ferret-talk] Road map of ferret

Thanks for all the info, I just found a very good related discussion
from ruby-forum which I thought I''d share

http://www.ruby-forum.com/topic/137629

On Wednesday, August 27, at 11:57, arvind gautam wrote:
 > How bout Sphinx?
 > 
 > On Wed, Aug 27, 2008 at 11:20 AM, Eric Schulte <schulte.eric at
gmail.com>wrote:
 > 
 > > I would also be interested in Ferret alternatives for IR in ruby, a
 > > simple search on rubyforge returned mainly a bunch of projects that
 > > look to be abandoned...
 > >
 > > - Rise (does not appear to be actively developed)
 > > - rubylucene (looks to be a dead project)
 > > - Ruby Simple Indexer (also looks dead)
 > > - Ruby Odeum (simple ruby-bindings for a fast inverted index)
 > >
 > > If anyone knows of any ruby IR projects which are mature, and are
 > > being actively developed I would love to hear about them.
 > >
 > > Thanks -- Eric
 > >
 > > On Wednesday, August 27, at 10:29, Paul Lynch wrote:
 > >  > I''ve been using Ferret in a project still under
development, and it
 > >  > works pretty well.  As far as I can tell, the project is dying,
if not
 > >  > already dead.  David Balmain is still the only listed
developer, and
 > >  > he seems to have moved on to other things.  However, since the
 > >  > software is still meeting my project''s needs, I am not
terribly
 > >  > bothered by that.  I suppose that eventually (in a few years?)
 > >  > something will change enough that Ferret will stop working, and
then
 > >  > we''ll have to find something else.
 > >  >
 > >  > If you can find an alternative that has active development, I
would
 > >  > recommend you go with that.  (And if you find one, please post
about
 > >  > it.)  But, if you can''t, Ferret will probably be good
enough for a
 > >  > while.
 > >  >
 > >  > On Tue, Aug 19, 2008 at 3:24 PM, Fernando Parisotto
 > >  > <fernando.parisotto at gmail.com> wrote:
 > >  > > Hi all,
 > >  > >
 > >  > > I''m new on the list, and glad to participate.
 > >  > > I would like to make some questions about the ferret
project...
 > >  > > - Is the http://ferret.davebalmain.com/ official page of
the project?
 > > (I''m
 > >  > > always getting 502 Bad Gateway)
 > >  > > - Where I can find the road map of the project?
 > >  > > - In the http://rubyforge.org/projects/ferret/ I see the
last realize
 > > was in
 > >  > > November 28, 2007, that is true?
 > >  > > - Is ferret discontinued?
 > >  > >
 > >  > > Please don''t take this questions as offensive, I
really like to know
 > > about
 > >  > > how ferret is reliable for a long life product.
 > >  > > Here on my company we are planning to make a big product
with a
 > > indexing
 > >  > > engine, I would like to know if the ferret is
"alive".
 > >  > > Thanks for the answers!
 > >  > >
 > >  > > --
 > >  > > Atenciosamente - Best regards,
 > >  > >
 > >  > > Fernando Luiz Parisotto
 > >  > >
 > >  > > _______________________________________________
 > >  > > Ferret-talk mailing list
 > >  > > Ferret-talk at rubyforge.org
 > >  > > http://rubyforge.org/mailman/listinfo/ferret-talk
 > >  > >
 > >  >
 > >  >
 > >  >
 > >  > --
 > >  > Paul Lynch
 > >  > Aquilent, Inc.
 > >  > National Library of Medicine (Contractor)
 > >  > _______________________________________________
 > >  > Ferret-talk mailing list
 > >  > Ferret-talk at rubyforge.org
 > >  > http://rubyforge.org/mailman/listinfo/ferret-talk
 > >
 > > --
 > > schulte
 > > _______________________________________________
 > > Ferret-talk mailing list
 > > Ferret-talk at rubyforge.org
 > > http://rubyforge.org/mailman/listinfo/ferret-talk
 > >

-- 
schulte

Eric Schulte

2008-Aug-27 18:36 UTC

head link

[Ferret-talk] Road map of ferret

On Wednesday, August 27, at 08:34, Marvin Humphrey wrote:
 > KinoSearch is sort of a sister project to Ferret.  The dev branch  
 > implements many of the ideas that Dave Balmain and I designed together  
 > for the Lucy project.

What is the status of the Lucy project?  A ruby api into the venerable
library Lucene seems to be the obvious first step towards developing a
truly stable effective IR solution for Ruby.  The last update on the
Lucy webpage http://lucene.apache.org/lucy/ seems to be from 2006.

Also, I may be missing something obvious here, but I don''t understand
why there is no ruby API directly to the Lucene Java library, why
would the only Lucene/Ruby API be to the C-port of lucene?

Much Thanks -- Eric

-- 
schulte

Marvin Humphrey

2008-Aug-27 19:28 UTC

head link

[Ferret-talk] Road map of ferret

On Aug 27, 2008, at 11:36 AM, Eric Schulte wrote:
> What is the status of the Lucy project?
The dev branch of KinoSearch is basically Lucy.  When Dave became  
unavailable, I didn''t really have anyone else to bounce ideas off of  
for Lucy (since it was a from-scratch project without a community), so  
I returned to the established KS community -- but took the code base  
in the direction that Dave and I had worked out.

My current plan is to make an official KinoSearch release for Perl,  
write some experimental bindings for other languages, achieve  
stability, then make KinoSearch the "maint" branch and Lucy the
"dev"
branch.
> Also, I may be missing something obvious here, but I don''t
understand
> why there is no ruby API directly to the Lucene Java library,
If you want to use Lucene, just go with Solr.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Erik Hatcher

2008-Aug-28 08:54 UTC

head link

[Ferret-talk] Road map of ferret

On Aug 27, 2008, at 11:20 AM, Eric Schulte wrote:> If anyone knows of any ruby IR projects which are mature, and are
> being actively developed I would love to hear about them.
disclaimer: highly opinionated response follows.... :)

Solr is the way to go for Ruby projects*. solr-ruby, if I do say so
myself, ain''t half bad. It''s downright beautiful to interact
with
Solr via Ruby: <http://wiki.apache.org/solr/solr-ruby>. I have plenty
of wishes for where solr-ruby could still evolve, so it''s not done
yet. * pragmatically I realize that another moving piece, especially
a JVM, isn''t a good fit for many current production deployment
environments. See below for my answer to that...

Ferret is awesome, let me be clear about that! I have always loved
it''s power, even beyond Lucene Java in some cases. But I''ve
stuck
with Lucene through the tough times and it''s always been good to me.
Solr''s goodness on top of Lucene Java make it extremely compelling for
every environment, be it Ruby, Python, Java itself, what have you.
I''ve always been fonder of the JVM than native C stuff, and when
Ferret went that direction I stuck with Java.

acts_as_solr, however, hasn''t yet reached its potential - and my
little hack that kick started it wasn''t really beneficial to the
community, my apologies - since I basically "abandoned" it. But it
ain''t half bad either thanks to Thiago''s hard work, and does
make cake
work out of RDBMS <-> Solr, whereas it takes something this ugly to do
it in Java: <http://wiki.apache.org/solr/DataImportHandler> (oh Ruby
how I love you!).

Solr is incredibly powerful, beyond the features I think almost all of
the other open source search engines offer. It''s scalability evolves
almost daily, as does the pluggability capabilities of it.

And for those JRuby folks out there.... well, I guess there aren''t
(m)any of those on the ferret list, but think about the
possibilities... SolrJRuby! Wow.

Erik

John Leach

2008-Aug-28 10:11 UTC

head link

[Ferret-talk] Road map of ferret

On Wed, 2008-08-27 at 08:34 -0700, Marvin Humphrey
wrote:> 
> FWIW, I recently finished porting all module code in KinoSearch to C.   
> If we write binding code and port the test suite, it will be usable  
> from Ruby.
> 
> KinoSearch is sort of a sister project to Ferret.  The dev branch  
> implements many of the ideas that Dave Balmain and I designed together  
> for the Lucy project.
Hi Marvin,

In my experience the Ruby community is crying out for a "drop-in"
replacement for Ferret.  Sphinx is great, but different.  Xapian looks
good but doesn''t have the Ruby maturity of Ferret yet (especially
considering acts_as_ferret).  I keep coming across people using Ferret
successfully but have little niggles here and there.

Is KinoSearch something that could be a Ferret replacement?  Or the
foundations of a Ferret replacement?  What are the differences between
it and Ferret?

Out of interest, what are the differences between it and the planned
Lucy project (would be good to hear more about what your plans were for
Lucy. Maybe it''ll inspire somebody else?)

Do you happen to know if Dave is likely to work on Ferret again someday?
I think we''ve seen some commits from him recentlyish but no word
I''ve
seen. Hope all is well.

Thanks,

John.
-- 
http://johnleach.co.uk

lists.jc.michel at symetrie.com

2008-Aug-28 10:26 UTC

head link

[Ferret-talk] Road map of ferret

Hi,

Le 28 ao?t 08 ? 12:11, John Leach a ?crit :> In my experience the Ruby community is crying out for a "drop-in"
> replacement for Ferret.  Sphinx is great, but different.  Xapian looks
> good but doesn''t have the Ruby maturity of Ferret yet (especially
> considering acts_as_ferret).  I keep coming across people using Ferret
> successfully but have little niggles here and there.
The best would probably be to have some of us dig into ferret and  
help to fix the remeaining bugs!

I''d like to experiment with beanstalkd
http://xph.us/software/beanstalkd/
which - I''ve been told - is a better alternative to Drb for  
background indexing.

Still using ferret on many websites, and it''s so simple to use, why  
use something else ?

Erik Hatcher

2008-Aug-28 10:47 UTC

head link

[Ferret-talk] Road map of ferret

On Aug 27, 2008, at 3:28 PM, Marvin Humphrey wrote:> On Aug 27, 2008, at 11:36 AM, Eric Schulte wrote:
>
>>
>> Also, I may be missing something obvious here, but I don''t
understand
>> why there is no ruby API directly to the Lucene Java library,
Mainly because Ruby has been too slow to have something pure.  Ferret  
is about as close as it gets to Lucene Java compatibility, and really  
only diverged from the file format because of wise practical reasons.

>> If you want to use Lucene, just go with Solr.
+1

Solr is great in Ruby environments to.  Really it is.  Sure, there''s  
this JVM beast, and deployment issues, and all that, but they  
generally aren''t that painful.  And the benefits are totally worth it.

	Erik

Jens Kraemer

2008-Aug-28 13:52 UTC

head link

[Ferret-talk] Road map of ferret

Hi!

On 27.08.2008, at 20:20, Eric Schulte wrote:> Thanks for all the info, I just found a very good related discussion
> from ruby-forum which I thought I''d share
>
> http://www.ruby-forum.com/topic/137629
well, in this discussion there''s (besides some useful information)  
some pretty biased statements from several people who obviously must  
have had a frustrating time with Ferret, or just didn''t get it working
right out of the box and decided it was cheaper to make their clients  
switch search technology (and possibly losing features) than to fix  
their deployment. I never had somebody from engine yard contact me  
regarding their massive ferret deployment problems, not sure how hard  
they really tried to get over them.

Imho it''s not very likely that it''s Ferret''s fault
that, while all
around the world people are running ferret based apps fine, *every*  
client of engine yard experiences the same set of problems...

So here''s my very own biased opinion just to complete the picture :)

I use Ferret in several productive projects with several customers,  
and also choose it for new projects like the soon-to-be-released new  
full text search for the german selfhtml.org portal or the search  
feature at www.fahrrad-xxl.de, which tightly integrates aaf with rdig  
(shameless plug: selfhtml.org search will be powered by Stellr [1] ;-).

I have absolutely no problem with Ferret not being very actively  
maintained, because it works for me just like it is. Honestly, I  
*never* had ferret segfault in any one one of my own production apps.  
(But I admit I saw it segfault in other places, maybe I just don''t do  
the right things to make it crash...)

So why do I stick to Ferret while others declare it a ''dead''
project?
Ferret''s flexibility and feature set plus the level of Rails  
integration it offers by means of aaf is very unlikely to be reached  
by any other combination of search engine lib + Rails plugin in the  
near future.
Having that said, I''m really interested how the KinoSearch/Lucy stuff  
will go on...

Solr, while being an interesting project without doubt, won''t ever  
reach the level of Rails integration that''s possible with  
acts_as_ferret, simply because it''s server doesn''t run in the
context
of the rails app with model classes and all that stuff. It''s an  
independent server indexing whatever you throw over the fence via http 
+xml. That framework independence is a great plus under some  
circumstances (and my Stellr project scratches exactly that itch in a  
much more lightweight and undoubtedly less scalable manner), but  
sometimes it''s also a bad thing.

How to use a custom analyzer with solr? You have to code it in Java  
(or you do your analysis before feeding the data into java land, which  
I wouldn''t consider good app design). But even if you do that then you
have
a) half a java project (I don''t want that)
and b) no way to use your existing rails classes in that custom  
analyzer (I *have* analyzers using rails models to retrieve synonyms  
and narrower terms for thesaurus based query expansion)

Not to speak of Sphinx here, which offers even less integration with  
your Rails application because it''s tied directly to the database and  
doesn''t support stuff like real incremental indexing. It''s
easy to be
several times faster when you leave out most of the features...

Of course there are lots of use cases where Sphinx or Solr are  
perfectly valid choices, because their feature set suits the  
requirements and/or you''re comfortable with running a servlet  
container in your production env and spreading your application logic  
across several languages.

Here''s what I would do *if* I experienced severe problems with Ferret  
in any of my projects:

Take aaf, replace Ferret with Lucene or even make it modular to decide  
at run time which one to use, run the DRb server (or the whole app,  
that depends) under JRuby and call it acts_as_lucene :-)
Et voila - great Rails integration plus Lucene''s maturity. But as long
as Ferret''s working fine for me that''s really unlikely to
happen...
Unless somebody wants to sponsor that project, of course ;)

Cheers,
Jens

[1] http://rubyforge.org/projects/stellr

--
Jens Kr?mer
Finkenlust 14, 06449 Aschersleben, Germany
VAT Id DE251962952
http://www.jkraemer.net/ - Blog
http://www.omdb.org/     - The new free film database

Erik Hatcher

2008-Aug-28 15:17 UTC

head link

[Ferret-talk] Road map of ferret

On Aug 28, 2008, at 9:52 AM, Jens Kraemer wrote:> So here''s my very own biased opinion just to complete the picture
:)
Hey, software should be opinionated!   That''s totally fair :)
> (shameless plug: selfhtml.org search will be powered by Stellr  
> [1] ;-).
Stellr - great name.  Interesting... that''s pretty sweet.
> Solr, while being an interesting project without doubt, won''t ever
> reach the level of Rails integration that''s possible with  
> acts_as_ferret, simply because it''s server doesn''t run in
the
> context of the rails app with model classes and all that stuff.
What advantage does Ferret have in terms of ActiveRecord integration  
that Solr wouldn''t have?

If you''re talking about custom analyzers being in Ruby, more on that  
below.
> It''s an independent server indexing whatever you throw over the  
> fence via http+xml.
Solr can index CSV as well now a relational database directly (with  
the new DataImportHandler).

It also responds with Ruby hash structure (just add &wt=ruby to the  
URLs, or use solr-ruby which does that automatically and hides all  
server communication from you anyway).
> How to use a custom analyzer with solr? You have to code it in Java  
> (or you do your analysis before feeding the data into java land,  
> which I wouldn''t consider good app design).
Most users would not need to write a custom analyzer.  Many of the  
built-in ones are quite configurable.  Yes, Solr does require schema  
configuration via an XML file, but there have been acts_as_solr  
variants (good and bad thing about this git craze) that generate that  
for you automatically from an AR model.
> But even if you do that then you have
> a) half a java project (I don''t want that)
That''s totally fair, and really the primary compelling reason for a  
Ferret over Solr for pure Ruby/Rails projects.  I dig that.

But isn''t Ferret is like 60k lines of C code too?!
> and b) no way to use your existing rails classes in that custom  
> analyzer (I *have* analyzers using rails models to retrieve synonyms  
> and narrower terms for thesaurus based query expansion)
You could leverage client-side query expansion with Solr... just take  
the users query, massage it, and send whatever query you like to  
Solr.   Solr also has synonym and stop word capability too.

However, there is also no reason (and I have this on my copious-free- 
time-TOOD-list) that JRuby couldn''t be used behind the scenes of a  
Solr analyzer/tokenizer/filter or even request handler... and do all  
the cool Ruby stuff you like right there.  Heck, you could even send  
the Ruby code over to Solr to execute there if you like ;)
> Here''s what I would do *if* I experienced severe problems with  
> Ferret in any of my projects:
>
> Take aaf, replace Ferret with Lucene or even make it modular to  
> decide at run time which one to use, run the DRb server (or the  
> whole app, that depends) under JRuby and call it acts_as_lucene :-)
> Et voila - great Rails integration plus Lucene''s maturity. But as
> long as Ferret''s working fine for me that''s really
unlikely to
> happen... Unless somebody wants to sponsor that project, of course ;)
Just using Solr and fixing up acts_as_solr to meet your needs (if it  
doesn''t) would be even easier than all that :)  Solr really is a  
better starting point than Lucene directly, for caching, scalability,  
replication, faceting, etc.

I''d be curious to see scalability comparisons between Ferret and Solr  
- or perhaps more properly between Stellr and Solr - as it boils down  
to number of documents, queries per second, and faceting and  
highlighting speed.  I''m betting on Solr myself (by being so into it  
and basing my professional life on it).

	Erik

Sheldon Maloff

2008-Aug-28 15:19 UTC

head link

[Ferret-talk] Road map of ferret

That is one awesome rebuttal, Jens. I read that forum topic below, and  
while I have a great respect for Ezra (from his fine book Deploying  
Rails Applications), I must say I disagree with him with respect to  
Ferret/AAF combination.

We run Ferret/AAF as a DRb server in production and on our staging  
servers and I''ve never seen a Ferret segfault. That said,
we''re not
high search load like Google, but even when hit with heavy load  
testing, I haven''t experienced a Ferret segfault, nor corrupt indexes.

Now, corrupt indexes in development is another issue. In development,  
you are not running a DRb server. Each mongrel is hitting the index  
directly. You typically have only one mongrel running in development.  
But if you open an interactive script/console session, and play with  
your models side-by-side a running mongrel, you WILL corrupt your  
Ferret index. That''s because both the mongrel and the script/console  
will be writers to the same index, something that Ferret doesn''t  
support. Heck, running a rake db:migrate along side a running mongrel  
will cause index corruption, for the same reason: multiple writers.

I''m wondering if that''s why so many people experience Ferret
indexing
problems in development? It''s not immediately obvious that
you''re in a
multiple writer scenario some times.

For now, I''m sticking with the Ferret/AAF combination until one or the
other falls over completely.

Sheldon Maloff
Developer
http://ideas.veer.com

On 08-Aug-28, at 7:52 AM, Jens Kraemer wrote:
> Hi!
>
> On 27.08.2008, at 20:20, Eric Schulte wrote:
>> Thanks for all the info, I just found a very good related discussion
>> from ruby-forum which I thought I''d share
>>
>> http://www.ruby-forum.com/topic/137629
>
> well, in this discussion there''s (besides some useful information)
> some pretty biased statements from several people who obviously must  
> have had a frustrating time with Ferret, or just didn''t get it  
> working right out of the box and decided it was cheaper to make  
> their clients switch search technology (and possibly losing  
> features) than to fix their deployment. I never had somebody from  
> engine yard contact me regarding their massive ferret deployment  
> problems, not sure how hard they really tried to get over them.

Marvin Humphrey

2008-Aug-28 16:24 UTC

head link

[Ferret-talk] Road map of ferret

On Aug 28, 2008, at 3:11 AM, John Leach wrote:
> Is KinoSearch something that could be a Ferret replacement?
Yes.  The projects are roughly comparable.

I''d be happier if Ferret''s ultimate successor was named
"Lucy",
though, because then more credit would flow to Dave.
> What are the differences between it and Ferret?
 From a high level, they''re pretty similar.  Analyzer, QueryParser,  
IndexReader, and all that.

There are superficial differences in the implementations of individual  
classes.  For instance, Ferret provides several different Tokenizer  
classes; KinoSearch provides one, based on a regex pattern matching  
one token.

     # KinoSearch version of WhiteSpaceTokenizer
     tokenizer = Tokenizer.new(:pattern => "\\S+")

At a low level, things start to diverge.  For instance, all metadata  
in the KinoSearch index file format is encoded as JSON, so it''s human- 
readable for easy spelunking and debugging.  Also, it''s easier to  
override methods in KinoSearch, so you can do things like implement  
SearchServer/SearchClient or MockScorer or KSx::Highlight::Summarizer  
in pure Perl; I believe the mechanism will work similarly with Ruby  
bindings.
> what are the differences between it and the planned Lucy project
Personally, I think of them as the same project.  KinoSearch is at  
version 0.x and will soon become version 1.0.  Lucy will be version 2  
-- KinoSearch''s successor.

Lucy has never had a high-level API -- the work Dave and I did was all  
on the low-level core.  That core has now been fully implemented in  
the KinoSearch dev branch.

What happens between version 1 and 2 depends on how the rollout of  
version 1 goes.
> Do you happen to know if Dave is likely to work on Ferret again  
> someday?
I know he would like to.  However, I hope to persuade him to return to  
his work on Lucy.  :)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Jens Kraemer

2008-Aug-28 17:02 UTC

head link

[Ferret-talk] Road map of ferret

On 28.08.2008, at 17:17, Erik Hatcher wrote:
> On Aug 28, 2008, at 9:52 AM, Jens Kraemer wrote:
>>
>> Solr, while being an interesting project without doubt, won''t
ever
>> reach the level of Rails integration that''s possible with  
>> acts_as_ferret, simply because it''s server doesn''t
run in the
>> context of the rails app with model classes and all that stuff.
>
> What advantage does Ferret have in terms of ActiveRecord integration  
> that Solr wouldn''t have?
>
> If you''re talking about custom analyzers being in Ruby, more on
that
> below.
It''s not only custom analyzers, but the fact that
acts_as_ferret''s DRb
runs with the full Rails application loaded, so i.e. to bulk index a  
number of records aaf just hands the server the ids and class name of  
the records to index, and the server does the rest. It''s debatable if  
one approach is better than the other, in terms of index server load  
it might even be better to do as much as possible on the client side,  
but still it''s a much tighter coupling than you get with the  
application agnostic interfaces of solr or stellr.

I must admit that I have a hard time to come up with another example  
besides my synonym/thesaurus analysis stuff where this might useful,  
but I think there are more use cases where such a tight integration  
might come in handy.
>> It''s an independent server indexing whatever you throw over
the
>> fence via http+xml.
>
> Solr can index CSV as well now a relational database directly (with  
> the new DataImportHandler).
>
> It also responds with Ruby hash structure (just add &wt=ruby to the  
> URLs, or use solr-ruby which does that automatically and hides all  
> server communication from you anyway).
Yeah, I know, but anyway there is a strict line between your  
application and Solr, which doesn''t know a thing about the application
using it.
>> How to use a custom analyzer with solr? You have to code it in Java  
>> (or you do your analysis before feeding the data into java land,  
>> which I wouldn''t consider good app design).
>
> Most users would not need to write a custom analyzer.  Many of the  
> built-in ones are quite configurable.  Yes, Solr does require schema  
> configuration via an XML file, but there have been acts_as_solr  
> variants (good and bad thing about this git craze) that generate  
> that for you automatically from an AR model.
Glad you mentioned this ;) I don''t want to configure an analyzer via  
xml when I can throw my own together with 4 or 5 lines of easy to read  
ruby code. Same for index structure. Philosophical mismatch between  
the Java and Ruby worlds I think :)
>> But even if you do that then you have
>> a) half a java project (I don''t want that)
>
> That''s totally fair, and really the primary compelling reason for
a
> Ferret over Solr for pure Ruby/Rails projects.  I dig that.
>
> But isn''t Ferret is like 60k lines of C code too?!
true, but I don''t have to compile that every time I deploy my app...
>> and b) no way to use your existing rails classes in that custom  
>> analyzer (I *have* analyzers using rails models to retrieve  
>> synonyms and narrower terms for thesaurus based query expansion)
>
> You could leverage client-side query expansion with Solr... just  
> take the users query, massage it, and send whatever query you like  
> to Solr. Solr also has synonym and stop word capability too.
yeah, I could do that. But that''s moving analysis stuff into my  
application, which is quite contrary to the purpose of analyzers -  
encapsulate this logic and make it pluggable into the search engine  
library. So less style points for this solution...
> However, there is also no reason (and I have this on my copious-free- 
> time-TOOD-list) that JRuby couldn''t be used behind the scenes of a
> Solr analyzer/tokenizer/filter or even request handler... and do all  
> the cool Ruby stuff you like right there.  Heck, you could even send  
> the Ruby code over to Solr to execute there if you like ;)
that sounds sexy ;)
>> Here''s what I would do *if* I experienced severe problems with
>> Ferret in any of my projects:
>>
>> Take aaf, replace Ferret with Lucene or even make it modular to  
>> decide at run time which one to use, run the DRb server (or the  
>> whole app, that depends) under JRuby and call it acts_as_lucene :-)
>> Et voila - great Rails integration plus Lucene''s maturity. But
as
>> long as Ferret''s working fine for me that''s really
unlikely to
>> happen... Unless somebody wants to sponsor that project, of course ;)
>
> Just using Solr and fixing up acts_as_solr to meet your needs (if it  
> doesn''t) would be even easier than all that :)  Solr really is a  
> better starting point than Lucene directly, for caching,  
> scalability, replication, faceting, etc.
Depends on whether you need these features or not. From my experience,  
lots of projects don''t need these things anyway, because
they''re
running on a single host and nearly every other part of the  
application is slower than search... Maybe it''s because I''m
quite
involved with the topic and am familiar with lucene''s API, but to me  
Solr looks like an additional layer of abstraction and complexity  
which I only want to have when it really gives me a feature I need.  
Plus the last time I checked Lucene didn''t need xml configuration  
files ;)

In development environments and especially when it comes to automated  
tests / CI it''s also quite comfortable not having to run a separate  
server but using the short cut directly to the index, which isn''t  
possible with Solr.
> I''d be curious to see scalability comparisons between Ferret and  
> Solr - or perhaps more properly between Stellr and Solr - as it  
> boils down to number of documents, queries per second, and faceting  
> and highlighting speed.  I''m betting on Solr myself (by being so  
> into it and basing my professional life on it).
This would be interesting, but I wouldn''t be that disappointed with  
Stellr ending up second given the little amount of time I''ve spent  
building it so far. Just out of curiosity, do you have some kind of  
performance testing suite for Solr which I could throw at Stellr?


Cheers,
Jens

--
Jens Kr?mer
Finkenlust 14, 06449 Aschersleben, Germany
VAT Id DE251962952
http://www.jkraemer.net/ - Blog
http://www.omdb.org/     - The new free film database

Jens Krämer

2008-Aug-28 17:10 UTC

head link

[Ferret-talk] Road map of ferret

Hi!

On 28.08.2008, at 18:24, Marvin Humphrey wrote:
[..]> There are superficial differences in the implementations of  
> individual classes.  For instance, Ferret provides several different  
> Tokenizer classes; KinoSearch provides one, based on a regex pattern  
> matching one token.
>
>    # KinoSearch version of WhiteSpaceTokenizer
>    tokenizer = Tokenizer.new(:pattern => "\\S+")
That''s pretty simple ;) With Ferret I can use custom tokenizers to  
inject additional terms at the same offset (i.e., synonyms), is there  
another way to achieve that with KinoSearch?

[..]>> Do you happen to know if Dave is likely to work on Ferret again  
>> someday?
>
> I know he would like to.  However, I hope to persuade him to return  
> to his work on Lucy.  :)
whatever, as long as it''s as powerful and easy to use as Ferret and  
has ruby bindings I''m all for it :)

Cheers,
Jens

--
Jens Kr?mer
webit! Gesellschaft f?r neue Medien mbH
Schnorrstra?e 76 | 01069 Dresden
Telefon +49351467660 | Telefax +493514676666
kraemer at webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold

Marvin Humphrey

2008-Aug-28 17:56 UTC

head link

[Ferret-talk] Road map of ferret

On Aug 28, 2008, at 10:10 AM, Jens Kr?mer wrote:
> With Ferret I can use custom tokenizers to inject additional terms  
> at the same offset (i.e., synonyms), is there another way to achieve  
> that with KinoSearch?
Synonym support isn''t part of the public API right now, but since the  
basic principle is the same in KinoSearch as it is in Ferret and  
Lucene, it shouldn''t be hard to add.

I don''t think we''d do this by extending Tokenizer; I think
we''d want
SynonymFilter/SynonymMap classes akin to the ones provided by Solr.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Erik Hatcher

2008-Aug-28 18:03 UTC

head link

[Ferret-talk] Road map of ferret

On Aug 28, 2008, at 1:02 PM, Jens Kraemer wrote:>> What advantage does Ferret have in terms of ActiveRecord  
>> integration that Solr wouldn''t have?
>>
>> If you''re talking about custom analyzers being in Ruby, more
on
>> that below.
>
> It''s not only custom analyzers, but the fact that
acts_as_ferret''s
> DRb runs with the full Rails application loaded, so i.e. to bulk  
> index a number of records aaf just hands the server the ids and  
> class name of the records to index, and the server does the rest.
Gotcha.  Meaning the search server is pulling from the DB directly.   
That''s what the DataImportHandler in Solr does as well.  It''d
be a
simple single HTTP request to Solr (once the DB stuff is configured,  
of course) to have it do full or incremental DB indexing.
>>>
>>> How to use a custom analyzer with solr? You have to code it in  
>>> Java (or you do your analysis before feeding the data into java  
>>> land, which I wouldn''t consider good app design).
>>
>> Most users would not need to write a custom analyzer.  Many of the  
>> built-in ones are quite configurable.  Yes, Solr does require  
>> schema configuration via an XML file, but there have been  
>> acts_as_solr variants (good and bad thing about this git craze)  
>> that generate that for you automatically from an AR model.
>
> Glad you mentioned this ;) I don''t want to configure an analyzer
via
> xml when I can throw my own together with 4 or 5 lines of easy to  
> read ruby code. Same for index structure. Philosophical mismatch  
> between the Java and Ruby worlds I think :)
Don''t get me wrong... I''m a Ruby fanatic myself!   XML makes
me ill,
generally speaking (it has its uses, but for configuration it is just  
plain wrong).

For using the built-in tokenizer/filters, a smarter acts_as_solr could  
generate the right config based on a model specifying parameters for  
analysis.
>>> But even if you do that then you have
>>> a) half a java project (I don''t want that)
>>
>> That''s totally fair, and really the primary compelling reason
for a
>> Ferret over Solr for pure Ruby/Rails projects.  I dig that.
>>
>> But isn''t Ferret is like 60k lines of C code too?!
>
> true, but I don''t have to compile that every time I deploy my
app...
My point was that Ferret isn''t just Ruby, just a counter point to your
"half a java project".  No one has to recompile Solr either.
>>> and b) no way to use your existing rails classes in that custom  
>>> analyzer (I *have* analyzers using rails models to retrieve  
>>> synonyms and narrower terms for thesaurus based query expansion)
>>
>> You could leverage client-side query expansion with Solr... just  
>> take the users query, massage it, and send whatever query you like  
>> to Solr. Solr also has synonym and stop word capability too.
>
> yeah, I could do that. But that''s moving analysis stuff into my  
> application, which is quite contrary to the purpose of analyzers -  
> encapsulate this logic and make it pluggable into the search engine  
> library. So less style points for this solution...
I was just saying :)   It''s debatable exactly where in the client- 
server spectrum synonym expansion belongs... and it really depends on  
the needs of the project.  Nothing wrong with a client doing some user  
input massaging before a query hits the search server.
>> However, there is also no reason (and I have this on my copious- 
>> free-time-TOOD-list) that JRuby couldn''t be used behind the
scenes
>> of a Solr analyzer/tokenizer/filter or even request handler... and  
>> do all the cool Ruby stuff you like right there.  Heck, you could  
>> even send the Ruby code over to Solr to execute there if you like ;)
>
> that sounds sexy ;)
Should be fairly trivial to wire JRuby in.  The DataImportHandler  
already has scripting language support for data transformation:
<http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
 > (shield your eyes from the XML wrapping it!), so I believe JRuby  
should already work in that context.  This is sort of like the Mapper  
stuff I built into solr-ruby, transforming data from domain to search  
engine "documents".
>>> Here''s what I would do *if* I experienced severe problems
with
>>> Ferret in any of my projects:
>>>
>>> Take aaf, replace Ferret with Lucene or even make it modular to  
>>> decide at run time which one to use, run the DRb server (or the  
>>> whole app, that depends) under JRuby and call it acts_as_lucene :-)
>>> Et voila - great Rails integration plus Lucene''s maturity.
But as
>>> long as Ferret''s working fine for me that''s
really unlikely to
>>> happen... Unless somebody wants to sponsor that project, of  
>>> course ;)
>>
>> Just using Solr and fixing up acts_as_solr to meet your needs (if  
>> it doesn''t) would be even easier than all that :)  Solr really
is a
>> better starting point than Lucene directly, for caching,  
>> scalability, replication, faceting, etc.
>
> Depends on whether you need these features or not. From my  
> experience, lots of projects don''t need these things anyway,
because
> they''re running on a single host and nearly every other part of
the
> application is slower than search... Maybe it''s because
I''m quite
> involved with the topic and am familiar with lucene''s API, but to
me
> Solr looks like an additional layer of abstraction and complexity  
> which I only want to have when it really gives me a feature I need.  
> Plus the last time I checked Lucene didn''t need xml configuration
> files ;)
I hear ya about the XML config files.  And always to be fair to Solr  
here, you really only need to set things up from a basic example  
configuration that covers most scenarios already - so it really isn''t  
necessary to even touch XML config except for tweaking little things.

But Solr''s advantages over just Lucene are built out of experiences  
that most Lucene projects eventually build anyway.  Caching - really  
important for faceting, which is a need that every project I touch  
these days needs.  Replication - really really important for  
scalability of massive querying load.   It''s really not such a big  
chunk over Lucene to bite off... and in almost all respects it is even  
simpler to use Solr than Lucene anyway.
> In development environments and especially when it comes to  
> automated tests / CI it''s also quite comfortable not having to run
a
> separate server but using the short cut directly to the index, which  
> isn''t possible with Solr.
Not true.  Solr can work embedded.  There is a base SolrServer  
abstraction, with an implementation that runs embedded (inside the  
same JVM) versus over HTTP.  Exactly the same interface for both  
operations, using a very simple API (SolrJ, much like Lucene''s basic  
API actually).
>> I''d be curious to see scalability comparisons between Ferret
and
>> Solr - or perhaps more properly between Stellr and Solr - as it  
>> boils down to number of documents, queries per second, and faceting  
>> and highlighting speed.  I''m betting on Solr myself (by being
so
>> into it and basing my professional life on it).
>
> This would be interesting, but I wouldn''t be that disappointed
with
> Stellr ending up second given the little amount of time I''ve spent
> building it so far. Just out of curiosity, do you have some kind of  
> performance testing suite for Solr which I could throw at Stellr?
No, I don''t have those kinds of tests myself.   While I can speak to  
Solr''s performance based on what I hear from our clients and the  
reports in the mailing lists, I don''t consider myself a performance  
savvy person myself.

I''m curious - what are the numbers of documents being put into Ferret  
indexes out there?   millions?   hundreds of millions?  billions?  And  
are folks doing faceting?  Does Ferret have faceting support?

	Erik

Jens Kraemer

2008-Aug-28 19:02 UTC

head link

[Ferret-talk] Road map of ferret

On 28.08.2008, at 20:03, Erik Hatcher wrote:
>
> On Aug 28, 2008, at 1:02 PM, Jens Kraemer wrote:
>>> What advantage does Ferret have in terms of ActiveRecord  
>>> integration that Solr wouldn''t have?
>>>
>>> If you''re talking about custom analyzers being in Ruby,
more on
>>> that below.
>>
>> It''s not only custom analyzers, but the fact that
acts_as_ferret''s
>> DRb runs with the full Rails application loaded, so i.e. to bulk  
>> index a number of records aaf just hands the server the ids and  
>> class name of the records to index, and the server does the rest.
>
> Gotcha.  Meaning the search server is pulling from the DB directly.   
> That''s what the DataImportHandler in Solr does as well. 
It''d be a
> simple single HTTP request to Solr (once the DB stuff is configured,  
> of course) to have it do full or incremental DB indexing.
With the slight difference that custom model logic defined in the  
rails model class is still involved to preprocess data, index values  
calculated at indexing time or even have certain records refuse being  
indexed based on their current state. Having per document boosts  
depending on some value from the database (i.e. record popularity) is  
also a classic... Aaf never just pulls data from the db, it always  
uses rails model objects. Doesn''t make indexing faster of course...

[..]> XML makes me ill, generally speaking (it has its uses, but for  
> configuration it is just plain wrong).
FULL ACK :)
> For using the built-in tokenizer/filters, a smarter acts_as_solr  
> could generate the right config based on a model specifying  
> parameters for analysis.
>
>>>> But even if you do that then you have
>>>> a) half a java project (I don''t want that)
>>>
>>> That''s totally fair, and really the primary compelling
reason for
>>> a Ferret over Solr for pure Ruby/Rails projects.  I dig that.
>>>
>>> But isn''t Ferret is like 60k lines of C code too?!
>>
>> true, but I don''t have to compile that every time I deploy my
app...
>
> My point was that Ferret isn''t just Ruby, just a counter point to
> your "half a java project".  No one has to recompile Solr either.
but the custom analyzer implemented in Java... By saying ''half a java  
project'' I didn''t mean solr, but the parts of my application
logic
that have to be implemented in Java in order to be plugged into solr.  
But the JRuby route looks promising here of course.
>>>> and b) no way to use your existing rails classes in that custom
>>>> analyzer (I *have* analyzers using rails models to retrieve  
>>>> synonyms and narrower terms for thesaurus based query
expansion)
>>>
>>> You could leverage client-side query expansion with Solr... just  
>>> take the users query, massage it, and send whatever query you like
>>> to Solr. Solr also has synonym and stop word capability too.
>>
>> yeah, I could do that. But that''s moving analysis stuff into
my
>> application, which is quite contrary to the purpose of analyzers -  
>> encapsulate this logic and make it pluggable into the search engine  
>> library. So less style points for this solution...
>
> I was just saying :)   It''s debatable exactly where in the client-
> server spectrum synonym expansion belongs... and it really depends  
> on the needs of the project.  Nothing wrong with a client doing some  
> user input massaging before a query hits the search server.
[..]
>>>> Here''s what I would do *if* I experienced severe
problems with
>>>> Ferret in any of my projects:
>>>>
>>>> Take aaf, replace Ferret with Lucene or even make it modular to
>>>> decide at run time which one to use, run the DRb server (or the
>>>> whole app, that depends) under JRuby and call it acts_as_lucene
:-)
>>>> Et voila - great Rails integration plus Lucene''s
maturity. But as
>>>> long as Ferret''s working fine for me that''s
really unlikely to
>>>> happen... Unless somebody wants to sponsor that project, of  
>>>> course ;)
>>>
>>> Just using Solr and fixing up acts_as_solr to meet your needs (if  
>>> it doesn''t) would be even easier than all that :)  Solr
really is
>>> a better starting point than Lucene directly, for caching,  
>>> scalability, replication, faceting, etc.
>>
>> Depends on whether you need these features or not. From my  
>> experience, lots of projects don''t need these things anyway,  
>> because they''re running on a single host and nearly every
other
>> part of the application is slower than search... Maybe it''s
because
>> I''m quite involved with the topic and am familiar with
lucene''s
>> API, but to me Solr looks like an additional layer of abstraction  
>> and complexity which I only want to have when it really gives me a  
>> feature I need. Plus the last time I checked Lucene didn''t
need xml
>> configuration files ;)
>
> I hear ya about the XML config files.  And always to be fair to Solr  
> here, you really only need to set things up from a basic example  
> configuration that covers most scenarios already - so it really  
> isn''t necessary to even touch XML config except for tweaking
little
> things.
But I still have to read it in order to see if it fits my needs. Okay,  
I''ll stop whining about that xml now ;)

[..]>> In development environments and especially when it comes to  
>> automated tests / CI it''s also quite comfortable not having to
run
>> a separate server but using the short cut directly to the index,  
>> which isn''t possible with Solr.
>
> Not true.  Solr can work embedded.  There is a base SolrServer  
> abstraction, with an implementation that runs embedded (inside the  
> same JVM) versus over HTTP.  Exactly the same interface for both  
> operations, using a very simple API (SolrJ, much like Lucene''s
basic
> API actually).
cool, but that won''t work for Rails projects running on MRI and  
accessing solr via solr-ruby.
>>> I''d be curious to see scalability comparisons between
Ferret and
>>> Solr - or perhaps more properly between Stellr and Solr - as it  
>>> boils down to number of documents, queries per second, and  
>>> faceting and highlighting speed.  I''m betting on Solr
myself (by
>>> being so into it and basing my professional life on it).
>>
>> This would be interesting, but I wouldn''t be that disappointed
with
>> Stellr ending up second given the little amount of time I''ve
spent
>> building it so far. Just out of curiosity, do you have some kind of  
>> performance testing suite for Solr which I could throw at Stellr?
>
> No, I don''t have those kinds of tests myself.   While I can speak
to
> Solr''s performance based on what I hear from our clients and the  
> reports in the mailing lists, I don''t consider myself a
performance
> savvy person myself.
>
> I''m curious - what are the numbers of documents being put into  
> Ferret indexes out there?   millions?   hundreds of millions?   
> billions?  And are folks doing faceting?  Does Ferret have faceting  
> support?
not sure about the billions, but afair an earlier message in this  
thread stated an index size of 90 million documents with aaf.   
Altlaw.org has reported an index size of > 4GB with around 700k  
documents last fall. The selfhtml.org index has approximately 1  
million forum entries indexed, index size around 2GB. Stellr doesn''t  
ever use more than around 50MB of RAM during indexing and searching  
this index. I know RAM is cheap and all, but RAM size still has a  
quite large influence on the price of the server you rent for your  
app, at least here in germany.

Without doubt Solr has much more references in the area of such large  
installations than ferret/aaf. I for myself never saw aaf as a drop-in  
solution for indexes of this size, but more as an easy to use out of  
the box solution for the average rails app with maybe several   
thousands or tens of thousands records, but I''m happy to see it still  
works in larger scale setups.

Heck, it all began with a simple full text search for my blog ;)

Regarding the faceting - it''s not built into ferret, and aaf
doesn''t
support it either since I didn''t need it yet, and nobody else  
requested this feature so far. All in all I think the average usage  
scenarios of solr and aaf are quite different atm...

I''ll try to find the time to benchmark the selfhtml.org data set with  
solr and stellr. I''ll report my findings here.

Cheers,
Jens

--
Jens Kr?mer
Finkenlust 14, 06449 Aschersleben, Germany
VAT Id DE251962952
http://www.jkraemer.net/ - Blog
http://www.omdb.org/     - The new free film database

Erik Hatcher

2008-Aug-28 20:07 UTC

head link

[Ferret-talk] Road map of ferret

On Aug 28, 2008, at 3:02 PM, Jens Kraemer wrote:>> Gotcha.  Meaning the search server is pulling from the DB  
>> directly.  That''s what the DataImportHandler in Solr does as
well.
>> It''d be a simple single HTTP request to Solr (once the DB
stuff is
>> configured, of course) to have it do full or incremental DB indexing.
>
> With the slight difference that custom model logic defined in the  
> rails model class is still involved to preprocess data, index values  
> calculated at indexing time or even have certain records refuse  
> being indexed based on their current state. Having per document  
> boosts depending on some value from the database (i.e. record  
> popularity) is also a classic... Aaf never just pulls data from the  
> db, it always uses rails model objects. Doesn''t make indexing
faster
> of course...
All great points.  ActiveRecord is much more pleasant than any other  
database access that I''ve ever worked with.  I don''t generally
work
with databases personally, though.  The bulk of my full-text searching  
experiences don''t involve databases at all.

I suppose the Java counterpart would be Hibernate Search - surely  
involving a lot more hideous XML and @annotations - ewww.
>>>
>>> In development environments and especially when it comes to  
>>> automated tests / CI it''s also quite comfortable not
having to run
>>> a separate server but using the short cut directly to the index,  
>>> which isn''t possible with Solr.
>>
>> Not true.  Solr can work embedded.  There is a base SolrServer  
>> abstraction, with an implementation that runs embedded (inside the  
>> same JVM) versus over HTTP.  Exactly the same interface for both  
>> operations, using a very simple API (SolrJ, much like Lucene''s
>> basic API actually).
>
> cool, but that won''t work for Rails projects running on MRI and  
> accessing solr via solr-ruby.
Fair point.

Again, the answer comes back to JRuby ;)  Forget MRI.   Good point  
about solr-ruby - it is specifically designed for Solr over HTTP.  It  
wouldn''t take much to refactor it to work with embedded Solr via JRuby
though.  But if JRuby is a given, it''d be just as easy to work with  
SolrJ''s API directly.

Though for testing purposes, solr-ruby is easily mocked.  solr-ruby  
touts great (98% or something like that) code coverage with unit  
tests, many of those tests are against solr-ruby''s API with Solr  
itself mocked.  And there are tests that fire up Solr in the  
background and test that way too for full functional tests.   So for  
unit testing purposes, having Solr running isn''t needed, but it  
launches plenty fast enough for testing end-to-end if desired.
>> I''m curious - what are the numbers of documents being put into
>> Ferret indexes out there?   millions?   hundreds of millions?   
>> billions?  And are folks doing faceting?  Does Ferret have faceting  
>> support?
>
> not sure about the billions, but afair an earlier message in this  
> thread stated an index size of 90 million documents with aaf.   
> Altlaw.org has reported an index size of > 4GB with around 700k  
> documents last fall. The selfhtml.org index has approximately 1  
> million forum entries indexed, index size around 2GB. Stellr
doesn''t
> ever use more than around 50MB of RAM during indexing and searching  
> this index. I know RAM is cheap and all, but RAM size still has a  
> quite large influence on the price of the server you rent for your  
> app, at least here in germany.
90 million is impressive for sure.

RAM - well, when Ferret/Stellr does faceting we''ll revisit that  
discussion :)   Solr loves RAM!  It still can run in modest  
environments, but the more RAM you can give it to use for caches  
(depending on your needs) the better it is.
> Without doubt Solr has much more references in the area of such  
> large installations than ferret/aaf. I for myself never saw aaf as a  
> drop-in solution for indexes of this size, but more as an easy to  
> use out of the box solution for the average rails app with maybe  
> several  thousands or tens of thousands records, but I''m happy to
> see it still works in larger scale setups.
Indeed!  ferret: +1 - no question!
> Heck, it all began with a simple full text search for my blog ;)
Same for me (though I abandoned it when I realized that regular  
blogging and server maintenance weren''t for me).
> Regarding the faceting - it''s not built into ferret, and aaf
doesn''t
> support it either since I didn''t need it yet, and nobody else  
> requested this feature so far. All in all I think the average usage  
> scenarios of solr and aaf are quite different atm...
I''m really surprised by that.  Faceting is the major feature that  
attracts folks to Solr.  It''s critical for all of our customers.

But yeah, no question that Lucene/Solr and Ferret/Stellr can happily  
coexist and aren''t necessarily competition for every project.  But  
there definitely are those areas of overlap where a project could go  
with either solution.  And I would definitely not try to shoehorn Solr  
into a project where it didn''t fit and Ferret worked fine. 
I''m
pragmatic like that.
> I''ll try to find the time to benchmark the selfhtml.org data set  
> with solr and stellr. I''ll report my findings here.
Awesome.  If you have the data in some easily digestible format, I''d  
be happy to toss it into Solr and report back numbers from my  
development machine.  Drop me a line offline if you''d like.

	Erik

Ferret talk - Aug 2008 - Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret

[Ferret-talk] Road map of ferret