thr3ads.net - Ferret talk - [Ferret-talk] Ferret/AAF Stability? [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Sam Smoot

2007-Nov-15 14:37 UTC

[Ferret-talk] Ferret/AAF Stability?

Hello. I''m the author of DataMapper (http://datamapper.org), and am
trying to choose what Full-Text-Indexing engine/plugin I want to
include by default. I was hoping you guys could help. :-)

Sphinx comes highly recommended, but without live index updates, it
just doesn''t seem practical for most of my work.

I''m most experienced with Solr, but the whole HTTP::Request and
general complexity of it is off-putting.

I haven''t used Ferret in an application yet, but I love what I see so
far. The ability to have an in-process server in development, and the
clean Ruby API are big wins for me. But I''ve heard a lot of scary
things about corrupted indexes, even when using the DRb server. Is
this just FUD? Are there any unresolved issues revolving around
corrupted indexes? Can I afford to use Ferret in big applications for
Fortune-500 clients? (I know that sounds... pompous really, but it''s a
genuine concern.)

Any advice you could offer would be greatly appreciated.

I''ve also read a few messages about serializing index requests/updates
to Ferret through message-queues. Are there any decent
guides/blog-posts on this topic?

Thanks, -Sam

Erik Morton

2007-Nov-15 15:44 UTC

head link

[Ferret-talk] Ferret/AAF Stability?

We have several 3GB indexes with approximately 1 million documents in  
each of them. Here are some quick notes, feel free to reach out with  
other questions:

* no corruption problems that weren''t our fault.
* there was an issue with large index files (> ~2GB) that was patched,  
but I''m honestly not sure if it is in the trunk, as the ferret trac/ 
svn is frequently MIA (which is a concern of course)
* the code is clear and fairly easy to follow. AAF is very easy to  
follow.
* I''ve been very happy with performance of the actual indexing/ 
searching, however you need to watch out for the processes that are  
actually doing the synchronization for writes. DRB is a bottleneck for  
us right now, though our volume isn''t high enough that I''d
call it a
real problem yet.
* for moderately high-volume sites you''ll want to consider batching  
index updates "offline", though for large indexes make sure that you  
have enough IO capacity to optimize the index. We host on EC2 and the  
$.1/hour instances simply do not have anywhere near the IO capacity to  
optimize a large index without having _every other process_ waiting  
for IO. I haven''t tested the larger instance types yet.
* we love how easy and efficient it is to combine many indexes into  
one. We index tens of thousands of websites in parallel and then  
combine 100 or so indexes into one index very quickly.
* the mailing list is great. Jens is on top of things, very receptive  
to new ideas and takes *very* good care of AAF. Haven''t seen Dave  
Balmain in a while.

Overall we are happy. There are times when search accuracy questions  
come up, and frequently the problem is that we are not effectively  
parsing queries or using the right analyzer for the problem at hand,  
so RTFM (http://www.oreilly.com/catalog/9780596527853/).

That''s all I can think of now...

Erik
On Nov 15, 2007, at 9:37 AM, Sam Smoot wrote:
> Hello. I''m the author of DataMapper (http://datamapper.org), and
am
> trying to choose what Full-Text-Indexing engine/plugin I want to
> include by default. I was hoping you guys could help. :-)
>
> Sphinx comes highly recommended, but without live index updates, it
> just doesn''t seem practical for most of my work.
>
> I''m most experienced with Solr, but the whole HTTP::Request and
> general complexity of it is off-putting.
>
> I haven''t used Ferret in an application yet, but I love what I see
so
> far. The ability to have an in-process server in development, and the
> clean Ruby API are big wins for me. But I''ve heard a lot of scary
> things about corrupted indexes, even when using the DRb server. Is
> this just FUD? Are there any unresolved issues revolving around
> corrupted indexes? Can I afford to use Ferret in big applications for
> Fortune-500 clients? (I know that sounds... pompous really, but
it''s a
> genuine concern.)
>
> Any advice you could offer would be greatly appreciated.
>
> I''ve also read a few messages about serializing index
requests/updates
> to Ferret through message-queues. Are there any decent
> guides/blog-posts on this topic?
>
> Thanks, -Sam
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk

Benjamin Krause

2007-Nov-15 18:41 UTC

head link

[Ferret-talk] Ferret/AAF Stability?

Hey ..
> I haven''t used Ferret in an application yet, but I love what I see
so
> far. The ability to have an in-process server in development, and the
> clean Ruby API are big wins for me. But I''ve heard a lot of scary
> things about corrupted indexes, even when using the DRb server. Is
> this just FUD? Are there any unresolved issues revolving around
> corrupted indexes? Can I afford to use Ferret in big applications for
> Fortune-500 clients? (I know that sounds... pompous really, but
it''s a
> genuine concern.)
We''re using ferret on omdb.org for 14 month without any problems.
There''re a few things you might want to work around (Erik pointed
some out). If you expect a huge amount of index updates, you need
to think about a few infrastructural problems, because right now, AAF
does not allow you to cluster indexing servers. but i know there is a
solution for that :)

If you just have huge amount of search queries, there is no need
to worry.. i would not suggest usings AAF''s ferret server for
searching,
though .. but it''s quite easy to do the searching in each mongrel, so
not concern here either.

i guess we need more information about the data you want to index
to give more detailed advices.
> I''ve also read a few messages about serializing index
requests/updates
> to Ferret through message-queues. Are there any decent
> guides/blog-posts on this topic?
yes, that''s currently being worked on .. so there will be some guides
later on :)

Cheers
  Ben
---
Benjamin Krause
http://www.omdb.org/
bk at benjaminkrause.com

Rails-Schulung "Advancing with Rails" mit David A. Black
19.11.-22.11.2007, Berlin-Mitte
Details u. Anmeldung: http://www.railsschulung.de


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/ferret-talk/attachments/20071115/c58d15dc/attachment.html

John Bachir

2007-Nov-15 19:00 UTC

head link

[Ferret-talk] Ferret/AAF Stability?

On Nov 15, 2007, at 1:41 PM, Benjamin Krause wrote:>  i would not suggest usings AAF''s ferret server for searching,
> though .. but it''s quite easy to do the searching in each mongrel,
so
> not concern here either.
I''m confused... what does "searching" mean in this context?
:)

John

Benjamin Krause

2007-Nov-15 20:04 UTC

head link

[Ferret-talk] Ferret/AAF Stability?

John,
> On Nov 15, 2007, at 1:41 PM, Benjamin Krause wrote:
>> i would not suggest usings AAF''s ferret server for searching,
>> though .. but it''s quite easy to do the searching in each
mongrel, so
>> not concern here either.
>
> I''m confused... what does "searching" mean in this
context? :)
If you''re using AAF, you should use the ferret drb server to index
your objects. however, using the ferret server means, whenever
someone is search (if you''re using Model.find_by_contents)
the search will be forwarded to the ferret server.

The ferret server will process the searching request and send
the response back to the mongrel. This overhead isn''t
necessary, as mongrel could use a local index to do the
search. there is no need to bother the ferret server.

so, indexing (aka updating, creating, saving, whatever) should
use the ferret server, but searching (using find_by_contents)
will use the ferret server if you''re using standard AAF, even
though it''s not really necessary and could result in a bottleneck.

don''t get me wrong. it is totally fine to use standard AAF, unless
you''re having huge amounts of searches or livesearches. I would
not recommend use a custom ferret solution, unless you
expect a problem or already have one :)

Cheers
  Ben

---
Benjamin Krause
http://www.omdb.org/
bk at benjaminkrause.com

Rails-Schulung "Advancing with Rails" mit David A. Black
19.11.-22.11.2007, Berlin-Mitte
Details u. Anmeldung: http://www.railsschulung.de

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/ferret-talk/attachments/20071115/122a7037/attachment.html

Stuart Sierra

2007-Nov-16 17:19 UTC

head link

[Ferret-talk] Ferret/AAF Stability?

On Nov 15, 2007 9:37 AM, Sam Smoot <ssmoot at gmail.com>
wrote:> Hello. I''m the author of DataMapper (http://datamapper.org), and
am
> trying to choose what Full-Text-Indexing engine/plugin I want to
> include by default. I was hoping you guys could help. :-)
>
> Sphinx comes highly recommended, but without live index updates, it
> just doesn''t seem practical for most of my work.
>
> I''m most experienced with Solr, but the whole HTTP::Request and
> general complexity of it is off-putting.
For a different perspective: I''m in the middle of switching from
Ferret to Solr.  I like Ferret a lot, and still use it on several
sites, but I had some problems with one large site:

1. the patches for large-index support are still in development;
2. each update to Ferret requires rebuilding the index;
3. Ferret doesn''t yet support compressed indexes.

My other reason for switching is that Rails'' ActiveRecord is not
well-suited to storing large documents, which made acts_as_ferret less
compelling.

I was nervous about tackling Solr, but I''ve found it quite easy to
use, and the built-in caching and multithreading make it fast.

I think Ferret is adequate for most search tasks, but if (like me)
you''re building a dedicated search engine, Solr is currently a
stronger candidate.

-Stuart Sierra

Jens Kraemer

2007-Nov-17 12:39 UTC

head link

[Ferret-talk] Ferret/AAF Stability?

Hi!

On Fri, Nov 16, 2007 at 12:19:10PM -0500, Stuart Sierra wrote:
[..]> For a different perspective: I''m in the middle of switching from
> Ferret to Solr.  I like Ferret a lot, and still use it on several
> sites, but I had some problems with one large site:
> 
> 1. the patches for large-index support are still in development;
Let''s hope Dave reads this ;-) However there are several sites I know
of
with Index sizes > several GB, so they seem to be working well enough.
> 2. each update to Ferret requires rebuilding the index;
This for sure is annoying but I''d consider this normal for a library
that has developed that fast. I think Dave has had very good reasons for each
of the changes he did to the index format. Plus I don''t think *every*
release had a new index format ;-)
> 3. Ferret doesn''t yet support compressed indexes.
At least from the docs it looks like it does, see
http://ferret.davebalmain.com/api/classes/Ferret/Index/FieldInfo.html .
I didn''t ever try this out however.
> My other reason for switching is that Rails'' ActiveRecord is not
> well-suited to storing large documents, which made acts_as_ferret less
> compelling.
That''s a good point, and we plan to make aaf independent from
active_record in the future. 
> I was nervous about tackling Solr, but I''ve found it quite easy to
> use, and the built-in caching and multithreading make it fast.
numbers, please :-)
> I think Ferret is adequate for most search tasks, but if (like me)
> you''re building a dedicated search engine, Solr is currently a
> stronger candidate.
Well, As Solr uses Lucene internally, the mechanics and performance
characteristics naturally can''t be that different from Ferret. Maybe
Ferret has a bug or two and a non-working inter-process locking (which
doesn''t matter when you think about building a dedicated search server
like Solr is, since it''s only one process), but the general internal
handling of the index is the same, i.e. you can also only have one
Writer open to a Lucene index at a time, and Searchers won''t see index
changes until re-opened, too.

Having that said, if my application''s main concern would be search, I
most probably wouldn''t choose any pre-cooked solution like aaf or Solr,
but build exactly the thing I need from scratch, basing it either on
Lucene or Ferret. But maybe that''s just me ;-)


Cheers,
Jens


-- 
Jens Kr?mer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/     - The new free film database

Erik Hatcher

2007-Nov-18 10:24 UTC

head link

[Ferret-talk] Ferret/AAF Stability?

On Nov 17, 2007, at 7:39 AM, Jens Kraemer wrote:>> I think Ferret is adequate for most search tasks, but if (like me)
>> you''re building a dedicated search engine, Solr is currently a
>> stronger candidate.
>
> Well, As Solr uses Lucene internally, the mechanics and performance
> characteristics naturally can''t be that different from Ferret.
Maybe
> Ferret has a bug or two and a non-working inter-process locking (which
> doesn''t matter when you think about building a dedicated search
server
> like Solr is, since it''s only one process), but the general
internal
> handling of the index is the same, i.e. you can also only have one
> Writer open to a Lucene index at a time, and Searchers won''t see
index
> changes until re-opened, too.
That''s all true.  However, Solr manages all the IndexWriter/ 
IndexSearcher stuff for you quite transparently (which I guess is  
comparable to Ferret + DRb, eh?).  Because it is a single point of  
access to the index, it takes care of the single writer situation,  
and also handles warming IndexSearchers before coming online so that  
caches are built and a search on an updated index is as fast as it  
was before being updated.
> Having that said, if my application''s main concern would be
search, I
> most probably wouldn''t choose any pre-cooked solution like aaf or
> Solr,
> but build exactly the thing I need from scratch, basing it either on
> Lucene or Ferret. But maybe that''s just me ;-)
You''d be reinventing a lot of wheels doing that, with IndexWriter  
synchronization, IndexSearcher warming, caching, and much more.

	Erik

Stuart Sierra

2007-Nov-19 02:59 UTC

head link

[Ferret-talk] Ferret/AAF Stability?

On Nov 17, 2007 7:39 AM, Jens Kraemer <jk at jkraemer.net>
wrote:> > 3. Ferret doesn''t yet support compressed indexes.
>
> At least from the docs it looks like it does, see
> http://ferret.davebalmain.com/api/classes/Ferret/Index/FieldInfo.html .
> I didn''t ever try this out however.
Yes, it''s in the API, but there''s no code for it yet.
> > I was nervous about tackling Solr, but I''ve found it quite
easy to
> > use, and the built-in caching and multithreading make it fast.
>
> numbers, please :-)
I make no claim that it''s faster than Ferret, but it''s fast
enough.
> Having that said, if my application''s main concern would be
search, I
> most probably wouldn''t choose any pre-cooked solution like aaf or
Solr,
> but build exactly the thing I need from scratch, basing it either on
> Lucene or Ferret. But maybe that''s just me ;-)
I''d like to do that, but I lack sufficient time and skill. :)  In the
mean time, I''m hoping Solr will let me offer an open search API to my
users without too much extra effort on my part.  We''ll see how it
goes; I may end up back on Ferret at some point.

-Stuart

Reasonably Related Threads

Search for more reasonably related threads

Ferret talk - Nov 2007 - Ferret/AAF Stability?

[Ferret-talk] Ferret/AAF Stability?

[Ferret-talk] Ferret/AAF Stability?

[Ferret-talk] Ferret/AAF Stability?

[Ferret-talk] Ferret/AAF Stability?

[Ferret-talk] Ferret/AAF Stability?

[Ferret-talk] Ferret/AAF Stability?

[Ferret-talk] Ferret/AAF Stability?

[Ferret-talk] Ferret/AAF Stability?

[Ferret-talk] Ferret/AAF Stability?

Reasonably Related Threads