I have an index with a few hundred thousand records. The index is generally very fast, with sub 100ms responses. However if I start adding records, it gets extremely slow, up to over 2 seconds per query. This is true even if I am not currently indexing until I optimize the index. In order to work around this, I index in bulk and immediately optimize. This is not ideal for the performance of my site. Unfortunately, contrary to what Dave Balmain seems to say here: http://osdir.com/ml/lang.ruby.ferret.general/2006-08/msg00037.html , the index seems to be locked for reading during optimization. So I have two questions: 1) Why does the performance degrade so badly after adding just a few records, unless I optimize the index? Can I avoid this? 2) Can I keep a second index so that it doesn''t get locked during optimization and then switch to the optimized index? Perhaps the index is not really locked and it is just using all the CPU? (I am using a single CPU server)? Thanks for any help. -Alex
Jens Kraemer
2007-Nov-05 10:29 UTC
[Ferret-talk] Performance before and after optimization
On Sat, Nov 03, 2007 at 08:49:17PM +0800, Alex Neth wrote: [..]> 2) Can I keep a second index so that it doesn''t get locked during > optimization and then switch to the optimized index? Perhaps the index > is not really locked and it is just using all the CPU? (I am using a > single CPU server)?If you''re already indexing in batches, keeping a second read-only index for searching is a good idea. rsync is useful to keep the search-index up to date in this case. To check if CPU usage is a problem, try lowering the optimizing process'' priority and see how it goes. Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database
> From: Jens Kraemer <jk at jkraemer.net> > Subject: Re: [Ferret-talk] Performance before and after optimization> On Sat, Nov 03, 2007 at 08:49:17PM +0800, Alex Neth wrote: > [..] >> 2) Can I keep a second index so that it doesn''t get locked during >> optimization and then switch to the optimized index? Perhaps the >> index >> is not really locked and it is just using all the CPU? (I am using a >> single CPU server)? > > If you''re already indexing in batches, keeping a second read-only > index for > searching is a good idea. rsync is useful to keep the search-index > up to > date in this case. > > To check if CPU usage is a problem, try lowering the optimizing > process'' > priority and see how it goes. >Thanks Jens. Any suggestion on how to get a two index solution working with acts_as_ferret? I could not find an easy way to change the index location dynamically. I would love to have a "read-only" index. It seems like using rsync might be problematic though as the index might not be in a consistent state throughout the sync. It''s not the CPU. The index is definitely locked for reading during optimization. With cheap disk space, I would rather use two indexes, add new records to the "off" index, optimize it, then switch indexes - and go back and for like that.