Hi, I''m currently trying to set up a solution involving multiple servers using the same index over nfs. The problem is that from what I have seen, ferret doesn''t support multiple processes writing to the same index. Using a DRb service is not an option since this would create a single point of failure. I tried using Ferret::Store::FSDirectory to create a write lock on the the index directory with code somewhat like this one: [...] dir = Ferret::Store::FSDirectory.new(INDEX_PATH) write_lock = dir.make_lock("lock") write_lock.obtain index << {:id => id, :type => ''create_test_type''}\ index.flush write_lock.release [...] but it makes the processes freezes or raise a Ferret::Store::Lock::LockError in my different attempts. I tried to play with IndexWriter options like max_merge_docs, merge_factor... but without success. Maybe there is a way to merge all the Compound files every couple of writes instead of doing it on the fly. Is there a way to achieve my goal? Dave please tell me you have an idea:-P Thanks Seb -- Sebastien Pahl - Rift Technologies spahl at rift.fr -- Posted via http://www.ruby-forum.com/.
On Mar 23, 2007, at 9:12 AM, Sebastien Pahl wrote:> \Dave please tell me you have an idea:-PDave, I recently more-or-less solved the NFS problem in KinoSearch. The gist of the solution is to implement read-locking on IndexReaders via lock files, but leave it off by default -- so that only people who put their indexes on NFS need turn it on. More info in the "Read- locking on shared volumes" section here: http://xrl.us/vfs2 (Link to www.rectangular.com) Marvin Humphrey Rectangular Research http://www.rectangular.com/
I personally would love some support for multi-threaded write locking, built-in. It''s pretty easy this days to set up a multithreaded Rails/Ferret server using Mongrel and Lighttpd. It''d also be nice if the docs gave special warning for this case. It came pretty unexpectedly. Schnitz On 3/23/07, Marvin Humphrey <marvin at rectangular.com> wrote:> > > On Mar 23, 2007, at 9:12 AM, Sebastien Pahl wrote: > > > \Dave please tell me you have an idea:-P > > Dave, I recently more-or-less solved the NFS problem in KinoSearch. > The gist of the solution is to implement read-locking on IndexReaders > via lock files, but leave it off by default -- so that only people > who put their indexes on NFS need turn it on. More info in the "Read- > locking on shared volumes" section here: > > http://xrl.us/vfs2 (Link to www.rectangular.com) > > Marvin Humphrey > Rectangular Research > http://www.rectangular.com/ > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20070323/18de6b63/attachment.html
On Mar 23, 2007, at 10:46 AM, Matt Schnitz wrote:> I personally would love some support for multi-threaded write > locking, built-in. It''s pretty easy this days to set up a > multithreaded Rails/Ferret server using Mongrel and Lighttpd.I''m not sure if Dave''s solved a problem that neither Lucene nor KinoSearch has solved, but I''d say it''s difficult to outright impossible to allow more than one write process access to the index at any given moment under the segmented, write-once model used by all of us. What is possible is to manage access to an index on a shared volume so that an active write process causes all other attempts to open a write process to fail, including those from other machines. The key is to put the write.lock file in the index directory, rather than in the temp directory -- since the temp directory is per-machine, no other machine knows about another machine''s lock files and write processes may stomp each other. I believe the default location of the lock directory was changed in Lucene in 2.1 (if not the change is in svn trunk). It changed in KinoSearch as of 0.20_01, though with a twist that makes things more convenient for everyone else at a minor cost to NFS users: Concurrency Only one InvIndexer may write to an invindex at a time. If a write lock cannot be secured, new() will throw an exception. If your an index is located on a shared volume, each writer application must identify itself by passing a LockFactory to InvIndexer''s constructor, or index corruption will occur. Imposing that condition means that stale lock files associated with dead pids can be zapped automatically by default. In earlier versions of Lucene, it''s possible to specify a global lock dir location, putting it on the shared volume for example and allowing multiple machines to become aware of each other''s lock files. It wouldn''t surprise me if Dave had duplicated that in Ferret.> It''d also be nice if the docs gave special warning for this case. > It came pretty unexpectedly.NFS is bleedin'' PITA to support because it doesn''t do "delete-on-last- close" and flock/fcntl locking is unreliable on so many operating systems. What I''d really like to do is detect NFS somehow and throw errors at construction time, but since that''s not realistic, there are moderately prominent warnings now in the KS docs. It''s not an ideal set-up because inevitably some fraction of users will get burned when they move their indexes to NFS without taking stock of the warnings, but without getting into the gory details, I''ll just say that''s hard to avoid. Marvin Humphrey Rectangular Research http://www.rectangular.com/
That is exactly what I tried with Ferret but it makes the processes freeze or raise a Ferret::Store::Lock::LockError. Marvin Humphrey wrote:> On Mar 23, 2007, at 10:46 AM, Matt Schnitz wrote: > > > What is possible is to manage access to an index on a shared volume > so that an active write process causes all other attempts to open a > write process to fail, including those from other machines. The key > is to put the write.lock file in the index directory, rather than in > the temp directory -- since the temp directory is per-machine, no > other machine knows about another machine''s lock files and write > processes may stomp each other. >-- Sebastien Pahl - Rift Technologies spahl at rift.fr -- Posted via http://www.ruby-forum.com/.
On Mar 23, 2007, at 1:54 PM, Sebastien Pahl wrote:> That is exactly what I tried with Ferret but it makes the processes > freeze or raise a Ferret::Store::Lock::LockError.I''m less than completely familiar with how Ferret handles this, but in KS, you''ll get a lock error after the timeout is exceeded and it stops retrying. A freeze sounds wrong. I suspect the only way to make this work is to catch the LockError and retry. Creating a queue for writers trying to access an NFS index, so that each new process starts immediately after an old process releases a lock... that would be great, but I don''t know how you''d pull it off with lock files. Creating shared read locks was hard enough! Marvin Humphrey Rectangular Research http://www.rectangular.com/
On Mar 23, 2007, at 5:12 PM, Sebastien Pahl wrote:> Hi, > > I''m currently trying to set up a solution involving multiple servers > using the same index over nfs. > The problem is that from what I have seen, ferret doesn''t support > multiple processes writing to the same index. > > Using a DRb service is not an option since this would create a single > point of failure.Did I miss something or is your NFS volume exactly that: a single point of failure. I think you ruled out the DRb solution too quickly. Shared resources on NFS volumes are always prone to failure. Plus it doesn''t scale well because too many processes accessing the index directory will inevitably lead to poor performance or a complete deadlock. I''ve come to the conclusion that the "Share Nothing" approach works best and SOAs are the way to go. I prefer talking to a single index server and not worry about the details. I don''t care whether it is a single server or a load balanced cluster that services my request. -- Andy
On Mon, Mar 26, 2007 at 05:20:47PM +0200, Andreas Korth wrote:> > On Mar 23, 2007, at 5:12 PM, Sebastien Pahl wrote: > > > Hi, > > > > I''m currently trying to set up a solution involving multiple servers > > using the same index over nfs. > > The problem is that from what I have seen, ferret doesn''t support > > multiple processes writing to the same index. > > > > Using a DRb service is not an option since this would create a single > > point of failure. > > Did I miss something or is your NFS volume exactly that: a single > point of failure. I think you ruled out the DRb solution too quickly. > Shared resources on NFS volumes are always prone to failure. Plus it > doesn''t scale well because too many processes accessing the index > directory will inevitably lead to poor performance or a complete > deadlock. > > I''ve come to the conclusion that the "Share Nothing" approach works > best and SOAs are the way to go. I prefer talking to a single index > server and not worry about the details. I don''t care whether it is a > single server or a load balanced cluster that services my request.Full ack :-) I don''t know how big you expect your index to grow, and how critical it is that it''s always up-to-date, but wouldn''t it be sufficient to have a backup system with a nightly snapshot of the index, that could jump in in case the production server fails? You even could run continuous rebuilds on that backup server to keep the index fairly in sync... Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa