thr3ads.net - Ferret talk - [Ferret-talk] Multiple servers for one index [Mar 2007]

If this information is useful, please help other people find it:
Share via:

Sebastien Pahl

2007-Mar-23 16:12 UTC

[Ferret-talk] Multiple servers for one index

Hi,

I''m currently trying to set up a solution involving multiple servers
using the same index over nfs.
The problem is that from what I have seen, ferret doesn''t support
multiple processes writing to the same index.

Using a DRb service is not an option since this would create a single
point of failure.

I tried using Ferret::Store::FSDirectory to create a write lock on the
the index directory with code somewhat like this one:

[...]

dir = Ferret::Store::FSDirectory.new(INDEX_PATH)
write_lock = dir.make_lock("lock")
write_lock.obtain
index << {:id => id, :type => ''create_test_type''}\
index.flush
write_lock.release

[...]

but it makes the processes freezes or raise a
Ferret::Store::Lock::LockError in my different attempts.

I tried to play with IndexWriter options like max_merge_docs,
merge_factor... but without success. Maybe there is a way to merge all
the Compound files every couple of writes instead of doing it on the
fly.

Is there a way to achieve my goal?

Dave please tell me you have an idea:-P

Thanks

Seb

--
Sebastien Pahl - Rift Technologies
spahl at rift.fr

-- 
Posted via http://www.ruby-forum.com/.

Marvin Humphrey

2007-Mar-23 17:05 UTC

head link

[Ferret-talk] Multiple servers for one index

On Mar 23, 2007, at 9:12 AM, Sebastien Pahl wrote:
> \Dave please tell me you have an idea:-P
Dave, I recently more-or-less solved the NFS problem in KinoSearch.   
The gist of the solution is to implement read-locking on IndexReaders  
via lock files, but leave it off by default -- so that only people  
who put their indexes on NFS need turn it on.  More info in the "Read- 
locking on shared volumes" section here:

   http://xrl.us/vfs2 (Link to www.rectangular.com)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Matt Schnitz

2007-Mar-23 17:46 UTC

head link

[Ferret-talk] Multiple servers for one index

I personally would love some support for multi-threaded write locking,
built-in.  It''s pretty easy this days to set up a multithreaded
Rails/Ferret
server using Mongrel and Lighttpd.

It''d also be nice if the docs gave special warning for this case.  It
came
pretty unexpectedly.


Schnitz

On 3/23/07, Marvin Humphrey <marvin at rectangular.com>
wrote:>
>
> On Mar 23, 2007, at 9:12 AM, Sebastien Pahl wrote:
>
> > \Dave please tell me you have an idea:-P
>
> Dave, I recently more-or-less solved the NFS problem in KinoSearch.
> The gist of the solution is to implement read-locking on IndexReaders
> via lock files, but leave it off by default -- so that only people
> who put their indexes on NFS need turn it on.  More info in the "Read-
> locking on shared volumes" section here:
>
>    http://xrl.us/vfs2 (Link to www.rectangular.com)
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/ferret-talk/attachments/20070323/18de6b63/attachment.html

Marvin Humphrey

2007-Mar-23 18:36 UTC

head link

[Ferret-talk] Multiple servers for one index

On Mar 23, 2007, at 10:46 AM, Matt Schnitz wrote:
> I personally would love some support for multi-threaded write  
> locking, built-in.  It''s pretty easy this days to set up a  
> multithreaded Rails/Ferret server using Mongrel and Lighttpd.
I''m not sure if Dave''s solved a problem that neither Lucene
nor
KinoSearch has solved, but I''d say it''s difficult to outright
impossible to allow more than one write process access to the index  
at any given moment under the segmented, write-once model used by all  
of us.

What is possible is to manage access to an index on a shared volume  
so that an active write process causes all other attempts to open a  
write process to fail, including those from other machines.  The key  
is to put the write.lock file in the index directory, rather than in  
the temp directory -- since the temp directory is per-machine, no  
other machine knows about another machine''s lock files and write  
processes may stomp each other.

I believe the default location of the lock directory was changed in  
Lucene in 2.1 (if not the change is in svn trunk).  It changed in  
KinoSearch as of 0.20_01, though with a twist that makes things more  
convenient for everyone else at a minor cost to NFS users:

   Concurrency

      Only one InvIndexer may write to an invindex
      at a time. If a write lock cannot be secured,
      new() will throw an exception.

      If your an index is located on a shared volume,
      each writer application must identify itself by
      passing a LockFactory to InvIndexer''s constructor,
      or index corruption will occur.

Imposing that condition means that stale lock files associated with  
dead pids can be zapped automatically by default.

In earlier versions of Lucene, it''s possible to specify a global lock  
dir location, putting it on the shared volume for example and  
allowing multiple machines to become aware of each other''s lock  
files.  It wouldn''t surprise me if Dave had duplicated that in Ferret.
> It''d also be nice if the docs gave special warning for this case.
> It came pretty unexpectedly.
NFS is bleedin'' PITA to support because it doesn''t do
"delete-on-last-
close" and flock/fcntl locking is unreliable on so many operating  
systems.  What I''d really like to do is detect NFS somehow and throw  
errors at construction time, but since that''s not realistic, there  
are moderately prominent warnings now in the KS docs.

It''s not an ideal set-up because inevitably some fraction of users  
will get burned when they move their indexes to NFS without taking  
stock of the warnings, but without getting into the gory details,  
I''ll just say that''s hard to avoid.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Sebastien Pahl

2007-Mar-23 20:54 UTC

head link

[Ferret-talk] Multiple servers for one index

That is exactly what I tried with Ferret but it makes the processes 
freeze or raise a Ferret::Store::Lock::LockError.

Marvin Humphrey wrote:> On Mar 23, 2007, at 10:46 AM, Matt Schnitz wrote:
> > 
> What is possible is to manage access to an index on a shared volume
> so that an active write process causes all other attempts to open a
> write process to fail, including those from other machines.  The key
> is to put the write.lock file in the index directory, rather than in
> the temp directory -- since the temp directory is per-machine, no
> other machine knows about another machine''s lock files and write
> processes may stomp each other.
> 
--
Sebastien Pahl - Rift Technologies
spahl at rift.fr

-- 
Posted via http://www.ruby-forum.com/.

Marvin Humphrey

2007-Mar-23 21:18 UTC

head link

[Ferret-talk] Multiple servers for one index

On Mar 23, 2007, at 1:54 PM, Sebastien Pahl wrote:
> That is exactly what I tried with Ferret but it makes the processes
> freeze or raise a Ferret::Store::Lock::LockError.
I''m less than completely familiar with how Ferret handles this, but  
in KS, you''ll get a lock error after the timeout is exceeded and it  
stops retrying.  A freeze sounds wrong.

I suspect the only way to make this work is to catch the LockError  
and retry.

Creating a queue for writers trying to access an NFS index, so that  
each new process starts immediately after an old process releases a  
lock... that would be great, but I don''t know how you''d pull
it off
with lock files.  Creating shared read locks was hard enough!

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Andreas Korth

2007-Mar-26 15:20 UTC

head link

[Ferret-talk] Multiple servers for one index

On Mar 23, 2007, at 5:12 PM, Sebastien Pahl wrote:
> Hi,
>
> I''m currently trying to set up a solution involving multiple
servers
> using the same index over nfs.
> The problem is that from what I have seen, ferret doesn''t support
> multiple processes writing to the same index.
>
> Using a DRb service is not an option since this would create a single
> point of failure.
Did I miss something or is your NFS volume exactly that: a single  
point of failure. I think you ruled out the DRb solution too quickly.  
Shared resources on NFS volumes are always prone to failure. Plus it  
doesn''t scale well because too many processes accessing the index  
directory will inevitably lead to poor performance or a complete  
deadlock.

I''ve come to the conclusion that the "Share Nothing" approach
works
best and SOAs are the way to go. I prefer talking to a single index  
server and not worry about the details. I don''t care whether it is a  
single server or a load balanced cluster that services my request.

-- Andy

Jens Kraemer

2007-Mar-27 08:53 UTC

head link

[Ferret-talk] Multiple servers for one index

On Mon, Mar 26, 2007 at 05:20:47PM +0200, Andreas Korth
wrote:> 
> On Mar 23, 2007, at 5:12 PM, Sebastien Pahl wrote:
> 
> > Hi,
> >
> > I''m currently trying to set up a solution involving multiple
servers
> > using the same index over nfs.
> > The problem is that from what I have seen, ferret doesn''t
support
> > multiple processes writing to the same index.
> >
> > Using a DRb service is not an option since this would create a single
> > point of failure.
> 
> Did I miss something or is your NFS volume exactly that: a single  
> point of failure. I think you ruled out the DRb solution too quickly.  
> Shared resources on NFS volumes are always prone to failure. Plus it  
> doesn''t scale well because too many processes accessing the index
> directory will inevitably lead to poor performance or a complete  
> deadlock.
> 
> I''ve come to the conclusion that the "Share Nothing"
approach works
> best and SOAs are the way to go. I prefer talking to a single index  
> server and not worry about the details. I don''t care whether it is
a
> single server or a load balanced cluster that services my request.
Full ack :-)

I don''t know how big you expect your index to grow, and how critical it
is that it''s always up-to-date, but wouldn''t it be sufficient
to have a
backup system with a nightly snapshot of the index, that could jump in
in case the production server fails? You even could run continuous
rebuilds on that backup server to keep the index fairly in sync...

Jens

-- 
Jens Kr?mer
webit! Gesellschaft f?r neue Medien mbH
Schnorrstra?e 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer at webit.de | www.webit.de
 
Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Apparently Analagous Threads

Search for more seemingly similar threads

Ferret talk - Mar 2007 - Multiple servers for one index

[Ferret-talk] Multiple servers for one index

[Ferret-talk] Multiple servers for one index

[Ferret-talk] Multiple servers for one index

[Ferret-talk] Multiple servers for one index

[Ferret-talk] Multiple servers for one index

[Ferret-talk] Multiple servers for one index

[Ferret-talk] Multiple servers for one index

[Ferret-talk] Multiple servers for one index

Apparently Analagous Threads