On Jul 20, 2010, at 12:17 PM, Richard de Vries wrote:
> Dear All,
> Is it advisable to increase the number of lock servers to ensure that
> when the first node in the subvolume fails the other node can continue
> to work correctly even if the first node comes back?
No. You don't need to increase the number of lock servers just to deal with
this case. As explained below, the second node can write even when the first
node has failed after holding a lock.
Increasing the number of lock servers is advised if you have more than one
client writing to the same region of the same file. Having more than one lock
server eliminates a (small) race in which the following happens:
1) Server 1 goes down.
2) Client 2 holds a lock on the (only remaining) server 2 and starts writing.
3) Server 1 comes back up.
4) Client 1 holds a lock on server 1 (the logic is the lock server(s) start from
the "first subvolume that is up" and continue)
5) Client 1 also writes (because it has acquired a lock).
The danger here is that (5) and (2) can happen in different orders on servers 1
and 2, thus leaving them inconsistent with each other. As you can see, the
window is quite small, and only comes into play if you have two clients writing
to the same region of the same file.
> 2nd question:
> What happens if the first node holds a lock on a file and fails (power
> down or kernel panic)? What if the second node now wants to modify
> this file?
The lock will automatically be released when the first client fails. The second
node can then hold the lock and continue with its write.
------------------------------
Vikas Gorur
Engineer - Gluster, Inc.
------------------------------