On 2009-11-19, at 14:49, Arifa Nisar wrote:> I have a question regarding implementation of server-based extent
> locking at Lustre. I have a situation where two processes are
> concurrently accessing one I/O server for writing one stripe at a
> time. Both of the processes are writing alternate stripes stored on
> that server. I want to understand how extent based locking protocol
> will work in this situation?
>
> I understand first process will be given lock to all the stripes.
> What will happen when second process sends a lock request? Will I/O
> server revoke all the (unused/ un asked for) locks back from
> processes 0, or it will revoke the locks to the required stripe(s)
> only?
Partly it depends on how large the regions S1 and S2 are, and whether
they reside on the same OST or not.
> Please explain if P0 and P1 requests locks to stripes S0 ? S8 in
> this order.
>
> P0 S0
> P1 S1
> P0 S2
> P1 S3
> P0 S4
> P1 S5
> P0 S6
> P1 S7
> P0 S8
For example, if the stripe size = 1MB (so S_even is on OST0 and S_odd
are on OST1), and the IO size is also 1MB from each client, then P0
will get an exclusive lock on OST0''s object and P1 will get an
exclusive lock on OST1''s object, and there is no contention.
Note that the Lustre DLM locks are held by nodes and not processes.
If P0 and P1 are on the same node, then that node will get all of the
locks and there is also no contention (writes are serialized by the
local kernel inode->i_mutex).
Now, with those cases aside, an interesting situation arises when
there is only a single stripe involved (or there are more processes
than stripes, or the IO is not stripe aligned), and there are two
different client nodes invloved.
In that case, the extent lock will only be grown to match the largest
uncontended extent on the object. Unfortunately, with 2 nodes
contending, the lock holder will only have a "lower" extent held, and
that still means that the next lock requester will get the "higher"
extent, all the way to ~0ULL.
We''ve discussed changing this at times to accumulate the number of
conflicts for some short time, so that it can detect the 2-node ping-
pong case and not bounce the lock back and forth.
> Does algorithm remains same if number of processes increases beyond
> two?
No, if there are more clients contending for the lock the heuristic
also changes. In the case of > 4 clients contending for locks the
lock will not be grown downward, only upward. With > 32 clients
contending for the lock, the locks will not be grown to more than 32MB
in size (if lock request is smaller than 32MB).
Also, if a lock is highly contended it is possible to force the
clients into "nolock" mode, so that the OST is doing the locking on
behalf of the client, in order to avoid lock ping-pong. Tunables for
this are:
/proc/fs/lustre/ldlm/{OST}/contended_locks
- number of lock conflicts before a lock is contended (default 4)
/proc/fs/lustre/ldlm/{OST}/contention_seconds
- seconds to be "conflicted" state until normal locking (default 2s)
/proc/fs/lustre/ldlm/{OST}/max_nolock_bytes
- largest enqueue to return conflicts on (default = 0 = off)
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.