thr3ads.net - Lustre devel - [Lustre-devel] question about ldlm_server_glimpse

If this information is useful, please help other people find it:
Share via:

Jeremy Filizetti

2010-Apr-30 02:59 UTC

[Lustre-devel] question about ldlm_server_glimpse_ast

In our Lustre WAN environment a few times we''ve had a link drop for an
extended period of time which causes problems on systems accessing data in
the same directory as the remote system that becomes unavailable.  Our
OSS''s
seem to be stuck in a loop of ptlrpc_queue_wait called from
ldlm_server_glimpse_ast.  The remote site is accesed through an LNet router
which is still available.  However the OSS resends requests every 7 seconds
successfully to the router but squbsequently with timeout which causes it to
loop in ptlrpc_queue_wait.

Looking over the ldlm_server_blocking_ast and ldlm_server_completion_ast
functions I see they set rq_no_resend = 1, but ldlm_server_glimpse_ast does
not.  I''m not familiar with the locking in Lustre, is there a reason
that
ldlm_server_glimpse_ast doesn''t set rq_no_resend = 1?  This would get
rid of
the loop ptlrpc_queue_wait is stuck in until the client comes back, but
I''m
not sure if it would have other unexpected consequences.

Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20100429/1d2bc58d/attachment.html

John Hammond

2010-Apr-30 13:00 UTC

head link

[Lustre-devel] question about ldlm_server_glimpse_ast

On 04/29/2010 09:59 PM, Jeremy Filizetti wrote:> In our Lustre WAN environment a few times we''ve had a link drop
for an
> extended period of time which causes problems on systems accessing data
> in the same directory as the remote system that becomes unavailable.
> Our OSS''s seem to be stuck in a loop of ptlrpc_queue_wait called
from
> ldlm_server_glimpse_ast.  The remote site is accesed through an LNet
> router which is still available.  However the OSS resends requests every
> 7 seconds successfully to the router but squbsequently with timeout
> which causes it to loop in ptlrpc_queue_wait.
>
> Looking over the ldlm_server_blocking_ast and ldlm_server_completion_ast
> functions I see they set rq_no_resend = 1, but ldlm_server_glimpse_ast
> does not.  I''m not familiar with the locking in Lustre, is there a
> reason that ldlm_server_glimpse_ast doesn''t set rq_no_resend = 1? 
This
> would get rid of the loop ptlrpc_queue_wait is stuck in until the client
> comes back, but I''m not sure if it would have other unexpected
consequences.
We have the same issue at TACC, and there is a bugzilla entry:

https://bugzilla.lustre.org/show_bug.cgi?id=21937

I tested a patch which set rq_no_resend = 0 for glimpses, and found that 
clients only had about 6 seconds to reply before eviction.  Since 
eviction creates the possibility for data loss, a 6 second timeout was 
deemed too short for production.  (With the patch applied, it was easy 
for me to create cases where data was indeed lost.)  I was also able to 
observe some file consistency issues which lasted for a few seconds 
after eviction, as well as a failure of the file operations on the 
evicted client to return an error.  See also:

https://bugzilla.lustre.org/show_bug.cgi?id=22360

-John

-- 
John L. Hammond, Ph.D.
ICES, The University of Texas at Austin
jhammond at ices.utexas.edu
(512) 471-9304

Oleg Drokin

2010-Apr-30 18:44 UTC

head link

[Lustre-devel] question about ldlm_server_glimpse_ast

Hello!

On Apr 30, 2010, at 9:00 AM, John Hammond wrote:> I tested a patch which set rq_no_resend = 0 for glimpses, and found that 
> clients only had about 6 seconds to reply before eviction.  Since 
> eviction creates the possibility for data loss, a 6 second timeout was 
> deemed too short for production.  (With the patch applied, it was easy 
> for me to create cases where data was indeed lost.)  I was also able to 
Please note that the 6 second timeout is in fact common ldlm_timeout and
it''s
not just glimpses that are bound by this value.
any ldlm callbacks are required to reply withing this time, so if your
network can have delays of more then this much, you need to consider
increasing ldlm_timeout value (/proc/sys/lustre/ldlm_timeout).
On the other hand if you have a packet loss issue, even if
resending of glimpse ASTs would be present, we don''t currently resend
other ASTs so the situation still has a potential for evictions
with subsequent possible data loss.

Bye,
    Oleg

Cory Spitz

2010-Apr-30 19:25 UTC

head link

[Lustre-devel] question about ldlm_server_glimpse_ast

Hello.

Increasing ldlm_timeout has no effect whatsoever if adaptive timeouts are
enabled.  See bug 22569.  I suggest that you tune up at_min instead.

Thanks,
-Cory


Oleg Drokin wrote:> Hello!
> 
> On Apr 30, 2010, at 9:00 AM, John Hammond wrote:
>> I tested a patch which set rq_no_resend = 0 for glimpses, and found
that
>> clients only had about 6 seconds to reply before eviction.  Since 
>> eviction creates the possibility for data loss, a 6 second timeout was 
>> deemed too short for production.  (With the patch applied, it was easy 
>> for me to create cases where data was indeed lost.)  I was also able to
> 
> Please note that the 6 second timeout is in fact common ldlm_timeout and
it''s
> not just glimpses that are bound by this value.
> any ldlm callbacks are required to reply withing this time, so if your
> network can have delays of more then this much, you need to consider
> increasing ldlm_timeout value (/proc/sys/lustre/ldlm_timeout).
> On the other hand if you have a packet loss issue, even if
> resending of glimpse ASTs would be present, we don''t currently
resend
> other ASTs so the situation still has a potential for evictions
> with subsequent possible data loss.
> 
> Bye,
>     Oleg
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>

John Hammond

2010-Apr-30 21:07 UTC

head link

[Lustre-devel] question about ldlm_server_glimpse_ast

On 04/30/2010 01:44 PM, Oleg Drokin wrote:> Hello!
>
> On Apr 30, 2010, at 9:00 AM, John Hammond wrote:
>> I tested a patch which set rq_no_resend = 0 for glimpses, and found
that
>> clients only had about 6 seconds to reply before eviction.  Since
>> eviction creates the possibility for data loss, a 6 second timeout was
>> deemed too short for production.  (With the patch applied, it was easy
>> for me to create cases where data was indeed lost.)  I was also able to
>
> Please note that the 6 second timeout is in fact common ldlm_timeout and
it''s
> not just glimpses that are bound by this value.
> any ldlm callbacks are required to reply withing this time, so if your
> network can have delays of more then this much, you need to consider
> increasing ldlm_timeout value (/proc/sys/lustre/ldlm_timeout).
> On the other hand if you have a packet loss issue, even if
> resending of glimpse ASTs would be present, we don''t currently
resend
> other ASTs so the situation still has a potential for evictions
> with subsequent possible data loss.
Are there any nonobvious ramifications of changing ldlm_timeout?  I 
noticed that it was set to 20 seconds (except for MDS''s?) in 1.8.2. 
Also there is some suspect looking logic in obd_config.c and elsewhere 
to keep it from being set too high relative to obd_timeout:

     if (ldlm_timeout >= obd_timeout)
         ldlm_timeout = max(obd_timeout / 3, 1U);

Does this mean that ldlm_timeout should not exceed 1/3 of obd_timeout?

Thanks,

-John

-- 
John L. Hammond, Ph.D.
ICES, The University of Texas at Austin
jhammond at ices.utexas.edu
(512) 471-9304

Oleg Drokin

2010-Apr-30 21:14 UTC

head link

[Lustre-devel] question about ldlm_server_glimpse_ast

Hello!

On Apr 30, 2010, at 5:07 PM, John Hammond wrote:
> Are there any nonobvious ramifications of changing ldlm_timeout?  I noticed
that it was set to 20 seconds (except for MDS''s?) in 1.8.2. Also there
is some suspect looking logic in obd_config.c and elsewhere to keep it from
being set too high relative to obd_timeout:
> 
>    if (ldlm_timeout >= obd_timeout)
>        ldlm_timeout = max(obd_timeout / 3, 1U);
> 
> Does this mean that ldlm_timeout should not exceed 1/3 of obd_timeout?
ldlm_timeout should not be set too high, because if a client that holds a lock
dies
this is for how long nobody will be able to get a conflicting lock.
Of course if your network might delay packets (roundtrip) for more than
ldlm_timeout, then you need to lift the limit.
1/3 is there so that if your network delay is potentially this big (and you do
not
use AT), there should be enough time to do some processing still and then send a
reply
to the client (obd_timeout is used for the client to determine when the reply
should
come) before the client times out that request.

Also see the comment from Cory, that if you use AT it is all not controlled
by at_min setting instead and then is dynamically adjusted as the system detects
your
network latency.

Bye,
    Oleg

Lustre devel - Apr 2010 - question about ldlm_server_glimpse_ast

[Lustre-devel] question about ldlm_server_glimpse_ast

[Lustre-devel] question about ldlm_server_glimpse_ast

[Lustre-devel] question about ldlm_server_glimpse_ast

[Lustre-devel] question about ldlm_server_glimpse_ast

[Lustre-devel] question about ldlm_server_glimpse_ast

[Lustre-devel] question about ldlm_server_glimpse_ast