thr3ads.net - Lustre devel - [Lustre-devel] More LND: error handling [Dec 2006]

If this information is useful, please help other people find it:
Share via:

John R. Dunning

2006-Dec-08 14:28 UTC

[Lustre-devel] More LND: error handling

Looking at the error handling logic that I''ll need to do: The
characteristics
of the transport are such that once I send short messages there''s no
way to
call them back, so the best I can do there is wait for them all to complete
(which might involve a failure code) and then clean them up.  For the dma ops,
there isn''t a good (ie, free of race conditions) way to tell what the
instantaneous state of the transfer is, so the best I can do is invalidate the
state of the various buffer descriptors, and wait for them to trickle out,
presumably with errors.  That''s true for both tx and rx ops.  

The one bit of good news is that we''re pretty sure that, in the case of
a node
failure or something, we can arrange to be sure within a small bounded time
(seconds) that all pending traffic is done.  In the case of a link failure
(which stops clocking the dma engine for that link) we can ensure that nothing
is happening there, which means it''s safe to reset the engine, which
means
it''s safe to yank out the pending buffer descriptors without having it
scribble all over the memory later.

Given all that, I think my error-recovery plan is approx as follows:

For short-messages, ignore them, once they''re gone, they''re
gone.  Ignore
those which are received from peers which aren''t in "good"
state, except of
course for hellos.

For rdma transmits, if I get a notification that the peer is down, invalidate
the descriptors which correspond to the buffers, and wait for them all to
return.  If they come back "ok", presumably that means I lost the
race, and
they get finalized as normal.  All the ones that come back with errors get
finalized with some kind (what kind?) of error.  Is it safe to assume that in
that case lnet and friends will deal with whatever kind of recovery is
necessary, for instance if an alternate OSS comes on line, do they DTRT about
replaying any pending messages to the new peer?

For rdma receives, if I get a notification that the peer is down, I do the
same sort of thing; invalidate the descriptors.  In addition, if it''s a
link
failure, I must do the under-the-hood stuff to guarantee that the engine is
stopped, then by hand signal completion on the relevant buffers.  After that I
think the rest of the logic about finalizing applies.

Does that sound roughly right?  Anything else I should be taking into account?

Eric Barton

2006-Dec-11 04:05 UTC

head link

[Lustre-devel] RE: More LND: error handling

John,
> Given all that, I think my error-recovery plan is approx as follows:
> 
> For short-messages, ignore them, once they''re gone,
they''re gone.
> Ignore those which are received from peers which aren''t in
"good"
> state, except of course for hellos.
Fine
> For rdma transmits, if I get a notification that the peer is down,
> invalidate the descriptors which correspond to the buffers, and wait
> for them all to return.  If they come back "ok", presumably that
> means I lost the race, and they get finalized as normal.  All the
> ones that come back with errors get finalized with some kind (what
> kind?) of error.  
Pass any non-zero completion status to flag an error -
e.g. lnet_finalize(ni, msg, -EIO)
> Is it safe to assume that in that case lnet and friends will deal
> with whatever kind of recovery is necessary, for instance if an
> alternate OSS comes on line, do they DTRT about replaying any
> pending messages to the new peer?
yes, that''s not LNET or the LND''s concern at all.
> For rdma receives, if I get a notification that the peer is down, I
> do the same sort of thing; invalidate the descriptors.  In addition,
> if it''s a link failure, I must do the under-the-hood stuff to
> guarantee that the engine is stopped, then by hand signal completion
> on the relevant buffers.  After that I think the rest of the logic
> about finalizing applies.
Sounds fine.
> Does that sound roughly right?  Anything else I should be taking
> into account?
The guiding principles for completion are...

1. If you return success from lnd_send or lnd_recv, you must call
   lnet_finalize() within finite time.

2. You may only call lnet_finalize() when there is no longer any
   chance that the underlying network can touch (read or write) the
   payload buffer.

3. The completion status on sends isn''t critical.  Lustre only really
   needs to know that sending is over; knowing whether the send was
   good or not is really just icing on the cake (e.g. so that it
   doens''t have to wait for a full timeout for an RPC reply if sending
   the request failed).

4. The completion status on receives is completely critical.  You may
   only return success if the sink buffer has been filled correctly.

                Cheers,
                        Eric

Scott Atchley

2006-Dec-11 05:22 UTC

head link

[Lustre-devel] RE: More LND: error handling

On Dec 11, 2006, at 6:06 AM, Eric Barton wrote:
> John,
>
>> Does that sound roughly right?  Anything else I should be taking
>> into account?
>
> The guiding principles for completion are...
>
> 1. If you return success from lnd_send or lnd_recv, you must call
>    lnet_finalize() within finite time.
>
> 2. You may only call lnet_finalize() when there is no longer any
>    chance that the underlying network can touch (read or write) the
>    payload buffer.
>
> 3. The completion status on sends isn''t critical.  Lustre only
really
>    needs to know that sending is over; knowing whether the send was
>    good or not is really just icing on the cake (e.g. so that it
>    doens''t have to wait for a full timeout for an RPC reply if
sending
>    the request failed).
>
> 4. The completion status on receives is completely critical.  You may
>    only return success if the sink buffer has been filled correctly.
>
>                 Cheers,
>                         Eric
Two other comments:

1) Do not hold any locks when calling any lnet_ functions.

2) Make sure you are _completely_ done with your buffer before  
calling lnet_finalize(). I ran into a race condition where I called  
lnet_finalize() then placed the rx or tx descriptor on my idle  
queue. :-)

John R. Dunning

2006-Dec-11 07:53 UTC

head link

[Lustre-devel] More LND: error handling

From: "Eric Barton" <eeb@bartonsoftware.com>
    Date: Mon, 11 Dec 2006 11:06:23 -0000
    
[...]    
    The guiding principles for completion are...
    
    1. If you return success from lnd_send or lnd_recv, you must call
       lnet_finalize() within finite time.
    
Right, I got that part.

    2. You may only call lnet_finalize() when there is no longer any
       chance that the underlying network can touch (read or write) the
       payload buffer.

Yes.  Not surprisingly, that''s the trickiest part.  But it''s
all stuff that we
control, so it can be done.
    
    3. The completion status on sends isn''t critical.  Lustre only
really
       needs to know that sending is over; knowing whether the send was
       good or not is really just icing on the cake (e.g. so that it
       doens''t have to wait for a full timeout for an RPC reply if
sending
       the request failed).

Ok.
    
    4. The completion status on receives is completely critical.  You may
       only return success if the sink buffer has been filled correctly.
    
Of course.
    
    From: Scott Atchley <atchley@myri.com>
    Date: Mon, 11 Dec 2006 07:21:38 -0500
    
    
    Two other comments:
    
    1) Do not hold any locks when calling any lnet_ functions.

Yikes.  Yes.  I''m pretty sure I wasn''t, but good to keep in
mind.

Does that really mean no locks at all, or no locks that could turn into
recursive lock attempts due to lnet calling back in?  Are the lnet things
(which get called into by lnd) all non-blocking?
    
    2) Make sure you are _completely_ done with your buffer before  
    calling lnet_finalize(). I ran into a race condition where I called  
    lnet_finalize() then placed the rx or tx descriptor on my idle  
    queue. :-)

Yes, that would probably be a Bad Thing (tm).  

Thanks...

Eric Barton

2006-Dec-11 08:06 UTC

head link

[Lustre-devel] RE: More LND: error handling

>     From: Scott Atchley <atchley@myri.com>
>     Date: Mon, 11 Dec 2006 07:21:38 -0500
>     
>     
>     Two other comments:
>     
>     1) Do not hold any locks when calling any lnet_ functions.
Indeed.  And I forgot to mention that you have to be in thread context too!
> Yikes.  Yes.  I''m pretty sure I wasn''t, but good to keep
in mind.
> 
> Does that really mean no locks at all, or no locks that could turn into
> recursive lock attempts due to lnet calling back in?  Are the lnet things
> (which get called into by lnd) all non-blocking?
LNET will not block when you call lnet_finalize() unless it is to allocate
memory.  But it can call you back (e.g. to send an ACK when you finalize a
PUT).
All in all, it''s best not to be holding any locks at all.


                Cheers,
                        Eric

---------------------------------------------------
|Eric Barton        Barton Software               |
|9 York Gardens     Tel:    +44 (117) 330 1575    |
|Clifton            Mobile: +44 (7909) 680 356    |
|Bristol BS8 4LL    Fax:    call first            |
|United Kingdom     E-Mail: eeb@bartonsoftware.com|
---------------------------------------------------

John R. Dunning

2006-Dec-11 08:17 UTC

head link

[Lustre-devel] More LND: error handling

From: "Eric Barton" <eeb@bartonsoftware.com>
    Date: Mon, 11 Dec 2006 15:07:16 -0000

    >     From: Scott Atchley <atchley@myri.com>
    >     Date: Mon, 11 Dec 2006 07:21:38 -0500
    >     
    >     
    >     Two other comments:
    >     
    >     1) Do not hold any locks when calling any lnet_ functions.

    Indeed.  And I forgot to mention that you have to be in thread context too!

Yes, I''d already figured that one out.  I''m really doing very
little in
interrupt context, I foisted all that off on the service thread.

    > Yikes.  Yes.  I''m pretty sure I wasn''t, but good to
keep in mind.
    > 
    > Does that really mean no locks at all, or no locks that could turn into
    > recursive lock attempts due to lnet calling back in?  Are the lnet
things
    > (which get called into by lnd) all non-blocking?

    LNET will not block when you call lnet_finalize() unless it is to allocate
    memory.  But it can call you back (e.g. to send an ACK when you finalize a
    PUT).

Sure, leading to the obvious class of deadlocks.  I just wondered whether
there was something I hadn''t thought of which might be causing lnet to
block.

    All in all, it''s best not to be holding any locks at all.

Of course.  I don''t actually believe there''s any reason for me
to be holding
any locks at that time.  I''m trying to make sure everything is
reentrant all
over the place, because that''s the only way it''s going to run
at speed on a
6-cpu smp.

Lustre devel - Dec 2006 - More LND: error handling

[Lustre-devel] More LND: error handling

[Lustre-devel] RE: More LND: error handling

[Lustre-devel] RE: More LND: error handling

[Lustre-devel] More LND: error handling

[Lustre-devel] RE: More LND: error handling

[Lustre-devel] More LND: error handling