thr3ads.net - Lustre devel - [Lustre-devel] Version based recovery [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Peter Braam

2008-Jun-10 17:51 UTC

[Lustre-devel] Version based recovery

I quickly reviewed the HLD and read Mike''s response.  Here are a few
questions:

1. Why do you wait for timeout+x after seeing a gap?  Why not x,timeout, or
y?

2. How to you avoid infinite accumulation of new exports?

3. If a VBR recovery operations happens, what transaction number is assigned
to this?   

4. Please discuss what happpens if multiple gaps are encountered?

5. Can we draw some pictures of the original transaction sequence and how
its slots are refilled (in what order, with what new transaction number etc)
if multiple clients are involved?


I believe that you might have the right algorithms, but the explanations in
the HLD are too short to be confident.

- Peter

Mikhail Pershin

2008-Jun-11 14:05 UTC

head link

[Lustre-devel] Version based recovery

Thanks for review. I put short answers below and will update HLD with more  
details about questions you asked.

On Tue, 10 Jun 2008 21:51:23 +0400, Peter Braam <Peter.Braam at Sun.COM>  
wrote:
>
>
> I quickly reviewed the HLD and read Mike''s response.  Here are a
few
> questions:
>
> 1. Why do you wait for timeout+x after seeing a gap?  Why not x,timeout,  
> or
> y?
this is wrong sentence. The server waits for RECOVERY_TIMEOUT seconds  
since last reconnect.
>
> 2. How to you avoid infinite accumulation of new exports?
>
new clients are not allowed to connect during recovery and number of  
existent exports is finite
> 3. If a VBR recovery operations happens, what transaction number is  
> assigned
> to this?
the same as during original operation, i.e. transno from replay request.  
Since we introduce the per-export last_committed value (section 2.2.3 of  
HLD), the transno may be the same as old one.
>
> 4. Please discuss what happpens if multiple gaps are encountered?
>
when first gap is encountered (the client misses recovery) the server  
starts using the version checking for replays and all not connected  
clients are marked as ''delayed''. The number of recoverable
clients is
decreased so check_for_next_transno will not stop on gap because number of  
queued requests is equal to number of client in recovery. You right, this  
is missed use case in HLD
> 5. Can we draw some pictures of the original transaction sequence and how
> its slots are refilled (in what order, with what new transaction number  
> etc)
> if multiple clients are involved?
>
I will do that, sure
>
> I believe that you might have the right algorithms, but the explanations  
> in
> the HLD are too short to be confident.
>
> - Peter
>
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel


-- 
Mikhail Pershin
Staff Engineer
Lustre Group
Sun Microsystems, Inc.

Peter Braam

2008-Jun-11 14:24 UTC

head link

[Lustre-devel] Version based recovery

Mike -

This deserves some pretty serious thinking and I should not be the only one
to discuss this with you, because it is so complex.

OK - so the thing to try to be very clear about is if after encountering a
gap the recovery can ever switch back to non-VBR recovery.

Also, it isn''t clear what happens if the servers saw a few gaps, but
power
down and back up.  It is possible that even when clients reconnect, they
don''t know anymore that there were gaps, yet it can affect sequence
number
recovery.  

If all clients and servers power off you can restart normal recovery.

This raises the question if there is a point in keeping sequence recovery if
we have version recovery, because one missing client appears to kick you
into VBR mode forever.  If you want to retain it, you''d have to record
the
gaps and track how they are getting filled with VBR operations and may
close.

Regards,

Peter



On 6/11/08 8:05 AM, "Mikhail Pershin" <Mikhail.Pershin at
Sun.COM> wrote:
> Thanks for review. I put short answers below and will update HLD with more
> details about questions you asked.
> 
> On Tue, 10 Jun 2008 21:51:23 +0400, Peter Braam <Peter.Braam at
Sun.COM>
> wrote:
> 
>> 
>> 
>> I quickly reviewed the HLD and read Mike''s response.  Here are
a few
>> questions:
>> 
>> 1. Why do you wait for timeout+x after seeing a gap?  Why not
x,timeout,
>> or
>> y?
> 
> this is wrong sentence. The server waits for RECOVERY_TIMEOUT seconds
> since last reconnect.
> 
>> 
>> 2. How to you avoid infinite accumulation of new exports?
>> 
> 
> new clients are not allowed to connect during recovery and number of
> existent exports is finite
> 
>> 3. If a VBR recovery operations happens, what transaction number is
>> assigned
>> to this?
> 
> the same as during original operation, i.e. transno from replay request.
> Since we introduce the per-export last_committed value (section 2.2.3 of
> HLD), the transno may be the same as old one.
> 
>> 
>> 4. Please discuss what happpens if multiple gaps are encountered?
>> 
> 
> when first gap is encountered (the client misses recovery) the server
> starts using the version checking for replays and all not connected
> clients are marked as ''delayed''. The number of
recoverable clients is
> decreased so check_for_next_transno will not stop on gap because number of
> queued requests is equal to number of client in recovery. You right, this
> is missed use case in HLD
> 
>> 5. Can we draw some pictures of the original transaction sequence and
how
>> its slots are refilled (in what order, with what new transaction number
>> etc)
>> if multiple clients are involved?
>> 
> 
> I will do that, sure
> 
>> 
>> I believe that you might have the right algorithms, but the
explanations
>> in
>> the HLD are too short to be confident.
>> 
>> - Peter
>> 
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 
>

Mikhail Pershin

2008-Jun-17 06:58 UTC

head link

[Lustre-devel] Version based recovery

Hello, Peter

On Wed, 11 Jun 2008 18:24:15 +0400, Peter Braam <Peter.Braam at Sun.COM>  
wrote:
> Mike -
>
> This deserves some pretty serious thinking and I should not be the only  
> one
> to discuss this with you, because it is so complex.
is it worth to add CC to lustre-recovery@ mail list? It is right about  
recovery issues.
>
> OK - so the thing to try to be very clear about is if after encountering  
> a
> gap the recovery can ever switch back to non-VBR recovery.
>
> Also, it isn''t clear what happens if the servers saw a few gaps,
but
> power
> down and back up.  It is possible that even when clients reconnect, they
> don''t know anymore that there were gaps, yet it can affect
sequence
> number
> recovery.
Please see attached file with investigation for this use case. I am going  
to add this in HLD after discussion. There is the problem with switching  
back to ordinary recovery from VBR if recovery was interrupted and started  
again.
>
> If all clients and servers power off you can restart normal recovery.
>
> This raises the question if there is a point in keeping sequence  
> recovery if we have version recovery,
The sequence recovery is simple and proven way to go, that is why it is  
better to use it with VBR for additional checks.
The using only VBR leads to the following problems:
1) the replays may be done out of order so version checking can fail even  
when all clients are present. E.g. client1 does changes with version 1 and  
client2 - with version 2. If client2 will join recovery before the client2  
then the version mismatch will occur. The obvious solution is to wait for  
RECOVERY_TIMOUT if version is less than needed, i.e. some another client  
will set it probably. This gives us new problem:
2) waiting for needed version to arrive leads to multithreaded recovery.  
E.g. client1 waits for version N of object K, in the same time another  
client2 needs version A of object H, therefore there can be multiple  
replays waiting for needed version of different objects and we should  
handle that somehow. During sequence recovery we wait for needed  
transaction in per-server sequence, but with VBR multiple requests can  
waits needed versions because version sequences are per-objects.

This worth to be discussed as future approach for recovery, possibly, but  
using sequence recovery is simple way to start.
> because one missing client appears to kick you
> into VBR mode forever.  If you want to retain it, you''d have to
record
> the gaps and track how they are getting filled with VBR operations and  
> may
> close.
The epoches (boot cycle counter) are used to track which clients should  
participate in recovery. Missed client will affect the recovery only once  
when it was missed. After that it will have epoch in last_rcvd client_data  
less then server last epoch and will not be included in main recovery. If  
all clients with epoch equal to server last epoch are connected then  
ordinary recovery can be used. See section 3.1 in VBR HLD for details  
about epoch management.

-- 
Mikhail Pershin
Staff Engineer
Lustre Group
Sun Microsystems, Inc.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vbr_transactions.pdf
Type: application/pdf
Size: 60921 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-devel/attachments/20080617/46830568/attachment-0001.pdf

Lustre devel - Jun 2008 - Version based recovery

[Lustre-devel] Version based recovery

[Lustre-devel] Version based recovery

[Lustre-devel] Version based recovery

[Lustre-devel] Version based recovery