Yes - and having this "stop the client" principle will make for
something
that can be used in future upgrade scenarios as well.
Note that I have copied lustre-devel as this is of general interest.
Peter
On 9/24/08 10:12 AM, "Huang Hua" <H.Huang at Sun.COM> wrote:
> Hello All,
>
> This is what I propose (it is mentioned in the revised HLD: see bug
> 11824, but I''d like to enhance it as followings)
>
>
> --------------------------------
> Upgrade is a special fail-over, invoked and controlled by administrator.
> We can try to make the whole lustre into a ``Quiescent''''
state and block
> any update operations.
> This is something similar while we take a snapshot for a file system.
> Clients block any incoming update operations (maybe all operations
> except sys_statfs()) and sync all pending operations. By this, all
> transactions on client side and server side are committed. There are
> only some ``open'''' requests in the replay queue. These
open requests are
> already committed on server side. They are still in replay queue because
> the files are not closed yet.
>
> In this "Quiescent" state, all read-only operations, such as
getattr,
> lookup, statfs can pass through.
> Maybe only statfs() can pass through. Wire protocol for statfs() does
> not change from 1.8 to 2.0.
> And this enables users can execute "df" command in this state.
>
> This idea is similar to super_operation->write_super_lockfs() in local
> file system.
>
> By this mechanism, we can avoid reformatting for all requests except
> open+create enqueue.
> Since the open+create enqueue itself is committed by server at the time
> of upgrade, the server only need to open the newly created file.
> The new file, created by 1.8 MDS server, can be opened by 2.0 MDS server
> while replay.
>
> The clients will leave this "Quiescent" state while the upgrade
is done.
>
> This will tremendously simplify the upgrade.
> Especially the reformatting of all resend/replay/delayed request, and
> then handle replay case in upgrade case, and
> test all possible upgrade cases.
> --------------------------------
>
> What''s your comment?
>
> Thanks,
> Huang Hua
>
>
>
> Andreas Dilger wrote:
>> On Sep 23, 2008 08:33 +0800, Peter J. Braam wrote:
>>
>>> I understood from Huang Hua that a considerable degree of
perfection is
>>> being pursued with the interoperability of 1.8 clients and 1.8/2.0
servers.
>>>
>>> In particular I was quite worried when I heard what Huang Hua has
been asked
>>> to do. It seems excessive to me to make replay/resend/version
recovery all
>>> work in a failover situation from 1.8 to 2.0. This requires
incredibly
>>> detailed testing of every RPC that might be rolled back or in
transit across
>>> such an upgrade, something that is not too easy to automate I
think. Quite
>>> apart from this, it might not be transparent to user applications
if during
>>> 1.8(client)-2.0(server) the same fids are not allocated to the
client (I am
>>> not sure if this would be the case).
>>>
>>
>> Minor note - IGIF will ensure that client-visible identifiers remain
the
>> same over a 1.8->2.0 upgrade. This will NOT be true in the case of
a
>> 2.0->1.8 downgrade (which will require client eviction), but that
should
>> only happen if there are already serious problems with 2.0.
>>
>>
>>> It would be much better, to dramatically reduce the hassles with
protocol
>>> interoperability, to have a mechanism to tell a client to wait for
>>> completion of its requests and block new ones while the server
failover is
>>> in progress. This would be organized through the configuration
lock. This
>>> would lead to a situation where no state in the protocol needs to
be
>>> recovered.
>>>
>>> Why is this not being pursued?
>>>
>>> Peter
>>>
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Sr. Staff Engineer, Lustre Group
>> Sun Microsystems of Canada, Inc.
>>
>>
>