thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre DRBD failover time [Jul 2009]

If this information is useful, please help other people find it:
Share via:

tao.a.wu at nokia.com

2009-Jul-14 15:54 UTC

[Lustre-discuss] Lustre DRBD failover time

Hi, all,

I am evaluating Lustre with DRBD failover, and experiencing about 2 minutes in
OSS failover time to switch to the secondary node.  Has anyone have the similar
observation (so that we can conclude this should be expected), or if there is
some parameters that I should tune to reduce that time?

I have a simple setup: the MDS and OSS0 are hosted on server1, and OSS1 are
hosted on server2.  OSS0 and OSS1 are the primary nodes for OST0 and OST1,
respectively, and the OSTs are replicated using DRBD (protocol C) to the other
machine.  The two OSTs are about 73GB each.  I am running Lustre 1.6 + DRBD 8 +
Heartbeat v2 (but using v1 configuration).
>From HA logs, it looks that Heartbeat noticed a node is down within 10
seconds (with is consistent with the deadtime of 6 seconds).  Where does the
secondary node spend the remaining 100-110 seconds?  There was a post
(http://groups.google.com/group/lustre-discuss-list/msg/bbbeac047df678ca?dmode=source)
contributing MDS failover time to fsck.  Does it also cause my problem?
Thanks,

-Tao




-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090714/e67c37ed/attachment.html

Brian J. Murrell

2009-Jul-14 16:10 UTC

head link

[Lustre-discuss] Lustre DRBD failover time

On Tue, 2009-07-14 at 17:54 +0200, tao.a.wu at nokia.com
wrote:>  
> Hi, all,
>  
> I am evaluating Lustre with DRBD failover, and experiencing about 2
> minutes in OSS failover time to switch to the secondary node.
What is this 2 minutes including?  Just the time for the second OSS to
mount the disk and start recovery or is it 2 minutes to detect the
primary failure and have the secondary complete recovery so that the
clients are fully functional again?

If the latter, then you are doing quite well.  Recovery is not an
instantaneous process.  Much work needs to be done to ensure coherency
between what is on the disk of the failed over OST and what the clients
think is on disk.  Getting to this state requires that all clients
synchronize with the OST and getting/waiting for many clients to do this
can, currently, take many minutes as each client has to first notice the
primary is dead and sync up with the failover.  Some clients might not
even be available to sync, in which case you have to wait for a timeout.

So if you are talking 2 minutes from failure to full recovery, you are
not likely going to put much of a dent in this.

Lustre 1.8 has adaptive timeouts enabled and that should help in optimal
situations, but it will still take time to do a full recovery.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090714/6fd0308a/attachment-0001.bin

Cliff White

2009-Jul-14 16:42 UTC

head link

[Lustre-discuss] Lustre DRBD failover time

tao.a.wu at nokia.com wrote:>  
> Hi, all,
>  
> I am evaluating Lustre with DRBD failover, and experiencing about 2 
> minutes in OSS failover time to switch to the secondary node.  Has 
> anyone have the similar observation (so that we can conclude this should 
> be expected), or if there is some parameters that I should tune to 
> reduce that time?
>  
> I have a simple setup: the MDS and OSS0 are hosted on server1, and OSS1 
> are hosted on server2.  OSS0 and OSS1 are the primary nodes for OST0 and 
> OST1, respectively, and the OSTs are replicated using DRBD (protocol C) 
> to the other machine.  The two OSTs are about 73GB each.  I am running 
> Lustre 1.6 + DRBD 8 + Heartbeat v2 (but using v1 configuration).
>  
>  From HA logs, it looks that Heartbeat noticed a node is down within 10 
> seconds (with is consistent with the deadtime of 6 seconds).  Where does 
> the secondary node spend the remaining 100-110 seconds?  There was a 
> post 
>
(_http://groups.google.com/group/lustre-discuss-list/msg/bbbeac047df678ca?dmode=source_)
> contributing MDS failover time to fsck.  Does it also cause my problem?
as Brian mentioned, Lustre servers go through a recovery process.
You need to examine system logs on the OSS - if Lustre is in recovery, 
there will be messages in the logs explaining this.

cliffw


> Thanks,
>  
> -Tao
>  
>  
>  
>  
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

tao.a.wu at nokia.com

2009-Jul-14 19:05 UTC

head link

[Lustre-discuss] Lustre DRBD failover time

Yes, it is the latter... Thanks for the info.

A related but different question,  Lustre 2.0 will have replication.  Under 2.0
(with replication), what would happen if the primary node goes down?  Would the
backup node be able to take over the load in shorter period of time?  Or is the
replication feature for something else?

Thanks,

-Tao

-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces
at lists.lustre.org] On Behalf Of ext Brian J. Murrell
Sent: Tuesday, July 14, 2009 12:10 PM
To: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Lustre DRBD failover time

On Tue, 2009-07-14 at 17:54 +0200, tao.a.wu at nokia.com
wrote:>  
> Hi, all,
>  
> I am evaluating Lustre with DRBD failover, and experiencing about 2 
> minutes in OSS failover time to switch to the secondary node.
What is this 2 minutes including?  Just the time for the second OSS to mount the
disk and start recovery or is it 2 minutes to detect the primary failure and
have the secondary complete recovery so that the clients are fully functional
again?

If the latter, then you are doing quite well.  Recovery is not an instantaneous
process.  Much work needs to be done to ensure coherency between what is on the
disk of the failed over OST and what the clients think is on disk.  Getting to
this state requires that all clients synchronize with the OST and
getting/waiting for many clients to do this can, currently, take many minutes as
each client has to first notice the primary is dead and sync up with the
failover.  Some clients might not even be available to sync, in which case you
have to wait for a timeout.

So if you are talking 2 minutes from failure to full recovery, you are not
likely going to put much of a dent in this.

Lustre 1.8 has adaptive timeouts enabled and that should help in optimal
situations, but it will still take time to do a full recovery.

b.

Andreas Dilger

2009-Jul-14 21:34 UTC

head link

[Lustre-discuss] Lustre DRBD failover time

On Jul 14, 2009  21:05 +0200, tao.a.wu at nokia.com
wrote:> A related but different question,  Lustre 2.0 will have replication.
> Under 2.0 (with replication), what would happen if the primary node
> goes down?  Would the backup node be able to take over the load in
> shorter period of time?  Or is the replication feature for something else?
The "replication" feature has nothing to do with what you are
thinking.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Jul 2009 - Lustre DRBD failover time

[Lustre-discuss] Lustre DRBD failover time

[Lustre-discuss] Lustre DRBD failover time

[Lustre-discuss] Lustre DRBD failover time

[Lustre-discuss] Lustre DRBD failover time

[Lustre-discuss] Lustre DRBD failover time