thr3ads.net - Lustre discuss - [Lustre-discuss] About MDS failover [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Jeffrey Alan Bennett

2009-Jan-15 04:52 UTC

[Lustre-discuss] About MDS failover

Hi,

What software are people using for MDS failover? 

I have been using Heartbeat from Linux-HA but I am not absolutely happy with its
performance.

Is there anything better out there?

Thanks,

Jeffrey Bennett
HPC Data Engineer
San Diego Supercomputer Center
858.822.0936 http://users.sdsc.edu/~jab

Wojciech Turek

2009-Jan-15 07:27 UTC

head link

[Lustre-discuss] About MDS failover

I like HA-linux, however if you are looking for alternatives have a look 
at RedHat Cluster Suite
http://www.redhat.com/docs/manuals/csgfs/browse/rh-cs-en/

Jeffrey Alan Bennett wrote:> Hi,
>
> What software are people using for MDS failover? 
>
> I have been using Heartbeat from Linux-HA but I am not absolutely happy
with its performance.
>
> Is there anything better out there?
>
> Thanks,
>
> Jeffrey Bennett
> HPC Data Engineer
> San Diego Supercomputer Center
> 858.822.0936 http://users.sdsc.edu/~jab
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Cliff White

2009-Jan-15 15:01 UTC

head link

[Lustre-discuss] About MDS failover

Jeffrey Alan Bennett wrote:> Hi,
> 
> What software are people using for MDS failover? 
> 
> I have been using Heartbeat from Linux-HA but I am not absolutely happy
with its performance.
> 
> Is there anything better out there?
Are you using heartbeat V1 or V2?

I would like to hear more about the issues you are experiencing.
We have had some people use the Red Hat cluster tools.

cliffw
> 
> Thanks,
> 
> Jeffrey Bennett
> HPC Data Engineer
> San Diego Supercomputer Center
> 858.822.0936 http://users.sdsc.edu/~jab
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Jeffrey Alan Bennett

2009-Jan-15 19:38 UTC

head link

[Lustre-discuss] About MDS failover

Hi Cliff,
> 
> Are you using heartbeat V1 or V2?
> I am using heartbeat V2. It works as expected, I just had to tune some time
outs, but it still takes around 3 minutes to totally move the MGS/MDS services
to the other system. I guess having the MGS and MDS on separate systems would
help reduce this time. Also, MMP is affecting somehow to this time, but MMP is
necessary for failover.

My biggest concern is that I can''t control the situation in which the
HBA connectivity with the storage system is damaged, ie: I pull the cables from
the HBAs on the MGS/MDS and nothing happens, the MDS and MGS services keep
running, they are still mounted and therefore heartbeat does nothing. From the
heartbeat "documentation" it does not seem that this can be done, at
least easily?. I read something about HBA ping and it seems it requires HBAAPI
which does not work with Brocade HBAs...

Any help will be greatly appreciated.
> I would like to hear more about the issues you are experiencing.
> We have had some people use the Red Hat cluster tools.
> I will try Red Hat cluster tools.

Thanks,

Jeff

Andreas Dilger

2009-Jan-16 01:19 UTC

head link

[Lustre-discuss] About MDS failover

On Jan 15, 2009  11:38 -0800, Jeffrey Alan Bennett
wrote:> I am using heartbeat V2. It works as expected, I just had to tune some
> time outs, but it still takes around 3 minutes to totally move the MGS/MDS
> services to the other system.
This is largely an issue of the Lustre failover itself, and not the HA
software.  The problem today is that under heavy load the clients may
have to wait a long time for any requests sent to the server to complete
(100s of seconds in some cases), so it is difficult for the clients to
distinguish between server death (unlikely) and heavy server load (common).

In the case where a server dies and fails over, the clients have to wait
for their requests to time out, then they resend and wait again (in the
common case the server is just overloaded), then finally they try to contact
any other server listed as failover for that node.

What we are looking to do for improving failover speed is to have the
backup server broadcast to the clients that it has taken over the OST/MDT
when it has started.  Then the clients will be able to do failover to
the new server as soon as it is ready, instead of waiting for the original
requests to time out.
> My biggest concern is that I can''t control the situation in which
> the HBA connectivity with the storage system is damaged, ie: I pull the
> cables from the HBAs on the MGS/MDS and nothing happens, the MDS and MGS
> services keep running, they are still mounted and therefore heartbeat
> does nothing. From the heartbeat "documentation" it does not seem
that
> this can be done, at least easily?. I read something about HBA ping and
> it seems it requires HBAAPI which does not work with Brocade HBAs...
You can use HBA multi-pathing to avoid this problem, if your hardware
supports it.  You can also use /proc/fs/lustre/health_check to check
if the filesystems have encountered errors and are marked "unhealthy".

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Jeffrey Alan Bennett

2009-Jan-16 01:51 UTC

head link

[Lustre-discuss] About MDS failover

Thanks Andreas,

I understand that this is a common issue with failover, as mentioned in the
Lustre documentation.
> 
> You can use HBA multi-pathing to avoid this problem, if your 
> hardware supports it.  You can also use 
> /proc/fs/lustre/health_check to check if the filesystems have 
> encountered errors and are marked "unhealthy".
> 
We use multipath in all our configurations. However, will Lustre be able to
detect if the connectivity to the storage has been totally lost ( ie. no
available path ) and display accordingly on /proc/fs/lustre/health_check?

Thanks,

Jeff

Andreas Dilger

2009-Jan-18 00:46 UTC

head link

[Lustre-discuss] About MDS failover

On Jan 15, 2009  17:51 -0800, Jeffrey Alan Bennett
wrote:> > You can use HBA multi-pathing to avoid this problem, if your 
> > hardware supports it.  You can also use 
> > /proc/fs/lustre/health_check to check if the filesystems have 
> > encountered errors and are marked "unhealthy".
> 
> We use multipath in all our configurations. However, will Lustre
> be able to detect if the connectivity to the storage has been
> totally lost ( ie. no available path ) and display accordingly on
> /proc/fs/lustre/health_check?
Yes, but it can currently only do this "reactively" instead of
"proactively".  If you are using MMP then it should detect the
IO error and mark the filesystem read-only within a second or
so (depending on how quickly the SCSI layer returns the error vs.
retrying), which will in turn cause health_check to return
"unhealthy".

However, if there is other filesystem IO going on that will also
generate an IO error that will be returned to the client.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Jan 2009 - About MDS failover

[Lustre-discuss] About MDS failover

[Lustre-discuss] About MDS failover

[Lustre-discuss] About MDS failover

[Lustre-discuss] About MDS failover

[Lustre-discuss] About MDS failover

[Lustre-discuss] About MDS failover

[Lustre-discuss] About MDS failover