Hi, What software are people using for MDS failover? I have been using Heartbeat from Linux-HA but I am not absolutely happy with its performance. Is there anything better out there? Thanks, Jeffrey Bennett HPC Data Engineer San Diego Supercomputer Center 858.822.0936 http://users.sdsc.edu/~jab
I like HA-linux, however if you are looking for alternatives have a look at RedHat Cluster Suite http://www.redhat.com/docs/manuals/csgfs/browse/rh-cs-en/ Jeffrey Alan Bennett wrote:> Hi, > > What software are people using for MDS failover? > > I have been using Heartbeat from Linux-HA but I am not absolutely happy with its performance. > > Is there anything better out there? > > Thanks, > > Jeffrey Bennett > HPC Data Engineer > San Diego Supercomputer Center > 858.822.0936 http://users.sdsc.edu/~jab > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Jeffrey Alan Bennett wrote:> Hi, > > What software are people using for MDS failover? > > I have been using Heartbeat from Linux-HA but I am not absolutely happy with its performance. > > Is there anything better out there?Are you using heartbeat V1 or V2? I would like to hear more about the issues you are experiencing. We have had some people use the Red Hat cluster tools. cliffw> > Thanks, > > Jeffrey Bennett > HPC Data Engineer > San Diego Supercomputer Center > 858.822.0936 http://users.sdsc.edu/~jab > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hi Cliff,> > Are you using heartbeat V1 or V2? >I am using heartbeat V2. It works as expected, I just had to tune some time outs, but it still takes around 3 minutes to totally move the MGS/MDS services to the other system. I guess having the MGS and MDS on separate systems would help reduce this time. Also, MMP is affecting somehow to this time, but MMP is necessary for failover. My biggest concern is that I can''t control the situation in which the HBA connectivity with the storage system is damaged, ie: I pull the cables from the HBAs on the MGS/MDS and nothing happens, the MDS and MGS services keep running, they are still mounted and therefore heartbeat does nothing. From the heartbeat "documentation" it does not seem that this can be done, at least easily?. I read something about HBA ping and it seems it requires HBAAPI which does not work with Brocade HBAs... Any help will be greatly appreciated.> I would like to hear more about the issues you are experiencing. > We have had some people use the Red Hat cluster tools. >I will try Red Hat cluster tools. Thanks, Jeff
On Jan 15, 2009 11:38 -0800, Jeffrey Alan Bennett wrote:> I am using heartbeat V2. It works as expected, I just had to tune some > time outs, but it still takes around 3 minutes to totally move the MGS/MDS > services to the other system.This is largely an issue of the Lustre failover itself, and not the HA software. The problem today is that under heavy load the clients may have to wait a long time for any requests sent to the server to complete (100s of seconds in some cases), so it is difficult for the clients to distinguish between server death (unlikely) and heavy server load (common). In the case where a server dies and fails over, the clients have to wait for their requests to time out, then they resend and wait again (in the common case the server is just overloaded), then finally they try to contact any other server listed as failover for that node. What we are looking to do for improving failover speed is to have the backup server broadcast to the clients that it has taken over the OST/MDT when it has started. Then the clients will be able to do failover to the new server as soon as it is ready, instead of waiting for the original requests to time out.> My biggest concern is that I can''t control the situation in which > the HBA connectivity with the storage system is damaged, ie: I pull the > cables from the HBAs on the MGS/MDS and nothing happens, the MDS and MGS > services keep running, they are still mounted and therefore heartbeat > does nothing. From the heartbeat "documentation" it does not seem that > this can be done, at least easily?. I read something about HBA ping and > it seems it requires HBAAPI which does not work with Brocade HBAs...You can use HBA multi-pathing to avoid this problem, if your hardware supports it. You can also use /proc/fs/lustre/health_check to check if the filesystems have encountered errors and are marked "unhealthy". Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Thanks Andreas, I understand that this is a common issue with failover, as mentioned in the Lustre documentation.> > You can use HBA multi-pathing to avoid this problem, if your > hardware supports it. You can also use > /proc/fs/lustre/health_check to check if the filesystems have > encountered errors and are marked "unhealthy". >We use multipath in all our configurations. However, will Lustre be able to detect if the connectivity to the storage has been totally lost ( ie. no available path ) and display accordingly on /proc/fs/lustre/health_check? Thanks, Jeff
On Jan 15, 2009 17:51 -0800, Jeffrey Alan Bennett wrote:> > You can use HBA multi-pathing to avoid this problem, if your > > hardware supports it. You can also use > > /proc/fs/lustre/health_check to check if the filesystems have > > encountered errors and are marked "unhealthy". > > We use multipath in all our configurations. However, will Lustre > be able to detect if the connectivity to the storage has been > totally lost ( ie. no available path ) and display accordingly on > /proc/fs/lustre/health_check?Yes, but it can currently only do this "reactively" instead of "proactively". If you are using MMP then it should detect the IO error and mark the filesystem read-only within a second or so (depending on how quickly the SCSI layer returns the error vs. retrying), which will in turn cause health_check to return "unhealthy". However, if there is other filesystem IO going on that will also generate an IO error that will be returned to the client. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.