Daniel Kulinski
2009-Jul-13 20:50 UTC
[Lustre-discuss] failover software - heartbeat (Lundgren, Andrew)
Andrew, I was able to get the ipfail to work on my heartbeat 2.1.3 installation. Make sure the following line is uncommented in /etc/ha.d/ha.cf: respawn hacluster /usr/lib64/heartbeat/ipfail And corresponding with that you must have a ping line with each host separated by a space. We have tested this and it works perfectly. We have 3 ethernet networks to each OSS and MDS pair. I have no idea on what pingd is or how it relates to heartbeat. Dan Kulinski> >Were you able to get monitoring working to detect network failures?(pingd?)> >I have it configured, but haven''t been able to get it to trigger a failoverwhen an MDS cannot ping the network. (I tried with 1.0 and 2.0 conf files, I am currently >using 2.0) I have a ticket open with the pacemaker project (no ticket system for the HA stuff...)>but not resolution. I am considering writing a script to down the nodewhen the ping fails, but don''t like the idea.> >I would also like to get the hpingd functioning to detect a fiber failure,but there was less available on that solution.> >-- >Andrew