Daire.Byrne at framestore.com
2009-Jun-23 11:52 UTC
[Lustre-discuss] Redhat cluster failover
Hi, I know that heartbeat is the preferred failover application for Lustre but I want to evaluate Redhat''s cluster suite again. It used to be pretty ropey in the RHEL4 days but I''m led to believe it is much improved in RHEL5. I was wondering if anyone is currently using this with Lustre and if so could you share your init.d script to help get me started? Any other advice or thoughts gratefully accepted. Regards, Daire
On Tue, 2009-06-23 at 12:52 +0100, Daire.Byrne at framestore.com wrote:> Hi, > > I know that heartbeat is the preferred failover application for Lustre but I want to evaluate Redhat''s cluster suite again. It used to be pretty ropey in the RHEL4 days but I''m led to believe it is much improved in RHEL5. I was wondering if anyone is currently using this with Lustre and if so could you share your init.d script to help get me started? Any other advice or thoughts gratefully accepted. > > Regards, > > DaireHi! I''m using RHCS on RHEL 5.3 in a test environment (VMware virtual machines, nothing special) to failover an MGS, an MDT and four OST''s over 2 VM. It works pretty well, I only needed to modify the original fs.sh resource agent script and disable almost every check - the only surviving check, by now, is "it''s mounted/it''s not mounted". I would like to rewrite the RA script to make it work better (with some effective check to see if a target is really working as it should) but I hadn''t time yet. I attach the RA script. It''s ugly, and maybe some comment is completely nonsense or out-of-place. And perhaps my English gets often funny (let''s say funny). I''m using LVM-HA to ensure no device gets mounted twice, but it should be an unbearable overhead in a true production environment (I think). Maye the lustre MMP is enough. Bye! Giacomo> _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- Giacomo Montagner Senior System Engineer & RHCE > SORINT.LAB S.R.L. (http://www.sorintlab.com/) ______________________________ Mobile: +39.335.1294989 e-mail: giacomo.montagner at SORINT.it === Please consider the environment before printing this email == PERSONALE E CONFIDENZIALE. Questa mail potrebbe includere materiale confidenziale, proprietario o altrimenti privato per l''uso esclusivo del destinatario. Se l''avete ricevuto per errore, siete pregati di contattare chi ha inviato il messaggio e di cancellarne tutte le copie. Ogni altro uso da parte vostra del messaggio e'' proibito. PERSONAL AND CONFIDENTIAL. This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete all the copies. Any other use of the email by you is prohibited. -------------- next part -------------- A non-text attachment was scrubbed... Name: lustrefs.sh Type: application/x-shellscript Size: 27638 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090623/7799de37/attachment-0001.bin
Daire.Byrne at framestore.com
2009-Jun-24 10:45 UTC
[Lustre-discuss] Redhat cluster failover
Giacomo, I had not considered using RHCS''s mount filesystem plugin "fs.sh". I was thinking of just using the "script" plugin with mount/umount commands in it. As far as I can tell the main advantage of this is that it is trivial to add checks to the "status" return to notify RHCS when an OST has had a failure (e.g. /proc/fs/lustre/health_check). I have included a quick proof of concept (untested). My idea is to create symlinks to this script named after the OST devices (e.g. delta-OST0000 -> lustre.init) and then add them as script services in RHCS. Are there more rigorous checks that people do to check the health of a lustre mount other than just checking /proc/fs/lustre/health_check ? Daire ----- "Giacomo Montagner" <gmontagner at sorint.it> wrote:> On Tue, 2009-06-23 at 12:52 +0100, Daire.Byrne at framestore.com wrote: > > Hi, > > > > I know that heartbeat is the preferred failover application for > Lustre but I want to evaluate Redhat''s cluster suite again. It used to > be pretty ropey in the RHEL4 days but I''m led to believe it is much > improved in RHEL5. I was wondering if anyone is currently using this > with Lustre and if so could you share your init.d script to help get > me started? Any other advice or thoughts gratefully accepted. > > > > Regards, > > > > Daire > > Hi! > I''m using RHCS on RHEL 5.3 in a test environment (VMware virtual > machines, nothing special) to failover an MGS, an MDT and four OST''s > over 2 VM. It works pretty well, I only needed to modify the original > > fs.sh resource agent script and disable almost every check - the only > > surviving check, by now, is "it''s mounted/it''s not mounted". I would > like to rewrite the RA script to make it work better (with some > effective check to see if a target is really working as it should) but > I > hadn''t time yet. I attach the RA script. It''s ugly, and maybe some > comment is completely nonsense or out-of-place. And perhaps my English > > gets often funny (let''s say funny). > I''m using LVM-HA to ensure no device gets mounted twice, but it should > > be an unbearable overhead in a true production environment (I think). > > Maye the lustre MMP is enough. > > Bye! > Giacomo > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > -- > Giacomo Montagner > Senior System Engineer & RHCE > SORINT.LAB S.R.L. > (http://www.sorintlab.com/) > ______________________________ > Mobile: +39.335.1294989 > e-mail: giacomo.montagner at SORINT.it > > === Please consider the environment before printing this email ==> > PERSONALE E CONFIDENZIALE. > Questa mail potrebbe includere materiale confidenziale, proprietario > o > altrimenti privato per l''uso esclusivo del destinatario. > Se l''avete ricevuto per errore, siete pregati di contattare chi ha > inviato il messaggio e di cancellarne tutte le copie. > Ogni altro uso da parte vostra del messaggio e'' proibito. > > PERSONAL AND CONFIDENTIAL. > This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise private information. > If you have received it in error, please notify the sender > immediately > and delete all the copies. > Any other use of the email by you is prohibited.-------------- next part -------------- A non-text attachment was scrubbed... Name: lustre.init Type: application/octet-stream Size: 1504 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090624/90ea3677/attachment.obj
Hi Daire, it seems good, did you try it? You might as well check for some /proc/fs/lustre/obdfilter/<OST> entry, to see if the OST is mounted and working well. Bye, Giacomo On Wed, 2009-06-24 at 11:45 +0100, Daire.Byrne at framestore.com wrote:> Giacomo, > > I had not considered using RHCS''s mount filesystem plugin "fs.sh". I was thinking of just using the "script" plugin with mount/umount commands in it. As far as I can tell the main advantage of this is that it is trivial to add checks to the "status" return to notify RHCS when an OST has had a failure (e.g. /proc/fs/lustre/health_check). I have included a quick proof of concept (untested). > > My idea is to create symlinks to this script named after the OST devices (e.g. delta-OST0000 -> lustre.init) and then add them as script services in RHCS. Are there more rigorous checks that people do to check the health of a lustre mount other than just checking /proc/fs/lustre/health_check ? > > Daire > > ----- "Giacomo Montagner" <gmontagner at sorint.it> wrote: > > > On Tue, 2009-06-23 at 12:52 +0100, Daire.Byrne at framestore.com wrote: > > > Hi, > > > > > > I know that heartbeat is the preferred failover application for > > Lustre but I want to evaluate Redhat''s cluster suite again. It used to > > be pretty ropey in the RHEL4 days but I''m led to believe it is much > > improved in RHEL5. I was wondering if anyone is currently using this > > with Lustre and if so could you share your init.d script to help get > > me started? Any other advice or thoughts gratefully accepted. > > > > > > Regards, > > > > > > Daire > > > > Hi! > > I''m using RHCS on RHEL 5.3 in a test environment (VMware virtual > > machines, nothing special) to failover an MGS, an MDT and four OST''s > > over 2 VM. It works pretty well, I only needed to modify the original > > > > fs.sh resource agent script and disable almost every check - the only > > > > surviving check, by now, is "it''s mounted/it''s not mounted". I would > > like to rewrite the RA script to make it work better (with some > > effective check to see if a target is really working as it should) but > > I > > hadn''t time yet. I attach the RA script. It''s ugly, and maybe some > > comment is completely nonsense or out-of-place. And perhaps my English > > > > gets often funny (let''s say funny). > > I''m using LVM-HA to ensure no device gets mounted twice, but it should > > > > be an unbearable overhead in a true production environment (I think). > > > > Maye the lustre MMP is enough. > > > > Bye! > > Giacomo > > > > > _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-discuss at lists.lustre.org > > > http://lists.lustre.org/mailman/listinfo/lustre-discuss