Tim Richards
2017-Feb-05 23:25 UTC
[Nut-upsuser] NUT configuration complicated by Stonith/Fencing cabling
Hello List, Any suggestions to solve the following would be most appreciated. Setup: Active/Passive Two Node Cluster. Two UPSes (APC Smart-UPS 1500 C) with USB communication cables cross connected (ie UPS-webserver1 monitored by webserver2, and vice versa) to allow for stonith/fencing OS OpenSuse Leap 42.2 NUT version 2.7.1-2.41-x86_64 Fencing agent: external/nut Problem: When power fails to a single UPS, both nodes are shutdown. The node with the still powered UPS comes back up, but requires manual intervention to keep it providing services. I would like only the node with the "On Battery" UPS to shutdown. The resupply of services problem seems to be that NUT on the node that comes back up will not restart until the other node restarts. Stonith and my upssched-cmd script both use upscmd -u ups-webserver2-master -p mypassword ups-webserver2 at webserver1 shutdown.reboot or upscmd -u ups-webserver1-master -p mypassword ups-webserver1 at webserver2 shutdown.reboot as appropriate. When the cluster software (Pacemaker/Corosync) use the one of above command as part of a fencing operation, only the target node is shutdown, and its UPS's outlets power-cycled. When NUT via my upssched-cmd script issues one of the above commands both nodes shutdown and both of their UPS's outlets power-cycle. This problem should be very rare, but it would be better to cover it rather than not. Power failure and resupply to both UPSes (the most common problem for me) works well. I use upssched to set the same timers after power failure on each system. The receive simultaneous shutdown commands, which they obey. When power returns they both come back up. Stonith/Fencing via the stonith resource agent external/nut resource agent works. Thanks, Tim. My config files ups.conf On webserver1 [ups-webserver2] driver = usbhid-ups port = auto desc = "APC Smart-UPS C 1000/1500va" vendorid = 051d On webserver2 [ups-webserver1] driver = usbhid-ups port = auto desc = "APC Smart-UPS C 1000/1500va" vendorid = 051d nut.conf MODE=netserver upsd.conf Webserver1 LISTEN 127.0.0.1 3493 LISTEN ::1 3493 LISTEN 192.168.1.21 3493 Webserver2 LISTEN 127.0.0.1 3493 LISTEN ::1 3493 LISTEN 192.168.1.22 3493 upsd.users defines users (special settings required for stonith to work) On webserver1 [ups-webserver2-slave] password = mypassword actions = SET instcmds = ALL upsmon slave [ups-webserver2-master] password = mypassword actions = SET actions = FSD instcmds = ALL upsmon master On webserver2 [ups-webserver1-slave] password = mypassword actions = SET instcmds = ALL upsmon slave [ups-webserver1-master] password = mypassword actions = SET actions = FSD instcmds = ALL upsmon master upsmon.conf Webserver1 MONITOR ups-webserver1 at webserver2 1 ups-webserver1-master mypassword master MONITOR ups-webserver2 at localhost 0 ups-webserver2-slave mypassword slave Webserver2 MONITOR ups-webserver2 at webserver1 1 ups-webserver2-master mypassword master MONITOR ups-webserver1 at localhost 0 ups-webserver1-slave mypassword slave It needs the following upsmon.conf NOTIFYCMD /usr/sbin/upssched NOTIFYFLAG ONLINE SYSLOG+WALL+ NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC Configure 'upssched' by editing upssched.conf upssched.conf webserver1 CMDSCRIPT /bin/upssched-cmd PIPEFN /var/lib/ups/upssched/upssched.pipe LOCKFN /var/lib/ups/upssched/upssched.lock AT ONBATT ups-webserver2 at localhost START-TIMER onbatt-ups-webserver2 600 AT ONLINE ups-webserver2 at localhost CANCEL-TIMER onbatt-ups-webserver2 webserver2 CMDSCRIPT /bin/upssched-cmd . PIPEFN /var/lib/ups/upssched/upssched.pipe LOCKFN /var/lib/ups/upssched/upssched.lock AT ONBATT ups-webserver1 at localhost START-TIMER onbatt-ups-webserver1 600 AT ONLINE ups-webserver1 at localhost CANCEL-TIMER onbatt-ups-webserver1 Edit /bin/upssched-cmd /bin/upssched-cmd webserver1 case $1 in onbatt-ups-webserver1) logger -t upssched-cmd "UPS-Webserver1 has gone on battery." ;; onbatt-ups-webserver2) logger -t upssched-cmd "UPS-Webserver2 has gone on battery." /usr/bin/upscmd -u ups-webserver2-master -p mypassword ups-webserver2 at webserver1 shutdown.reboot ;; *) logger -t upssched-cmd "Unrecognized command: $1" ;; esac Webserver2 case $1 in onbatt-ups-webserver1) logger -t upssched-cmd "UPS-Webserver1 has been gone on battery." /usr/bin/upscmd -u ups-webserver1-master -p mypassword ups-webserver1 at webserver2 shutdown.reboot ;; onbatt-ups-webserver2) logger -t upssched-cmd "UPS-Webserver2 has gone on battery." ;; *) logger -t upssched-cmd "Unrecognized command: $1" ;; esac -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.alioth.debian.org/pipermail/nut-upsuser/attachments/20170205/553b9c93/attachment-0001.html>
Roger Price
2017-Feb-10 12:16 UTC
[Nut-upsuser] NUT configuration complicated by Stonith/Fencing cabling
On Sun, 5 Feb 2017, Tim Richards wrote:> Setup: Active/Passive Two Node Cluster. Two UPSes (APC Smart-UPS 1500 C) > with USB communication cables cross connected (ie UPS-webserver1 > monitored by webserver2, and vice versa) to allow for stonith/fencing > > OS OpenSuse Leap 42.2 > NUT version 2.7.1-2.41-x86_64 > Fencing agent: external/nut > > Problem: When power fails to a single UPS, both nodes are shutdown. The > node with the still powered UPS comes back up, but requires manual > intervention to keep it providing services. I would like only the node > with the ?On Battery? UPS to shutdown.I think your title hints at the solution. What is the advantage of the cross-connection of the UPS units? Wouldn't it be simpler to have each node connected to the UPS which supplies the power? This is easier to set up, extends easily to n servers, and is independent of the stonith/fencing which I assume you use for other purposes.> The resupply of services problem seems to be that NUT on the node that > comes back up will not restart until the other node restarts.With a simpler setup, this problem should go away. Roger
Tim Richards
2017-Feb-10 22:48 UTC
[Nut-upsuser] NUT configuration complicated by Stonith/Fencing cabling
Roger, Thanks for your reply. As I understand it, for reliable fencing a node cannot be responsible for fencing itself, as it may not be functioning properly. Hence my "cross over" setup. The direct USB connection from Webserver1 to UPS-Webserver2 means that Webserver1 can fence (cut the power to) Webserver2 if the cluster software decides that it is necessary. If my UPSes were able to connect to the network themselves, this would work, but they only have USB or serial inputs for control. I am trying to kill two birds with one stone, that is UPS protection from power failure and cluster node fencing (Stonith) with the UPS ability to cut power to a node. Somebody has done this, as there exists a fencing agent using NUT in the Pacemaker/Corosync (Linux-HA cluster software), I just don't know the best way to go about it. Regards, Tim. -----Original Message----- From: Nut-upsuser [mailto:nut-upsuser-bounces+tims_tank=hotmail.com at lists.alioth.debian.org] On Behalf Of Roger Price Sent: Friday, 10 February 2017 11:17 PM To: nut-upsuser Mailing List Subject: Re: [Nut-upsuser] NUT configuration complicated by Stonith/Fencing cabling On Sun, 5 Feb 2017, Tim Richards wrote:> Setup: Active/Passive Two Node Cluster. Two UPSes (APC Smart-UPS 1500 > C) with USB communication cables cross connected (ie UPS-webserver1 > monitored by webserver2, and vice versa) to allow for stonith/fencing > > OS OpenSuse Leap 42.2 > NUT version 2.7.1-2.41-x86_64 > Fencing agent: external/nut > > Problem: When power fails to a single UPS, both nodes are shutdown. > The node with the still powered UPS comes back up, but requires manual > intervention to keep it providing services. I would like only the node > with the ?On Battery? UPS to shutdown.I think your title hints at the solution. What is the advantage of the cross-connection of the UPS units? Wouldn't it be simpler to have each node connected to the UPS which supplies the power? This is easier to set up, extends easily to n servers, and is independent of the stonith/fencing which I assume you use for other purposes.> The resupply of services problem seems to be that NUT on the node that > comes back up will not restart until the other node restarts.With a simpler setup, this problem should go away. Roger
Apparently Analagous Threads
- NUT configuration complicated by Stonith/Fencing cabling
- NUT configuration complicated by Stonith/Fencing cabling
- NUT configuration complicated by Stonith/Fencing cabling
- Several clusters in the same matchine
- rsync permission denied , without changing apache user and group setting