Tim Richards
2017-Feb-10 22:48 UTC
[Nut-upsuser] NUT configuration complicated by Stonith/Fencing cabling
Roger, Thanks for your reply. As I understand it, for reliable fencing a node cannot be responsible for fencing itself, as it may not be functioning properly. Hence my "cross over" setup. The direct USB connection from Webserver1 to UPS-Webserver2 means that Webserver1 can fence (cut the power to) Webserver2 if the cluster software decides that it is necessary. If my UPSes were able to connect to the network themselves, this would work, but they only have USB or serial inputs for control. I am trying to kill two birds with one stone, that is UPS protection from power failure and cluster node fencing (Stonith) with the UPS ability to cut power to a node. Somebody has done this, as there exists a fencing agent using NUT in the Pacemaker/Corosync (Linux-HA cluster software), I just don't know the best way to go about it. Regards, Tim. -----Original Message----- From: Nut-upsuser [mailto:nut-upsuser-bounces+tims_tank=hotmail.com at lists.alioth.debian.org] On Behalf Of Roger Price Sent: Friday, 10 February 2017 11:17 PM To: nut-upsuser Mailing List Subject: Re: [Nut-upsuser] NUT configuration complicated by Stonith/Fencing cabling On Sun, 5 Feb 2017, Tim Richards wrote:> Setup: Active/Passive Two Node Cluster. Two UPSes (APC Smart-UPS 1500 > C) with USB communication cables cross connected (ie UPS-webserver1 > monitored by webserver2, and vice versa) to allow for stonith/fencing > > OS OpenSuse Leap 42.2 > NUT version 2.7.1-2.41-x86_64 > Fencing agent: external/nut > > Problem: When power fails to a single UPS, both nodes are shutdown. > The node with the still powered UPS comes back up, but requires manual > intervention to keep it providing services. I would like only the node > with the ?On Battery? UPS to shutdown.I think your title hints at the solution. What is the advantage of the cross-connection of the UPS units? Wouldn't it be simpler to have each node connected to the UPS which supplies the power? This is easier to set up, extends easily to n servers, and is independent of the stonith/fencing which I assume you use for other purposes.> The resupply of services problem seems to be that NUT on the node that > comes back up will not restart until the other node restarts.With a simpler setup, this problem should go away. Roger
Charles Lepple
2017-Feb-12 00:56 UTC
[Nut-upsuser] NUT configuration complicated by Stonith/Fencing cabling
On Feb 10, 2017, at 5:48 PM, Tim Richards <tims_tank at hotmail.com> wrote:> > I am trying to kill two birds with one stone, that is UPS protection from power failure and cluster node fencing (Stonith) with the UPS ability to cut power to a node. Somebody has done this, as there exists a fencing agent using NUT in the Pacemaker/Corosync (Linux-HA cluster software), I just don't know the best way to go about it.Some UPS models have more than one serial port, or have a network adapter which can support multiple monitoring systems (via SNMP or HTTP/XML). Is it possible that the NUT fencing agent was written with that case in mind? That would mean that neither node would depend on the other for UPS status. Can you elaborate on the "resupply of services problem"? With cross-connected UPSes (and only a single comm port per UPS), I am not sure if you can achieve both goals when only one UPS loses power. (I don't think this sort of setup has been discussed much on the NUT lists, although it certainly sounds like an interesting way to use NUT. If you do find out more about how the NUT fencing agent was intended to be configured, perhaps from the fencing software lists or forums, feel free to post that here was well.) -- - Charles Lepple https://ghz.cc/charles/
Tim Richards
2017-Feb-13 13:08 UTC
[Nut-upsuser] NUT configuration complicated by Stonith/Fencing cabling
Charles, Thanks for your reply. Indeed you may be right that the NUT fencing agent might be written with networked UPSes in mind, as healthy nodes could use the network to issue "fence" orders to remove unhealthy ones. I will post here if I find more info. The problem with the resupply of services is that NUT doesn't restart on the node that comes back up. To recap, I pull the power on one UPS, both nodes shutdown. The remaining mains connected UPS power cycles its outlets, which reboots its node. Because the node has just started, it wants all of its services to be healthy before providing them. This includes the fencing agent, which relies on NUT, which hasn't started. So the node doesn't start the rest of its services (Apache, MySQL, Samba). Relevant log entries. Feb 13 23:11:42 xinetd[1647] Reading included configuration file: /etc/xinetd.d/cups-lpd [file/etclxinetd.d/cups-lpd] [linel 7] Feb 13 23:11:42 systemd[1] Starting LSB: UPS monitoring software (deprecated, remote/local)... Feb 13 23:11:43 usbhid-ups[2093] Startup successful Feb 13 23:11:43 upsd[1 932] Starting NUT UPS drivers ..done Feb 13 23:11:43 upsd[21 04] not listening on 192.168.1.22 port 3.493 Feb 13 23:11:43 upsd[21 04] listening on ::1 port 3493 Feb 13 23:11:43 upsd[2104] listening on 127.0.0.1 port 3493 Feb 1323:11:43 upsd[21041 no listening interface available Feb 13 23:11:43 startproc[2095] startproc: exit status of parent of /usr/sbin/upsd: 1 Feb 13 23:11:43 usbhid-ups[20931 Signal 15: exiting Feb 1323:11:43 upsd[1932] Starting NUTUPSserver..failed Feb 13 23:11:43 systemd[1] upsd.service: Control process exited, codeexited status7 Feb 13 23:11:43 systemd[1] Failed to start LSB: UPS monitoring software (deprecated, remote/local). Feb 13 23:11:43 systemd[1] upsd.service: Unit entered failed state. Feb 13 23:11:43 systemd[1] upsd.service: Failed with result 'exit-code'. I can manually bring the surviving node's services back up if by removing the requirement that Stonith services are enabled. I cannot get NUT to restart until I restart the 2nd node. Regards, Tim. -----Original Message----- From: Charles Lepple [mailto:clepple at gmail.com] Sent: Sunday, 12 February 2017 11:57 AM To: Tim Richards Cc: nut-upsuser Mailing List Subject: Re: [Nut-upsuser] NUT configuration complicated by Stonith/Fencing cabling On Feb 10, 2017, at 5:48 PM, Tim Richards <tims_tank at hotmail.com> wrote:> > I am trying to kill two birds with one stone, that is UPS protection from power failure and cluster node fencing (Stonith) with the UPS ability to cut power to a node. Somebody has done this, as there exists a fencing agent using NUT in the Pacemaker/Corosync (Linux-HA cluster software), I just don't know the best way to go about it.Some UPS models have more than one serial port, or have a network adapter which can support multiple monitoring systems (via SNMP or HTTP/XML). Is it possible that the NUT fencing agent was written with that case in mind? That would mean that neither node would depend on the other for UPS status. Can you elaborate on the "resupply of services problem"? With cross-connected UPSes (and only a single comm port per UPS), I am not sure if you can achieve both goals when only one UPS loses power. (I don't think this sort of setup has been discussed much on the NUT lists, although it certainly sounds like an interesting way to use NUT. If you do find out more about how the NUT fencing agent was intended to be configured, perhaps from the fencing software lists or forums, feel free to post that here was well.) -- - Charles Lepple https://ghz.cc/charles/
Seemingly Similar Threads
- NUT configuration complicated by Stonith/Fencing cabling
- NUT configuration complicated by Stonith/Fencing cabling
- NUT configuration complicated by Stonith/Fencing cabling
- NUT configuration complicated by Stonith/Fencing cabling
- NUT configuration complicated by Stonith/Fencing cabling