Hello all, I was asked about a proper way to ensure powerfail shutdown of a blade server fed off an APC SmartUPS, which is monitored by all hosts over SNMP. In particular, they'd like to avoid the "untimely" return of utility power - when the OSes are already shutting down, and would stay down because the UPS is well-fed now and thus won't power off and later power on to boot the servers. On smaller systems directly connected to an UPS with a ser/usb cable, I remember sending an ups-poweroff signal in the end of the shutdown procedure, which effectively rebooted the UPS and its load (if the wall power is already back). This required picking an ups model that did not stay down in such case, of course. Now, with the blade situation - each server might need unpredictably long to shut down. It would be inappropriate for the first one to yank power off those which are still going down. Also, according to the manpage, the snmp-ups driver might not implement the shutdown commands at all because "it is run at the time when the networking might already be unconfigured" (strange reasoning, not all OSes even do that unconfiguration). So, I wonder if the list members can suggest a way to signal the APC SmartUPS to go down and return if the power comes back or if it is already present at the time of UPS poweroff (thus acting as a reboot)? Also, is it possible to request a delay in such poweroff, i.e. each host going down would ask the UPS to poweroff in, say, 5 minutes (so that other hosts have ample time to shut down properly) with each such signal "restarting" the timer, or perhaps to explicitly ask the UPS to cancel pending shutdown and then ask it to do it again? Can it be done with SNMP or rather some telnet/expect scripting? There are no serial ports to connect the UPS directly to one of the blade servers... As an alternative, I know it is possible to hack shutdown scripts of some OSes so that in case of an UPS-initiated shutdown the system would not "halt" but rather sleep for a while (10-20 minutes) and reboot, so that if the utility power is back - the servers just restart without the UPS ever going down. At least, this is what was suggested in the docs a decade ago and what I did on some systems. One downside is that if there is no wall-power, the sleeping loop just depletes the UPS battery which may be not a good thing (i.e. if the UPS is configured to come back up after it has accumulated some threshold value of battery charge). Are there any new ideas in this direction? :) Thanks for any suggestions, ideas, code/config snippets, etc. :) //Jim Klimov
Hi Jim, On Tue, 18 Feb 2014, Jim Klimov wrote:> Hello all, > > I was asked about a proper way to ensure powerfail shutdown of a blade > server fed off an APC SmartUPS, which is monitored by all hosts over > SNMP. In particular, they'd like to avoid the "untimely" return of > utility power - when the OSes are already shutting down, and would > stay down because the UPS is well-fed now and thus won't power off > and later power on to boot the servers. > > On smaller systems directly connected to an UPS with a ser/usb cable, > I remember sending an ups-poweroff signal in the end of the shutdown > procedure, which effectively rebooted the UPS and its load (if the > wall power is already back). This required picking an ups model that > did not stay down in such case, of course. > > Now, with the blade situation - each server might need unpredictably > long to shut down. It would be inappropriate for the first one to yank > power off those which are still going down. > > Also, according to the manpage, the snmp-ups driver might not implement > the shutdown commands at all because "it is run at the time when the > networking might already be unconfigured" (strange reasoning, not all > OSes even do that unconfiguration). > > So, I wonder if the list members can suggest a way to signal the > APC SmartUPS to go down and return if the power comes back or if > it is already present at the time of UPS poweroff (thus acting > as a reboot)? > > Also, is it possible to request a delay in such poweroff, i.e. each > host going down would ask the UPS to poweroff in, say, 5 minutes > (so that other hosts have ample time to shut down properly) with > each such signal "restarting" the timer, or perhaps to explicitly > ask the UPS to cancel pending shutdown and then ask it to do it > again? > > Can it be done with SNMP or rather some telnet/expect scripting? > There are no serial ports to connect the UPS directly to one of > the blade servers...Seems like a similar user case to ESXi. If you've got a AP9630 card these notes may be useful. # APC ap9630 notes: # Use firmware 6.0.6 or later if you want SNMPv3 # Configuration->Shutdown # In "Start of Shutdown" section, set "Low Battery Duration" and # "Shutdown Delay" to get enough grace time for the ESXI shutdown # Optionally, in "End of Shutdown" section set "Return Delay" to 60 seconds. # Now here are the OIDs (pick one) to actually turn the UPS off after the shutdown delay. # reboot gracefully (APC) (stays off until utility power is restored) UPS_OID=".1.3.6.1.4.1.318.1.1.1.6.2.2.0 integer 3" # turn off UP gracefully (APC) (stays off even if power is restored) #UPS_OID=".1.3.6.1.4.1.318.1.1.1.6.2.1.0 integer 3" # -- Tim Rice Multitalents (707) 456-1146 tim at multitalents.net
On 2014-02-19 03:04, Tim Rice wrote:> If you've got a AP9630 card these notes may be useful. > > # APC ap9630 notes: > # Use firmware 6.0.6 or later if you want SNMPv3 > # Configuration->Shutdown > # In "Start of Shutdown" section, set "Low Battery Duration" and > # "Shutdown Delay" to get enough grace time for the ESXI shutdown > # Optionally, in "End of Shutdown" section set "Return Delay" to 60 seconds. > #Thanks a lot for the suggestion, I'll give it a spin :)> Now here are the OIDs (pick one) to actually turn the UPS off > after the shutdown delay. > > # reboot gracefully (APC) (stays off until utility power is restored) > UPS_OID=".1.3.6.1.4.1.318.1.1.1.6.2.2.0 integer 3" > # turn off UP gracefully (APC) (stays off even if power is restored) > #UPS_OID=".1.3.6.1.4.1.318.1.1.1.6.2.1.0 integer 3"These seem like config file entries... where do they go? I don't see UPS_OID in /etc/nut/* (for that 2.6.3 installation) Also, just to make sure, the "reboot gracefully" part does indeed reboot the UPS and its load "instantly" (after the timeout) if the wall power is already back by the time the UPS goes down due to this command, right? Thanks, //Jim Klimov