On Apr 30, 2012, at 9:11 PM, Moorcroft, Mark (ARC-TSM)[ERC, Inc.] wrote:
>
> Hello list,
>
> I am a pretty green Linux admin managing 3 ROCKS CentOS clusters total
> about 2,000 cores. My predecessor set up NUT for me on the primary head
> node. We have about 42 Tripplite UPS units in the room. The way our power
> systems work I intend to use the NUT connected head node to shut down the
> entire server room any time power is lost for over 30 seconds or so. It is
> about 95% likely that any time the head node loses power the whole room
> has lost power. And for the most part we have no more than 2 to 3 minutes
> of battery power anyway so even a 30 second wait will be pushing things.
> The head node will be the only thing with a data connection to a UPS. The
> rest of the shutdown process must happen over the network.
>
> Can anyone point me to any resources to help me set up such a scenario? Is
> there any reason to run NUT anyplace besides the head node?
I'm guessing that things look roughly like this for each UPS and the nodes
it powers:
http://www.networkupstools.org/docs/user-manual.chunked/ar01s06.html#DataRoom
(scroll down to diagram)
However, if I understand correctly, there is only one monitoring connection to
the UPS on the head node, so the other nodes would rely on a shutdown signal
from the head node.
Also, you mentioned that you would like a time-based shutdown, rather than using
the "on battery + low battery" signal from the UPS. If the UPS units
are sophisticated enough, you might want to consider the OB+LB signal, since the
UPS self test usually re-calibrates the runtime estimate, and many
"smart" UPSes have ways to send the low battery shutdown signal when
the runtime estimate falls below a given threshold. You can then simulate the
"power off after 30 seconds" scenario by setting the runtime low
battery threshold to the current runtime estimate minus 30 seconds. At some
point when the battery is old, the UPS may trigger a shutdown in less than 30
seconds, but it's safer than assuming 30 seconds will always be sufficient.
The UPS-based runtime estimate should also take into account any fluctuations in
the power draw due to system load, or adding more hardware down the road.
What model of UPS are you using? Also, on the head node, what do you get from
"upsc <name-of-ups>@localhost"?
I'll cover the OB+LB method first, since the time-based shutdown relies on
the same configuration. The node(s) which communicate directly with an UPS are
considered NUT master instances, and they run a driver (tripplite*, usbhid-ups,
or possibly snmp-ups), upsd, and upsmon. If it is important that the UPS master
(in your current configuration, the head node) shuts down cleanly, I would
recommend minimizing the number of slave systems that it depends on. This is
important to minimize the impact as the batteries age in the UPS units connected
to the non-head nodes. If the power goes out, and several nodes fall over before
the head node signals a shutdown, there will be an extra delay as the head node
tries to contact them. It will give up eventually, but I'd recommend having
one NUT master per UPS. It's a little extra cabling, but I think it's
worth it.
The NUT slave systems run upsmon to talk to their respective NUT master. This is
where you configure the actual shutdown command.
http://www.networkupstools.org/docs/man/upsmon.html
If your UPS model does not have an adjustment for the runtime, you will want to
configure upsmon to call upssched:
http://www.networkupstools.org/docs/man/upssched.html
I haven't done much with upssched (especially not in a data center setting)
but I think it requires a bit of tweaking to prevent race conditions if the
power flickers before going out completely.
> What are the
> best practices for a scripted/staged shutdown. I have my concerns that we
> don?t have the battery capacity for a wait time and orderly shutdown of
> the nodes and the RAID units. Our reseller GROSSLY underestimated the load
> requirements of the equipment they sold us.
Definitely check your shutdown scripts to see if there is any special handling
for RAID. Often, the normal "shutdown -h"/"poweroff" style
shutdown will do the trick, but might require a different BIOS setting to allow
the machine to turn on once the power returns, and the UPS has charged enough to
provide power again.
Some of the older documentation included with NUT suggests that you want to
quiesce the system and let the UPS cut power, but this is riskier with RAID (or
even large disk caches on non-RAID systems). It doesn't sound like turning
back on immediately is as important as shutting down cleanly, so a normal
poweroff should work - but it's good to review how the shutdown script works
(and I am not familiar enough with RHEL/CentOS to offer much advice).
--
Charles Lepple
clepple at gmail