>> Over the last couple months, we have had a couple incidents (luckily
all
>> on weekends) where all 15 UPSs will mysteriously shut off, and we had
to
>> manually power them back up. To top it off, there was nothing in syslog
>> about any UPS going onto battery power and staying there - just a
couple
>> curious low-battery warnings from one of the UPSs.
You may not be aware of it, but this is where the root of the problem
lies. Normally, when a UPS is told to shutdown, two things can happen:
1) The input power is gone and the UPS powers off and switches back on
when the power returns.
2) Input power is still available and the UPS cycles the power so that the
systems that receive power from it, can restart.
Apparently, #2 is not happening for you and from looking at the driver, I
can understand why. The shutdown sequence in this driver is not doing what
it is supposed to do. This needs fixing, but since I don't have an APC
UPS, I can't do that for you. We'll have to track down the developer
that
wrote this driver, to correct this.
>> This weekend, we tested the UPSs and found that - just our luck - the
>> one in particular that fed the Linux system would immediately report
low
>> battery if you put it in test mode (which it does every two weeks on
its
>> own).
There is definitly something broken here too. Under no circumstance should
a driver indicate both 'on battery' and 'low battery' if the
power is not
actually out. Even if the test would indicate a 'low battery', the
presence of the AC mains should prevent the NUT server from initiating a
shutdown. This is a bug in the subdriver though, not in NUT. Interesting
question is why the UPS under test is indicating 'low battery' as soon
as
you start a test. This could be justified (but in that case, I would
expect a 'replace battery' warning as well) or might be happening
because
the subdriver is misinterpreting a value read from the UPS. Running the
driver in debug mode (to see what values it reads from the UPS) could help
fixing this.
>> Nut would then immediately shut down the Linux system, and once
>> that was done, it would proceed to force a shutdown of all the other
>> UPSs.
This is by design. If the power is really out, the systems that are
connected to the NUT server *must* be shutdown, if the NUT server must go
down. Otherwise, they would not be able to be shutdown if the UPS that
feeds them is low on battery.
>> The smoking gun is in upsmon's forceshutdown():
>>
>> /* set FSD on any "master" UPS entries (forced
shutdown in
progress) */>> for (ups =3D firstups; ups !=3D NULL; ups =3D ups->next)
>> if (flag_isset(ups->status, ST_MASTER)) {
>> isamaster =3D 1;
>> setfsd(ups);
>> }
>>
>> This code does not attempt to determine whether the UPS in question
>> needs to be shut down or not. Shutting down a UPS that is online with a
>> full charge is a grievous offense.
No, it's not. In a single NUT server, multiple UPS system it is impossible
to deal with situations where some of the UPS'es monitored receive power
from the mains and the one powering the NUT server is not. So if the UPS
powering the NUT server is critical, all the UPS'es we're monitoring
should be at the very least on battery as well. Not shutting down all
clients connected to the NUT server when that has to go down (ie, the
power is out and the batteries of the UPS feeding it), would be a sure
fire way to crash them.
[...]
>> Or, at the very least, document the hidden assumption that none of your
>> monitored UPS's runtimes will exceed that of your master
server's UPS
>> runtime in particular.
This is documented in the 'upsmon' man page for instance:
> FORCED SHUTDOWNS
>
> When upsmon is forced to bring down the local system, it sets the
> "FSD" (forced shutdown) flag on any UPSes that it is running in
master
> mode. This is used to synchronize slaves in the event that a master UPS
> that is otherwise OK needs to be brought down due to some pressing event
> on the master.
Best regards, Arjen
--=20
Eindhoven - The Netherlands
Key fingerprint - 66 4E 03 2C 9D B5 CB 9B 7A FE 7E C1 EE 88 BC 57