Aaron J. Grier
2008-Feb-20 04:55 UTC
[Nut-upsuser] MGE pulsar evolution 3000 discharges without nut noticing
I'm running nut 2.0.1 with an MGE pulsar evolution 3000 with the mge-utalk (serial) driver on NetBSD 3.1 . I had a very strange occurance in my basement datacenter this early AM. the following was logged by upslog: 20080219 023452 100.0 123.0 049.0 [OL CHRG] NA 60.00 20080219 023522 100.0 122.8 049.0 [OL CHRG] NA 60.00 20080219 023552 098.0 123.0 049.0 [OL DISCHRG] NA 60.00 20080219 023622 097.0 123.0 049.0 [OL DISCHRG] NA 60.00 ... 20080219 063429 019.0 122.6 049.0 [OL DISCHRG] NA 60.00 20080219 063459 019.0 122.8 049.0 [OL DISCHRG] NA 60.00 20080219 183013 100.0 121.1 050.0 [OL CHRG] NA 60.00 20080219 183043 100.0 120.9 049.0 [OL CHRG] NA 60.00 the good news is the UPS held up for four hours. the bad news is that nut completely ignored that the UPS was discharging, and none of my connected machines shut down properly. I know that the UPS driver was running, because if was not, upsd would have complained loudly and frequently. I know upsd was running (on the connected server as well as all the clients), because if it was not, upsmon would have complained loudly and frequently. there were no indications in my logs that nut was paying any attention to the situation. why did my UPS decide to discharge its battery when there appears to be no power outage? (just looking for suggestions and guesses here from other MGE users...) why did nut not see that my UPS was discharging and the battery percentage running down and take appropriate action? "OL DISCHRG" status in the logs seems nonsensical. I have only seen this condition a few times before in my logs: 20061002 002628 094.0 079.6 045.0 [OB DISCHRG] NA 60.00 20061002 002658 093.0 096.8 045.0 [OL DISCHRG BOOST] NA 60.00 20061002 002728 093.0 088.3 045.0 [OL DISCHRG] NA 60.00 20061002 002758 093.0 000.0 045.0 [OB DISCHRG] NA 00.00 Oct 2 00:22:59 aragorn upsmon[234]: UPS mge3000 at localhost on battery Oct 2 00:26:45 aragorn upsmon[234]: UPS mge3000 at localhost on line power Oct 2 00:27:35 aragorn upsmon[234]: UPS mge3000 at localhost on battery nut saw this one, and shut everything down, as expected. 20061214 224604 100.0 122.4 054.0 [OL CHRG] NA 60.00 20061214 224634 099.0 123.7 056.0 [OL DISCHRG] NA 60.00 20061214 224704 097.0 000.0 054.0 [OB DISCHRG] NA 00.00 Dec 14 22:46:45 aragorn upsmon[234]: UPS mge3000 at localhost on battery Dec 14 22:58:41 aragorn upsmon[234]: UPS mge3000 at localhost battery is low Dec 14 22:58:41 aragorn upsd[226]: Client master at 127.0.0.1 set FSD on UPS [mge3000] Dec 14 22:58:57 aragorn upsmon[234]: Host sync timer expired, forcing shutdown Dec 14 22:58:57 aragorn upsmon[234]: Executing automatic power-fail shutdown Dec 14 22:58:57 aragorn upsmon[234]: Auto logout and shutdown proceeding Dec 14 22:59:02 aragorn upsd[226]: Host 127.0.0.1 disconnected (read failure) Dec 14 22:59:08 aragorn upsd[226]: Signal 15: exiting nut did not see this one, but it was transient: 20070227 083935 100.0 122.2 048.0 [OL CHRG] NA 60.00 20070227 084005 100.0 122.2 048.0 [OL DISCHRG] NA 60.00 20070227 084035 100.0 122.2 048.0 [OL CHRG] NA 60.00 here is the device (online now) $ upsc mge3000 at localhost battery.charge: 100.0 battery.charge.low: 30 battery.runtime: 00876 battery.voltage: 081.6 battery.voltage.nominal: 072.0 driver.name: mge-utalk driver.parameter.port: /dev/tty00 driver.version: 2.0.1 driver.version.internal: 0.81.0 input.frequency: 60.00 input.transfer.boost.low: 102.0 input.transfer.high: 132.0 input.transfer.low: 102.0 input.transfer.trim.high: 132.0 input.voltage: 122.8 output.current: 010.3 output.voltage: 122.2 ups.delay.shutdown: 020 ups.delay.start: 001 ups.firmware: unknown ups.id: Evolutio 3000 17 ups.load: 049.0 ups.mfr: MGE UPS SYSTEMS ups.model: Evolution 3000 ups.status: OL CHRG ups.test.interval: 10080 thanks in advance for any advice. -- Aaron J. Grier | "Not your ordinary poofy goof." | agrier at poofygoof.com
Arjen de Korte
2008-Feb-20 08:28 UTC
[Nut-upsuser] MGE pulsar evolution 3000 discharges without nut noticing
> I had a very strange occurance in my basement datacenter this early AM. > > the following was logged by upslog: > > 20080219 023452 100.0 123.0 049.0 [OL CHRG] NA 60.00 > 20080219 023522 100.0 122.8 049.0 [OL CHRG] NA 60.00 > 20080219 023552 098.0 123.0 049.0 [OL DISCHRG] NA 60.00 > 20080219 023622 097.0 123.0 049.0 [OL DISCHRG] NA 60.00 > > ... > > 20080219 063429 019.0 122.6 049.0 [OL DISCHRG] NA 60.00 > 20080219 063459 019.0 122.8 049.0 [OL DISCHRG] NA 60.00 > 20080219 183013 100.0 121.1 050.0 [OL CHRG] NA 60.00 > 20080219 183043 100.0 120.9 049.0 [OL CHRG] NA 60.00 > > the good news is the UPS held up for four hours.Nothing wrong here. It was probably running a battery test, since the input power seems to have been present at all times (it is reporting OL).> the bad news is that nut completely ignored that the UPS was > discharging, and none of my connected machines shut down properly.It didn't ignore it, the UPS just wasn't critical at any time during this test. By default, NUT will only start a shutdown sequence when the UPS becomes critical (ie, no input power available and batteries low). From the above, I conclude that at no time the input power was lost and neither the batteries were low. You have a couple of things to configure to make NUT behave otherwise.> I > know that the UPS driver was running, because if was not, upsd would > have complained loudly and frequently. I know upsd was running (on the > connected server as well as all the clients), because if it was not, > upsmon would have complained loudly and frequently.I don't see any problems here.> there were no indications in my logs that nut was paying any attention > to the situation.Yes, it was. It was reporting [OL DISCHRG] all the time. There was just nothing NUT needed to do.> why did my UPS decide to discharge its battery when there appears to be > no power outage? (just looking for suggestions and guesses here from > other MGE users...)Most probably, because of an (automated) battery test. Or someone initiated one, but given your surprised reaction, I guess it wasn't you.> why did nut not see that my UPS was discharging and the battery > percentage running down and take appropriate action? "OL DISCHRG" > status in the logs seems nonsensical.This is all documented (in the FAQ for instance). This makes perfect sense.> I have only seen this condition a > few times before in my logs: > > 20061002 002628 094.0 079.6 045.0 [OB DISCHRG] NA 60.00 > 20061002 002658 093.0 096.8 045.0 [OL DISCHRG BOOST] NA 60.00 > 20061002 002728 093.0 088.3 045.0 [OL DISCHRG] NA 60.00 > 20061002 002758 093.0 000.0 045.0 [OB DISCHRG] NA 00.00 > > Oct 2 00:22:59 aragorn upsmon[234]: UPS mge3000 at localhost on battery > Oct 2 00:26:45 aragorn upsmon[234]: UPS mge3000 at localhost on line power > Oct 2 00:27:35 aragorn upsmon[234]: UPS mge3000 at localhost on battery > > nut saw this one, and shut everything down, as expected.Sure, here you see that the input power is actually lost (the line state flipping back and forth between OL and OB).> 20061214 224604 100.0 122.4 054.0 [OL CHRG] NA 60.00 > 20061214 224634 099.0 123.7 056.0 [OL DISCHRG] NA 60.00 > 20061214 224704 097.0 000.0 054.0 [OB DISCHRG] NA 00.00 > > Dec 14 22:46:45 aragorn upsmon[234]: UPS mge3000 at localhost on battery > Dec 14 22:58:41 aragorn upsmon[234]: UPS mge3000 at localhost battery is low > Dec 14 22:58:41 aragorn upsd[226]: Client master at 127.0.0.1 set FSD on UPS > [mge3000] > Dec 14 22:58:57 aragorn upsmon[234]: Host sync timer expired, forcing > shutdown > Dec 14 22:58:57 aragorn upsmon[234]: Executing automatic power-fail > shutdown > Dec 14 22:58:57 aragorn upsmon[234]: Auto logout and shutdown proceeding > Dec 14 22:59:02 aragorn upsd[226]: Host 127.0.0.1 disconnected (read > failure) > Dec 14 22:59:08 aragorn upsd[226]: Signal 15: exitingAnd here the batteries where probably already low and NUT decided the UPS was critical and it was time to shutdown. [...]> ups.test.interval: 10080There you go. This seems to be an awful short interval (three hours). Again, read up on the FAQ and come back if you anything is not clear to you. Best regards, Arjen -- Eindhoven - The Netherlands Key fingerprint - 66 4E 03 2C 9D B5 CB 9B 7A FE 7E C1 EE 88 BC 57
Arnaud Quette
2008-Feb-20 08:40 UTC
[Nut-upsuser] MGE pulsar evolution 3000 discharges without nut noticing
Hi Aaron, 2008/2/20, Aaron J. Grier <agrier at poofygoof.com>:> I'm running nut 2.0.1 with an MGE pulsar evolution 3000 with the > mge-utalk (serial) driver on NetBSD 3.1 . > > I had a very strange occurance in my basement datacenter this early AM. > > the following was logged by upslog: > > 20080219 023452 100.0 123.0 049.0 [OL CHRG] NA 60.00 > 20080219 023522 100.0 122.8 049.0 [OL CHRG] NA 60.00 > 20080219 023552 098.0 123.0 049.0 [OL DISCHRG] NA 60.00 > 20080219 023622 097.0 123.0 049.0 [OL DISCHRG] NA 60.00 > > ... > > 20080219 063429 019.0 122.6 049.0 [OL DISCHRG] NA 60.00 > 20080219 063459 019.0 122.8 049.0 [OL DISCHRG] NA 60.00 > 20080219 183013 100.0 121.1 050.0 [OL CHRG] NA 60.00 > 20080219 183043 100.0 120.9 049.0 [OL CHRG] NA 60.00 > > the good news is the UPS held up for four hours. > > the bad news is that nut completely ignored that the UPS was > discharging, and none of my connected machines shut down properly. I > know that the UPS driver was running, because if was not, upsd would > have complained loudly and frequently. I know upsd was running (on the > connected server as well as all the clients), because if it was not, > upsmon would have complained loudly and frequently. > > there were no indications in my logs that nut was paying any attention > to the situation. > > why did my UPS decide to discharge its battery when there appears to be > no power outage? (just looking for suggestions and guesses here from > other MGE users...) > > why did nut not see that my UPS was discharging and the battery > percentage running down and take appropriate action? "OL DISCHRG" > status in the logs seems nonsensical. I have only seen this condition a > few times before in my logs: > > 20061002 002628 094.0 079.6 045.0 [OB DISCHRG] NA 60.00 > 20061002 002658 093.0 096.8 045.0 [OL DISCHRG BOOST] NA 60.00 > 20061002 002728 093.0 088.3 045.0 [OL DISCHRG] NA 60.00 > 20061002 002758 093.0 000.0 045.0 [OB DISCHRG] NA 00.00 > > Oct 2 00:22:59 aragorn upsmon[234]: UPS mge3000 at localhost on battery > Oct 2 00:26:45 aragorn upsmon[234]: UPS mge3000 at localhost on line power > Oct 2 00:27:35 aragorn upsmon[234]: UPS mge3000 at localhost on battery > > nut saw this one, and shut everything down, as expected. > > 20061214 224604 100.0 122.4 054.0 [OL CHRG] NA 60.00 > 20061214 224634 099.0 123.7 056.0 [OL DISCHRG] NA 60.00 > 20061214 224704 097.0 000.0 054.0 [OB DISCHRG] NA 00.00 > > Dec 14 22:46:45 aragorn upsmon[234]: UPS mge3000 at localhost on battery > Dec 14 22:58:41 aragorn upsmon[234]: UPS mge3000 at localhost battery is low > Dec 14 22:58:41 aragorn upsd[226]: Client master at 127.0.0.1 set FSD on UPS [mge3000] > Dec 14 22:58:57 aragorn upsmon[234]: Host sync timer expired, forcing shutdown > Dec 14 22:58:57 aragorn upsmon[234]: Executing automatic power-fail shutdown > Dec 14 22:58:57 aragorn upsmon[234]: Auto logout and shutdown proceeding > Dec 14 22:59:02 aragorn upsd[226]: Host 127.0.0.1 disconnected (read failure) > Dec 14 22:59:08 aragorn upsd[226]: Signal 15: exiting > > nut did not see this one, but it was transient: > > 20070227 083935 100.0 122.2 048.0 [OL CHRG] NA 60.00 > 20070227 084005 100.0 122.2 048.0 [OL DISCHRG] NA 60.00 > 20070227 084035 100.0 122.2 048.0 [OL CHRG] NA 60.00 > > here is the device (online now) > > $ upsc mge3000 at localhost > battery.charge: 100.0 > battery.charge.low: 30 > battery.runtime: 00876 > battery.voltage: 081.6 > battery.voltage.nominal: 072.0 > driver.name: mge-utalk > driver.parameter.port: /dev/tty00 > driver.version: 2.0.1 > driver.version.internal: 0.81.0 > input.frequency: 60.00 > input.transfer.boost.low: 102.0 > input.transfer.high: 132.0 > input.transfer.low: 102.0 > input.transfer.trim.high: 132.0 > input.voltage: 122.8 > output.current: 010.3 > output.voltage: 122.2 > ups.delay.shutdown: 020 > ups.delay.start: 001 > ups.firmware: unknown > ups.id: Evolutio 3000 17 > ups.load: 049.0 > ups.mfr: MGE UPS SYSTEMS > ups.model: Evolution 3000 > ups.status: OL CHRG > ups.test.interval: 10080 > > thanks in advance for any advice.as always, Arjen already replied. I would just add that you should prefer the mge-shut driver. It's more complete and maintained that the mge-utalk (which is for the legacy ASCII protocol). Best regards, Arnaud -- Linux / Unix Expert R&D - MGE Office Protection Systems - http://www.mgeops.com Network UPS Tools (NUT) Project Leader - http://www.networkupstools.org/ Debian Developer - http://people.debian.org/~aquette/ Free Software Developer - http://arnaud.quette.free.fr/