Hi, recently I observed that during a power outage, my NUT setup doesn't shutdown properly. Indeed it never reaches the LowBattery state to notify and initiate shutdown on clients. Reviewing my setup and logs I think I'm facing multiple problems and I'm unable to point to real root cause. First, let me focus on my scenario and current setup: In normal status (ONLINE, fully charged) my UPS runtime is about 20minutes, this is some of the relevant info: Charge 100% Charge Low 20% Runtime 20m37s Type PbAc Manufacturer EATON Model Ellipse ECO 1200 Serial 000000000 Power 25VA Frequency Nominal 50Hz Voltage 230.0V Voltage Nominal 230V Delay Shutdown 25s Delay Start 40s Firmware 02 Load 23% Power Nominal 1200VA Status OL During a power outage it used to last for about 15 minutes and then initiate the shutdown. however reviewing the logs of my systems: Apr 19 12:04:23 - OnBattery Apr 19 12:10:21 - Power from UPS cut, all systems got suddenly powered off (no graceful shutdown) Apr 19 12:16:46 - Power restored, UPS feeds power again to the systems and they boot up. So this indicated the UPS only lasted about 6 minutes. Also I have 2 UPS: - UPSdevice, which is the real UPS - HeartBeat : virtual dummy device to monitor proper NUT communication (see http://rogerprice.org/NUT/NUT.html#HEARTBEAT). Looking at the logs when the event starts: Apr 19 12:04:23 laney upsmon[756]: UPS UPSdevice at power.srv.l3jane.net: On battery. Apr 19 12:04:23 laney upssched[797]: New timer: onbattwarn (30 seconds) Apr 19 12:04:53 laney upssched[797]: Event: onbattwarn Apr 19 12:08:24 laney upsmon[756]: UPS HeartBeat at power.srv.l3jane.net: On battery. Apr 19 12:08:24 laney upssched[797]: Cancelling timer: heartbeat-failure UPSdevice uses timer "onbattwarn", while HeartBeat is using a separated timer "heartbeat-failure". So I supose that when heartbeat cancels the 'heartbeat-failure' the onbattwarn timer is still running. The second concern is quickly spotted: Apr 19 12:09:19 laney usbhid-ups[703]: libusb_get_interrupt: error submitting URB: No such device Apr 19 12:09:24 laney upsd[753]: Data for UPS [UPSdevice] is stale - check driver Apr 19 12:09:26 laney upsd[753]: UPS [UPSdevice] data is no longer stale Seems that during the 'onbattwarn' event, NUT lost communcaiton with the UPS for about 6 secs? this is really quick to trigger any 'COMM_BAD' evebt so no notifcation regarding this is sent, just the logs. However as stated above on the logs, just 1 min later (12:10:21) power is totally lost even from UPS, no trace of LowBattery events, comm errors or so. A part from a email message that upssched sent me (although no logs of the event appears), this email reports: Power ups UPSdevice battery-low notification UPS: UPSdevice at power.srv.l3jane.net Notice type: LOWBATT Message: battery-low Sun, 19 Apr 2020 12:10:27 +0200 Charge 73% Charge Low 20% Runtime 15m52s Type PbAc Manufacturer EATON Model Ellipse ECO 1200 Serial 000000000 Type Ups Power 25VA Frequency Nominal 50Hz Voltage 230.0V Voltage Nominal 230V Delay Shutdown 20s Delay Start 30s Firmware 02 Load 21% Power Nominal 1200VA Status FSD ALARM OB LB So, indeed was in LowBattery and FSD shutdown, however note that the current charge was 73% >> 20% low battery, so why it set LB when there were enough battery to run? Is this a comm problem, device issue, driver ? So mainly my concerns questions are: A) Can upsched introduce some race-conditions making a timer cancel cancel other timers not related ? B) May my UPS be faulting or just some USB driver issue? Any other hint, information or idea is welcomed. Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://alioth-lists.debian.net/pipermail/nut-upsuser/attachments/20200423/28acb83e/attachment.html>
Hello. Battery, battery, battery. I am only thinking at worn batteries. I recently had the same symptoms with an hp/eaton g3 1500va. One of the 3 batteries was reading a healthy 12.8v, but when put under load it would quickly drain. I would measure each battery and test them under load, like with a car headlight bulb switched on both fillaments. On Thu, 23 Apr 2020, 16:54 Marc Franquesa, <marc.franquesa at gmail.com> wrote:> Hi, recently I observed that during a power outage, my NUT setup doesn't > shutdown properly. Indeed it never reaches the LowBattery state to notify > and initiate shutdown on clients. Reviewing my setup and logs I think I'm > facing multiple problems and I'm unable to point to real root cause. > > First, let me focus on my scenario and current setup: > > In normal status (ONLINE, fully charged) my UPS runtime is about > 20minutes, this is some of the relevant info: > > Charge 100% > Charge Low 20% > Runtime 20m37s > Type PbAc > Manufacturer EATON > Model Ellipse > ECO 1200 > Serial 000000000 > Power 25VA > Frequency Nominal 50Hz > Voltage 230.0V > Voltage Nominal 230V > Delay Shutdown 25s > Delay Start 40s > Firmware 02 > Load 23% > Power Nominal 1200VA > Status OL > > During a power outage it used to last for about 15 minutes and then > initiate the shutdown. however reviewing the logs of my systems: > > Apr 19 12:04:23 - OnBattery > Apr 19 12:10:21 - Power from UPS cut, all systems got suddenly powered off > (no graceful shutdown) > Apr 19 12:16:46 - Power restored, UPS feeds power again to the systems and > they boot up. > > So this indicated the UPS only lasted about 6 minutes. > > Also I have 2 UPS: > - UPSdevice, which is the real UPS > - HeartBeat : virtual dummy device to monitor proper NUT communication > (see http://rogerprice.org/NUT/NUT.html#HEARTBEAT). > > Looking at the logs when the event starts: > > Apr 19 12:04:23 laney upsmon[756]: UPS UPSdevice at power.srv.l3jane.net: On > battery. > Apr 19 12:04:23 laney upssched[797]: New timer: onbattwarn (30 seconds) > Apr 19 12:04:53 laney upssched[797]: Event: onbattwarn > Apr 19 12:08:24 laney upsmon[756]: UPS HeartBeat at power.srv.l3jane.net: On > battery. > Apr 19 12:08:24 laney upssched[797]: Cancelling timer: heartbeat-failure > > UPSdevice uses timer "onbattwarn", while HeartBeat is using a separated > timer "heartbeat-failure". So I supose that when heartbeat cancels the > 'heartbeat-failure' the onbattwarn timer is still running. > > The second concern is quickly spotted: > > Apr 19 12:09:19 laney usbhid-ups[703]: libusb_get_interrupt: error > submitting URB: No such device > Apr 19 12:09:24 laney upsd[753]: Data for UPS [UPSdevice] is stale - check > driver > Apr 19 12:09:26 laney upsd[753]: UPS [UPSdevice] data is no longer stale > > Seems that during the 'onbattwarn' event, NUT lost communcaiton with the > UPS for about 6 secs? this is really quick to trigger any 'COMM_BAD' evebt > so no notifcation regarding this is sent, just the logs. > > However as stated above on the logs, just 1 min later (12:10:21) power is > totally lost even from UPS, no trace of LowBattery events, comm errors or > so. A part from a email message that upssched sent me (although no logs of > the event appears), this email reports: > > Power ups UPSdevice battery-low notification > UPS: UPSdevice at power.srv.l3jane.net > Notice type: LOWBATT > Message: battery-low > Sun, 19 Apr 2020 12:10:27 +0200 > > Charge 73% > Charge Low 20% > Runtime 15m52s > Type PbAc > Manufacturer EATON > Model Ellipse > ECO 1200 > Serial 000000000 > Type Ups > Power 25VA > Frequency Nominal 50Hz > Voltage 230.0V > Voltage Nominal 230V > Delay Shutdown 20s > Delay Start 30s > Firmware 02 > Load 21% > Power Nominal 1200VA > Status FSD > ALARM OB LB > > So, indeed was in LowBattery and FSD shutdown, however note that the > current charge was 73% >> 20% low battery, so why it set LB when there were > enough battery to run? Is this a comm problem, device issue, driver ? > > So mainly my concerns questions are: > A) Can upsched introduce some race-conditions making a timer cancel cancel > other timers not related ? > B) May my UPS be faulting or just some USB driver issue? > > Any other hint, information or idea is welcomed. > > Regards > _______________________________________________ > Nut-upsuser mailing list > Nut-upsuser at alioth-lists.debian.net > https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://alioth-lists.debian.net/pipermail/nut-upsuser/attachments/20200423/af10d426/attachment-0001.html>
Have you done a load/battery test without NUT running to see how long the UPS will run? Not sure if yourngear will do it, but some UPS gear has an internal battery test that will automate that process, without ever dropping output. Also, how old are the batteries? - Tim On April 23, 2020 12:41:30 PM CDT, nicolae788 <nicolae788 at gmail.com> wrote:>Hello. Battery, battery, battery. I am only thinking at worn batteries. >I >recently had the same symptoms with an hp/eaton g3 1500va. One of the 3 >batteries was reading a healthy 12.8v, but when put under load it would >quickly drain. I would measure each battery and test them under load, >like >with a car headlight bulb switched on both fillaments. > >On Thu, 23 Apr 2020, 16:54 Marc Franquesa, <marc.franquesa at gmail.com> >wrote: > >> Hi, recently I observed that during a power outage, my NUT setup >doesn't >> shutdown properly. Indeed it never reaches the LowBattery state to >notify >> and initiate shutdown on clients. Reviewing my setup and logs I think >I'm >> facing multiple problems and I'm unable to point to real root cause. >> >> First, let me focus on my scenario and current setup: >> >> In normal status (ONLINE, fully charged) my UPS runtime is about >> 20minutes, this is some of the relevant info: >> >> Charge 100% >> Charge Low 20% >> Runtime 20m37s >> Type PbAc >> Manufacturer EATON >> Model Ellipse >> ECO 1200 >> Serial 000000000 >> Power 25VA >> Frequency Nominal 50Hz >> Voltage 230.0V >> Voltage Nominal 230V >> Delay Shutdown 25s >> Delay Start 40s >> Firmware 02 >> Load 23% >> Power Nominal 1200VA >> Status OL >> >> During a power outage it used to last for about 15 minutes and then >> initiate the shutdown. however reviewing the logs of my systems: >> >> Apr 19 12:04:23 - OnBattery >> Apr 19 12:10:21 - Power from UPS cut, all systems got suddenly >powered off >> (no graceful shutdown) >> Apr 19 12:16:46 - Power restored, UPS feeds power again to the >systems and >> they boot up. >> >> So this indicated the UPS only lasted about 6 minutes. >> >> Also I have 2 UPS: >> - UPSdevice, which is the real UPS >> - HeartBeat : virtual dummy device to monitor proper NUT >communication >> (see http://rogerprice.org/NUT/NUT.html#HEARTBEAT). >> >> Looking at the logs when the event starts: >> >> Apr 19 12:04:23 laney upsmon[756]: UPS >UPSdevice at power.srv.l3jane.net: On >> battery. >> Apr 19 12:04:23 laney upssched[797]: New timer: onbattwarn (30 >seconds) >> Apr 19 12:04:53 laney upssched[797]: Event: onbattwarn >> Apr 19 12:08:24 laney upsmon[756]: UPS >HeartBeat at power.srv.l3jane.net: On >> battery. >> Apr 19 12:08:24 laney upssched[797]: Cancelling timer: >heartbeat-failure >> >> UPSdevice uses timer "onbattwarn", while HeartBeat is using a >separated >> timer "heartbeat-failure". So I supose that when heartbeat cancels >the >> 'heartbeat-failure' the onbattwarn timer is still running. >> >> The second concern is quickly spotted: >> >> Apr 19 12:09:19 laney usbhid-ups[703]: libusb_get_interrupt: error >> submitting URB: No such device >> Apr 19 12:09:24 laney upsd[753]: Data for UPS [UPSdevice] is stale - >check >> driver >> Apr 19 12:09:26 laney upsd[753]: UPS [UPSdevice] data is no longer >stale >> >> Seems that during the 'onbattwarn' event, NUT lost communcaiton with >the >> UPS for about 6 secs? this is really quick to trigger any 'COMM_BAD' >evebt >> so no notifcation regarding this is sent, just the logs. >> >> However as stated above on the logs, just 1 min later (12:10:21) >power is >> totally lost even from UPS, no trace of LowBattery events, comm >errors or >> so. A part from a email message that upssched sent me (although no >logs of >> the event appears), this email reports: >> >> Power ups UPSdevice battery-low notification >> UPS: UPSdevice at power.srv.l3jane.net >> Notice type: LOWBATT >> Message: battery-low >> Sun, 19 Apr 2020 12:10:27 +0200 >> >> Charge 73% >> Charge Low 20% >> Runtime 15m52s >> Type PbAc >> Manufacturer EATON >> Model Ellipse >> ECO 1200 >> Serial 000000000 >> Type Ups >> Power 25VA >> Frequency Nominal 50Hz >> Voltage 230.0V >> Voltage Nominal 230V >> Delay Shutdown 20s >> Delay Start 30s >> Firmware 02 >> Load 21% >> Power Nominal 1200VA >> Status FSD >> ALARM OB LB >> >> So, indeed was in LowBattery and FSD shutdown, however note that the >> current charge was 73% >> 20% low battery, so why it set LB when >there were >> enough battery to run? Is this a comm problem, device issue, driver ? >> >> So mainly my concerns questions are: >> A) Can upsched introduce some race-conditions making a timer cancel >cancel >> other timers not related ? >> B) May my UPS be faulting or just some USB driver issue? >> >> Any other hint, information or idea is welcomed. >> >> Regards >> _______________________________________________ >> Nut-upsuser mailing list >> Nut-upsuser at alioth-lists.debian.net >> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser-- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://alioth-lists.debian.net/pipermail/nut-upsuser/attachments/20200423/3601a93e/attachment.html>
On Thu, 23 Apr 2020, Marc Franquesa wrote:> ...this email reports: > > Power ups UPSdevice battery-low notification > UPS: UPSdevice at power.srv.l3jane.net > Notice type: LOWBATT > Message: battery-low > Sun, 19 Apr 2020 12:10:27 +0200 > > Charge 73% > Charge Low 20% > Runtime 15m52s > ... > Delay Shutdown 20s > Delay Start 30s > Firmware 02 > Load 21% > Power Nominal 1200VA > Status FSD > ALARM OB LBAssuming the battery is ok, then how long does it take this system to shut down? Is a Delay Shutdown of only 20 sec enough?> So, indeed was in LowBattery and FSD shutdown, however note that the current > charge was 73% >> 20% low battery, so why it set LB when there were enough > battery to run? Is this a comm problem, device issue, driver ?System administrators sometimes use the command upsrw -s battery.charge.low=75 -u <upsuser> -p <sekret> <myups> to speed up NUT testing. Do you have such a command lurking somewhere?> A) Can upsched introduce some race-conditions making a timer cancel cancel > other timers not related ?The upssched timers operate independently of one another. Unless you have strange logic in upssched-cmd the heartbeat timers should not affect any other timer. If you are shutting down on LB, then why do you need timers in upssched (apart from the heartbeart)? Roger