Kari LempiƤinen
2023-Jun-11 18:32 UTC
[Nut-upsuser] Upssched 100% CPU after updating Debian 12
Hi, I have been running nut successfully for a long time with my Debian 11 server. I upgraded my server to Debian 12 today, which upgraded nut also from 2.7.4-13 to 2.8.0-7. I noticed that after upgrade there was a upssched process running and taking 100% cpu time. I checked if there were any changes to configuration file formats with nut upgrade and only differences I noticed were a terminology change from master/slave to primary/secondary in upsd.users and upsmon.conf, and dummy mode change for the dummy-ups. I didn?t expect these being the culprit, but I made the changes to my configuration. No change, after five minutes of starting nut, upssched will be running with 100% cpu usage. Any ideas what could be causing this? Here is my configuration and in the end are some log snippets: --------------------------------------------------- heartbeat.conf # heartbeat.conf -- 10 minute heartbeat ups.status: OL TIMER 300 ups.status: OB TIMER 300 --------------------------------------------------- hosts.conf MONITOR eaton5s at localhost "Eaton 5S 700" --------------------------------------------------- nut.conf MODE=netserver --------------------------------------------------- ups.conf maxretry = 3 [eaton5s] driver = usbhid-ups port = auto desc = "Eaton 5S 700i UPS" vendorid = "0463" productid = "FFFF" pollinterval = 15 [heartbeat] driver = dummy-ups mode=dummy-loop port = heartbeat.conf desc = "Watch over NUT" --------------------------------------------------- upsd.conf LISTEN 0.0.0.0 3493 --------------------------------------------------- upsmon.conf RUN_AS_USER root MONITOR eaton5s at localhost 1 upsmaster xxx primary MONITOR heartbeat at localhost 0 upsmaster xxx primary MINSUPPLIES 1 NOTIFYCMD /sbin/upssched SHUTDOWNCMD "/sbin/shutdown -h +0" POLLFREQ 5 POLLFREQALERT 5 HOSTSYNC 15 DEADTIME 15 POWERDOWNFLAG /etc/killpower RBWARNTIME 43200 NOCOMMWARNTIME 300 FINALDELAY 5 NOTIFYMSG ONLINE "UPS %s: On line power." NOTIFYMSG ONBATT "UPS %s: On battery." NOTIFYMSG LOWBATT "UPS %s: Battery is low." NOTIFYMSG REPLBATT "UPS %s: Battery needs to be replaced." NOTIFYMSG FSD "UPS %s: Forced shutdown in progress." NOTIFYMSG SHUTDOWN "Auto logout and shutdown proceeding." NOTIFYMSG COMMOK "UPS %s: Communications (re-)established." NOTIFYMSG COMMBAD "UPS %s: Communications lost." NOTIFYMSG NOCOMM "UPS %s: Not available." NOTIFYMSG NOPARENT "upsmon parent dead, shutdown impossible." NOTIFYFLAG ONLINE EXEC NOTIFYFLAG ONBATT EXEC NOTIFYFLAG LOWBATT SYSLOG+WALL NOTIFYFLAG REPLBATT SYSLOG+WALL NOTIFYFLAG FSD SYSLOG+WALL NOTIFYFLAG SHUTDOWN SYSLOG+WALL NOTIFYFLAG COMMOK SYSLOG+WALL NOTIFYFLAG COMMBAD SYSLOG+WALL NOTIFYFLAG NOCOMM SYSLOG+WALL NOTIFYFLAG NOPARENT SYSLOG+WALL --------------------------------------------------- upssched.conf CMDSCRIPT /etc/nut/upssched-cmd PIPEFN /var/run/nut/upssched.pipe LOCKFN /var/run/nut/upssched.lock AT ONBATT eaton5s at localhost EXECUTE onbatt AT ONBATT eaton5s at localhost START-TIMER beeperoff 15 AT ONLINE eaton5s at localhost EXECUTE online AT ONLINE eaton5s at localhost CANCEL-TIMER beeperoff AT ONBATT eaton5s at localhost START-TIMER shutdown-ws2019 300 AT ONLINE eaton5s at localhost CANCEL-TIMER shutdown-ws2019 AT ONBATT heartbeat at localhost CANCEL-TIMER heartbeat-failure-timer AT ONBATT heartbeat at localhost START-TIMER heartbeat-failure-timer 660 --------------------------------------------------- upsd.users [upsmaster] password = xxx upsmon primary actions = set actions = fsd instcmds = all [upsslave] password = xxx upsmon secondary --------------------------------------------------- upssched-cmd #! /bin/bash UPS="eaton5s" UPS_USERNAME="upsmaster" UPS_PASSWORD="xxx" EMAIL_TO=xxx at xxx.xxx<mailto:xxx at xxx.xxx> EMAIL_FROM=xxx at xxx.xxx<mailto:xxx at xxx.xxx> STATUS=`upsc $UPS ups.status 2>&1 | grep -v '^Init SSL'` CHARGE=`upsc $UPS battery.charge 2>&1 | grep -v '^Init SSL'` CHMSG="[$STATUS]:$CHARGE%" MSG="dummy message - $1" case $1 in online) MSG="$UPS, $CHMSG - On line power - power supply had been restored." ;; onbatt) MSG="$UPS, $CHMSG - On battery - power failure - save your work!" ;; lowbatt) MSG="$UPS, $CHMSG - Battery is low - shutdown now!" ;; beeperoff) logger -i -t upsshed-cmd "Turning off Beeper" STATUS=`upsc ${UPS} ups.beeper.status 2>&1 | grep -v '^Init SSL'` if [[ "${STATUS}" == "enabled" ]] ; then upscmd -u ${UPS_USERNAME} -p ${UPS_PASSWORD} ${UPS} beeper.toggle fi ;; shutdown-ws2019) MSG="$UPS, $CHMSG - Stopping WS2019" logger -i -t upsshed-cmd "Stopping WS2019" setsid /usr/local/bin/shutdown-ws2019 & ;; upsgone) logger -i -t upssched-cmd "The UPS has been gone for awhile" ;; heartbeat-failure-timer) MSG="NUT heart beat fails. $CHMSG" # Email MSG1="Hello, upssched-cmd reports NUT heartbeat has failed." MSG2="Current status: $CHMSG \n\n$0 $1" MSG3="\n\n$( ps -elf | grep -E 'ups[dms]|nut' )" echo -e "$MSG1 $MSG2 $MSG3" | /usr/bin/mail -r "$EMAIL_FROM" \ -s "NUT heart beat fails. Currently $CHMSG" "$EMAIL_TO" ;; *) logger -i -t upssched-cmd "Unrecognized command: $1" ;; esac logger -i -t upssched-cmd $MSG wall "$MSG" --------------------------------------------------- --------------------------------------------------- This is normal ups activity in daemon log from the previous version: Jun 10 23:10:50 fricka upssched[16007]: Cancelling timer: heartbeat-failure-timer Jun 10 23:10:50 fricka upssched[16007]: New timer: heartbeat-failure-timer (660 seconds) Jun 10 23:20:50 fricka upssched[16007]: Cancelling timer: heartbeat-failure-timer Jun 10 23:20:50 fricka upssched[16007]: New timer: heartbeat-failure-timer (660 seconds) Jun 10 23:30:55 fricka upssched[16007]: Cancelling timer: heartbeat-failure-timer Jun 10 23:30:55 fricka upssched[16007]: New timer: heartbeat-failure-timer (660 seconds) Jun 10 23:40:55 fricka upssched[16007]: Cancelling timer: heartbeat-failure-timer Jun 10 23:40:55 fricka upssched[16007]: New timer: heartbeat-failure-timer (660 seconds) Jun 10 23:50:55 fricka upssched[16007]: Cancelling timer: heartbeat-failure-timer Jun 10 23:50:55 fricka upssched[16007]: New timer: heartbeat-failure-timer (660 seconds) --------------------------------------------------- This is daemon log from the new version, first two lines are the last lines of the startup messages. One same upssched process (124835) running with 100% cpu from 21:07: Jun 11 21:02:40 fricka dummy-ups[121803]: sock_connect: enabling asynchronous mode (auto) Jun 11 21:02:41 fricka usbhid-ups[121820]: sock_connect: enabling asynchronous mode (auto) Jun 11 21:07:40 fricka upssched[124835]: Timer daemon started Jun 11 21:07:40 fricka upssched[124835]: New timer: heartbeat-failure-timer (660 seconds) Jun 11 21:07:40 fricka nut-monitor[124832]: Network UPS Tools upsmon 2.8.0 Jun 11 21:12:40 fricka nut-monitor[128005]: Network UPS Tools upsmon 2.8.0 Jun 11 21:17:40 fricka upssched[124835]: Cancelling timer: heartbeat-failure-timer Jun 11 21:17:40 fricka upssched[124835]: New timer: heartbeat-failure-timer (660 seconds) Jun 11 21:17:40 fricka nut-monitor[131233]: Network UPS Tools upsmon 2.8.0 Jun 11 21:22:45 fricka nut-monitor[134330]: Network UPS Tools upsmon 2.8.0 Jun 11 21:27:45 fricka upssched[124835]: Cancelling timer: heartbeat-failure-timer Jun 11 21:27:45 fricka upssched[124835]: New timer: heartbeat-failure-timer (660 seconds) Jun 11 21:27:45 fricka nut-monitor[137838]: Network UPS Tools upsmon 2.8.0 --------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://alioth-lists.debian.net/pipermail/nut-upsuser/attachments/20230611/b66860d3/attachment-0001.htm>
Hello, Thank you for the report and I don't recall any similar issues. Unfortunately, upssched did not have any "native" enablement of debug verbosity in 2.8.0 and older releases (some added with https://github.com/networkupstools/nut/pull/1864 recently), so maybe the best way to get more info at the moment would be to watch syscalls made when it begins to hog CPU and make educated guesses from there, e.g.: # strace -f -ttt -T -p "`ps -ef | grep -w upssched | grep -v grep | awk '{print $2}'`" I suppose when hogging the CPU it has some reason to quickly interrupt its processing loop - perhaps a lack of some file (socket?) it waits on, etc. Hope this helps, Jim Klimov On Mon, Jun 12, 2023, 14:59 Kari Lempi?inen <kari.lempiainen at summerday.net> wrote:> Hi, > > > > I have been running nut successfully for a long time with my Debian 11 > server. I upgraded my server to Debian 12 today, which upgraded nut also > from 2.7.4-13 to 2.8.0-7. I noticed that after upgrade there was a upssched > process running and taking 100% cpu time. > > > > I checked if there were any changes to configuration file formats with nut > upgrade and only differences I noticed were a terminology change from > master/slave to primary/secondary in upsd.users and upsmon.conf, and dummy > mode change for the dummy-ups. I didn?t expect these being the culprit, but > I made the changes to my configuration. No change, after five minutes of > starting nut, upssched will be running with 100% cpu usage. > > > > Any ideas what could be causing this? Here is my configuration and in the > end are some log snippets: > > --------------------------------------------------- > > heartbeat.conf > > > > # heartbeat.conf -- 10 minute heartbeat > > ups.status: OL > > TIMER 300 > > ups.status: OB > > TIMER 300 > > --------------------------------------------------- > > hosts.conf > > > > MONITOR eaton5s at localhost "Eaton 5S 700" > > --------------------------------------------------- > > nut.conf > > > > MODE=netserver > > --------------------------------------------------- > > ups.conf > > > > maxretry = 3 > > [eaton5s] > > driver = usbhid-ups > > port = auto > > desc = "Eaton 5S 700i UPS" > > vendorid = "0463" > > productid = "FFFF" > > pollinterval = 15 > > [heartbeat] > > driver = dummy-ups > > mode=dummy-loop > > port = heartbeat.conf > > desc = "Watch over NUT" > > --------------------------------------------------- > > upsd.conf > > > > LISTEN 0.0.0.0 3493 > > --------------------------------------------------- > > upsmon.conf > > > > RUN_AS_USER root > > MONITOR eaton5s at localhost 1 upsmaster xxx primary > > MONITOR heartbeat at localhost 0 upsmaster xxx primary > > MINSUPPLIES 1 > > NOTIFYCMD /sbin/upssched > > SHUTDOWNCMD "/sbin/shutdown -h +0" > > POLLFREQ 5 > > POLLFREQALERT 5 > > HOSTSYNC 15 > > DEADTIME 15 > > POWERDOWNFLAG /etc/killpower > > RBWARNTIME 43200 > > NOCOMMWARNTIME 300 > > FINALDELAY 5 > > NOTIFYMSG ONLINE "UPS %s: On line power." > > NOTIFYMSG ONBATT "UPS %s: On battery." > > NOTIFYMSG LOWBATT "UPS %s: Battery is low." > > NOTIFYMSG REPLBATT "UPS %s: Battery needs to be replaced." > > NOTIFYMSG FSD "UPS %s: Forced shutdown in progress." > > NOTIFYMSG SHUTDOWN "Auto logout and shutdown proceeding." > > NOTIFYMSG COMMOK "UPS %s: Communications (re-)established." > > NOTIFYMSG COMMBAD "UPS %s: Communications lost." > > NOTIFYMSG NOCOMM "UPS %s: Not available." > > NOTIFYMSG NOPARENT "upsmon parent dead, shutdown impossible." > > NOTIFYFLAG ONLINE EXEC > > NOTIFYFLAG ONBATT EXEC > > NOTIFYFLAG LOWBATT SYSLOG+WALL > > NOTIFYFLAG REPLBATT SYSLOG+WALL > > NOTIFYFLAG FSD SYSLOG+WALL > > NOTIFYFLAG SHUTDOWN SYSLOG+WALL > > NOTIFYFLAG COMMOK SYSLOG+WALL > > NOTIFYFLAG COMMBAD SYSLOG+WALL > > NOTIFYFLAG NOCOMM SYSLOG+WALL > > NOTIFYFLAG NOPARENT SYSLOG+WALL > > --------------------------------------------------- > > upssched.conf > > > > CMDSCRIPT /etc/nut/upssched-cmd > > PIPEFN /var/run/nut/upssched.pipe > > LOCKFN /var/run/nut/upssched.lock > > > > AT ONBATT eaton5s at localhost EXECUTE onbatt > > AT ONBATT eaton5s at localhost START-TIMER beeperoff 15 > > AT ONLINE eaton5s at localhost EXECUTE online > > AT ONLINE eaton5s at localhost CANCEL-TIMER beeperoff > > AT ONBATT eaton5s at localhost START-TIMER shutdown-ws2019 300 > > AT ONLINE eaton5s at localhost CANCEL-TIMER shutdown-ws2019 > > AT ONBATT heartbeat at localhost CANCEL-TIMER heartbeat-failure-timer > > AT ONBATT heartbeat at localhost START-TIMER heartbeat-failure-timer 660 > > --------------------------------------------------- > > upsd.users > > > > [upsmaster] > > password = xxx > > upsmon primary > > actions = set > > actions = fsd > > instcmds = all > > > > [upsslave] > > password = xxx > > upsmon secondary > > --------------------------------------------------- > > upssched-cmd > > > > #! /bin/bash > > > > UPS="eaton5s" > > UPS_USERNAME="upsmaster" > > UPS_PASSWORD="xxx" > > EMAIL_TO=xxx at xxx.xxx > > EMAIL_FROM=xxx at xxx.xxx > > STATUS=`upsc $UPS ups.status 2>&1 | grep -v '^Init SSL'` > > CHARGE=`upsc $UPS battery.charge 2>&1 | grep -v '^Init SSL'` > > CHMSG="[$STATUS]:$CHARGE%" > > MSG="dummy message - $1" > > > > case $1 in > > online) MSG="$UPS, $CHMSG - On line power - power supply had been > restored." ;; > > onbatt) MSG="$UPS, $CHMSG - On battery - power failure - save your > work!" ;; > > lowbatt) MSG="$UPS, $CHMSG - Battery is low - shutdown now!" ;; > > beeperoff) > > logger -i -t upsshed-cmd "Turning off Beeper" > > STATUS=`upsc ${UPS} ups.beeper.status 2>&1 | grep -v > '^Init SSL'` > > if [[ "${STATUS}" == "enabled" ]] ; then > > upscmd -u ${UPS_USERNAME} -p ${UPS_PASSWORD} > ${UPS} beeper.toggle > > fi > > ;; > > shutdown-ws2019) > > MSG="$UPS, $CHMSG - Stopping WS2019" > > logger -i -t upsshed-cmd "Stopping WS2019" > > setsid /usr/local/bin/shutdown-ws2019 & > > ;; > > upsgone) > > logger -i -t upssched-cmd "The UPS has been gone for > awhile" > > ;; > > heartbeat-failure-timer) > > MSG="NUT heart beat fails. $CHMSG" > > # Email > > MSG1="Hello, upssched-cmd reports NUT heartbeat has > failed." > > MSG2="Current status: $CHMSG \n\n$0 $1" > > MSG3="\n\n$( ps -elf | grep -E 'ups[dms]|nut' )" > > echo -e "$MSG1 $MSG2 $MSG3" | /usr/bin/mail -r > "$EMAIL_FROM" \ > > -s "NUT heart beat fails. Currently $CHMSG" "$EMAIL_TO" ;; > > > > *) > > logger -i -t upssched-cmd "Unrecognized command: $1" > > ;; > > esac > > > > logger -i -t upssched-cmd $MSG > > wall "$MSG" > > --------------------------------------------------- > > > > > > --------------------------------------------------- > > This is normal ups activity in daemon log from the previous version: > > > > Jun 10 23:10:50 fricka upssched[16007]: Cancelling timer: > heartbeat-failure-timer > > Jun 10 23:10:50 fricka upssched[16007]: New timer: heartbeat-failure-timer > (660 seconds) > > Jun 10 23:20:50 fricka upssched[16007]: Cancelling timer: > heartbeat-failure-timer > > Jun 10 23:20:50 fricka upssched[16007]: New timer: heartbeat-failure-timer > (660 seconds) > > Jun 10 23:30:55 fricka upssched[16007]: Cancelling timer: > heartbeat-failure-timer > > Jun 10 23:30:55 fricka upssched[16007]: New timer: heartbeat-failure-timer > (660 seconds) > > Jun 10 23:40:55 fricka upssched[16007]: Cancelling timer: > heartbeat-failure-timer > > Jun 10 23:40:55 fricka upssched[16007]: New timer: heartbeat-failure-timer > (660 seconds) > > Jun 10 23:50:55 fricka upssched[16007]: Cancelling timer: > heartbeat-failure-timer > > Jun 10 23:50:55 fricka upssched[16007]: New timer: heartbeat-failure-timer > (660 seconds) > > --------------------------------------------------- > > This is daemon log from the new version, first two lines are the last > lines of the startup messages. One same upssched process (124835) running > with 100% cpu from 21:07: > > > > Jun 11 21:02:40 fricka dummy-ups[121803]: sock_connect: enabling > asynchronous mode (auto) > > Jun 11 21:02:41 fricka usbhid-ups[121820]: sock_connect: enabling > asynchronous mode (auto) > > Jun 11 21:07:40 fricka upssched[124835]: Timer daemon started > > Jun 11 21:07:40 fricka upssched[124835]: New timer: > heartbeat-failure-timer (660 seconds) > > Jun 11 21:07:40 fricka nut-monitor[124832]: Network UPS Tools upsmon 2.8.0 > > Jun 11 21:12:40 fricka nut-monitor[128005]: Network UPS Tools upsmon 2.8.0 > > Jun 11 21:17:40 fricka upssched[124835]: Cancelling timer: > heartbeat-failure-timer > > Jun 11 21:17:40 fricka upssched[124835]: New timer: > heartbeat-failure-timer (660 seconds) > > Jun 11 21:17:40 fricka nut-monitor[131233]: Network UPS Tools upsmon 2.8.0 > > Jun 11 21:22:45 fricka nut-monitor[134330]: Network UPS Tools upsmon 2.8.0 > > Jun 11 21:27:45 fricka upssched[124835]: Cancelling timer: > heartbeat-failure-timer > > Jun 11 21:27:45 fricka upssched[124835]: New timer: > heartbeat-failure-timer (660 seconds) > > Jun 11 21:27:45 fricka nut-monitor[137838]: Network UPS Tools upsmon 2.8.0 > > > > > > --------------------------------------------------- > > > _______________________________________________ > Nut-upsuser mailing list > Nut-upsuser at alioth-lists.debian.net > https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://alioth-lists.debian.net/pipermail/nut-upsuser/attachments/20230612/ff16d3f9/attachment-0001.htm>