Jim Klimov
2023-Jun-13  13:41 UTC
[Nut-upsdev] [Nut-upsuser] Upssched 100% CPU after updating Debian 12
So, got some good news: I hear(*) I managed to reproduce the problem with current NUT master and an adapted copy of your posted configs and script :D Experimental debugging now sounds possible. (*) PC under the desk wails with all its cooling fans as soon as I started the client which spawned a daemon and itself had exited: $ UPSNAME=heartbeat at localhost NOTIFYTYPE=ONBATT NUT_STATEPATH=/dev/shm NUT_CONFPATH=`pwd`/tests/NIT/tmp/etc ./clients/upssched -DDDDDDDD 0.000000 [D2] parse_at: is 'heartbeat at localhost' in AT command the 'heartbeat at localhost' we were launched to process? 0.000053 [D1] parse_at: 'heartbeat at localhost' in AT command matched the 'heartbeat at localhost' UPSNAME we were launched to process 0.000066 [D1] parse_at: processing CANCEL-TIMER 0.000089 [D2] parse_at: is 'heartbeat at localhost' in AT command the 'heartbeat at localhost' we were launched to process? 0.000096 [D1] parse_at: 'heartbeat at localhost' in AT command matched the 'heartbeat at localhost' UPSNAME we were launched to process 0.000104 [D1] parse_at: processing START-TIMER $ Jim On Tue, Jun 13, 2023 at 1:58?PM Kari Lempi?inen < kari.lempiainen at summerday.net> wrote:> Hi, > > Thanks Jim! Here is more system information from the commands you > mentioned. > > Kari > > root at fricka:~# lsof -p 1716171 > lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1002/gvfs > Output information may be incomplete. > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > upssched 1716171 root cwd DIR 8,2 4096 2 / > upssched 1716171 root rtd DIR 8,2 4096 2 / > upssched 1716171 root txt REG 8,2 39240 21762148 > /usr/sbin/upssched > upssched 1716171 root mem REG 8,2 1922136 21764230 > /usr/lib/x86_64-linux-gnu/libc.so.6 > upssched 1716171 root mem REG 8,2 210968 21764476 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > upssched 1716171 root 0u CHR 1,3 0t0 4 > /dev/null > upssched 1716171 root 1u CHR 1,3 0t0 4 > /dev/null > upssched 1716171 root 2u CHR 1,3 0t0 4 > /dev/null > upssched 1716171 root 3u unix 0x00000000eeb52825 0t0 16427010 > type=DGRAM (CONNECTED) > upssched 1716171 root 4r REG 8,2 545 39847858 > /etc/nut/upssched.conf > upssched 1716171 root 5u unix 0x00000000a035b466 0t0 16426351 > type=STREAM (UNCONNECTED) > upssched 1716171 root 6u unix 0x0000000050820b2b 0t0 16426352 > type=STREAM (UNCONNECTED) > upssched 1716171 root 7u unix 0x00000000447aac0e 0t0 16424724 > /var/run/nut/upssched.pipe type=STREAM (CONNECTED) > upssched 1716171 root 10u unix 0x00000000d7bd8088 0t0 16424722 > /var/run/nut/upssched.pipe type=STREAM (LISTEN) > > > root at fricka:~# ps -ef|grep ups > root 1425 1 0 Jun11 ? 00:00:00 sshd: /usr/sbin/sshd > -D [listener] 0 of 10-100 startups > root 1156722 1 0 00:00 ? 00:00:00 /usr/sbin/cupsd -l > root 1156723 1 0 00:00 ? 00:00:00 /usr/sbin/cups-browsed > lp 1156748 1156722 0 00:00 ? 00:00:00 > /usr/lib/cups/notifier/dbus dbus:// > nut 1713117 1 0 14:30 ? 00:00:00 /lib/nut/dummy-ups -a > heartbeat > nut 1713154 1 0 14:30 ? 00:00:00 /lib/nut/usbhid-ups -a > eaton5s > nut 1713155 1 0 14:30 ? 00:00:00 /lib/nut/upsd -F > root 1713156 1 0 14:30 ? 00:00:00 /lib/nut/upsmon -F > root 1713157 1713156 0 14:30 ? 00:00:00 /lib/nut/upsmon -F > root 1716171 1 99 14:35 ? 00:05:10 /sbin/upssched UPS > heartbeat at localhost: On battery. > root 1719510 1550505 0 14:40 pts/2 00:00:00 grep ups > > > > > root at fricka:~# fuser /var/run/nut/upssched.pipe > /run/nut/upssched.pipe: 1716171 > > > root at fricka:/proc/1716171/fd# ls -la > total 0 > dr-x------ 2 root root 0 Jun 13 14:36 . > dr-xr-xr-x 9 root root 0 Jun 13 14:35 .. > lrwx------ 1 root root 64 Jun 13 14:36 0 -> /dev/null > lrwx------ 1 root root 64 Jun 13 14:36 1 -> /dev/null > lrwx------ 1 root root 64 Jun 13 14:36 10 -> 'socket:[16424722]' > lrwx------ 1 root root 64 Jun 13 14:36 2 -> /dev/null > lrwx------ 1 root root 64 Jun 13 14:36 3 -> 'socket:[16427010]' > lr-x------ 1 root root 64 Jun 13 14:36 4 -> /etc/nut/upssched.conf > lrwx------ 1 root root 64 Jun 13 14:36 5 -> 'socket:[16426351]' > lrwx------ 1 root root 64 Jun 13 14:36 6 -> 'socket:[16426352]' > lrwx------ 1 root root 64 Jun 13 14:36 7 -> 'socket:[16424724]' > > > > ------------------------------ > *From:* Jim Klimov <jimklimov+nut at gmail.com> > *Sent:* Tuesday, June 13, 2023 14:17 > *To:* Kari Lempi?inen <kari.lempiainen at summerday.net> > *Cc:* Arnaud Quette via Nut-upsuser <nut-upsuser at alioth-lists.debian.net>; > nut-upsdev <nut-upsdev at alioth-lists.debian.net>; Dimitris Economou < > dimitris.s.economou at gmail.com> > *Subject:* Re: [Nut-upsuser] Upssched 100% CPU after updating Debian 12 > > FWIW, cross-posted the issue to NUT GitHub tracker: > https://github.com/networkupstools/nut/issues/1964 > > Jim >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://alioth-lists.debian.net/pipermail/nut-upsdev/attachments/20230613/f942a68a/attachment.htm>
Jim Klimov
2023-Jun-13  14:06 UTC
[Nut-upsdev] [Nut-upsuser] Upssched 100% CPU after updating Debian 12
After launching the command several times, with debug (posted by new code
in a new branch for the investigation) confirming that the same daemon
handles operations from the new client instances, its strace now has
numerous FDs to report after select() - so I guess it is a problem of
detecting an exit of the counterpart.
   0.000000     [D2] parse_at: is 'heartbeat at localhost' in AT command
the
'heartbeat at localhost' we were launched to process?
   0.000034     [D1] parse_at: 'heartbeat at localhost' in AT command
matched
the 'heartbeat at localhost' UPSNAME we were launched to process
   0.000041     [D1] parse_at: processing CANCEL-TIMER
   0.000054     [D2] parse_at: is 'heartbeat at localhost' in AT command
the
'heartbeat at localhost' we were launched to process?
   0.000057     [D1] parse_at: 'heartbeat at localhost' in AT command
matched
the 'heartbeat at localhost' UPSNAME we were launched to process
   0.000060     [D1] parse_at: processing START-TIMER
   0.000151     [D1] Keeping stderr open due to debug verbosity 8
   0.000195     Timer daemon started
   0.000204     [D2] Timer daemon waiting for connections on pipefd 10
   0.250325     [D3] new connection on fd 7
   0.250377     New timer: heartbeat-failure-timer (660 seconds)
   0.250423     [D1] Exiting upssched (CLI process)
   0.000000     [D2] parse_at: is 'heartbeat at localhost' in AT command
the
'heartbeat at localhost' we were launched to process?
   0.000039     [D1] parse_at: 'heartbeat at localhost' in AT command
matched
the 'heartbeat at localhost' UPSNAME we were launched to process
   0.000047     [D1] parse_at: processing CANCEL-TIMER
  14.745773     [D3] new connection on fd 8
  14.745829     Cancelling timer: heartbeat-failure-timer
   0.000153     [D2] parse_at: is 'heartbeat at localhost' in AT command
the
'heartbeat at localhost' we were launched to process?
   0.000159     [D1] parse_at: 'heartbeat at localhost' in AT command
matched
the 'heartbeat at localhost' UPSNAME we were launched to process
   0.000163     [D1] parse_at: processing START-TIMER
  14.745890     [D3] new connection on fd 9
  14.745969     New timer: heartbeat-failure-timer (660 seconds)
   0.000293     [D1] Exiting upssched (CLI process)
   0.000001     [D2] parse_at: is 'heartbeat at localhost' in AT command
the
'heartbeat at localhost' we were launched to process?
   0.000042     [D1] parse_at: 'heartbeat at localhost' in AT command
matched
the 'heartbeat at localhost' UPSNAME we were launched to process
   0.000050     [D1] parse_at: processing CANCEL-TIMER
  47.674032     [D3] new connection on fd 11
  47.674160     Cancelling timer: heartbeat-failure-timer
   0.000266     [D2] parse_at: is 'heartbeat at localhost' in AT command
the
'heartbeat at localhost' we were launched to process?
   0.000277     [D1] parse_at: 'heartbeat at localhost' in AT command
matched
the 'heartbeat at localhost' UPSNAME we were launched to process
   0.000284     [D1] parse_at: processing START-TIMER
  47.674235     [D3] new connection on fd 12
  47.674388     New timer: heartbeat-failure-timer (660 seconds)
   0.000487     [D1] Exiting upssched (CLI process)
Note that `select` only lists remnants of "new connection" but not the
FD#10 (pipefd):
    select(13, [7 8 9 10 11 12], NULL, NULL, {tv_sec=1, tv_usec=0}) = 5 (in
[7 8 9 11 12], left {tv_sec=0, tv_usec=999997})
Jim
On Tue, Jun 13, 2023 at 3:41?PM Jim Klimov <jimklimov+nut at gmail.com>
wrote:
> So, got some good news: I hear(*) I managed to reproduce the problem with
> current NUT master and an adapted copy of your posted configs and script :D
> Experimental debugging now sounds possible.
>
> (*) PC under the desk wails with all its cooling fans as soon as I started
> the client which spawned a daemon and itself had exited:
>
> $ UPSNAME=heartbeat at localhost NOTIFYTYPE=ONBATT NUT_STATEPATH=/dev/shm
> NUT_CONFPATH=`pwd`/tests/NIT/tmp/etc ./clients/upssched -DDDDDDDD
>    0.000000     [D2] parse_at: is 'heartbeat at localhost' in AT
command the
> 'heartbeat at localhost' we were launched to process?
>    0.000053     [D1] parse_at: 'heartbeat at localhost' in AT
command
> matched the 'heartbeat at localhost' UPSNAME we were launched to
process
>    0.000066     [D1] parse_at: processing CANCEL-TIMER
>    0.000089     [D2] parse_at: is 'heartbeat at localhost' in AT
command the
> 'heartbeat at localhost' we were launched to process?
>    0.000096     [D1] parse_at: 'heartbeat at localhost' in AT
command
> matched the 'heartbeat at localhost' UPSNAME we were launched to
process
>    0.000104     [D1] parse_at: processing START-TIMER
>
> $
>
> Jim
>
>
> On Tue, Jun 13, 2023 at 1:58?PM Kari Lempi?inen <
> kari.lempiainen at summerday.net> wrote:
>
>> Hi,
>>
>> Thanks Jim! Here is more system information from the commands you
>> mentioned.
>>
>> Kari
>>
>> root at fricka:~# lsof -p 1716171
>> lsof: WARNING: can't stat() fuse.gvfsd-fuse file system
>> /run/user/1002/gvfs
>>       Output information may be incomplete.
>> COMMAND      PID USER   FD   TYPE             DEVICE SIZE/OFF     NODE
>> NAME
>> upssched 1716171 root  cwd    DIR                8,2     4096        2
/
>> upssched 1716171 root  rtd    DIR                8,2     4096        2
/
>> upssched 1716171 root  txt    REG                8,2    39240 21762148
>> /usr/sbin/upssched
>> upssched 1716171 root  mem    REG                8,2  1922136 21764230
>> /usr/lib/x86_64-linux-gnu/libc.so.6
>> upssched 1716171 root  mem    REG                8,2   210968 21764476
>> /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
>> upssched 1716171 root    0u   CHR                1,3      0t0        4
>> /dev/null
>> upssched 1716171 root    1u   CHR                1,3      0t0        4
>> /dev/null
>> upssched 1716171 root    2u   CHR                1,3      0t0        4
>> /dev/null
>> upssched 1716171 root    3u  unix 0x00000000eeb52825      0t0 16427010
>> type=DGRAM (CONNECTED)
>> upssched 1716171 root    4r   REG                8,2      545 39847858
>> /etc/nut/upssched.conf
>> upssched 1716171 root    5u  unix 0x00000000a035b466      0t0 16426351
>> type=STREAM (UNCONNECTED)
>> upssched 1716171 root    6u  unix 0x0000000050820b2b      0t0 16426352
>> type=STREAM (UNCONNECTED)
>> upssched 1716171 root    7u  unix 0x00000000447aac0e      0t0 16424724
>> /var/run/nut/upssched.pipe type=STREAM (CONNECTED)
>> upssched 1716171 root   10u  unix 0x00000000d7bd8088      0t0 16424722
>> /var/run/nut/upssched.pipe type=STREAM (LISTEN)
>>
>>
>> root at fricka:~# ps -ef|grep ups
>> root        1425       1  0 Jun11 ?        00:00:00 sshd:
/usr/sbin/sshd
>> -D [listener] 0 of 10-100 startups
>> root     1156722       1  0 00:00 ?        00:00:00 /usr/sbin/cupsd -l
>> root     1156723       1  0 00:00 ?        00:00:00
/usr/sbin/cups-browsed
>> lp       1156748 1156722  0 00:00 ?        00:00:00
>> /usr/lib/cups/notifier/dbus dbus://
>> nut      1713117       1  0 14:30 ?        00:00:00 /lib/nut/dummy-ups
-a
>> heartbeat
>> nut      1713154       1  0 14:30 ?        00:00:00 /lib/nut/usbhid-ups
>> -a eaton5s
>> nut      1713155       1  0 14:30 ?        00:00:00 /lib/nut/upsd -F
>> root     1713156       1  0 14:30 ?        00:00:00 /lib/nut/upsmon -F
>> root     1713157 1713156  0 14:30 ?        00:00:00 /lib/nut/upsmon -F
>> root     1716171       1 99 14:35 ?        00:05:10 /sbin/upssched UPS
>> heartbeat at localhost: On battery.
>> root     1719510 1550505  0 14:40 pts/2    00:00:00 grep ups
>>
>>
>>
>>
>> root at fricka:~# fuser /var/run/nut/upssched.pipe
>> /run/nut/upssched.pipe: 1716171
>>
>>
>> root at fricka:/proc/1716171/fd# ls -la
>> total 0
>> dr-x------ 2 root root  0 Jun 13 14:36 .
>> dr-xr-xr-x 9 root root  0 Jun 13 14:35 ..
>> lrwx------ 1 root root 64 Jun 13 14:36 0 -> /dev/null
>> lrwx------ 1 root root 64 Jun 13 14:36 1 -> /dev/null
>> lrwx------ 1 root root 64 Jun 13 14:36 10 ->
'socket:[16424722]'
>> lrwx------ 1 root root 64 Jun 13 14:36 2 -> /dev/null
>> lrwx------ 1 root root 64 Jun 13 14:36 3 ->
'socket:[16427010]'
>> lr-x------ 1 root root 64 Jun 13 14:36 4 -> /etc/nut/upssched.conf
>> lrwx------ 1 root root 64 Jun 13 14:36 5 ->
'socket:[16426351]'
>> lrwx------ 1 root root 64 Jun 13 14:36 6 ->
'socket:[16426352]'
>> lrwx------ 1 root root 64 Jun 13 14:36 7 ->
'socket:[16424724]'
>>
>>
>>
>> ------------------------------
>> *From:* Jim Klimov <jimklimov+nut at gmail.com>
>> *Sent:* Tuesday, June 13, 2023 14:17
>> *To:* Kari Lempi?inen <kari.lempiainen at summerday.net>
>> *Cc:* Arnaud Quette via Nut-upsuser <nut-upsuser at
alioth-lists.debian.net>;
>> nut-upsdev <nut-upsdev at alioth-lists.debian.net>; Dimitris
Economou <
>> dimitris.s.economou at gmail.com>
>> *Subject:* Re: [Nut-upsuser] Upssched 100% CPU after updating Debian 12
>>
>> FWIW, cross-posted the issue to NUT GitHub tracker:
>> https://github.com/networkupstools/nut/issues/1964
>>
>> Jim
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://alioth-lists.debian.net/pipermail/nut-upsdev/attachments/20230613/1ed3a16b/attachment-0001.htm>