Hi,
I may've observed a memory leak condition affecting the rpcd_spoolss
processes, leading them to cause
OOM-Events on one of the systems I maintain.
The used Samba version is 4.16.1 (4.16.1+dfsg-8~bpo11+1) and the Kernel version
is 4.19.0-17-amd64.
~ # awk '/oom_reaper/' /var/log/syslog | grep -o -P
'(?<=\().*(?=\))' | sort | uniq -c
28 rpcd_spoolss
So far there have been 28 OOM-Events in total where the rpcd_spoolss process got
terminated by the oom_reaper.
As processes being terminated by the oom_reaper doesn't necessarily have to
be the processes causing the
OOM situation, I cross-checked this observation by evaluation the top 3
processes consuming most resident
memory using atopsar.
I choose a timeframe between the last two OOM-Events (which were about 2h apart)
as we only store atop
metrics in a 10m interval, which would've been too narrow (mostly less than
10-20m between each
OOM-Event for the other Events) to observe the steadily increase in memory
allocation throughout
the day:
~ # atopsar -b 1500 -e 1700 -G -r /var/log/atop/atop_20220920
iserv 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64
2022/09/20
-------------------------- analysis date: 2022/09/20 --------------------------
15:00:02 pid command mem% | pid command mem% | pid command mem%_top3_
15:10:02 9479 avast 4% | 19187 rpcd_spo 3% | 31418 squid 3%
15:20:02 9479 avast 4% | 19187 rpcd_spo 4% | 19184 rpcd_spo 4%
15:30:02 19187 rpcd_spo 5% | 19184 rpcd_spo 4% | 19185 rpcd_spo 4%
15:40:02 18671 rpcd_spo 5% | 19185 rpcd_spo 5% | 19187 rpcd_spo 5%
15:50:02 19185 rpcd_spo 7% | 18671 rpcd_spo 6% | 19187 rpcd_spo 6%
16:00:02 19185 rpcd_spo 8% | 19184 rpcd_spo 6% | 19186 rpcd_spo 6%
16:10:02 19185 rpcd_spo 8% | 19184 rpcd_spo 6% | 19186 rpcd_spo 6%
16:20:02 19185 rpcd_spo 8% | 19184 rpcd_spo 7% | 19186 rpcd_spo 6%
16:30:02 19185 rpcd_spo 9% | 19184 rpcd_spo 8% | 19186 rpcd_spo 7%
16:40:02 19185 rpcd_spo 10% | 19184 rpcd_spo 9% | 19186 rpcd_spo 8%
16:50:02 19185 rpcd_spo 11% | 19184 rpcd_spo 10% | 19186 rpcd_spo 9%
It can be observed that the rpcd_spoolss processes are steadily allocating
memory without seemingly freeing it again.
Unfortunately the debug level of rpcd_spoolss is set to the default level 1:
~ # ps aux | grep spoolss
root 15133 29.4 1.4 531056 245720 ? S 20:06 3:23
/usr/libexec/samba/rpcd_spoolss --configfile=/etc/samba/smb.conf
--worker-group=1 --worker-index=2 --debuglevel=1
root 18671 7.2 9.4 1838952 1545112 ? S 14:35 24:49
/usr/libexec/samba/rpcd_spoolss --configfile=/etc/samba/smb.conf
--worker-group=1 --worker-index=0 --debuglevel=1
root 19184 4.8 10.0 1947312 1652344 ? S 14:35 16:39
/usr/libexec/samba/rpcd_spoolss --configfile=/etc/samba/smb.conf
--worker-group=1 --worker-index=1 --debuglevel=1
root 19186 5.1 9.6 1878384 1587856 ? S 14:35 17:38
/usr/libexec/samba/rpcd_spoolss --configfile=/etc/samba/smb.conf
--worker-group=1 --worker-index=3 --debuglevel=1
root 19187 4.9 8.4 1684464 1390200 ? S 14:35 16:57
/usr/libexec/samba/rpcd_spoolss --configfile=/etc/samba/smb.conf
--worker-group=1 --worker-index=4 --debuglevel=1
which means I do only have access to log messages in log.rpcd_spoolss with
limited verbosity.
A quick parse of the rpcd_spoolss debug log seems to imply, that the
delete_and_reload_printers function
as well as printer_list_get_last_refresh are logging most frequently to the
debug log (-n+6304308 as a offset
in tail to only consider todays log entries as l6304308 is the first log entry
of this day):
~ # tail -n+6304308 /var/log/samba/log.rpcd_spoolss | grep -o -P
'(?<=\.\./).*' | sort | uniq -c | sort -n
1 ../librpc/ndr/ndr.c:1150
1 ../librpc/ndr/ndr.c:630(_ndr_pull_error)
2 ../source3/lib/smbrun.c:102(smbrun_internal)
3 ../source3/rpc_client/cli_winreg_spoolss.c:762(winreg_create_printer)
7 ../source3/rpc_client/cli_winreg_spoolss.c:227(winreg_printer_openkey)
34
../source3/rpc_server/spoolss/srv_spoolss_util.c:57(winreg_printer_binding_handle)
68 ../source3/printing/print_cups.c:1100(cups_job_submit)
530 ../source3/rpc_server/rpc_worker.c:1105(rpc_worker_main)
7568 ../librpc/rpc/dcesrv_handles.c:130(dcesrv_handle_lookup)
25800 ../source3/printing/printer_list.c:255(printer_list_get_last_refresh)
25800 ../source3/smbd/server_reload.c:73(delete_and_reload_printers)
The exact log messages are:
~ # tail -n+6304308 /var/log/samba/log.rpcd_spoolss | grep -A1
"printer_list_get_last_refresh" | grep -v
'printer_list_get_last_refresh\|--' | sort | uniq -c
25800 Failed to fetch record!
~ # tail -n+6304308 /var/log/samba/log.rpcd_spoolss | grep -A1
"delete_and_reload_printers" | grep -v
'delete_and_reload_printers\|--' | sort | uniq -c
25800 pcap cache not loaded
However I am unsure wether the log messages point us into the right direction,
as a malfunctioning query as well as a debug
information that a cache hasn't be loaded is rather unlikely a cause of a
Memory Leak. Especially when checking this against
the log.rpcd_spoolss of a server without the memory leak behaviour:
~ # grep -o -P '(?<=\.\./).*' /var/log/samba/log.rpcd_spoolss | sort
| uniq -c | sort -n
1 ../librpc/ndr/ndr.c:1150
1 ../librpc/ndr/ndr.c:630(_ndr_pull_error)
1 ../librpc/rpc/dcesrv_auth.c:371(dcesrv_auth_complete)
1260 ../source3/rpc_server/rpc_worker.c:1105(rpc_worker_main)
4158 ../source3/printing/printer_list.c:255(printer_list_get_last_refresh)
4158 ../source3/smbd/server_reload.c:73(delete_and_reload_printers)
6930 ../librpc/rpc/dcesrv_handles.c:130(dcesrv_handle_lookup)
I was able to observe in samba-ad-dc.services journal, that
rpc_host_distribute_clients seems to allocate new clients to
the rpcd_spoolss as well as rpcd_winreg processes quite frequently within an
interval of 5 minutes:
~ # journalctl -u samba-ad-dc.service --since "5 minutes ago"
--g="rpc_host_distribute_clients" -o json | jq '.MESSAGE' |
sort | uniq -c | sort -n
1 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_classic to 2524 with 0 clients"
1 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 26 clients"
1 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19184 with 26 clients"
1 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19184 with 27 clients"
1 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19186 with 25 clients"
1 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19187 with 25 clients"
1 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19187 with 26 clients"
2 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 25 clients"
2 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 18671 with 26 clients"
2 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 18671 with 27 clients"
2 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19184 with 25 clients"
2 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19186 with 23 clients"
2 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19186 with 26 clients"
3 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 18671 with 25 clients"
3 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19187 with 24 clients"
5 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_winreg to 1234 with 0 clients"
6 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19186 with 24 clients"
10 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 24 clients"
13 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19186 with 22 clients"
22 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19184 with 24 clients"
39 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 18671 with 24 clients"
45 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19184 with 22 clients"
53 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19187 with 23 clients"
77 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 18671 with 22 clients"
77 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_winreg to 20226 with 0 clients"
98 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 23 clients"
109 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 19184 with 23 clients"
113 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 18671 with 23 clients"
150 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 14 clients"
155 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 13 clients"
162 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 18671 with 21 clients"
170 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 22 clients"
179 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 15 clients"
256 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 16 clients"
276 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 21 clients"
323 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 19 clients"
346 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 20 clients"
360 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_winreg to 13068 with 0 clients"
366 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 18 clients"
392 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_spoolss to 15133 with 17 clients"
1371 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_winreg to 11438 with 0 clients"
10316 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_winreg to 18710 with 0 clients"
Whereas on the, without a known OOM-Event, comparsion server with a similiar
number of users
and printers there are hardly as many calls of rpc_host_distribute_clients:
~ # journalctl -u samba-ad-dc.service --since "1 day ago"
--g="rpc_host_distribute_clients" -o json | jq '.MESSAGE' |
sort | uniq -c
1 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_classic to 593335 with 0 clients"
2 "rpc_host_distribute_clients: Sending new client
/usr/libexec/samba/rpcd_winreg to 593350 with 0 clients"
Could this be caused by a flood of DCERPC bind packets as
rpc_host_endpoint_accept_got_bind would call
rpc_host_distribute_clients upon receiving those? The seemingly odd "with 0
clients" message could be caused
by malformed DCERPC packets then? I definitely should observe the network
traffic regarding that over time.
I still got reading into the source of rpcd_spoolss as well as rpcd_winreg on my
agenda, as well as analyzing what exactly triggers
rpc_host_distribute_clients to get called without actually sending clients to
the rpcd_winreg/rpcd_spoolss (the `with 0 clients`
log messages seem odd to me), so I'd hopefully be able to provide more
context on what may've caused the memory leak condition.
For now, I'd appreciate pointers into the right direction in terms of
debugging the root cause of this, if this behaviour/the
memory leak cause of the rpcd_spoolss service has been observed before.
Regards,
Wilko Meyer
IServ GmbH