Valentijn Sessink
2021-Dec-24 11:09 UTC
[Samba] smbd linux freeze, not responding to (TERM) signals
Hi, For a couple of years now, my smbd hangs a couple of times per year: smb daemons do not respond to TERM signal, I have to use SIGKILL. This is in a small network with mostly Apple and a few Linux clients, server running Ubuntu Linux, used to be 18.04, now is 20.04. The users complain "I cannot connect to the server" and the only way to resolve is to restart smbd; however, the smbd daemons do not respond to TERM signals, I have to KILL them. ("systemctl restart smbd.service" will wait for 90s, then kill all smbd-s). I'll try to give more information below, but I'm sure there is more to add - log level or anything. Suggestions welcome. Whenever the problem occurs, smbstatus shows several "(auth in progress)" lines and these SMBds specifically do not listen to any signals: Samba version 4.13.14-Ubuntu PID Username Group Machine Protocol Version Encryption Signing ---------------------------------------------------------------------------------------------------------------------------------------- 1696515 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:56390) SMB3_11 - - 1293711 userie userie 192.168.102.119 (ipv4:192.168.102.119:51048) SMB3_11 - partial(AES-128-CMAC) 4165094 userne userne 192.168.102.153 (ipv4:192.168.102.153:39456) SMB3_11 - partial(AES-128-CMAC) 259670 userne userne 192.168.102.153 (ipv4:192.168.102.153:39936) SMB3_11 - partial(AES-128-CMAC) 1700382 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:56400) SMB3_11 - - 1711963 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:53136) SMB3_11 - - 1708107 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:53134) SMB3_11 - - 1700371 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:56396) SMB3_11 - - 1657745 userlo userlo 192.168.103.18 (ipv4:192.168.103.18:53924) SMB3_11 - partial(AES-128-CMAC) 1696496 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:56384) SMB3_11 - - 1696495 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:56386) SMB3_11 - - 1696516 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:56392) SMB3_11 - - Service pid Machine Connected at Encryption Signing --------------------------------------------------------------------------------------------- IPC$ 1293711 192.168.102.119 vr dec 24 09:07:11 2021 CET - - shar 1293711 192.168.102.119 vr dec 24 09:07:11 2021 CET - - IPC$ 1657745 192.168.103.18 vr dec 24 10:41:34 2021 CET - - IPC$ 1293711 192.168.102.119 vr dec 24 09:07:22 2021 CET - - shar 1657745 192.168.103.18 vr dec 24 10:41:33 2021 CET - - userie 1293711 192.168.102.119 vr dec 24 09:07:11 2021 CET - - shar-shararaties 259670 192.168.102.153 do dec 23 10:37:13 2021 CET - - shar 4165094 192.168.102.153 do dec 23 09:23:24 2021 CET - - No locked files In the above exampe, "kill 1696516" doesn't seem to do anything, 1696516 stays where it is. However if I "kill -KILL" all pids that have "auth in progress" for status will make smbd behave correctly (Users: "yes, I can connect now"). This situation used to be the same under Ubuntu 18.04 - but as that was a rather old smbd, I hoped to fix things with an upgrade. (Yes, I am aware of the fact that 4.13.14-Ubuntu is older, too.) The only difference from a more straight forward setup is probably that we run a separate LDAP server for authentication, with passdb backend = ldapsam:ldap://127.0.0.1/ Also, since this is an existing situation that went from upgrade to upgrade, I suspect that there will be a few outdated options in smb.conf: [global] log level = 1 workgroup = shar passdb backend = ldapsam:ldap://127.0.0.1/ ldap admin dn = cn=admin,dc=kantoor,dc=shar,dc=nl ldap ssl = off ldap suffix = dc=kantoor,dc=shar,dc=nl ldap user suffix = ou=Users ldap group suffix = ou=Groups ldap machine suffix = ou=Computers unix extensions = yes delete readonly = yes ea support = yes ldap password sync = yes interfaces = 127.0.0.0/8 ens3 bind interfaces only = true load printers = no printing = bsd printcap name = /dev/null disable spoolss = Yes disable netbios = yes smb ports = 445 dns proxy = no vfs objects = fruit streams_xattr security = user Shares are pretty simple: [name] force group = users force directory mode = 2770 force create mode = 0660 directory mask = 2770 create mode = 0660 comment = Comment writable = yes path = /home/somewhere mangled names = no mangling char = _ valid users = @users Oh, trying to find out what the daemon is doing: strace -p 1700382 (but maybe I'm totally mistaken here and "strace" isn't the right tool): strace: Process 1700382 attached restart_syscall(<... resuming interrupted read ...> netstat shows: tcp 1 0 192.168.102.3:445 192.168.103.42:56400 CLOSE_WAIT 1700382/smbd tcp 0 0 127.0.0.1:33010 127.0.0.1:389 ESTABLISHED 1700382/smbd unix 2 [ ] DGRAM 72953128 1700382/smbd /var/lib/samba/private/msg.sock/1700382 What could cause these hangs? Best regards, Valentijn -- http://www.openoffice.nl/ Open Office - Linux Office Solutions Valentijn Sessink v.sessink at openoffice.nl +31(0)20-4214059
Hello list, I'm having a hard freeze of samba once in a while, and I don't know how to debug it best. I hope this list can be of help - or should I ask my question on the devel list? See below for the issue I'm experiencing. I'm aware that December 24 probably isn't the best day to send a message - I still had hoped for more than 0 replies ;-) (Sorry to repost - if that is considered unpolite, please feel free to tell me but please do so off-list). I know I could "strace" smbd and try to find which function stalls the thing, and I could also use wireshark; but I am afraid that this will get me tons and tons of traffic and logs - the problem manifests itself a couple of times per year and I don't have unlimited storage. Plus I don't know how to reproduce the problem properly, that is probably the biggest problem. Also, *if* I'm going to packetdump/log/strace everything, I'd rather know the best way to proceed, i.e. instead of just maxing out on log level, know if there are better options. Any clues? Pointing in the right direction is also appreciated, I looked in the Samba-wiki but did not find anything; and using the info from "troubleshooting Samba", ch9 of a 22 year old O'Reilly book seems rather futile. Best regards, Valentijn -------- Forwarded Message -------- Subject: smbd linux freeze, not responding to (TERM) signals Date: Fri, 24 Dec 2021 12:09:26 +0100 Hi, For a couple of years now, my smbd hangs a couple of times per year: smb daemons do not respond to TERM signal, I have to use SIGKILL. This is in a small network with mostly Apple and a few Linux clients, server running Ubuntu Linux, used to be 18.04, now is 20.04. The users complain "I cannot connect to the server" and the only way to resolve is to restart smbd; however, the smbd daemons do not respond to TERM signals, I have to KILL them. ("systemctl restart smbd.service" will wait for 90s, then kill all smbd-s). I'll try to give more information below, but I'm sure there is more to add - log level or anything. Suggestions welcome. Whenever the problem occurs, smbstatus shows several "(auth in progress)" lines and these SMBds specifically do not listen to any signals: Samba version 4.13.14-Ubuntu PID Username Group Machine Protocol Version Encryption Signing ---------------------------------------------------------------------------------------------------------------------------------------- 1696515 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:56390) SMB3_11 - - 1293711 userie userie 192.168.102.119 (ipv4:192.168.102.119:51048) SMB3_11 - partial(AES-128-CMAC) 4165094 userne userne 192.168.102.153 (ipv4:192.168.102.153:39456) SMB3_11 - partial(AES-128-CMAC) 259670 userne userne 192.168.102.153 (ipv4:192.168.102.153:39936) SMB3_11 - partial(AES-128-CMAC) 1700382 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:56400) SMB3_11 - - 1711963 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:53136) SMB3_11 - - 1708107 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:53134) SMB3_11 - - 1700371 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:56396) SMB3_11 - - 1657745 userlo userlo 192.168.103.18 (ipv4:192.168.103.18:53924) SMB3_11 - partial(AES-128-CMAC) 1696496 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:56384) SMB3_11 - - 1696495 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:56386) SMB3_11 - - 1696516 (auth in progress) 192.168.103.42 (ipv4:192.168.103.42:56392) SMB3_11 - - Service pid Machine Connected at Encryption Signing --------------------------------------------------------------------------------------------- IPC$ 1293711 192.168.102.119 vr dec 24 09:07:11 2021 CET - - shar 1293711 192.168.102.119 vr dec 24 09:07:11 2021 CET - - IPC$ 1657745 192.168.103.18 vr dec 24 10:41:34 2021 CET - - IPC$ 1293711 192.168.102.119 vr dec 24 09:07:22 2021 CET - - shar 1657745 192.168.103.18 vr dec 24 10:41:33 2021 CET - - userie 1293711 192.168.102.119 vr dec 24 09:07:11 2021 CET - - shar-shararaties 259670 192.168.102.153 do dec 23 10:37:13 2021 CET - - shar 4165094 192.168.102.153 do dec 23 09:23:24 2021 CET - - No locked files In the above exampe, "kill 1696516" doesn't seem to do anything, 1696516 stays where it is. However if I "kill -KILL" all pids that have "auth in progress" for status will make smbd behave correctly (Users: "yes, I can connect now"). This situation used to be the same under Ubuntu 18.04 - but as that was a rather old smbd, I hoped to fix things with an upgrade. (Yes, I am aware of the fact that 4.13.14-Ubuntu is older, too.) The only difference from a more straight forward setup is probably that we run a separate LDAP server for authentication, with passdb backend = ldapsam:ldap://127.0.0.1/ Also, since this is an existing situation that went from upgrade to upgrade, I suspect that there will be a few outdated options in smb.conf: [global] log level = 1 workgroup = shar passdb backend = ldapsam:ldap://127.0.0.1/ ldap admin dn = cn=admin,dc=kantoor,dc=shar,dc=nl ldap ssl = off ldap suffix = dc=kantoor,dc=shar,dc=nl ldap user suffix = ou=Users ldap group suffix = ou=Groups ldap machine suffix = ou=Computers unix extensions = yes delete readonly = yes ea support = yes ldap password sync = yes interfaces = 127.0.0.0/8 ens3 bind interfaces only = true load printers = no printing = bsd printcap name = /dev/null disable spoolss = Yes disable netbios = yes smb ports = 445 dns proxy = no vfs objects = fruit streams_xattr security = user Shares are pretty simple: [name] force group = users force directory mode = 2770 force create mode = 0660 directory mask = 2770 create mode = 0660 comment = Comment writable = yes path = /home/somewhere mangled names = no mangling char = _ valid users = @users Oh, trying to find out what the daemon is doing: strace -p 1700382 (but maybe I'm totally mistaken here and "strace" isn't the right tool): strace: Process 1700382 attached restart_syscall(<... resuming interrupted read ...> netstat shows: tcp 1 0 192.168.102.3:445 192.168.103.42:56400 CLOSE_WAIT 1700382/smbd tcp 0 0 127.0.0.1:33010 127.0.0.1:389 ESTABLISHED 1700382/smbd unix 2 [ ] DGRAM 72953128 1700382/smbd /var/lib/samba/private/msg.sock/1700382 What could cause these hangs? Best regards, Valentijn -- http://www.openoffice.nl/ Open Office - Linux Office Solutions Valentijn Sessink v.sessink at openoffice.nl +31(0)20-4214059
Hai Valentijn, Yeah, on 24e not many are behind there pc.. I had a quick look, and I dont see much wrong here.. But this device. : 192.168.103.42 What is that and what does it use to connect. because, that looks to be the problem. Now, you?re currenlty using an OpenLDAP backend, Ubuntu notes in documentation that you should instead integrate Samba with its own LDAP server in AD mode. So my advice, setup samba AD and use its integrated ldap. Greetz, Louis> -----Oorspronkelijk bericht----- > Van: samba [mailto:samba-bounces at lists.samba.org] Namens > Valentijn Sessink via samba > Verzonden: woensdag 29 december 2021 13:50 > Aan: samba at lists.samba.org > Onderwerp: [Samba] How to debug a hard freeze? > > Hello list, > > I'm having a hard freeze of samba once in a while, and I > don't know how > to debug it best. I hope this list can be of help - or should > I ask my > question on the devel list? > > See below for the issue I'm experiencing. I'm aware that December 24 > probably isn't the best day to send a message - I still had hoped for > more than 0 replies ;-) (Sorry to repost - if that is considered > unpolite, please feel free to tell me but please do so off-list). > > I know I could "strace" smbd and try to find which function > stalls the > thing, and I could also use wireshark; but I am afraid that this will > get me tons and tons of traffic and logs - the problem > manifests itself > a couple of times per year and I don't have unlimited storage. Plus I > don't know how to reproduce the problem properly, that is > probably the > biggest problem. > > Also, *if* I'm going to packetdump/log/strace everything, I'd rather > know the best way to proceed, i.e. instead of just maxing out on log > level, know if there are better options. > > Any clues? Pointing in the right direction is also > appreciated, I looked > in the Samba-wiki but did not find anything; and using the info from > "troubleshooting Samba", ch9 of a 22 year old O'Reilly book > seems rather > futile. > > Best regards, > > Valentijn > > -------- Forwarded Message -------- > Subject: smbd linux freeze, not responding to (TERM) signals > Date: Fri, 24 Dec 2021 12:09:26 +0100 > > Hi, > > For a couple of years now, my smbd hangs a couple of times > per year: smb > daemons do not respond to TERM signal, I have to use SIGKILL. > > This is in a small network with mostly Apple and a few Linux clients, > server running Ubuntu Linux, used to be 18.04, now is 20.04. > > The users complain "I cannot connect to the server" and the > only way to > resolve is to restart smbd; however, the smbd daemons do not > respond to > TERM signals, I have to KILL them. ("systemctl restart smbd.service" > will wait for 90s, then kill all smbd-s). > > I'll try to give more information below, but I'm sure there > is more to > add - log level or anything. Suggestions welcome. > > Whenever the problem occurs, smbstatus shows several "(auth in > progress)" lines and these SMBds specifically do not listen > to any signals: > > Samba version 4.13.14-Ubuntu > PID Username Group Machine Protocol Version > Encryption Signing > -------------------------------------------------------------- > -------------------------------------------------------------- > ------------ > 1696515 (auth in progress) 192.168.103.42 > (ipv4:192.168.103.42:56390) SMB3_11 - - > 1293711 userie userie 192.168.102.119 > (ipv4:192.168.102.119:51048) SMB3_11 - partial(AES-128-CMAC) > 4165094 userne userne 192.168.102.153 > (ipv4:192.168.102.153:39456) SMB3_11 - partial(AES-128-CMAC) > 259670 userne userne 192.168.102.153 > (ipv4:192.168.102.153:39936) SMB3_11 - partial(AES-128-CMAC) > 1700382 (auth in progress) 192.168.103.42 > (ipv4:192.168.103.42:56400) SMB3_11 - - > 1711963 (auth in progress) 192.168.103.42 > (ipv4:192.168.103.42:53136) SMB3_11 - - > 1708107 (auth in progress) 192.168.103.42 > (ipv4:192.168.103.42:53134) SMB3_11 - - > 1700371 (auth in progress) 192.168.103.42 > (ipv4:192.168.103.42:56396) SMB3_11 - - > 1657745 userlo userlo 192.168.103.18 > (ipv4:192.168.103.18:53924) SMB3_11 - partial(AES-128-CMAC) > 1696496 (auth in progress) 192.168.103.42 > (ipv4:192.168.103.42:56384) SMB3_11 - - > 1696495 (auth in progress) 192.168.103.42 > (ipv4:192.168.103.42:56386) SMB3_11 - - > 1696516 (auth in progress) 192.168.103.42 > (ipv4:192.168.103.42:56392) SMB3_11 - - > > Service pid Machine Connected at Encryption Signing > -------------------------------------------------------------- > ------------------------------- > IPC$ 1293711 192.168.102.119 vr dec 24 09:07:11 2021 > CET - > - > shar 1293711 192.168.102.119 vr dec 24 09:07:11 2021 > CET - > - > IPC$ 1657745 192.168.103.18 vr dec 24 10:41:34 2021 > CET - > - > IPC$ 1293711 192.168.102.119 vr dec 24 09:07:22 2021 > CET - > - > shar 1657745 192.168.103.18 vr dec 24 10:41:33 2021 > CET - > - > userie 1293711 192.168.102.119 vr dec 24 09:07:11 2021 > CET - > - > shar-shararaties 259670 192.168.102.153 do dec 23 10:37:13 > 2021 CET - > - > shar 4165094 192.168.102.153 do dec 23 09:23:24 2021 > CET - > - > > No locked files > > In the above exampe, "kill 1696516" doesn't seem to do > anything, 1696516 > stays where it is. However if I "kill -KILL" all pids that > have "auth in > progress" for status will make smbd behave correctly (Users: > "yes, I can > connect now"). > > This situation used to be the same under Ubuntu 18.04 - but > as that was > a rather old smbd, I hoped to fix things with an upgrade. (Yes, I am > aware of the fact that 4.13.14-Ubuntu is older, too.) > > The only difference from a more straight forward setup is > probably that > we run a separate LDAP server for authentication, with passdb > backend = > ldapsam:ldap://127.0.0.1/ > Also, since this is an existing situation that went from upgrade to > upgrade, I suspect that there will be a few outdated options > in smb.conf: > > [global] > log level = 1 > workgroup = shar > passdb backend = ldapsam:ldap://127.0.0.1/ > ldap admin dn = cn=admin,dc=kantoor,dc=shar,dc=nl > ldap ssl = off > ldap suffix = dc=kantoor,dc=shar,dc=nl > ldap user suffix = ou=Users > ldap group suffix = ou=Groups > ldap machine suffix = ou=Computers > unix extensions = yes > delete readonly = yes > ea support = yes > ldap password sync = yes > interfaces = 127.0.0.0/8 ens3 > bind interfaces only = true > load printers = no > printing = bsd > printcap name = /dev/null > disable spoolss = Yes > disable netbios = yes > smb ports = 445 > dns proxy = no > vfs objects = fruit streams_xattr > security = user > > Shares are pretty simple: > [name] > force group = users > force directory mode = 2770 > force create mode = 0660 > directory mask = 2770 > create mode = 0660 > comment = Comment > writable = yes > path = /home/somewhere > mangled names = no > mangling char = _ > valid users = @users > > > Oh, trying to find out what the daemon is doing: > strace -p 1700382 (but maybe I'm totally mistaken here and "strace" > isn't the right tool): > strace: Process 1700382 attached > restart_syscall(<... resuming interrupted read ...> > > > netstat shows: > tcp 1 0 192.168.102.3:445 192.168.103.42:56400 > CLOSE_WAIT 1700382/smbd > tcp 0 0 127.0.0.1:33010 127.0.0.1:389 ESTABLISHED > 1700382/smbd > unix 2 [ ] DGRAM 72953128 > 1700382/smbd > /var/lib/samba/private/msg.sock/1700382 > > What could cause these hangs? > > Best regards, > > Valentijn > -- > http://www.openoffice.nl/ Open Office - Linux Office Solutions > Valentijn Sessink v.sessink at openoffice.nl +31(0)20-4214059 > > -- > To unsubscribe from this list go to the following URL and read the > instructions: https://lists.samba.org/mailman/options/samba > >