Alexander Spannagel
2019-Feb-22 12:59 UTC
[Samba] winbind causing huge timeouts/delays since 4.8
Hello! I want to share some findings with the community about hugh timeouts/delays since upgraded to samba 4.8 end of last year and a patch fixing this in our setup. It would be great if someone from samba dev team could take a look and if acceptable apply the patch to the common code base. It may also affect current stable and release candidates. The patch expects the patch from BUG 13503 "getpwnam resolves local system accounts to AD" being already applied. Within the company i'm working for, we see frequently system hangs/slowness for a couple of seconds on servers using winbind passwd/group resolution via nsswitch.conf since we updated our OS from CentOS7.5 to CentOS7.6 which includes a samba update from 4.7 to 4.8. We could track it down to winbind and when it is asked for an unknown local user account. This means that the users account in question is not in local passwd and doesn't contain any domain like SOMEDOMAIN\account or account at SOMEDOMAIN. The expected behavior is an immediately return with an error like "no such user" or "unknown user", but instead a call like "id unknown" takes 60+ seconds. Increasing "winbind max domain connections" could reduce this to 10+ seconds and setting "winbind use default domain" to yes could get it back to the expected immediately response. A protocol about different setups can be found at the bottom. As none of the config changes make sense as a requirement to me and setting "winbind use default domain" to yes isn't usable on some of our servers, i digged deeper using wbinfo to talk to the winbind more directly and so avoid other services affecting testing. The finding was pretty clear: [root at centos7dev64 ~]# testparm -v 2>&1 < /dev/null|grep "winbind use default domain" ; time wbinfo -i unknown winbind use default domain = No failed to call wbcGetpwnam: WBC_ERR_WINBIND_NOT_AVAILABLE Could not get info for user unknown real 1m2.522s user 0m0.005s sys 0m0.009s [root at centos7dev64 ~]# vi /etc/samba/smb.conf ; systemctl restart winbind [root at centos7dev64 ~]# testparm -v 2>&1 < /dev/null|grep "winbind use default domain" ; time wbinfo -i unknown winbind use default domain = Yes failed to call wbcGetpwnam: WBC_ERR_DOMAIN_NOT_FOUND Could not get info for user unknown real 0m0.015s user 0m0.005s sys 0m0.005s Doing some code research i could track it down to a logical change and the return value of the function parse_domain_user from within source3/winbindd/winbindd_util.c. Calling the function with this conditions: - none domain (e.g. empty) - user without a domain part (e.g. not DOM\user or user at DOM) - "winbind use default domain" set to No/false (which is the default) causes different return values: - up to version 4.7: false - since version 4.8: \0 - e.g. empty string Applying the attached patch that re-introduce the return value of false instead of '\0' fixed the described issues and we now could revert back to former config without changing "winbind use default domain" and/or "winbind max domain connections" from their default values using our patched version of samba. Hopefully this helps others and i would appreciate if it gets into common code base of samba, so it could get into usual update channels of the distributions out there. For CentOS i already reported a bug (15795) for further processing. Best regards Alex ####### Here is a protocol of a trip through the different config settings on one of our servers, which is reproducible on the other servers using winbind and samba-4.8: [root at centos7dev64 ~]# rpm -q samba-4* samba-4.8.3-4.el7.x86_64 [root at centos7dev64 ~]# testparm -v 2>&1 < /dev/null|egrep "(Server role|winbind use default domain|max domain connections)" ; time id unknown Server role: ROLE_DOMAIN_MEMBER winbind max domain connections = 1 winbind use default domain = No id: unknown: no such user real 1m8.630s user 0m0.000s sys 0m0.009s [root at centos7dev64 ~]# vi /etc/samba/smb.conf ; systemctl restart winbind ; sss_cache -E [root at centos7dev64 ~]# testparm -v 2>&1 < /dev/null|egrep "(Server role|winbind use default domain|max domain connections)" ; time id unknown Server role: ROLE_DOMAIN_MEMBER winbind max domain connections = 10 winbind use default domain = No id: unknown: no such user real 0m10.914s user 0m0.000s sys 0m0.005s [root at centos7dev64 ~]# vi /etc/samba/smb.conf ; systemctl restart winbind ; sss_cache -E [root at ecentos7dev64 ~]# testparm -v 2>&1 < /dev/null|egrep "(Server role|winbind use default domain|max domain connections)" ; time id unknown Server role: ROLE_DOMAIN_MEMBER winbind max domain connections = 10 winbind use default domain = Yes id: unknown: no such user real 0m0.020s user 0m0.002s sys 0m0.003s -------------- next part -------------- A non-text attachment was scrubbed... Name: samba-4.8.9-fix_winbind_empty_domain.patch Type: text/x-patch Size: 459 bytes Desc: not available URL: <http://lists.samba.org/pipermail/samba/attachments/20190222/3eb9e8e7/samba-4.8.9-fix_winbind_empty_domain.bin>
Hi, On Fri, Feb 22, 2019 at 01:59:15PM +0100, Alexander Spannagel via samba wrote:>I want to share some findings with the community about hugh >timeouts/delays since upgraded to samba 4.8 end of last year and a >patch fixing this in our setup. It would be great if someone from >samba dev team could take a look and if acceptable apply the patch to >the common code base. It may also affect current stable and release >candidates. >The patch expects the patch from BUG 13503 "getpwnam resolves local >system accounts to AD" being already applied. > >Within the company i'm working for, we see frequently system >hangs/slowness for a couple of seconds on servers using winbind >passwd/group resolution via nsswitch.conf since we updated our OS from >CentOS7.5 to CentOS7.6 which includes a samba update from 4.7 to 4.8. > >We could track it down to winbind and when it is asked for an unknown >local user account. This means that the users account in question is >not in local passwd and doesn't contain any domain like >SOMEDOMAIN\account or account at SOMEDOMAIN. The expected behavior is an >immediately return with an error like "no such user" or "unknown >user", but instead a call like "id unknown" takes 60+ seconds.hm, can't reproduce: slow at titan:~/git/samba/scratch$ git describe samba-4.8.3 slow at titan:~/git/samba/scratch$ sudo bin/net cache flush slow at titan:~/git/samba/scratch$ time bin/wbinfo -i foo failed to call wbcGetpwnam: WBC_ERR_DOMAIN_NOT_FOUND Could not get info for user foo real 0m0.025s user 0m0.004s sys 0m0.004s Can you share your full smb.conf? -slow -- Ralph Boehme, Samba Team https://samba.org/ Samba Developer, SerNet GmbH https://sernet.de/en/samba/ GPG-Fingerprint FAE2C6088A24252051C559E4AA1E9B7126399E46
Rowland Penny
2019-Feb-22 14:42 UTC
[Samba] winbind causing huge timeouts/delays since 4.8
On Fri, 22 Feb 2019 15:35:53 +0100 Ralph Böhme via samba <samba at lists.samba.org> wrote:> Hi, > > On Fri, Feb 22, 2019 at 01:59:15PM +0100, Alexander Spannagel via > samba wrote: > >I want to share some findings with the community about hugh > >timeouts/delays since upgraded to samba 4.8 end of last year and a > >patch fixing this in our setup. It would be great if someone from > >samba dev team could take a look and if acceptable apply the patch > >to the common code base. It may also affect current stable and > >release candidates. > >The patch expects the patch from BUG 13503 "getpwnam resolves local > >system accounts to AD" being already applied. > > > >Within the company i'm working for, we see frequently system > >hangs/slowness for a couple of seconds on servers using winbind > >passwd/group resolution via nsswitch.conf since we updated our OS > >from CentOS7.5 to CentOS7.6 which includes a samba update from 4.7 > >to 4.8. > > > >We could track it down to winbind and when it is asked for an > >unknown local user account. This means that the users account in > >question is not in local passwd and doesn't contain any domain like > >SOMEDOMAIN\account or account at SOMEDOMAIN. The expected behavior is > >an immediately return with an error like "no such user" or "unknown > >user", but instead a call like "id unknown" takes 60+ seconds. > > hm, can't reproduce: > > slow at titan:~/git/samba/scratch$ git describe > samba-4.8.3 > > slow at titan:~/git/samba/scratch$ sudo bin/net cache flush > > slow at titan:~/git/samba/scratch$ time bin/wbinfo -i foo > failed to call wbcGetpwnam: WBC_ERR_DOMAIN_NOT_FOUND > Could not get info for user foo > > real 0m0.025s > user 0m0.004s > sys 0m0.004s > > Can you share your full smb.conf? > > -slow >You might also want to explain why you are using sssd's cache with winbind. Rowland