> We just had an interesting experience here. One of our AD servers was down
for 90 minutes due to the server being physically moved to another location.
This shouldn?t be a problem since there are 5 other AD servers in that ?group?
that can take over the load. However it seems Samba (when used as a fileserver)
for some reason is taking quite a long time to ?give up? on the first one and
switch to one of the alternative ones.
>
> Don?t know if it?s the Kerberos bits or if it?s the LDAP connection (or
both) that is slow to ?switch?.
>
> Am I the only one seeing this?
No, we're experiencing the same behaviour (FreeBSD 12.1 p8, Samba 4.10.15).
Although we have the impression that it also occurs when an AD server responds a
bit (too) slow.
> Is there something that can be done to speed that process up?
>
> I guess I could force Samba to talk to a special virtual ?AD? address we
have that is behind a load balancer (it?s mainly used for equipments that needs
to talk to the AD servers but only can talk to one specific server) but I?ve
tried to keep the configuration as normal as possible so...
There is a post on the FreeBSD forum about this:
https://forums.freebsd.org/threads/winbind-ad-dropping-every-10-hours.70752/.
Especially this part intrigues me:
---
But the refreshing of the GSSAPI ticket for the openldap-sasl-client (with
GSSAPI=on) that is used for the idmapper (process name: "winbindd: idmap
child (winbindd)") seems to be the problem: when this ticket is expired, a
connection to the DC (LDAP port) is established and stays open for 2 hours (i.e.
7200000 msecs, which is exactly the value of net.inet.tcp.keepidle).
---
Would this be a problem when AD servers disappear as well? I dug into the Samba
code a while ago and find that the particular code is blocking, however, it
might be a FreeBSD specific problem.
> We have a ?samba-watchdog? script that regularily attempts to connect to
the file service (using smbclient) and during this time period this script was
triggered a number of times: If a connection attempt takes more than 15 seconds
then it sleeps 5 seconds and tries again. If that one fails too then it kills
winbindd and restarts it (which is pretty quick so most users doesn?t notice
it).
>
> The main reason for this script is to make smbd recover when new
connections are ?hung? when/if it hangs at the ?10 hour lockup after winbindd
start? (which probably is due to the service principal expiring and needing
renewal - this doesn?t seem to happen on small servers with few users, but for
us with 500-1600 users per ?samba? it happens regularly. Every day at 17:00 and
03:00 (we restart smbd&winbindd at 07:00). Without this watchdog smbd would
refuse new connections for 1-15 minutes (or more) which isn?t good :-)
Our work-around is also a watchdog script ('guard-winbindd-idmap'). It
kills the idmap child of winbindd if it has been running for 8 hours or when
'wbinfo -i administrator' fails. Obviously, this script runs on the
fileserver (domain member server). Since these servers also act as NFS server,
it also restarts gssd if it is still running otherwise is starts gssd. This is
needed since gssd stops working as well.
> Samba 4.12.5, FreeBSD 11.3 & 12.1
>
> From krb5.conf:
>
> [realms]
> OURREALM = {
> kdc = server1
> kdc = server2
> kdc = server3
> kdc = server4
> }
>
> It was ?server1? that was being moved.
-Remy