What I could gather from the code, the conditions to add DC to NEG_CONN_CACHE:
* Winbind was connected to DC, but during SAMlogon DC did not respond after
three attempts
* Winbind failed to establish SMB connection to DC (port 445, marked as
microsoft-ds in 'ss -roestu' output)
* Winbind failed to find out the DC server name, from the looks of it - using
LDAP query to map IP to name
* Winbind did not get a response to CLDAP NetLogon ping during DC location
* Winbind failed to resolve DC IP with DNS
On Thursday, 28 May 2020 13:50:23 PDT Jeremy Allison
wrote:> 'winbind max domain connections' causes the
> parent winbindd to chose the winbindd child
> with the shortest queue to talk to.
>
> Then it sends an async tevent_req request
> to that child. I don't think it's opening
> a new connection to the DC.
>
So what I can typically see is that there are only two active connections to DC,
one on port 445 (microsoft-ds) and one on port, say, 49159 (one of the ports
used for RPC calls for LSA/SAMR/NetLogon, from what I could find in
documentation). When I do 'wbinfo --ping-dc' it succeeds, so my
understanding is that's CLDAP NetLogon ping succeeding, meaning DC is alive
and well. But then Winbind doesn't reuse the existing connection, based on
the logs it seems to be trying to perform DC location for a new connection, and
fails. If I understand correctly, Winbind would be doing that only if, say,
there is some request to DC on the first connection that's blocking the
queue. Is that correct?
If the above is correct, then my understanding is in theory force removing
NEG_CONN_CACHE entries may help, but if the issue is not with DC being
unresponsive but with getting DC name or resolving DC IP then it will be
re-added to NEG_CONN_CACHE right away anyway during DC location if I allow
multiple connections to DC. If I have only one connection to DC then the auth
request will be blocked by whatever other queries are in the queue of that
single connection, but if that query is fast enough auth request may come
through quickly enough as well. Is that correct?
Now if I have 'winbind offline logon' enabled, then while I'm
limited to single connection, Winbind should not attempt to go through DC
location or anything else like that, so while the single connection may get
blocked by some query in the queue, eventually the auth request should go
through, even if DC is not available and Winbind cannot locate any, so long as
Winbind cache has the creds to authenticate the user. Is that correct?
Now, in another email I saw Andrew Bartlett saying that "offline login is
only for local plaintext logins, we don't do NTLM challenge-response against
the offline store." I have following in my pam_winbind.conf:
cached_login = yes
krb5_ccache_type = FILE
krb5_auth = yes
Does using krb5_auth mean 'winbind offline logon' would be useless to me
as way to mitigate various temporary hickups in the DNS/networking?
If 'winbind offline logon' would not be useless with krb5_auth, then how
long credentials cache is supposed to live, is there any way to control that
TTL? I cannot find any information on that so far.
I've also ran into an issue where attempting to log in with 'winbind
offline logon' enabled while there is nothing cached (after net cache flush)
and domain-specific DNS servers are temporarily unavailable then Winbind
(v4.9.1) afterwards keeps failing auth easily for more than ten-fifteen minutes
even after DNS is fixed, even after machine restart, and so far have been unable
to figure out how to get Winbind out of that state, besides just letting it sit
there undisturbed for some time.
> Add DBG_ERR() (log level 0) statements into the places
> you are suspicious of and try and follow the
> control flow.
>
Unfortunately, running a custom build of Samba is not an option for me :-(
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL:
<http://lists.samba.org/pipermail/samba/attachments/20200528/d81559f8/signature.sig>