I've been working on a Samba AD setup with a bunch of test machines - the one DC, and a bunch of clients.? Last night, I ended up switching the name of the test machines temporarily (except the DC), and re-joining the domain (that's for another e-mail later).? When things didn't work the way I had planned,? I switched the hostnames back, and re-joined the domain today on all the test machines.? I was shocked to find that I am only able to login to the domain on one of my hosts.? It fails on all the other ones.? I ensured that I deleted the machine entries from AD.? I haven't changed my Samba config in months which Rowland had last verified was fine.? I haven't changed my /etc/krb5.conf Kerberos config in months.? I even did a complete rebuild of one of the machines since I automated the installation process, and that rebuild was working perfectly many many times, but now it is failed.? In winbind log every time I try to login I'm mostly seeing: [2020/10/11 21:33:45.498701,? 1, pid=3637, effective(1004, 0), real(1004, 0)] ../../source3/libads/authdata.c:177(kerberos_return_pac) ? kinit failed for 'jas at AD.EECS.YORKU.CA' with: Preauthentication failed (-1765328360) .. which clearly doesn't make sense given that the net ads join completed successfully, the computer entry is there, just like before.? In fact, I can login to the system console as root, then do a "kinit jas", and it gets a ticket just fine so the system is able to talk to the DC. ? Winbind is unhappy about something, but I just can't figure out what that is.? On the DC, I can still query all the users, groups, etc. ?? I enabled log level 3 and get: [2020/10/11 21:33:45.426469,? 3, pid=3637, effective(0, 0), real(0, 0)] ../../source3/winbindd/winbindd_pam.c:2089(winbindd_dual_pam_auth) ? [ 3635]: dual pam auth EECSYORKUCA\jas [2020/10/11 21:33:45.498701,? 1, pid=3637, effective(1004, 0), real(1004, 0)] ../../source3/libads/authdata.c:177(kerberos_return_pac) ? kinit failed for 'jas at AD.EECS.YORKU.CA' with: Preauthentication failed (-1765328360) [2020/10/11 21:33:45.498763,? 2, pid=3637, effective(0, 0), real(0, 0)] ../../source3/winbindd/winbindd_pam.c:2410(winbindd_dual_pam_auth) ? Plain-text authentication for user EECSYORKUCA\jas returned NT_STATUS_LOGON_FAILURE (PAM: 7) [2020/10/11 21:33:45.498779,? 3, pid=3637, effective(0, 0), real(0, 0)] ../../libcli/security/dom_sid.c:215(dom_sid_parse_endp) ? string_to_sid: SID? is not in a valid format [2020/10/11 21:33:45.498807,? 2, pid=3637, effective(0, 0), real(0, 0)] ../../auth/auth_log.c:653(log_authentication_event_human_readable) ? Auth: [winbind,PAM_AUTH, nss_winbind, 3635] user [EECSYORKUCA]\[jas] at [Sun, 11 Oct 2020 21:33:45.498795 EDT] with [Plaintext] status [NT_ST ATUS_LOGON_FAILURE] workstation [(null)] remote host [unix:] mapped to [(null)]\[(null)]. local host [unix:] ? {"timestamp": "2020-10-11T21:33:45.498912-0400", "type": "Authentication", "Authentication": {"version": {"major": 1, "minor": 2}, "eventId": ?4625, "logonId": "c6dad50c7ecbb3a4", "logonType": 8, "status": "NT_STATUS_LOGON_FAILURE", "localAddress": "unix:", "remoteAddress": "unix:", " serviceDescription": "winbind", "authDescription": "PAM_AUTH, nss_winbind, 3635", "clientDomain": "EECSYORKUCA", "clientAccount": "jas", "works tation": null, "becameAccount": "", "becameDomain": "", "becameSid": null, "mappedAccount": null, "mappedDomain": null, "netlogonComputer": nul l, "netlogonTrustAccount": null, "netlogonNegotiateFlags": "0x00000000", "netlogonSecureChannelType": 0, "netlogonTrustAccountSid": null, "pass wordType": "Plaintext", "duration": 72496}} [2020/10/11 21:33:48.636206,? 3, pid=3637, effective(0, 0), real(0, 0)] ../../source3/winbindd/winbindd_pam.c:2089(winbindd_dual_pam_auth) ? [ 3635]: dual pam auth EECSYORKUCA\jas [2020/10/11 21:33:48.726636,? 1, pid=3637, effective(1004, 0), real(1004, 0)] ../../source3/libads/authdata.c:177(kerberos_return_pac) ? kinit failed for 'jas at AD.EECS.YORKU.CA' with: Preauthentication failed (-1765328360) [2020/10/11 21:33:48.726690,? 2, pid=3637, effective(0, 0), real(0, 0)] ../../source3/winbindd/winbindd_pam.c:2410(winbindd_dual_pam_auth) ? Plain-text authentication for user EECSYORKUCA\jas returned NT_STATUS_LOGON_FAILURE (PAM: 7) [2020/10/11 21:33:48.726705,? 3, pid=3637, effective(0, 0), real(0, 0)] ../../libcli/security/dom_sid.c:215(dom_sid_parse_endp) ? string_to_sid: SID? is not in a valid format I don't know if that SID error is the problem, but I've seen that in other debug logs before, so I think it's probably not. One the one system that works, I'm seeing the following error in the log: ../../source3/librpc/crypto/gse_krb5.c:417: krb5_kt_start_seq_get failed (Permission denied) [2020/10/11 20:54:46.663685,? 3, pid=26219, effective(4481, 0), real(4481, 0)] ../../source3/librpc/crypto/gse_krb5.c:577(gse_krb5_get_server_keytab) ? ../../source3/librpc/crypto/gse_krb5.c:577: Warning! Unable to set mem keytab from system keytab! Any thoughts?? I've just spent the last 9 hours looking at this on a Sunday of a holiday weekend and have unfortunately not got anywhere. Jason.
On 12/10/2020 02:54, Jason Keltz via samba wrote:> I've been working on a Samba AD setup with a bunch of test machines - > the one DC, and a bunch of clients.? Last night, I ended up switching > the name of the test machines temporarily (except the DC), and > re-joining the domain (that's for another e-mail later). When things > didn't work the way I had planned,? I switched the hostnames back, and > re-joined the domain today on all the test machines.? I was shocked to > find that I am only able to login to the domain on one of my hosts.? > It fails on all the other ones.? I ensured that I deleted the machine > entries from AD.? I haven't changed my Samba config in months which > Rowland had last verified was fine.? I haven't changed my > /etc/krb5.conf Kerberos config in months.? I even did a complete > rebuild of one of the machines since I automated the installation > process, and that rebuild was working perfectly many many times, but > now it is failed.? In winbind log every time I try to login I'm mostly > seeing:Did you leave the domain before you changed the hostname ? Why did you change the hostnames ? In a case like this, I would have set up a new computer, joined this to the domain and then removed the old computer from the domain. Rowland
On 10/12/2020 4:06 AM, Rowland penny via samba wrote:> On 12/10/2020 02:54, Jason Keltz via samba wrote: >> I've been working on a Samba AD setup with a bunch of test machines - >> the one DC, and a bunch of clients. Last night, I ended up switching >> the name of the test machines temporarily (except the DC), and >> re-joining the domain (that's for another e-mail later). When things >> didn't work the way I had planned,? I switched the hostnames back, >> and re-joined the domain today on all the test machines.? I was >> shocked to find that I am only able to login to the domain on one of >> my hosts. It fails on all the other ones.? I ensured that I deleted >> the machine entries from AD.? I haven't changed my Samba config in >> months which Rowland had last verified was fine.? I haven't changed >> my /etc/krb5.conf Kerberos config in months.? I even did a complete >> rebuild of one of the machines since I automated the installation >> process, and that rebuild was working perfectly many many times, but >> now it is failed.? In winbind log every time I try to login I'm >> mostly seeing: > > Did you leave the domain before you changed the hostname ? > > Why did you change the hostnames ? In a case like this, I would have > set up a new computer, joined this to the domain and then removed the > old computer from the domain.Hi Rowland, I did not leave the domain, but I did delete the entry by either the Windows AD tool or "samba-tool computer delete" option.? I can't remember which one at this point.? I think that clears up all the bits.? Is that correct?? On the local host, I also deleted the /etc/krb5.keytab, and deleted all the samba bits so that the join was fresh. Things are better today.? I discovered one issue which seemingly unrelated (to me) to the errors seems to have been the cause of a lot of the trouble.? I was chasing errors in winbind log, but several of the test servers are NFS servers, and when I rejoined them to the domain, I didn't replace the nfs/X entries in their keytab.? Now, the clients couldn't mount, and that definately caused some trouble, for which I didn't see the signs.? I'm still watching though. However, I can login to all the hosts now. By the way, at one point, I rebooted the DC, and I noticed that all the AD clients showed something like this: [2020/10/12 09:25:19.183616,? 1, pid=36145, effective(0, 0), real(0, 0)] ../../source3/rpc_client/cli_pipe.c:422(cli_pipe_validate_current_pdu) ? ../../source3/rpc_client/cli_pipe.c:422: Bind NACK received from host dc1.ad.eecs.yorku.ca! [2020/10/12 09:44:11.598150,? 1, pid=36145, effective(0, 0), real(0, 0)] ../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal) ? Reducing LDAP page size from 1000 to 500 due to IO_TIMEOUT (Which is strange because this means that if you reboot he DC, then the clients start talking slower to it when it comes back up?? I don't think the number ever increases unless you restart winbind everywhere?) and since that reboot, I've seen a few of them do this: [2020/10/12 10:00:19.814381,? 1, pid=36145, effective(0, 0), real(0, 0)] ../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal) ? Reducing LDAP page size from 500 to 250 due to IO_TIMEOUT [2020/10/12 10:16:19.557261,? 1, pid=36145, effective(0, 0), real(0, 0)] ../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal) ? Reducing LDAP page size from 250 to 125 due to IO_TIMEOUT Two of them are virtualbox VMs, so I figured maybe it's some kind of virtualbox thing, but one of them is an actual machine and still has the same error.? The DC is very lightly loaded.? How would I debug what is causing this reduction in IO? I know that various errors in the Samba logs are not "issues" but this one seems to be an issue.? I don't like seeing IO_TIMEOUTs. Another distracting error in the log included: [2020/10/11 22:43:29.843630,? 1, pid=969, effective(0, 0), real(0, 0)] ../../source3/libads/ldap.c:565(ads_find_dc) ? ads_find_dc: name resolution for realm 'AD.EECS.YORKU.CA' (domain 'EECSYORKUCA') failed: NT_STATUS_NO_LOGON_SERVERS ... after boot which sounds serious but it turns out if I try to authenticate before everything is up and running, that's what I get. The error makes sense but there's no "follow up" to say: "Ok ok - I found it now - Sorry to give you a heart attack.".? It's all a learning experience. The real reason I was trying to change the hostnames was to deal with a scenario particular of our environment.? We have many dualboot machines? running Windows and Linux.? I know that I can't join the domain with the same name on both Linux and Windows systems because joining one would change the password, then the other wouldn't be joined, etc.? I understand that it's possible to generate a machine password manually, and use that from both sides, but as I understand it, this interferes with the systems ability to change the machine password regularly which seems more secure.? I don't know if Samba does that. ? I also don't want to have a different IP address for both sides because that would be wasteful.? I would prefer if the hostname would be the same on both sides as well.??? I was trying to explore how carefully the name in the AD computer database is tied to the "real" DNS name of the host.? What I was trying to do was to add to /etc/samba/smb.conf: netbios name=<system hostname>-linux so that when I would join the hosts under Linux, they would take on a "-linux" name, but only in the AD computer database.? When the host was booted, the host would have an AD name of <system hostname>-linux, but a real name of just "<system hostname>".? ? On Windows, both the AD name and hostname would be "<system hostname>".? This would mean that on Windows, you could have a computer called "test", and under Linux, "test-linux", but both would really be the same physical PC and both would be host "test" with one IP. ?? It wasn't working.? I am pretty sure I forgot the nfs/X entries on the NFS servers after rejoining the domain so that may be the issue.? However, thinking back, I also think that "net ads keytab" would not let me add an entry for "host/test...." because it wanted "host/test-linux....", but I could be wrong.? If the host *had* to take on its real identity "test-linux" then test-linux could just be an alias for test, I guess, but then the machine build would be a headache.... and when the Linux machines boot they use dhcp (just like Windows) and the machine wouldn't know if it's "test" or "test-linux". Lots of "fun". Jason.