Tris Mabbs
2013-Aug-12 21:46 UTC
[Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #1 - "Could not fetch trust account password for domain ...".
Good day oh technical ones . I was running Samba 4 (client only, not using it as a DC so effectively running Samba 3 code from the Samba 4 tree) and, other than a little "Gotcha!" regarding decoding Kerberos PACs, it was all working perfectly. Then recently I had to upgrade, to "4.2.0pre1-GIT-b505111" (I had to upgrade the OS on the server running Samba - 'twas "OpenSolaris" and is now "Solaris 11.1") so I recompiled it all up and installed afresh (so no ".tdb"s from the previous installation or anything). It's all working (well, except for the PAC issue which is still being worked on). I set the LDAP admin. Password using "smbpasswd -W". Kerberos is set up fine. I'm joined to the domain and both "net ads testjoin" and "net rpc testjoin" (as well as "wbinfo -t") all agree that the join is good. "wbinfo -u" reports my AD users; "wbinfo -g" reports my AD groups (with the domain prefix removed); "wbinfo -U UUUU" gives me the correct SID for UID UUUU. But here's a funny thing (#1) - "wbinfo -S SSSS" gives me a UID for SID SSSS. However it's not the same UID as, when given to "wbinfo -U UUUU", would return that SID. Duh? So the mapping is only currently one way. UID->SID = OK; SID->UID = not OK (no error but allocated value not the one stored in the LDAP schema). This kinda-almost-sorta works. The most annoying symptom is that any UNC path which a workstation accesses winds up with an irritating "$RECYCLE.BIN" folder being created on it, which every time that UNC path is accessed results in a "The recycle bin for \\server\path\to\unc\folder <file:///\\server\path\to\unc\folder> has become corrupted. Would you like to delete it?". I *suspect* that it may have something to do with the following messages, which get logged over and over (and over and .) together in the system log file: Aug 12 20:38:31 Gateway smbd[22736]: [ID 702911 daemon.error] [2013/08/12 20:38:31.381776, 0] ../source3/auth/auth_domain.c:266(domain_client_validate) Aug 12 20:38:31 Gateway smbd[22736]: [ID 702911 daemon.error] domain_client_validate: unable to validate password for user in domain to Domain controller PDC.MYDOMAIN. Error was NT_STATUS_NO_SUCH_USER. Aug 12 20:38:31 Gateway smbd[22736]: [ID 702911 daemon.error] [2013/08/12 20:38:31.382811, 0] ../source3/auth/auth_domain.c:419(check_trustdomain_security) Aug 12 20:38:31 Gateway smbd[22736]: [ID 702911 daemon.error] check_trustdomain_security: could not fetch trust account password for domain MYDOMAIN And no, that's not me editing out the username and domain in the second message, it is an empty username and an empty domain name. It's probably that I've been stupid and missed a configuration step. However I can't think what, and I've had a quick dig around in "auth_domain.c" and can't see what user (and domain) it might be failing to get from where. Plus, of course, it's pure speculation that this is causing the lack of a coherent bidirectional mapping between UIDs and SIDs . Anyway, if anyone has any helpful suggestions either to resolve, or to get to the bottom of, this little hiccup, I'd much appreciate hearing them :) Cheers folks! Tris.
Tris Mabbs
2013-Aug-25 18:37 UTC
[Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #1 - "Could not fetch trust account password for domain ...".
So after much playing around, leaving and re-joining, etc. I am now at the stage where I can successfully use "wbinfo" to map UID to SID and back again. However I am still getting log files filled (sometimes many, many entries per second) with lines such as: Aug 25 18:46:29 Gateway smbd[18959]: [ID 702911 daemon.error] domain_client_validate: unable to validate password for user in domain to Domain controller HYDROCARBON.FIRSTGRADE.CO.UK. Error was NT_STATUS_NO_SUCH_USER. (so still apparently trying to validate no user name in no domain to the correct DC for the actual domain - note there's just two spaces between "user" and "in", and between "domain" and "to") and . Aug 25 18:54:07 Gateway smbd[19022]: [ID 702911 daemon.error] check_trustdomain_security: could not fetch trust account password for domain FIRSTGRADE (so still apparently being completely unable to fetch the trust account password for the domain). Duh? The Samba box *is* correctly joined to the domain. For example: # kinit administrator Password for administrator at FIRSTGRADE.CO.UK: # net ads -k testjoin Join is OK # net rpc -k testjoin saf_store: refusing to store 0 length domain or servername! Join to 'FIRSTGRADE' is OK # wbinfo -t checking the trust secret for domain FIRSTGRADE via RPC calls succeeded # Also "wbinfo -u", "wbinfo -g", etc. work fine, so "winbind" is happy. The "saf_store: refusing to store 0 length domain or servername!" is interesting, and presumably is caused by the same issue which is responsible for the bunch of the first log messages above - something cannot work out some information about the domain. Though other things can, E.g.: # net ads status | more objectClass: top objectClass: person objectClass: organizationalPerson objectClass: user objectClass: computer cn: gateway distinguishedName: CN=gateway,CN=Computers,DC=Firstgrade,DC=Co,DC=UK instanceType: 4 whenCreated: 20130823190919.0Z whenChanged: 20130823190920.0Z uSNCreated: 7088763 uSNChanged: 7088769 name: gateway objectGUID: 2e0e0366-8b59-426c-9f7c-7b87d137d975 . sAMAccountName: gateway$ sAMAccountType: 805306369 dNSHostName: gateway.firstgrade.co.uk servicePrincipalName: HOST/gateway.firstgrade.co.uk servicePrincipalName: HOST/GATEWAY objectCategory: CN=Computer,CN=Schema,CN=Configuration,DC=Firstgrade,DC=Co,DC=UK isCriticalSystemObject: FALSE dSCorePropagationData: 16010101000000.0Z lastLogonTimestamp: 130217585604506398 -------------- Security Descriptor (revision: 1, type: 0x8c14) owner SID: S-1-5-21-1362148477-1610942424-3041352000-512 group SID: S-1-5-21-1362148477-1610942424-3041352000-512 . # wbinfo --own-domain FIRSTGRADE # wbinfo -D FIRSTGRADE Name : FIRSTGRADE Alt_Name : Firstgrade.Co.UK SID : S-1-5-21-1362148477-1610942424-3041352000 Active Directory : Yes Native : Yes Primary : Yes All looks perfectly reasonable. The procedure used to join the domain is straightforward: 1. Configure everything. 2. Make sure there are no ".tdb" files left lying around from any previous join; likewise make sure there's no computer account hanging around in AD. 3. "kinit administrator" to get a Kerberos ticket (and verify that the Kerberos configuration is correct). 4. "net ads join" to join the domain (works; no errors or warnings). 5. Use "smbpasswd -W" to set the LDAP bind user password. 6. Start Samba. 7. Test with "net ads -k testjoin", "net rpc -k testjoin", "wbinfo -t", etc., then "wbinfo -u", "wbinfo -g", "wbinfo -U 1000" and then check that "wbinfo -S SSSS" on that SID ("SSSS") maps back to UID 1000, etc. 8. Test access to shares from a workstation (may have to perform a "klist purge" on the workstation first to flush any existing Kerberos tickets); make sure access is OK, creating a file or directory creates something with the correct owner and group; etc. 9. Scratch my head at the ridiculous number of entries, like the ones shown above, which start appearing in the log files. X. Note: Behaviour is the same whether I use "-k" to use the "administrator" Kerberos ticket, or whether I use "-U administrator" and provide the password. So unless I'm doing something really stupid (entirely possible) and missing something out somewhere, the machine should correctly be joined to the domain and there should be no issues. Yet still I'm getting these erroneous log messages by the thousand (literally .). Also, as I understand it (?!), the ". could not fetch trust account password for domain ." message is typically logged if the Samba server isn't joined to the domain, yet here it evidently is. The *only* thing that I can see which might be different from a typical installation (other than being run on Solaris .) is that the domain name is in mixed case. This is a hangover from the original setup of the domain on our DCs (and hence cannot be changed). However that should only possibly cause hiccups with Kerberos (with it's REALLY ANNOYING AND ANACHRONOUS in this day and age) insistence on case dependence in principal names, and it certainly isn't causing issues with any other use of Kerberos (and there's a realm mapping for both ".firstgrade.co.uk" and ".Firstgrade.Co.UK" to "FIRSTGRADE.CO.UK" in the "/etc/krb5/krb5.conf" file). So it probably isn't that, but that's the only thing I can see which might in any way be different from any other similar installation. Certainly the domain name on the Samba server is, as can be seen from the principals listed in the "net ads status" output, forced to lower case. So, does anyone have any thoughts about these log entries please? Many thanks, and regards, Tris. Ps. Copying this to the technical list as well as it might be more appropriately addressed in there; if the list moderator feels otherwise please feel free to block or remove this post from the technical group and leave it in the normal Samba discussion one only. From: Tris Mabbs [mailto:TM-Samba201302 at Firstgrade.Co.UK] Sent: 12 August 2013 22:46 To: 'samba at lists.samba.org' Subject: Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #1 - "Could not fetch trust account password for domain ...". Good day oh technical ones . I was running Samba 4 (client only, not using it as a DC so effectively running Samba 3 code from the Samba 4 tree) and, other than a little "Gotcha!" regarding decoding Kerberos PACs, it was all working perfectly. Then recently I had to upgrade, to "4.2.0pre1-GIT-b505111" (I had to upgrade the OS on the server running Samba - 'twas "OpenSolaris" and is now "Solaris 11.1") so I recompiled it all up and installed afresh (so no ".tdb"s from the previous installation or anything). It's all working (well, except for the PAC issue which is still being worked on). I set the LDAP admin. Password using "smbpasswd -W". Kerberos is set up fine. I'm joined to the domain and both "net ads testjoin" and "net rpc testjoin" (as well as "wbinfo -t") all agree that the join is good. "wbinfo -u" reports my AD users; "wbinfo -g" reports my AD groups (with the domain prefix removed); "wbinfo -U UUUU" gives me the correct SID for UID UUUU. But here's a funny thing (#1) - "wbinfo -S SSSS" gives me a UID for SID SSSS. However it's not the same UID as, when given to "wbinfo -U UUUU", would return that SID. Duh? So the mapping is only currently one way. UID->SID = OK; SID->UID = not OK (no error but allocated value not the one stored in the LDAP schema). This kinda-almost-sorta works. The most annoying symptom is that any UNC path which a workstation accesses winds up with an irritating "$RECYCLE.BIN" folder being created on it, which every time that UNC path is accessed results in a "The recycle bin for \\server\path\to\unc\folder <file:///\\server\path\to\unc\folder> has become corrupted. Would you like to delete it?". I *suspect* that it may have something to do with the following messages, which get logged over and over (and over and .) together in the system log file: Aug 12 20:38:31 Gateway smbd[22736]: [ID 702911 daemon.error] [2013/08/12 20:38:31.381776, 0] ../source3/auth/auth_domain.c:266(domain_client_validate) Aug 12 20:38:31 Gateway smbd[22736]: [ID 702911 daemon.error] domain_client_validate: unable to validate password for user in domain to Domain controller PDC.MYDOMAIN. Error was NT_STATUS_NO_SUCH_USER. Aug 12 20:38:31 Gateway smbd[22736]: [ID 702911 daemon.error] [2013/08/12 20:38:31.382811, 0] ../source3/auth/auth_domain.c:419(check_trustdomain_security) Aug 12 20:38:31 Gateway smbd[22736]: [ID 702911 daemon.error] check_trustdomain_security: could not fetch trust account password for domain MYDOMAIN And no, that's not me editing out the username and domain in the second message, it is an empty username and an empty domain name. It's probably that I've been stupid and missed a configuration step. However I can't think what, and I've had a quick dig around in "auth_domain.c" and can't see what user (and domain) it might be failing to get from where. Plus, of course, it's pure speculation that this is causing the lack of a coherent bidirectional mapping between UIDs and SIDs . Anyway, if anyone has any helpful suggestions either to resolve, or to get to the bottom of, this little hiccup, I'd much appreciate hearing them :) Cheers folks! Tris.
Apparently Analagous Threads
- Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".
- Another odd problem - missing user and domain - with 4.2.0pre1-GIT-0ce4631 on "Solaris".
- Auth problem
- No response to our critical packet problem
- binding/unbinding devices to vfio-pci