Hi Marc, >> The cause is that the password change didn' reach both AD DCs, but only>> one. The other one still had the old value as could be seen by >> samba-tool ldapcmp. Restarting the DCs and waiting for a couple of >> seconds brings them back to sync and Windows logons work as they used to. >> Any idea, what I should do next time to obtain valuable output for >> debugging? > > * What Samba version are you running?The DCs are 4.1.17-Debian.> * How many DCs?Just two.> * Can you force this problem to appear?Need some more investigation here - I did not find any way reproducible under arbitrary conditions.> Just an idea: AD problems are often caused by DNS problems and we got > the keyword "DNS islanding" in an other threat at the moment: Which DNS > do your DCs use as primary? Their own or a different one? See > http://retrohack.com/a-word-or-two-about-dns-islanding/As I understood Linux resolving there is no static primary-secondary concept for DNS. So I'll try to remove the self-dependence altogether and see, if it enhances the situation. Regards, - lars.
Unsure, whether this is another symptom of the same disease: While configuring a member CUPS print server and checking the syslog for an entirely different reason I was surprised to see the following log entries (and many more similar): Mar 13 11:36:10 snorri nslcd[11752]: [4a481a] <passwd="mgr"> ldap_result() failed: Can't contact LDAP server Mar 13 11:36:10 snorri nslcd[11752]: [4a481a] <passwd="mgr"> ldap_abandon() failed to abandon search: Can't contact LDAP server: Transport endpoint is not connected Mar 13 11:36:10 snorri nslcd[11752]: [9abb43] <passwd=1001> ldap_result() failed: Can't contact LDAP server Mar 13 11:36:10 snorri nslcd[11752]: [9abb43] <passwd=1001> ldap_abandon() failed to abandon search: Can't contact LDAP server: Transport endpoint is not connected Okay doing: ldapsearch -LLL -D "CN=Administrator,CN=Users,DC=ad,DC=microsult,DC=de" -H ldap://ad.microsult.de -x -W '(uid=mgr)' uid uidNumber gidNumber sAMAccountName name gecos works nicely. I can also specify each DC separately as LDAP URI. Login to the machine, id, getent everything works, but sometimes produces the said log entries, and take a considerable time then. =nscd= is stopped on the machine. Currently everything is running smoothly. In the time where I see the most entries I also had several brief pauses in my music - served via Kerberized NFS4 with AD serving NSS and Kerberos. Some time before that, I applied today's Debian security updates to both DC and changed /etc/resolv.conf for the primary DC to not point to itself anymore. However, second's silences are not uncommon in my setup. When they become more frequent, this is usually a dire indication that something is about to break. And it generally does not coincide with any work on the DC.>>> Any idea, what I should do next time to obtain valuable output for >>> debugging?Which is still the challenging question! ;)>> >> * What Samba version are you running? > > The DCs are 4.1.17-Debian. > >> * How many DCs? > > Just two.Regards, - lars.
It did happen again and this time I was a little less panicked and took some time to figure out what happened. On my primary DC (SAMBA) I did not notice anything extraordinary. However, my secondary (VERDANDI) reported issues: root at verdandi:~# samba-tool drs showrepl Default-First-Site-Name\VERDANDI DSA Options: 0x00000001 DSA object GUID: a03bbb51-1dca-44ae-a4d9-7aa8cb4a1ace DSA invocationId: 8bdb4f85-1da2-4f5a-b9a9-e8369d202745 ==== INBOUND NEIGHBORS === CN=Schema,CN=Configuration,DC=ad,DC=microsult,DC=de Default-First-Site-Name\SAMBA via RPC DSA object GUID: b19509be-c3ee-4a58-9fc9-afd61759a23f Last attempt @ Wed Apr 22 00:12:36 2015 CEST failed, result 5 (WERR_ACCESS_DENIED) 1265 consecutive failure(s). Last success @ Fri Apr 17 14:47:18 2015 CEST [...] ==== OUTBOUND NEIGHBORS ===[... everything OK for no attempts were ever made, but ...] DC=ad,DC=microsult,DC=de Default-First-Site-Name\SAMBA via RPC DSA object GUID: b19509be-c3ee-4a58-9fc9-afd61759a23f Last attempt @ Wed Apr 22 00:14:00 2015 CEST failed, result 5 (WERR_ACCESS_DENIED) 31 consecutive failure(s). Last success @ NTTIME(0) And consequently the password update that happened the previous day was out of sync: samba-tool ldapcmp ldap://samba ldap://verdandi -Uadministrator Password for [AD\administrator]: * Comparing [DOMAIN] context... * Objects to be compared: 289 Comparing: 'CN=Builtin,DC=ad,DC=microsult,DC=de' [ldap://samba] 'CN=Builtin,DC=ad,DC=microsult,DC=de' [ldap://verdandi] Attributes found only in ldap://samba: serverState FAILED Comparing: 'CN=Lars LH. Hanke,CN=Users,DC=ad,DC=microsult,DC=de' [ldap://samba] 'CN=Lars LH. Hanke,CN=Users,DC=ad,DC=microsult,DC=de' [ldap://verdandi] Difference in attribute values: pwdLastSet => ['130740170160000000'] ['130703672860000000'] FAILED [...] Having restarted the secondary DC some 34h ago, it synchronized immediately and still does, i.e. drs showrepl has its last success 5 minutes ago, no failures. It looks a little like an expired ticket, which fails to renew after several weeks. But this is pure speculation. Any ideas for troubleshooting? Regards, - lars. Am 13.03.2015 um 00:43 schrieb Lars Hanke:> Hi Marc, > > >> The cause is that the password change didn' reach both AD DCs, but only >>> one. The other one still had the old value as could be seen by >>> samba-tool ldapcmp. Restarting the DCs and waiting for a couple of >>> seconds brings them back to sync and Windows logons work as they used >>> to. >>> Any idea, what I should do next time to obtain valuable output for >>> debugging? >> >> * What Samba version are you running? > > The DCs are 4.1.17-Debian. > >> * How many DCs? > > Just two. > >> * Can you force this problem to appear? > > Need some more investigation here - I did not find any way reproducible > under arbitrary conditions. > >> Just an idea: AD problems are often caused by DNS problems and we got >> the keyword "DNS islanding" in an other threat at the moment: Which DNS >> do your DCs use as primary? Their own or a different one? See >> http://retrohack.com/a-word-or-two-about-dns-islanding/ > > As I understood Linux resolving there is no static primary-secondary > concept for DNS. So I'll try to remove the self-dependence altogether > and see, if it enhances the situation. > > Regards, > - lars. >
Greetings, Dr. Lars Hanke!> It did happen again and this time I was a little less panicked and took > some time to figure out what happened.> On my primary DC (SAMBA) I did not notice anything extraordinary. > However, my secondary (VERDANDI) reported issues:> root at verdandi:~# samba-tool drs showrepl > Default-First-Site-Name\VERDANDI > DSA Options: 0x00000001 > DSA object GUID: a03bbb51-1dca-44ae-a4d9-7aa8cb4a1ace > DSA invocationId: 8bdb4f85-1da2-4f5a-b9a9-e8369d202745> ==== INBOUND NEIGHBORS === > CN=Schema,CN=Configuration,DC=ad,DC=microsult,DC=de > Default-First-Site-Name\SAMBA via RPC > DSA object GUID: b19509be-c3ee-4a58-9fc9-afd61759a23f > Last attempt @ Wed Apr 22 00:12:36 2015 CEST failed, > result 5 (WERR_ACCESS_DENIED) > 1265 consecutive failure(s). > Last success @ Fri Apr 17 14:47:18 2015 CEST> [...] > ==== OUTBOUND NEIGHBORS ===> [... everything OK for no attempts were ever made, but ...]> DC=ad,DC=microsult,DC=de > Default-First-Site-Name\SAMBA via RPC > DSA object GUID: b19509be-c3ee-4a58-9fc9-afd61759a23f > Last attempt @ Wed Apr 22 00:14:00 2015 CEST failed, > result 5 (WERR_ACCESS_DENIED) > 31 consecutive failure(s). > Last success @ NTTIME(0)> And consequently the password update that happened the previous day was > out of sync:> samba-tool ldapcmp ldap://samba ldap://verdandi -Uadministrator > Password for [AD\administrator]:> * Comparing [DOMAIN] context...> * Objects to be compared: 289> Comparing: > 'CN=Builtin,DC=ad,DC=microsult,DC=de' [ldap://samba] > 'CN=Builtin,DC=ad,DC=microsult,DC=de' [ldap://verdandi] > Attributes found only in ldap://samba: > serverState > FAILED> Comparing: > 'CN=Lars LH. Hanke,CN=Users,DC=ad,DC=microsult,DC=de' [ldap://samba] > 'CN=Lars LH. Hanke,CN=Users,DC=ad,DC=microsult,DC=de' [ldap://verdandi] > Difference in attribute values: > pwdLastSet => > ['130740170160000000'] > ['130703672860000000']Looks very much like an hour off. I suggest checking tzdata configuration.> FAILED> [...]> Having restarted the secondary DC some 34h ago, it synchronized > immediately and still does, i.e. drs showrepl has its last success 5 > minutes ago, no failures.> It looks a little like an expired ticket, which fails to renew after > several weeks. But this is pure speculation.> Any ideas for troubleshooting?-- With best regards, Andrey Repin Friday, April 24, 2015 00:04:34 Sorry for my terrible english...