Jason Haar
2007-Nov-25 20:58 UTC
[Samba] NT_STATUS_NO_LOGON_SERVERS errors sporadically occurring
Hi there I have samba-3.0.27a rolled out over a large number of servers, and every once in a while one of them will start failing to allow people to connect, with winbind reporting NT_STATUS_NO_LOGON_SERVERS, and ntlm_auth failing with "NT_STATUS_NO_LOGON_SERVERS: No logon servers". The same problem occurred with earlier versions too. I think I've tracked down the cause of the problem as being "our fault", but Samba really isn't handling it well. We have a 10.* network, and servers with dual Ethernet cards, and sometimes/somehow the IP address of the unused 2nd card (a 192.168.* address) starts getting broadcast onto our Active Directory as being a domain controller IP. Then if winbind decides to choose that address, it all starts failing, as that address space isn't reachable. If I do a "nslookup domain.AD" I get a listing of all our valid DC 10.* addresses back - plus the unwanted 192.168 address - but it appears that sometimes winbind decides that is the valid address, and won't try any of the other addresses? And then you get the NT_STATUS_NO_LOGON_SERVERS - as it isn't reachable. Here's some excepts from /var/log/samba/log.wb-DOMAIN ads_find_dc: looking for realm 'domain.AD' get_sorted_dc_list: attempting lookup for name domain.AD (sitename NULL) using [ads] sitename_fetch: Returning sitename for domain.AD: "correct-sitename" name domain.AD#20 found get_dc_list: negative entry domain.AD removed from DC list get_dc_list: returning 1 ip addresses in an ordered list get_dc_list: 192.168.234.235:389 those last two lines imply why this problem occurs, but this problem isn't being noticed within AD itself - I think Microsoft actually uses ICMP pings to test DCs are reachable? Does Samba? Also, I have no idea why it returns only one, invalid IP - nslookup shows this particular domain has 13 domain controller IPs listed - including the one 192.168 one. Obviously to fix it I just have to whine at our AD people until they clean out this bogus DC IP - but shouldn't Samba work its way around this? As an added advantage, ping tests could even ensure Samba connects to the closest DC by measuring the latency...? Thanks! -- Cheers Jason Haar Information Security Manager, Trimble Navigation Ltd. Phone: +64 3 9635 377 Fax: +64 3 9635 417 PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
Jeremy Allison
2007-Nov-25 21:35 UTC
[Samba] NT_STATUS_NO_LOGON_SERVERS errors sporadically occurring
On Mon, Nov 26, 2007 at 09:51:18AM +1300, Jason Haar wrote:> > If I do a "nslookup domain.AD" I get a listing of all our valid DC 10.* > addresses back - plus the unwanted 192.168 address - but it appears that > sometimes winbind decides that is the valid address, and won't try any > of the other addresses? And then you get the NT_STATUS_NO_LOGON_SERVERS > - as it isn't reachable. > > Here's some excepts from /var/log/samba/log.wb-DOMAIN > > > ads_find_dc: looking for realm 'domain.AD' > get_sorted_dc_list: attempting lookup for name domain.AD (sitename > NULL) using [ads] > sitename_fetch: Returning sitename for domain.AD: "correct-sitename" > name domain.AD#20 found > get_dc_list: negative entry domain.AD removed from DC list > get_dc_list: returning 1 ip addresses in an ordered list > get_dc_list: 192.168.234.235:389 > > > those last two lines imply why this problem occurs, but this problem > isn't being noticed within AD itself - I think Microsoft actually uses > ICMP pings to test DCs are reachable? Does Samba? Also, I have no idea > why it returns only one, invalid IP - nslookup shows this particular > domain has 13 domain controller IPs listed - including the one 192.168 one. > > Obviously to fix it I just have to whine at our AD people until they > clean out this bogus DC IP - but shouldn't Samba work its way around > this? As an added advantage, ping tests could even ensure Samba connects > to the closest DC by measuring the latency...?We should notice this address is bad and add it to the negative connection cache once we fail to connect - we actually use a lot of techniques to ensure we don't get stuck on a bad DC (server affinity cache, negative connection cache etc.). Is there a chance you can get me a debug level 10 when you're running into this problem so I can see what is going on ? Jeremy.
Possibly Parallel Threads
- Using net ads user to get child domain users on Samba 4.10.7
- Using net ads user to get child domain users on Samba 4.10.7
- Using net ads user to get child domain users on Samba 4.10.7
- Winbind logins failing after upgrade from Samba 3 to Samba 4
- domain member file server failed after upgrade from 4.11.14 to 4.13.2