Nacho del Rey
2015-May-20 17:00 UTC
[Samba] Strange problem samba+winbind+AD - transport endpoint is not connected
Hi there Yesterday at job we had a poltergeist with radius + winbind & samba in a linux box with authentication against an Active Directory (Windows 2012) The AD is formed by 3 windows servers. The linux box has a connection established against one of them thru port 445. Then this server downs and a bunch of messages like the following are shown in /var/log/messages May 19 16:40:59 pv4il0168 winbindd[18357]: cm_prepare_connection: getpeername failed with: Transport endpoint is not connected An execution of the following command [PRO] [root at pv4il0168 samba]# wbinfo -t checking the trust secret for domain XXXXXX via RPC calls failed error code was NT_STATUS_DOMAIN_CONTROLLER_NOT_FOUND (0xc0000233) failed to call wbcCheckTrustCredentials: WBC_ERR_AUTH_ERROR Could not check secret There was no way to reestablish connections against AD restarting samba & winbind services.. no way After 3 hours investigating with a 'debug level = 10' in samba, I tried do a net cache flush and it worked 10 min later we can reproduce the problem (each time the server where samba was connected went down, the problem appeared) and the solutions was the same. Finally we decide to do a net ads leave and net ads join and so far, the system is working like a charm The question is, does anyone know any problem/bug related to this strange behaviour? Connections, DNS, permissions, etc were fine. The linux box have been working properly for months Thanks in advance Nacho. ------------------------------------------------------------------------------------------ Data: [PRO] [root at pv4il0168 samba]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.4 (Santiago) samba4-libs-4.0.0-55.el6.rc4.x86_64 samba-common-3.6.9-151.el6_4.1.x86_64 samba-3.6.9-151.el6_4.1.x86_64 samba-winbind-clients-3.6.9-151.el6_4.1.x86_64 samba-winbind-3.6.9-151.el6_4.1.x86_64 smb.conf [global] workgroup = XXXXXXX server string = RADIUS log file = /var/log/samba/log.%m max log size = 50 security = ads realm = XXXXXXX.NET password server = ldapofi.yyyyyyy.net idmap uid = 10000-20000 idmap gid = 10000-20000 load printers = no cups options = raw printcap name = /etc/printcap printing = lprng /etc/krb5.conf [realms] XXXXXX.NET = { kdc = pv4iw0001.xxxxxx.net kdc = pv4iw0002.xxxxxx.net admin_server = pv4iw0001.xxxxxx.net admin_server = pv4iw0002.xxxxxx.net default_domain = xxxxxx.net debug [2015/05/19 12:59:25.021104, 10] libsmb/namequery.c:89(saf_store) saf_store: domain = [XXXXX], server = [PV4IW0001.XXXXX.net], expire [1432034065] [2015/05/19 12:59:25.021127, 10] lib/gencache.c:183(gencache_set_data_blob) Adding cache entry with key SAF/DOMAIN/XXXXX and timeout = Tue May 19 13:14:25 2015 (900 seconds ahead) [2015/05/19 12:59:25.021164, 10] libsmb/namequery.c:89(saf_store) saf_store: domain = [XXXXX.NET], server = [PV4IW0001.XXXXX.net], expire [1432034065] [2015/05/19 12:59:25.021186, 10] lib/gencache.c:183(gencache_set_data_blob) Adding cache entry with key SAF/DOMAIN/XXXXX.NET and timeout = Tue May 19 13:14:25 2015 (900 seconds ahead) [2015/05/19 12:59:25.021237, 10] winbindd/winbindd_cm.c:802(cm_prepare_connection) cm_prepare_connection: connecting to DC PV4IW0001.XXXXX.net for domain XXXXX [2015/05/19 12:59:25.021289, 0] winbindd/winbindd_cm.c:835(cm_prepare_connection) cm_prepare_connection: getpeername failed with: Transport endpoint is not connected [2015/05/19 12:59:25.021549, 10] lib/gencache.c:183(gencache_set_data_blob) Adding cache entry with key NEG_CONN_CACHE/XXXXX,PV4IW0001.XXXXX.net and timeout = Tue May 19 12:59:55 2015 (30 seconds ahead) [2015/05/19 12:59:25.021591, 9] libsmb/conncache.c:189(add_failed_connection_entry) add_failed_connection_entry: added domain XXXXX (PV4IW0001.XXXXX.net) to failed conn cache