Hello, I have configured a Nagios server to be part of a Windows 2003 domain. The Linux server is RedHat 5.3 with winbind version 3.0.22. The configuration is using kerberos and pam with winbind to support Windows user and local account. Everything is working fine until we test the active directory failover. The system is still accessible through domain account but it's very slow and the nagios pages are also extremely slow. Like 10 to 15 seconds to display a page. When I use net ads info, I see that the failover is occurring rather quickly to the secondary AD. All my Windows servers are failing over without any problem. The user used by nagios is local (not in the AD) but when I look at the winbind log I see the following all the time: ----- [2011/07/11 18:08:54, 3] nsswitch/winbindd_group.c:winbindd_getgroups(1273) [21838]: getgroups nagios ----->From my point of view, it's not supposed to do that. As nagios is alocal user, winbind should not check nagios groups, right? When we restore the first AD, it immediately runs fine. When we completely stops winbind, after few minutes the system is also running normally with few latency. The way we test the AD failover is in switching off the network interface on the primary AD. The krb5.conf looks like this: ---- [realms] DOMAIN.COM = { kdc = IP_AD_1 kdc = IP_AD_2 default_domain = DOMAIN.COM } ----- The rest is default. In the smb.conf, we have defined the following for the AD: --- security = ads workgroup = DOMAIN realm = DOMAIN.COM password server = IP_AD_1, IP_AD_2 --- The rest is also pretty much default. The system-auth in the pam.d is like that: ------------------- auth required pam_env.so auth sufficient pam_unix.so nullok auth sufficient pam_winbind.so try_first_pass auth required pam_deny.so auth required pam_tally2.so deny=3 onerr=fail unlock_time=60 account required pam_unix.so account sufficient pam_succeed_if.so uid < 500 quiet account sufficient pam_winbind.so account required pam_permit.so account required pam_tally2.so password sufficient pam_unix.so md5 shadow nullok use_authtok remember=10 password sufficient pam_winbind.so try_first_pass password required pam_deny.so session required pam_mkhomedir.so skel=/etc/skel umask=0077 session required pam_unix.so session sufficient pam_winbind.so session optional pam_keyinit.so revoke session required pam_limits.so ------------------- We don't use any shares on this server, only winbind for authentication. I'm wondering if I made a mistake somewhere. Thanks a lot for your help. Gilles.
Gilles Paquet
2011-Jul-12 07:06 UTC
[Samba] Active Directory failover problem with winbind
Hello, I have configured a Nagios server to be part of a Windows 2003 domain. The Linux server is RedHat 5.3 with winbind version 3.0.22. The configuration is using kerberos and pam with winbind to support Windows user and local account. Everything is working fine until we test the active directory failover. The system is still accessible through domain account but it's very slow and the nagios pages are also extremely slow. Like 10 to 15 seconds to display a page. When I use net ads info, I see that the failover is occurring rather quickly to the secondary AD. All my Windows servers are failing over without any problem. The user used by nagios is local (not in the AD) but when I look at the winbind log I see the following all the time: ----- [2011/07/11 18:08:54, 3] nsswitch/winbindd_group.c:winbindd_getgroups(1273) ?[21838]: getgroups nagios ----->From my point of view, it's not supposed to do that. As nagios is alocal user, winbind should not check nagios groups, right? When we restore the first AD, it immediately runs fine. When we completely stops winbind, after few minutes the system is also running normally with few latency. The way we test the AD failover is in switching off the network interface on the primary AD. The krb5.conf looks like this: ---- [realms] DOMAIN.COM = { ?kdc = IP_AD_1 ?kdc = IP_AD_2 ?default_domain = DOMAIN.COM } ----- The rest is default. In the smb.conf, we have defined the following for the AD: --- ?security = ads ?workgroup = DOMAIN ?realm = DOMAIN.COM ?password server = ?IP_AD_1, IP_AD_2 --- The rest is also pretty much default. The system-auth in the pam.d is like that: ------------------- auth ? ? ? ?required ? ? ?pam_env.so auth ? ? ? ?sufficient ? ?pam_unix.so nullok auth ? ? ? ?sufficient ? ?pam_winbind.so try_first_pass auth ? ? ? ?required ? ? ?pam_deny.so auth ? ? ? ?required ? ? ?pam_tally2.so deny=3 onerr=fail unlock_time=60 account ? ? required ? ? ?pam_unix.so account ? ? sufficient ? ?pam_succeed_if.so uid < 500 quiet account ? ? sufficient ? ?pam_winbind.so account ? ? required ? ? ?pam_permit.so account ? ? required ? ? ?pam_tally2.so password ? ?sufficient ? ?pam_unix.so md5 shadow nullok use_authtok remember=10 password ? ?sufficient ? ?pam_winbind.so try_first_pass password ? ?required ? ? ?pam_deny.so session ? ? required ? ? ?pam_mkhomedir.so skel=/etc/skel umask=0077 session ? ? required ? ? ?pam_unix.so session ? ? sufficient ? ?pam_winbind.so session ? ? optional ? ? ?pam_keyinit.so revoke session ? ? required ? ? ?pam_limits.so ------------------- We don't use any shares on this server, only winbind for authentication. I'm wondering if I made a mistake somewhere. Thanks a lot for your help. Gilles.