I'm running Samba 4.11.16 on CentOS 7 and not having much luck with failover to a second domain controller.? I could *really* use some help. I know my Samba config is fine.? I know that adding the second domain controler was fine.? Replication is working perfectly.? No errors.?? If I stop the DC processes on either server, Windows clients appear to failover perfectly fine. The problem seems to affect my Linux clients (CentOS 7) running winbind. Let's say a CentOS 7 client X is connected to dc2, and I stop the DC processes on dc2....? The odd time, the client will connect to dc1 almost right away, and everything just works the way it should always work. However, most of the time, I stop the DC processes on dc2, the client will connect to dc1, I can even do a "wbinfo -u" or "wbinfo -g", but "whoami" reveals "user doesn't exist".? Somewhere between 20-50 minutes later, it just "magically" works.? The timing doesn't seem consistent.? Even a reboot doesn't fix things when it's in this state. I've tried to follow the Samba logs, but I really can't figure out what's up.? Andrew? Jeremy? Anyone? I don't think this can be just my system.? I suspect there's a lot of users out there running multiple DCs with a similar setup to me, believing that it's all working, and maybe, because there hasn't been a failure, everything works great, but who knows what will happen when there's actually a failure. Jason.
On 08/12/2020 21:09, Jason Keltz via samba wrote:> I'm running Samba 4.11.16 on CentOS 7 and not having much luck with > failover to a second domain controller.? I could *really* use some help. > > I know my Samba config is fine.? I know that adding the second domain > controler was fine.? Replication is working perfectly.? No errors.?? > If I stop the DC processes on either server, Windows clients appear to > failover perfectly fine. > > The problem seems to affect my Linux clients (CentOS 7) running winbind. > > Let's say a CentOS 7 client X is connected to dc2, and I stop the DC > processes on dc2....? The odd time, the client will connect to dc1 > almost right away, and everything just works the way it should always > work. > > However, most of the time, I stop the DC processes on dc2, the client > will connect to dc1, I can even do a "wbinfo -u" or "wbinfo -g", but > "whoami" reveals "user doesn't exist".? Somewhere between 20-50 > minutes later, it just "magically" works.? The timing doesn't seem > consistent.? Even a reboot doesn't fix things when it's in this state. > > I've tried to follow the Samba logs, but I really can't figure out > what's up.? Andrew? Jeremy? Anyone? > > I don't think this can be just my system.? I suspect there's a lot of > users out there running multiple DCs with a similar setup to me, > believing that it's all working, and maybe, because there hasn't been > a failure, everything works great, but who knows what will happen when > there's actually a failure. > > Jason. > >Try adding these lines to the /etc/resolv.conf on the Linux clients: options rotate options timeout:1 ||Rowland
Hi Jason, I was following this thread with interest, but it seems to have died a silent death. We might be seeing something similar on our samba domain member servers. We run (automatic) nightly reboots of our DCs, one reboots at 02:00, one at 03:00 and the third at 04:00. On our main (winbind) fileserver we often (but not always) see that at one of the above times, for a few minutes, the AD groups are gone. (I run a script on the member server that verifies the existance of our AD groups using "getent group") We know that at DC-reboot time, the two other DCs are up and running, so the reboots should really be little (or even: no) impact on the member servers. I was hoping for continued dialogue in this ticket. Curious if everybody here can actually reboot their DCs (or stop samba on them) without any consequence on their domain member servers? We have three DCs, no problems between them, they have recently been examined by sernet with basically no remarks. The DCs run 4.12.8 sernet, and the domain member server is still on 4.10.18. (yes, we will upgrade that soon) And, Jason: On 12/8/20 10:09 PM, Jason Keltz via samba wrote:> I don't think this can be just my system.? I suspect there's a lot of > users out there running multiple DCs with a similar setup to me, > believing that it's all working, and maybe, because there hasn't been a > failure, everything works great, but who knows what will happen when > there's actually a failure.I think we agree with you there. :-) Curious to the experience of others... MJ