On 12/8/2020 4:35 PM, Rowland penny via samba wrote:> On 08/12/2020 21:09, Jason Keltz via samba wrote:
>> I'm running Samba 4.11.16 on CentOS 7 and not having much luck with
>> failover to a second domain controller.? I could *really* use some
help.
>>
>> I know my Samba config is fine.? I know that adding the second domain
>> controler was fine.? Replication is working perfectly. No errors.??
>> If I stop the DC processes on either server, Windows clients appear
>> to failover perfectly fine.
>>
>> The problem seems to affect my Linux clients (CentOS 7) running
winbind.
>>
>> Let's say a CentOS 7 client X is connected to dc2, and I stop the
DC
>> processes on dc2....? The odd time, the client will connect to dc1
>> almost right away, and everything just works the way it should always
>> work.
>>
>> However, most of the time, I stop the DC processes on dc2, the client
>> will connect to dc1, I can even do a "wbinfo -u" or
"wbinfo -g", but
>> "whoami" reveals "user doesn't exist".
Somewhere between 20-50
>> minutes later, it just "magically" works.? The timing
doesn't seem
>> consistent.? Even a reboot doesn't fix things when it's in this
state.
>>
>> I've tried to follow the Samba logs, but I really can't figure
out
>> what's up.? Andrew? Jeremy? Anyone?
>>
>> I don't think this can be just my system.? I suspect there's a
lot of
>> users out there running multiple DCs with a similar setup to me,
>> believing that it's all working, and maybe, because there
hasn't been
>> a failure, everything works great, but who knows what will happen
>> when there's actually a failure.
>>
>> Jason.
>>
>>
> Try adding these lines to the /etc/resolv.conf on the Linux clients:
>
> options rotate
>
> options timeout:1
>
> ||Rowland
Hi Rowland,
Thanks for your message! Unfortunately, this didn't work.
Here's something that may help jog your memory if you've heard of this
happening before.....
So my machine was connected to dc2... I stopped DC services on dc2, and
sure enough, I see the connection host->dc1:microsoft-ds, and
host->dc2:ldap ... perfect! buuuttt I still get "user jas does not
exist".? wbinfo -u is giving me nothing now, yet wbinfo -g is working
fine.? I checked back in a few mins, and now "wbinfo -u" is giving me
the full user list.? I'm still an "unknown user" though, and calls
to
"getent passwd jas" or "getent passwd <any user>" fail
even though calls
to "getent group <any group>" all work.? There *is* a
connection.?
Eventually, it will realize that I exist without anything changing.? I
highly suspect it's some kind of cache that needs to timeout... some
kind of cache that doesn't get reset if winbind is just restarted.? You
know I've got the right nsswitch.conf, but here it is ...
passwd:???? files winbind
shadow:???? files
group:????? files winbind
... and I know I've got all the proper links as well (or things wouldn't
magically start working some time later).
This sure has me puzzled.
Jason.