thr3ads.net - samba - [Samba] CTDB RecLockLatencyMs vs RecoverInterval [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Robert Buck

2020-Jun-30 21:00 UTC

[Samba] CTDB RecLockLatencyMs vs RecoverInterval

Hi

I have a question regarding CTDB RecLockLatencyMs tunable parameter. Is
there any relationship between the RecLockLatencyMs property and
the RecoverInterval property? Does one need to be larger than the other? Or
if RecLockLatencyMs were increased to 5000ms, should some other setting be
changed in proportion?

We're using a geo-distributed etcd cluster for the CTDB recovery lock and I
noticed a "*High RECLOCK latency"* (of 4s) message in syslog, and just
wanted to see if we could safely squelch the warning, and if so, how?

Thank you,

-- 

BOB BUCK
SENIOR PLATFORM SOFTWARE ENGINEER

SKIDMORE, OWINGS & MERRILL
7 WORLD TRADE CENTER
250 GREENWICH STREET
NEW YORK, NY 10007
T  (212) 298-9624
ROBERT.BUCK at SOM.COM

Martin Schwenke

2020-Jun-30 22:20 UTC

head link

[Samba] CTDB RecLockLatencyMs vs RecoverInterval

Hi Bob,

On Tue, 30 Jun 2020 17:00:11 -0400, Robert Buck via samba
<samba at lists.samba.org> wrote:
> I have a question regarding CTDB RecLockLatencyMs tunable parameter. Is
> there any relationship between the RecLockLatencyMs property and
> the RecoverInterval property? Does one need to be larger than the other? Or
> if RecLockLatencyMs were increased to 5000ms, should some other setting be
> changed in proportion?
> 
> We're using a geo-distributed etcd cluster for the CTDB recovery lock
and I
> noticed a "*High RECLOCK latency"* (of 4s) message in syslog, and
just
> wanted to see if we could safely squelch the warning, and if so, how?
RecoverInterval indicates how often nodes should monitor conditions
that indicate that a database recovery is needed.  I would suggest
leaving this at the default of 1 second.  In future we might change
this to be hard coded anyway.

Many years ago CTDB used to release the recovery lock after each
recovery.  This meant that the recovery lock had to be taken before
each recovery, so the recovery lock latency mattered more.

We changed that so the recovery lock is taken before the first recovery
after a node is elected leader (currently called recovery master), so
it is now more of a cluster lock.  We also made some changes so that
the leader is more likely to be stable across elections.  Both of these
changes make the recovery lock latency matter a lot less.

So, I don't think that warnings about recovery lock latency are as
important as they used to be.  You could safely increase
RecLockLatencyMs to 5000.

However... (and there is always a "however" ;-)

The presence of recovery lock latency warnings made one of the race
conditions in the following bug pretty obvious to me:

  https://bugzilla.samba.org/show_bug.cgi?id=14294

so, while they matter less, they still have value.

If you're using a CTDB recovery lock with high latency then you should
make sure you are using a version that contains a fix for the above bug.

Please let us know if you have more questions...

peace & happiness,
martin

Robert Buck

2020-Jul-01 02:20 UTC

head link

[Samba] CTDB RecLockLatencyMs vs RecoverInterval

Thank you, Martin.

Yes, we happen to be using Samba and CTDB v4.10.7, on Ubuntu. *Would these
happen to include the defect?*  *In your opinion, will 4s be an issue?* We
happen to be running this on top of a geo-distributed etcd cluster, and in
this particular case there was about 4200 miles between the two data
centers. We're running a distributed NFS file system over a total of three
data centers, spanning 7000+ miles. During failover testing we're seeing
failover times less than 7 seconds, which seems pretty nice to me.  *In
your experience, anything we should be tuning for? *

The file system performs great, we're just trying to tune/understand
winbind and trying to get that to work flawlessly.

Bob

On Tue, Jun 30, 2020 at 6:27 PM Martin Schwenke <martin at meltin.net>
wrote:
> Hi Bob,
>
> On Tue, 30 Jun 2020 17:00:11 -0400, Robert Buck via samba
> <samba at lists.samba.org> wrote:
>
> > I have a question regarding CTDB RecLockLatencyMs tunable parameter.
Is
> > there any relationship between the RecLockLatencyMs property and
> > the RecoverInterval property? Does one need to be larger than the
other?
> Or
> > if RecLockLatencyMs were increased to 5000ms, should some other
setting
> be
> > changed in proportion?
> >
> > We're using a geo-distributed etcd cluster for the CTDB recovery
lock
> and I
> > noticed a "*High RECLOCK latency"* (of 4s) message in
syslog, and just
> > wanted to see if we could safely squelch the warning, and if so, how?
>
> RecoverInterval indicates how often nodes should monitor conditions
> that indicate that a database recovery is needed.  I would suggest
> leaving this at the default of 1 second.  In future we might change
> this to be hard coded anyway.
>
> Many years ago CTDB used to release the recovery lock after each
> recovery.  This meant that the recovery lock had to be taken before
> each recovery, so the recovery lock latency mattered more.
>
> We changed that so the recovery lock is taken before the first recovery
> after a node is elected leader (currently called recovery master), so
> it is now more of a cluster lock.  We also made some changes so that
> the leader is more likely to be stable across elections.  Both of these
> changes make the recovery lock latency matter a lot less.
>
> So, I don't think that warnings about recovery lock latency are as
> important as they used to be.  You could safely increase
> RecLockLatencyMs to 5000.
>
> However... (and there is always a "however" ;-)
>
> The presence of recovery lock latency warnings made one of the race
> conditions in the following bug pretty obvious to me:
>
>   https://bugzilla.samba.org/show_bug.cgi?id=14294
>
> so, while they matter less, they still have value.
>
> If you're using a CTDB recovery lock with high latency then you should
> make sure you are using a version that contains a fix for the above bug.
>
> Please let us know if you have more questions...
>
> peace & happiness,
> martin
>
>
-- 

BOB BUCK
SENIOR PLATFORM SOFTWARE ENGINEER

SKIDMORE, OWINGS & MERRILL
7 WORLD TRADE CENTER
250 GREENWICH STREET
NEW YORK, NY 10007
T  (212) 298-9624
ROBERT.BUCK at SOM.COM

Possibly Parallel Threads

Search for more maybe matching threads

samba - Jun 2020 - CTDB RecLockLatencyMs vs RecoverInterval

[Samba] CTDB RecLockLatencyMs vs RecoverInterval

[Samba] CTDB RecLockLatencyMs vs RecoverInterval

[Samba] CTDB RecLockLatencyMs vs RecoverInterval

Possibly Parallel Threads