thr3ads.net - samba - [Samba] CTDB Question: external locking tool [Oct 2020]

If this information is useful, please help other people find it:
Share via:

Robert Buck

2020-Oct-26 00:44 UTC

[Samba] CTDB Question: external locking tool

Folks,

We use a Golang-based lock tool that we wrote for CTDB. That tool interacts
with our 3.4 etcd cluster, and follows the requirements specified in the
project.

Question, does the external command line tool get called when LMASTER and
RECMASTER are false? Given a scenario where we have a set of processes that
have it set to false, then others that have it set to true, does the
locking tool get called when they're set to false?

IF you say the lock tool still gets called in both cases, then the docs
need to be updated, and we on our end need to add a special config file
option to reject lock acquisitions from those nodes that have the CTDB
options set to false, permitting only those nodes set to true to acquire
etcd locks.

Please advise?, Martin?

-- 

BOB BUCK
SENIOR PLATFORM SOFTWARE ENGINEER

SKIDMORE, OWINGS & MERRILL
7 WORLD TRADE CENTER
250 GREENWICH STREET
NEW YORK, NY 10007
T  (212) 298-9624
ROBERT.BUCK at SOM.COM

Martin Schwenke

2020-Oct-27 04:09 UTC

head link

[Samba] CTDB Question: external locking tool

Hi Bob,

On Sun, 25 Oct 2020 20:44:07 -0400, Robert Buck <robert.buck at som.com>
wrote:
> We use a Golang-based lock tool that we wrote for CTDB. That tool interacts
> with our 3.4 etcd cluster, and follows the requirements specified in the
> project.
> 
> Question, does the external command line tool get called when LMASTER and
> RECMASTER are false? Given a scenario where we have a set of processes that
> have it set to false, then others that have it set to true, does the
> locking tool get called when they're set to false?
Indeed it does.  There are 2 current things conspiring against you:

* At the start of each recovery a recovery lock consistency check is
  done. Unfortunately, this means the recovery lock can't be left unset
  on nodes that do not have the recmaster capability because then the
  consistency check would fail.

* At the end of recovery, if the recovery lock is set, all nodes will
  attempt to take the recovery lock and will expect to fail (on the
  leader/master too, since it is being taken from a different process on
  the leader/master).

  This is meant to be a sanity check but, to be honest, I'm not sure
  whether it really adds any value.  A better option might be to only
  accept recovery-related controls from the current leader/master node,
  banning any other node that is stupid enough to send such a control.

I need to think about his more...

One of the problems is that the ideas of recovery master and recovery
lock are historical and they are somewhat dated compared to current
clustering concepts. Recovery master should really be "cluster leader"
and the lock should be "cluster lock".  If we clearly change our
approach in that direction then it makes no sense to check a cluster
lock at recovery time.

I have a branch that does the technical (but not documentation) parts
of switching to cluster leader and lock... but more work is needed
before this is ready to merge.
> IF you say the lock tool still gets called in both cases, then the docs
> need to be updated, and we on our end need to add a special config file
> option to reject lock acquisitions from those nodes that have the CTDB
> options set to false, permitting only those nodes set to true to acquire
> etcd locks.
Well, the documentation (ctdb(7) manual page) does say:

  CTDB does sanity checks to ensure that the recovery lock is held as
  expected.

;-)

OK, that's pretty weak!

I'll try to get some of Amitay's time to discuss what we should do
here...

peace & happiness,
martin

Martin Schwenke

2020-Oct-29 09:31 UTC

head link

[Samba] CTDB Question: external locking tool

Hi Bob,

On Tue, 27 Oct 2020 15:09:34 +1100, Martin Schwenke via samba
<samba at lists.samba.org> wrote:
> On Sun, 25 Oct 2020 20:44:07 -0400, Robert Buck <robert.buck at
som.com>
> wrote:
> 
> > We use a Golang-based lock tool that we wrote for CTDB. That tool
interacts
> > with our 3.4 etcd cluster, and follows the requirements specified in
the
> > project.
> > 
> > Question, does the external command line tool get called when LMASTER
and
> > RECMASTER are false? Given a scenario where we have a set of processes
that
> > have it set to false, then others that have it set to true, does the
> > locking tool get called when they're set to false?  
> 
> Indeed it does.  There are 2 current things conspiring against you:
> 
> * At the start of each recovery a recovery lock consistency check is
>   done. Unfortunately, this means the recovery lock can't be left unset
>   on nodes that do not have the recmaster capability because then the
>   consistency check would fail.
> 
> * At the end of recovery, if the recovery lock is set, all nodes will
>   attempt to take the recovery lock and will expect to fail (on the
>   leader/master too, since it is being taken from a different process on
>   the leader/master).
> 
>   This is meant to be a sanity check but, to be honest, I'm not sure
>   whether it really adds any value.  A better option might be to only
>   accept recovery-related controls from the current leader/master node,
>   banning any other node that is stupid enough to send such a control.
> 
> I need to think about his more...
> 
> One of the problems is that the ideas of recovery master and recovery
> lock are historical and they are somewhat dated compared to current
> clustering concepts. Recovery master should really be "cluster
leader"
> and the lock should be "cluster lock".  If we clearly change our
> approach in that direction then it makes no sense to check a cluster
> lock at recovery time.
> 
> I have a branch that does the technical (but not documentation) parts
> of switching to cluster leader and lock... but more work is needed
> before this is ready to merge.
> 
> > IF you say the lock tool still gets called in both cases, then the
docs
> > need to be updated, and we on our end need to add a special config
file
> > option to reject lock acquisitions from those nodes that have the CTDB
> > options set to false, permitting only those nodes set to true to
acquire
> > etcd locks.  
> 
> Well, the documentation (ctdb(7) manual page) does say:
> 
>   CTDB does sanity checks to ensure that the recovery lock is held as
>   expected.
> 
> ;-)
> 
> OK, that's pretty weak!
> 
> I'll try to get some of Amitay's time to discuss what we should do
> here...
There are a few possible changes but none of them would really fix
things properly.  We have situation where the recovery lock (which used
to be released at the end of each recovery but is now released on
election loss) is almost a cluster lock, so we really shouldn't be
sanity checking it at the end of recovery.  However, there's currently
no other sane place to check it.

So, as you say, the docs need to be updated.  Do you think it would be
enough to add to the single sentence above from ctdb(7)?  Something
like the following:

  CTDB does some recovery lock sanity checks. At the beginning of
  each recovery each node checks that its recovery lock setting is
  consistent with that of the recovery master.  At the end of each
  recovery each node attempts to take the recovery lock and expects
  this to fail (because the lock is already held by another process).
  These checks are done unconditionally when the recovery lock is
  set, without regard to node capabilities (see CAPABILITIES below).

How's that?

Thanks...

peace & happiness,
martin

Maybe Matching Threads

Search for more maybe matching threads

samba - Oct 2020 - CTDB Question: external locking tool

[Samba] CTDB Question: external locking tool

[Samba] CTDB Question: external locking tool

[Samba] CTDB Question: external locking tool

Maybe Matching Threads