Very helpful. Thank you, Martin.
I'd like to share the information below with you and solicit your fine
feedback :-)
I provide additional detail in case there is something else you feel
strongly we should consider.
We made some changes last night, let me share those with you.
The error that is repeating itself and causing these failures is:
Takeover run starting
RELEASE_IP 10.200.1.230 failed on node 0, ret=-1
Assigning banning credits to node 0
takeover run failed, ret=-1
ctdb_takeover_run() failed
Takeover run unsuccessful
Node 0 reached 4 banning credits - banning it for 300 seconds
Banning node 0 for 300 seconds
Unassigned IP 10.206.2.124 can be served by this node
Unassigned IP 10.200.1.230 can be served by this node
IP 10.206.2.124 incorrectly on an interface
Last night we truncated the public_addresses file, then everything started
working.
And so we've been rereading the doc on the public addresses file. So it may
be we have gravely misunderstood the *public_addresses* file, we never read
that part of the documentation carefully. The *nodes* file made perfect
sense, and the point we missed is that CTDB is using floating
(unreserved/unused) addresses and assigning them to a SECOND public
interface (aliases). We did not plan a private subnet for the node traffic,
and a separate public subnet for the client traffic.
More on the changes we made last night in a sec... Let me explain our
architecture, for context. Would love some feedback, expressed concerns,
etc.
We have built a geo-distributed SMB file system, it is deployed in AWS,
over four regions globally, internally uses ObjectiveFS as a backend shared
file system and cache, and uses a custom ETCD locking helper (written in
Golang). The instances have only one network interface, private; they do
not have a second interface (possibly our mistake). The existing private
interface is AWS assigned, static, and cannot be reassigned (obviously).
Initial testing of this is promising; leadership election is not
instantaneous as you'd expect, it takes upwards of 5 seconds, b/c etcd is
operating as a geo-distributed fully meshed cluster, and the current leader
could be a continent away, but not bad.
Here is our mistake... The initial *public_addresses* file had identical
addresses as the *nodes* file, containing the private IP addresses assigned
by AWS. Not good, right? The error messages shown, above, were the result.
However, once we truncated the file,
echo '' > /etc/ctdb/public_addresses
ctdb status
Then the CTDB status showed all nodes as healthy:
Number of nodes:2
pnn:0 10.200.1.230 OK
pnn:1 10.206.2.124 OK (THIS NODE)
Generation:1547616286
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:0
And after these changes the logs simply have these messages periodically:
Disabling takeover runs for 60 seconds
Reenabling takeover runs
*Is this normal?*
(This is a modest test rig mind you, and only one Samba process per region.
In prod it will be several regions, multiple processes, etc.)
Really appreciate your help, Martin. Thank you!
On Wed, Aug 5, 2020 at 6:53 PM Martin Schwenke <martin at meltin.net>
wrote:
> Hi Bob,
>
> On Wed, 5 Aug 2020 17:10:11 -0400, Robert Buck via samba
> <samba at lists.samba.org> wrote:
>
> > Could I impose upon someone to provide some guidance? Some hint? Thank
> you
>
> Any time! :-)
>
> > Is a shared file system actually required? If etcd is used to manage
the
> > global recovery lock, is there any need at that point for a shared
file
> > system?
> >
> > In other words, are there samba or CTDB files (state) that must be on
a
> > shared file system, or can each clustered host simply have these files
> > locally?
> >
> > What must be shared? What can be optionally shared?
>
> The only thing that CTDB uses the shared filesystem for is the recovery
> lock, so if you're using etcd for the recovery lock then CTDB will not
> be using the shared filesystem.
>
> Clustered Samba (smbd in this case) expects to serve files to client
> from a shared filesystem. Although some of the metadata is stored in
> in CTDB, smbd makes some assumptions about the underlying filesystem
> (e.g. I/O coherence is required when using POSIX locking).
>
> > The doc is not clear on this.
>
> I have updated the wiki to mention this:
>
>
>
https://wiki.samba.org/index.php/Setting_up_a_cluster_filesystem#Checking_lock_coherence
>
> The page about ping_pong was already there but it doesn't look like
> there was a link to it.
>
> I also need to update the ctdb(7) manual page to point to the wiki.
>
> > In our scenario, when we attempt to start up a second node, it always
> goes
> > into a banned state. If we shut down the healthy node and restart CTDB
on
> > the "failed node" it now works. We're trying to
understand this.
>
> One reason I can think of for this is the recovery lock check during
> recovery. When recovery completes and CTDB is setting the recovery
> mode back to "normal" on each node, it does a sanity check where
it
> attempts to take the recovery lock. It should never be able to do this
> because the lock should already be held by another process on the
> master/leader node.
>
> I've documented a couple of reasons, unrelated to the recovery lock,
> why CTDB can behave badly:
>
>
> https://wiki.samba.org/index.php/Basic_CTDB_configuration#Troubleshooting
>
> So, 2 questions:
>
> * Does the 2nd node still get banned if you disable the recovery lock?
>
> If not then the problem is clearly with the recovery lock.
>
> * What do the logs say about the reason for banning the node?
>
> peace & happiness,
> martin
>
>
--
BOB BUCK
SENIOR PLATFORM SOFTWARE ENGINEER
SKIDMORE, OWINGS & MERRILL
7 WORLD TRADE CENTER
250 GREENWICH STREET
NEW YORK, NY 10007
T (212) 298-9624
ROBERT.BUCK at SOM.COM