thr3ads.net - samba - [Samba] CTDB question about "shared file system" [Aug 2020]

If this information is useful, please help other people find it:
Share via:

Robert Buck

2020-Aug-06 10:55 UTC

[Samba] CTDB question about "shared file system"

Very helpful. Thank you, Martin.

I'd like to share the information below with you and solicit your fine
feedback :-)
I provide additional detail in case there is something else you feel
strongly we should consider.

We made some changes last night, let me share those with you.
The error that is repeating itself and causing these failures is:

Takeover run starting
RELEASE_IP 10.200.1.230 failed on node 0, ret=-1
Assigning banning credits to node 0
takeover run failed, ret=-1
ctdb_takeover_run() failed
Takeover run unsuccessful
Node 0 reached 4 banning credits - banning it for 300 seconds
Banning node 0 for 300 seconds
Unassigned IP 10.206.2.124 can be served by this node
Unassigned IP 10.200.1.230 can be served by this node
IP 10.206.2.124 incorrectly on an interface

Last night we truncated the public_addresses file, then everything started
working.

And so we've been rereading the doc on the public addresses file. So it may
be we have gravely misunderstood the *public_addresses* file, we never read
that part of the documentation carefully. The *nodes* file made perfect
sense, and the point we missed is that CTDB is using floating
(unreserved/unused) addresses and assigning them to a SECOND public
interface (aliases). We did not plan a private subnet for the node traffic,
and a separate public subnet for the client traffic.

More on the changes we made last night in a sec... Let me explain our
architecture, for context. Would love some feedback, expressed concerns,
etc.

We have built a geo-distributed SMB file system, it is deployed in AWS,
over four regions globally, internally uses ObjectiveFS as a backend shared
file system and cache, and uses a custom ETCD locking helper (written in
Golang). The instances have only one network interface, private; they do
not have a second interface (possibly our mistake). The existing private
interface is AWS assigned, static, and cannot be reassigned (obviously).

Initial testing of this is promising; leadership election is not
instantaneous as you'd expect, it takes upwards of 5 seconds, b/c etcd is
operating as a geo-distributed fully meshed cluster, and the current leader
could be a continent away, but not bad.

Here is our mistake... The initial *public_addresses* file had identical
addresses as the *nodes* file, containing the private IP addresses assigned
by AWS. Not good, right? The error messages shown, above, were the result.

However, once we truncated the file,

echo '' > /etc/ctdb/public_addresses
ctdb status

Then the CTDB status showed all nodes as healthy:

Number of nodes:2
pnn:0 10.200.1.230     OK
pnn:1 10.206.2.124     OK (THIS NODE)
Generation:1547616286
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:0

And after these changes the logs simply have these messages periodically:

Disabling takeover runs for 60 seconds
Reenabling takeover runs

*Is this normal?*

(This is a modest test rig mind you, and only one Samba process per region.
In prod it will be several regions, multiple processes, etc.)

Really appreciate your help, Martin. Thank you!

On Wed, Aug 5, 2020 at 6:53 PM Martin Schwenke <martin at meltin.net>
wrote:
> Hi Bob,
>
> On Wed, 5 Aug 2020 17:10:11 -0400, Robert Buck via samba
> <samba at lists.samba.org> wrote:
>
> > Could I impose upon someone to provide some guidance? Some hint? Thank
> you
>
> Any time!  :-)
>
> > Is a shared file system actually required? If etcd is used to manage
the
> > global recovery lock, is there any need at that point for a shared
file
> > system?
> >
> > In other words, are there samba or CTDB files (state) that must be on
a
> > shared file system, or can each clustered host simply have these files
> > locally?
> >
> > What must be shared? What can be optionally shared?
>
> The only thing that CTDB uses the shared filesystem for is the recovery
> lock, so if you're using etcd for the recovery lock then CTDB will not
> be using the shared filesystem.
>
> Clustered Samba (smbd in this case) expects to serve files to client
> from a shared filesystem.  Although some of the metadata is stored in
> in CTDB, smbd makes some assumptions about the underlying filesystem
> (e.g. I/O coherence is required when using POSIX locking).
>
> > The doc is not clear on this.
>
> I have updated the wiki to mention this:
>
>
>
https://wiki.samba.org/index.php/Setting_up_a_cluster_filesystem#Checking_lock_coherence
>
> The page about ping_pong was already there but it doesn't look like
> there was a link to it.
>
> I also need to update the ctdb(7) manual page to point to the wiki.
>
> > In our scenario, when we attempt to start up a second node, it always
> goes
> > into a banned state. If we shut down the healthy node and restart CTDB
on
> > the "failed node" it now works. We're trying to
understand this.
>
> One reason I can think of for this is the recovery lock check during
> recovery.  When recovery completes and CTDB is setting the recovery
> mode back to "normal" on each node, it does a sanity check where
it
> attempts to take the recovery lock.  It should never be able to do this
> because the lock should already be held by another process on the
> master/leader node.
>
> I've documented a couple of reasons, unrelated to the recovery lock,
> why CTDB can behave badly:
>
>
> https://wiki.samba.org/index.php/Basic_CTDB_configuration#Troubleshooting
>
> So, 2 questions:
>
> * Does the 2nd node still get banned if you disable the recovery lock?
>
>   If not then the problem is clearly with the recovery lock.
>
> * What do the logs say about the reason for banning the node?
>
> peace & happiness,
> martin
>
>
-- 

BOB BUCK
SENIOR PLATFORM SOFTWARE ENGINEER

SKIDMORE, OWINGS & MERRILL
7 WORLD TRADE CENTER
250 GREENWICH STREET
NEW YORK, NY 10007
T  (212) 298-9624
ROBERT.BUCK at SOM.COM

Martin Schwenke

2020-Aug-08 06:52 UTC

head link

[Samba] CTDB question about "shared file system"

Hi Bob,

On Thu, 6 Aug 2020 06:55:31 -0400, Robert Buck <robert.buck at som.com>
wrote:
> And so we've been rereading the doc on the public addresses file. So it
may
> be we have gravely misunderstood the *public_addresses* file, we never read
> that part of the documentation carefully. The *nodes* file made perfect
> sense, and the point we missed is that CTDB is using floating
> (unreserved/unused) addresses and assigning them to a SECOND public
> interface (aliases). We did not plan a private subnet for the node traffic,
> and a separate public subnet for the client traffic.
> [...]
> Here is our mistake... The initial *public_addresses* file had identical
> addresses as the *nodes* file, containing the private IP addresses assigned
> by AWS. Not good, right? The error messages shown, above, were the result.
Yep, that would definitely cause chaos.  ;-)

CTDB is really designed to have the node traffic go over a private
network.  There is no authentication between nodes (other than checking
that a connecting node is listed in the nodes file) and there is no
encryption between nodes.  Contents of files will not be transferred
between nodes by CTDB if filenames are sensitive then they could be
exposed if they are not on a private network.

In the future we plan to have some authentication between nodes when
they connect.  Most likely a shared secret used to generate something
from the nodes file.
> [...]
> 
> And after these changes the logs simply have these messages periodically:
> 
> Disabling takeover runs for 60 seconds
> Reenabling takeover runs
> 
> *Is this normal?*
How frequently are these messages logged?  They should occur as nodes
join but should stop after that.  If they continue are there any clues
indicating why takeover runs occurs?  A takeover run is just what CTDB
currently calls a recalculation of the floating IP addresses for
fail-over.

peace & happiness,
martin

Robert Buck

2020-Aug-08 11:07 UTC

head link

[Samba] CTDB question about "shared file system"

On Sat, Aug 8, 2020 at 2:52 AM Martin Schwenke <martin at meltin.net>
wrote:
> Hi Bob,
>
> On Thu, 6 Aug 2020 06:55:31 -0400, Robert Buck <robert.buck at
som.com>
> wrote:
>
> > And so we've been rereading the doc on the public addresses file.
So it
> may
> > be we have gravely misunderstood the *public_addresses* file, we never
> read
> > that part of the documentation carefully. The *nodes* file made
perfect
> > sense, and the point we missed is that CTDB is using floating
> > (unreserved/unused) addresses and assigning them to a SECOND public
> > interface (aliases). We did not plan a private subnet for the node
> traffic,
> > and a separate public subnet for the client traffic.
>
> > [...]
>
> > Here is our mistake... The initial *public_addresses* file had
identical
> > addresses as the *nodes* file, containing the private IP addresses
> assigned
> > by AWS. Not good, right? The error messages shown, above, were the
> result.
>
> Yep, that would definitely cause chaos.  ;-)
>
> CTDB is really designed to have the node traffic go over a private
> network.  There is no authentication between nodes (other than checking
> that a connecting node is listed in the nodes file) and there is no
> encryption between nodes.  Contents of files will not be transferred
> between nodes by CTDB if filenames are sensitive then they could be
> exposed if they are not on a private network.
>
> In the future we plan to have some authentication between nodes when
> they connect.  Most likely a shared secret used to generate something
> from the nodes file.
>
> > [...]
> >
> > And after these changes the logs simply have these messages
periodically:
> >
> > Disabling takeover runs for 60 seconds
> > Reenabling takeover runs
> >
> > *Is this normal?*
>
> How frequently are these messages logged?  They should occur as nodes
> join but should stop after that.  If they continue are there any clues
> indicating why takeover runs occurs?  A takeover run is just what CTDB
> currently calls a recalculation of the floating IP addresses for
> fail-over.
>
Hi Martin, thank you for your helpful feedback, this is great.

Yes, those log messages, they were occurring once per second (precisely).

Then after several hours they stopped after these messages in the log:

ctdbd[1220]: 10.206.2.124:4379: node 10.200.1.230:4379 is dead: 0 connected
ctdbd[1220]: Tearing down connection to dead node :0
ctdb-recoverd[1236]: Current recmaster node 0 does not have CAP_RECMASTER,
but we (node 1) have - force an election
ctdbd[1220]: Recovery mode set to ACTIVE
ctdbd[1220]: This node (1) is now the recovery master
ctdb-recoverd[1236]: Election period ended
ctdb-recoverd[1236]: Node:1 was in recovery mode. Start recovery process
ctdb-recoverd[1236]: ../../ctdb/server/ctdb_recoverd.c:1347 Starting
do_recovery
ctdb-recoverd[1236]: Attempting to take recovery lock
(!/usr/local/bin/lockctl elect --endpoints REDACTED:2379 SM
ctdbd[1220]: High RECLOCK latency 4.268180s for operation recd reclock
ctdb-recoverd[1236]: Recovery lock taken successfully
ctdb-recoverd[1236]: ../../ctdb/server/ctdb_recoverd.c:1422 Recovery
initiated due to problem with node 0
ctdb-recoverd[1236]: ../../ctdb/server/ctdb_recoverd.c:1447 Recovery -
created remote databases
ctdb-recoverd[1236]: ../../ctdb/server/ctdb_recoverd.c:1476 Recovery -
updated flags
ctdb-recoverd[1236]: Set recovery_helper to
"/usr/libexec/ctdb/ctdb_recovery_helper"
...
recover database 0x2ca251cf
...
Thaw db: smbXsrv_client_global.tdb generation 999520140
Release freeze handle for db smbXsrv_client_global.tdb
19 of 19 databases recovered
Recovery mode set to NORMAL
...
No nodes available to host public IPs yet
...
Reenabling recoveries after timeout
...

Then it's a clean syslog after that.

Thank you!

>
> peace & happiness,
> martin
>
>
-- 

BOB BUCK
SENIOR PLATFORM SOFTWARE ENGINEER

SKIDMORE, OWINGS & MERRILL
7 WORLD TRADE CENTER
250 GREENWICH STREET
NEW YORK, NY 10007
T  (212) 298-9624
ROBERT.BUCK at SOM.COM

Apparently Analagous Threads

Search for more maybe matching threads

samba - Aug 2020 - CTDB question about "shared file system"

[Samba] CTDB question about "shared file system"

[Samba] CTDB question about "shared file system"

[Samba] CTDB question about "shared file system"

Apparently Analagous Threads