thr3ads.net - samba - [Samba] samba failover with ctdb and client-visible errors [May 2024]

If this information is useful, please help other people find it:
Share via:

Sage Weil

2024-May-03 21:17 UTC

[Samba] samba failover with ctdb and client-visible errors

Hi everyone,

I'm setting up a clustered Samba+CTDB in front of CephFS and am
running into an issue during failover.  For the most part everything
seems to work: the IP moves quickly, smbd is started on the right
node, etc, but if there is an IO load from a client during failover
(e.g., copying a big directory full of files in File Explorer), it
pauses for a couple of seconds and then pops up an error dialog box.
If I hit 'Try Again' everything continues without problems.
However... I assume that a client-visible error like this will cause
problems with most applications (that may not be persistent enough to
retry everything).  I did a google search and the only thing I found
was something suggesting passing a flag to xcopy that forces a retry
on error.

Here's what the dialog looks like when I reboot one of the gateway nodes:
  https://i.ibb.co/kh4fFPW/tryagain.png
If I click 'Try Again' everything proceeds.

Here's my smb.conf:

root at smbgw2:/etc/samba# cat smb.conf
[global]
  clustering = yes
  include = registry
root at smbgw2:/etc/samba# net conf list
[global]
netbios name = smbgw
clustering = yes
idmap config * : backend = tdb2
passdb backend = tdbsam
load printers = no
smbd: backgroundqueue = no

[Audio]
path = /mnt/audio
read only = no
oplocks = no
kernel share modes = no



CTDB config looks like so:

# See ctdb.conf(5) for documentation
#
# See ctdb-script.options(5) for documentation about event script
# options

[logging]
# Enable logging to syslog
location = syslog

# Default log level
log level = NOTICE

[cluster]
# Shared recovery lock file to avoid split brain.  Daemon
# default is no recovery lock.  Do NOT run CTDB without a
# recovery lock file unless you know exactly what you are
# doing.
#
# Please see the RECOVERY LOCK section in ctdb(7) for more
# details.
#
# recovery lock = !/bin/false RECOVERY LOCK NOT CONFIGURED
recovery lock = /mnt/audio/.ctdb/recovery_lock

^ /mnt/audio is the CephFS mount I am reexporting.

CTDB has a single IP in public_addresses that is moving around between
the gateway nodes as expected--from what I can tell that is all
working well.

The only other issue I've identified is that I seem to have to create
the user (and set the password with smbpasswd) on each of the
gateways... even though I expected that the 'passdb backend = tdbsam'
line would keep user and password info in ctdb somewhere.  Am I
missing something there?

Thanks!
sage

Martin Schwenke

2024-May-04 02:05 UTC

head link

[Samba] samba failover with ctdb and client-visible errors

Hi Sage,

On Fri, 3 May 2024 16:17:45 -0500, Sage Weil via samba
<samba at lists.samba.org> wrote:
> I'm setting up a clustered Samba+CTDB in front of CephFS and am
> running into an issue during failover.  For the most part everything
> seems to work: the IP moves quickly, smbd is started on the right
> node, etc, but if there is an IO load from a client during failover
> (e.g., copying a big directory full of files in File Explorer), it
> pauses for a couple of seconds and then pops up an error dialog box.
> If I hit 'Try Again' everything continues without problems.
> However... I assume that a client-visible error like this will cause
> problems with most applications (that may not be persistent enough to
> retry everything).  I did a google search and the only thing I found
> was something suggesting passing a flag to xcopy that forces a retry
> on error.
> 
> Here's what the dialog looks like when I reboot one of the gateway
nodes:
>   https://i.ibb.co/kh4fFPW/tryagain.png
> If I click 'Try Again' everything proceeds.
Error handling seems to be application-dependent on Windows.  If you're
doing lots of copying then the hint you found for xcopy is probably a
good idea.  Many applications will silently reconnect.

One issue is that CTDB's failover is done at the TCP networking level,
so it is impossible to hide errors from applications.

The dream is to get transparent failover with Microsoft's Witness
Protocol (available in Samba ? 4.20) and persistent file handles (not
yet in Samba).
> Here's my smb.conf:
> 
> root at smbgw2:/etc/samba# cat smb.conf
> [global]
>   clustering = yes
>   include = registry
> root at smbgw2:/etc/samba# net conf list
> [global]
> netbios name = smbgw
> clustering = yes
> idmap config * : backend = tdb2
For default domain ID mapping, you probably want autorid these days:

  https://www.samba.org/samba/docs/current/man-html/idmap_autorid.8.html
> [...]
> CTDB config looks like so:
> CTDB has a single IP in public_addresses that is moving around between
> the gateway nodes as expected--from what I can tell that is all
> working well.
If CephFS is sane (i.e. has proper locking coherency - others will be
able to make better comments about this) then clustered Samba can
happily be active-active, so you can multiple IPs in public_addresses,
so multiple clients can access via different gateway nodes in parallel.
> The only other issue I've identified is that I seem to have to create
> the user (and set the password with smbpasswd) on each of the
> gateways... even though I expected that the 'passdb backend =
tdbsam'
> line would keep user and password info in ctdb somewhere.  Am I
> missing something there?
There currently isn't a way of exposing local users at the OS level,
and an OS user is needed for file permissions.  We have thought of
faking this via winbind, but it keeps sliding down the priority queue.

Setting up a Samba Active Directory server isn't especially difficult,
so tends to be a good option.

I hope some of that is useful...  :-)

peace & happiness,
martin

Maybe Matching Threads

Search for more apparently analagous threads

samba - May 2024 - samba failover with ctdb and client-visible errors

[Samba] samba failover with ctdb and client-visible errors

[Samba] samba failover with ctdb and client-visible errors

Maybe Matching Threads