Hey Martin,
Thanks for reaching out! Sure thing, here is the ctdb.conf in question:
[legacy]
??? #realtime scheduling = true will cause ctdb to fail when docker
containers are running
??? realtime scheduling = false
[cluster]
??? node address = 192.168.45.230
??? recovery lock = !/usr/libexec/ctdb/ctdb_mutex_ceph_rados_helper
ceph client.samba cephfs.cephfs.meta ctdblock
I did do some messing around and gave it an incorrect IP
(192.168.45.2322), and it did error and stop CTDB per the code (due to
invalid IP). Just appears when giving an IP address it's not changing
the behaviour.
But perhaps it's my understanding of it that is incorrect.
To give a bit more detail, we are using the ingress service from
cephadm, and CTDB on the same nodes. This ingress service utilizes the
sysctl value mentioned, net.ipv4.ip_nonlocal_bind=1.
What is eventually occurring is CTDB crashing due to being unable to
assign the VIP to the interface on the host.
Once turning the value back to 0, CTDB does function correctly too.
It may just be that there is another completely separate issue we are
running into, but I was just hopeful based on the docs mentioning that
specific value it may have just been that.
Here is some logs from right before the crash too if that helps:
2025-10-13T18:41:42.058594-03:00 adm-gw1 ctdbd[479406]: startup event OK
- enabling monitoring
2025-10-13T18:41:42.058654-03:00 adm-gw1 ctdbd[479406]: Set runstate to
RUNNING (5)
2025-10-13T18:41:42.097671-03:00 adm-gw1 ctdb-recoverd[479490]: Takeover
run completed successfully
2025-10-13T18:41:42.730948-03:00 adm-gw1 ctdb-recoverd[479490]: IP
192.168.45.235 incorrectly on an interface
2025-10-13T18:41:42.731016-03:00 adm-gw1 ctdb-recoverd[479490]: Trigger
takeoverrun
2025-10-13T18:41:42.731208-03:00 adm-gw1 ctdb-recoverd[479490]: Takeover
run starting
2025-10-13T18:41:42.742354-03:00 adm-gw1 ctdb-takeover[479783]: No nodes
available to host public IPs yet
2025-10-13T18:41:42.787980-03:00 adm-gw1 ctdb-recoverd[479490]: Takeover
run completed successfully
2025-10-13T18:41:43.732009-03:00 adm-gw1 ctdb-recoverd[479490]: IP
192.168.45.235 incorrectly on an interface
2025-10-13T18:41:43.732077-03:00 adm-gw1 ctdb-recoverd[479490]: Trigger
takeoverrun
2025-10-13T18:41:43.732294-03:00 adm-gw1 ctdb-recoverd[479490]: Takeover
run starting
2025-10-13T18:41:43.745921-03:00 adm-gw1 ctdb-takeover[479794]: No nodes
available to host public IPs yet
2025-10-13T18:41:43.796166-03:00 adm-gw1 ctdb-recoverd[479490]: Takeover
run completed successfully
2025-10-13T18:41:44.210461-03:00 adm-gw1 ctdbd[479406]: monitor event OK
- node re-enabled
2025-10-13T18:41:44.211218-03:00 adm-gw1 ctdbd[479406]: Node became
HEALTHY. Ask recovery master to reallocate IPs
2025-10-13T18:41:44.732792-03:00 adm-gw1 ctdb-recoverd[479490]:
Unassigned IP 192.168.45.235 can be served by this node
2025-10-13T18:41:44.732964-03:00 adm-gw1 ctdb-recoverd[479490]: IP
192.168.45.235 incorrectly on an interface
2025-10-13T18:41:44.732987-03:00 adm-gw1 ctdb-recoverd[479490]: Trigger
takeoverrun
2025-10-13T18:41:44.733160-03:00 adm-gw1 ctdb-recoverd[479490]: Takeover
run starting
2025-10-13T18:41:44.769369-03:00 adm-gw1 ctdbd[479406]:
../../ctdb/server/ctdb_takeover.c:797 Doing updateip for IP
192.168.45.235 already on an interface
2025-10-13T18:41:44.769448-03:00 adm-gw1 ctdbd[479406]: Update of IP
192.168.45.235/16 from interface __none__ to ens18
2025-10-13T18:41:44.788619-03:00 adm-gw1 ctdb-eventd[479407]:
10.interface: ERROR: Unable to determine interface for IP 192.168.45.235
2025-10-13T18:41:44.788689-03:00 adm-gw1 ctdb-eventd[479407]: updateip
event failed
2025-10-13T18:41:44.788847-03:00 adm-gw1 ctdbd[479406]: Failed update of
IP 192.168.45.235 from interface __none__ to ens18
2025-10-13T18:41:44.788945-03:00 adm-gw1 ctdbd[479406]:
==============================================================2025-10-13T18:41:44.788966-03:00
adm-gw1 ctdbd[479406]: INTERNAL ERROR:
Signal 11: Segmentation fault in? () () pid 479406 (4.21.3)
2025-10-13T18:41:44.788985-03:00 adm-gw1 ctdbd[479406]: If you are
running a recent Samba version, and if you think this problem is not yet
fixed in the latest versions, please consider reporting this bug, see
https://wiki.samba.org/index.php/Bug_Reporting
2025-10-13T18:41:44.789003-03:00 adm-gw1 ctdbd[479406]:
==============================================================2025-10-13T18:41:44.789016-03:00
adm-gw1 ctdbd[479406]: PANIC (pid
479406): Signal 11: Segmentation fault in 4.21.3
2025-10-13T18:41:44.789489-03:00 adm-gw1 ctdbd[479406]: BACKTRACE: 21
stack frames:
?#0 /usr/lib64/samba/libgenrand-private-samba.so(log_stack_trace+0x34)
[0x7fee28193624]
?#1 /usr/lib64/samba/libgenrand-private-samba.so(smb_panic+0xd)
[0x7fee28193e0d]
?#2 /usr/lib64/samba/libgenrand-private-samba.so(+0x2fd8) [0x7fee28193fd8]
?#3 /lib64/libc.so.6(+0x3ebf0) [0x7fee27e3ebf0]
?#4 /usr/sbin/ctdbd(+0x563c7) [0x55e82cb8a3c7]
?#5 /usr/sbin/ctdbd(+0x516a0) [0x55e82cb856a0]
?#6 /usr/sbin/ctdbd(+0x51632) [0x55e82cb85632]
?#7 /usr/sbin/ctdbd(+0x54aef) [0x55e82cb88aef]
?#8 /usr/sbin/ctdbd(+0x21a25) [0x55e82cb55a25]
?#9 /usr/sbin/ctdbd(+0x227c2) [0x55e82cb567c2]
?#10 /lib64/libtevent.so.0(tevent_common_invoke_fd_handler+0x95)
[0x7fee2813c4a5]
?#11 /lib64/libtevent.so.0(+0x1055e) [0x7fee2814055e]
?#12 /lib64/libtevent.so.0(+0x782b) [0x7fee2813782b]
?#13 /lib64/libtevent.so.0(_tevent_loop_once+0x98) [0x7fee28139368]
?#14 /lib64/libtevent.so.0(tevent_common_loop_wait+0x1b) [0x7fee2813948b]
?#15 /lib64/libtevent.so.0(+0x789b) [0x7fee2813789b]
?#16 /usr/sbin/ctdbd(ctdb_start_daemon+0x68a) [0x55e82cb6b2ba]
?#17 /usr/sbin/ctdbd(main+0x4fb) [0x55e82cb4a92b]
?#18 /lib64/libc.so.6(+0x295d0) [0x7fee27e295d0]
?#19 /lib64/libc.so.6(__libc_start_main+0x80) [0x7fee27e29680]
?#20 /usr/sbin/ctdbd(_start+0x25) [0x55e82cb4afe5]
2025-10-13T18:41:44.969411-03:00 adm-gw1 ctdb-recoverd[479490]: recovery
daemon parent died - exiting
2025-10-13T18:41:44.971113-03:00 adm-gw1 ctdb-eventd[479407]: Received
signal 15
2025-10-13T18:41:44.971154-03:00 adm-gw1 ctdb-eventd[479407]: Shutting down
Regards,
Bailey Allison
Service Team Lead
45Drives, Ltd.
866-594-7199 x868
On 2025-10-13 21:11, Martin Schwenke wrote:> Hi Bailey,
>
> On Mon, 13 Oct 2025 17:58:07 -0300, Bailey Allison via samba
> <samba at lists.samba.org> wrote:
>
>> Anyone have experience using the node address = value in ctdb.conf?
>> Running into the exact issue specified in the docs:
>>
>> node address = IPADDR
>>
>> ??? IPADDR is the private IP address that ctdbd will bind to.
>>
>> ??? This option is only required when automatic address detection can
>> not be used. This can be the case when running multiple ctdbd
>> daemons/nodes on the same physical host (usually for testing) or using
>> InfiniBand for the private network. Another unlikely possibility would
>> be running on a platform with a feature like Linux's
>> net.ipv4.ip_nonlocal_bind=1 enabled and no usable getifaddrs(3)
>> implementation (or replacement) available.
>>
>> ??? Default: CTDB selects the first address from the nodes list that
it
>> can bind to. See also the PRIVATE ADDRESS section in ctdb(7).
>>
>> Specifically the section about net.ipv4_nonlocal=bind=1.
>>
>> When trying to use the node address = IPADDR conf though, it appears
>> nothing is changing. It seems from logs that it isn't even using
the
>> value, and for testing I tried renaming to a garbage value (node
garbage
>> = IPADDR) instead of the proper one, and no difference in the logs.
>>
>> Is it possible the parameter has a different value than specified in
the
>> docs? Also checked man page on system it's installed on and seeing
the
>> same value for it.
>>
>> I know the cause of this issue is resolved in 4.22.x samba, but looking
>> to see if it can also be solved without an upgrade.
> This feature is regularly used in CTDB's "local daemons" test
> environment, where we run multiple daemons on a single machine.
>
> One very basic question: Are you setting "node address" in the
[cluster]
> section of ctdb.conf? For historical reasons, the configuration
> handling doesn't warn about misplaced (or unknown) options.
>
> If this can't be explained by being in an incorrect section, can you
> please share an example of a ctdb.conf file that isn't working as
> expected?
>
> Thanks...
>
> peace & happiness,
> martin