thr3ads.net - samba - [Samba] ctdb tcp kill: remaining connections [Mar 2023]

If this information is useful, please help other people find it:
Share via:

Martin Schwenke

2023-Feb-28 22:03 UTC

[Samba] ctdb tcp kill: remaining connections

On Fri, 17 Feb 2023 18:07:17 +0000, Ulrich Sibiller
<ulrich.sibiller at atos.net> wrote:
> Feb 13 12:27:36 serverpnn0 ctdb-recoverd[24951]: Disabling takeover runs
for 60 seconds
> Feb 13 12:27:36 serverpnn1 ctdb-recoverd[85688]: Takeover run starting
> Feb 13 12:27:36 serverpnn2 ctdb-recoverd[29692]: Disabling takeover runs
for 60 seconds
> Feb 13 12:27:36 serverpnn3 ctdb-recoverd[3482705]: Disabling takeover runs
for 60 seconds
> Feb 13 12:27:46 serverpnn0 ctdb-recoverd[24951]: Reenabling takeover runs
> Feb 13 12:27:46 serverpnn1 ctdb-recoverd[85688]: Takeover run completed
successfully
> 
> -> 10s  
> 
> Feb 13 12:27:46 serverpnn2 ctdb-recoverd[29692]: Reenabling takeover runs
> Feb 13 12:27:46 serverpnn3 ctdb-recoverd[3482705]: Reenabling takeover runs
> Feb 13 12:27:50 serverpnn0 ctdb-recoverd[24951]: Disabling takeover runs
for 60 seconds
> 
> Feb 13 12:27:50 serverpnn1 ctdb-recoverd[85688]: Takeover run starting
> Feb 13 12:27:50 serverpnn2 ctdb-recoverd[29692]: Disabling takeover runs
for 60 seconds
> Feb 13 12:27:50 serverpnn3 ctdb-recoverd[3482705]: Disabling takeover runs
for 60 seconds
> Feb 13 12:27:56 serverpnn0 ctdb-recoverd[24951]: Reenabling takeover runs
> Feb 13 12:27:56 serverpnn1 ctdb-recoverd[85688]: Takeover run completed
successfully
> 
> ->6s  
These times look very reasonable.
> > support capture on InfiniBand networks. It might be worth building a
> > static (to avoid shared library issues) ctdb_killtcp binary from
4.18.x
> > source (after configuring with --enable-pcap) and installing that in
> > place of the existing ctdb_killtcp binary.  The version in 4.18 also
has
> > slightly better debugging for unknown packets.  
> 
> Sounds promising! I will try that! I have not found enough time today
> to do the compile, will do that next week.
Any progress with this?

If this doesn't work then it might be possible to increase the number
of attempts from 50 to 100 here:

        state->max_attempts = 50;

This means each ctdb_killtcp could take up to 10s.  This increases the
chances of the releaseip event timing out.

The concern is that packets are being dropped because there is simply
too much traffic.  If this turns out to be the case then I wonder if
it can be improved in the pcap implementation by adding a capture
filter.
>  > > We are using two teaming-Interfaces (one for each LAN). Each
> > > team-Interface has 2x10Gbit LACP.  
> > 
> > I'm not sure how this would affect the packet capture.  I would
guess
> > that the frame format would be one of the ones that is now supported.
> 
> It should be transparent to the consumers.
Hopefully.
> > Fair enough... but a whole minute is a long time to be running
> > ctdb_killtcp during failover...  
> 
> $ grep "Feb 13 17:36:08" /var/log/messages | grep -i ignor | wc
-l
> 7293
> 
> I should have written "one run" or similar, all these lines have
been logged (with additional debug lines added by me) with the same timestamp
and show up between:
> 
> Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: team0:
blocking [z.z.139.15] on interface [team0]
> Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: ip_block
z.z.139.15 team0
> Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: team0:
killing tcp connections for ip [z.z.139.15] on interface [team0]
> Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug:
killcount [22] twoway: [22]
> Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug:
reset_connections_send: Adding 44 connections to hash
> 
> and
> 
> Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: Killed
22/22 TCP connections to released IP z.z.139.15
> Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: team0:
deleting ip [z.z.139.15] from interface [team0] maskbits [16]
> Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: team0:
unblocking [z.z.139.15] on interface [team0]
> Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug:
ip_unblock z.z.139.15 team0
> Feb 13 17:36:08 serverpnn2 ctdb-eventd[29607]: 10.interface.debug: team0:
flushing route cache
> Feb 13 17:36:08 serverpnn2 ctdbd[29605]: Sending RELEASE_IP message for
z.z.139.15
> 
> In that very same second we see these message counts on the node running
ctdb_killtcp:
> $ grep "Feb 13 17:36:08" /var/log/messages | cut -d: -f4- | sed
's/192.*$//' | sort | uniq -c
>       1  10.interface.debug: ctdb_sys_open_capture_socket: Created RAW
SOCKET FD:5 for tcp tickle
>    1808  10.interface.debug: Ignoring packet: 
>       1  10.interface.debug: ip_block <<< DEBUG output added by me
>       1  10.interface.debug: ip_unblock <<< DEBUG output added by
me
>       1  10.interface.debug: killcount [22] twoway: [22] <<< DEBUG
output added by me
>       1  10.interface.debug: Killed 22/22 TCP connections to released IP 
>    5485  10.interface.debug: reset_connections_capture_tcp_handler:
Ignoring packet for unknown connection:
>       1  10.interface.debug: reset_connections_send: Adding 44 connections
to hash
>      44  10.interface.debug: reset_connections_send: Adding connection to
hash:
>     905  10.interface.debug: reset_connections_tickle_connection: Sending
tickle ACK for connection '
>       1  10.interface.debug: running
/etc/ctdb/events/legacy/10.interface.debug.script with arguments: releaseip
team0  <<< DEBUG output added by me
>      28  10.interface.debug: Sending a TCP RST to for connection 
>       1  10.interface.debug: team0: blocking [      <<< DEBUG
output added by me
>       1  10.interface.debug: team0: deleting ip [   <<< DEBUG
output added by me
>       1  10.interface.debug: team0: flushing route cache   <<<
DEBUG output added by me
>       1  10.interface.debug: team0: killing tcp connections for ip [  
<<< DEBUG output added by me
>       1  10.interface.debug: team0: unblocking [      <<< DEBUG
output added by me
>       1  ../../ctdb/server/ctdb_takeover.c:295 public address '
>       1  Sending RELEASE_IP message for
> So in this case only 22 connections to be killed (two-way kill)
> caused more than 7000 lines of debugging output...
Yes, this is one problem with attempting to capture every packet on all
interfaces.  The pcap code restricts by interface, so this is another
reason it could be better.

Another improvement might be to change this message:

	DBG_DEBUG("Sending tickle ACK for connection '%s'\n",

to INFO level and then (in the first instance) debug this at INFO level
instead.  That might be a good change.  Queued.
> (BTW, there's a typo in  cdtb_killtcp: " Sending a TCP RST to for
> connection" - one of "to" and "for" is obsolete 
Thanks.  There's now a patch in my queue for this.  OK to add this tag
to the commit?

Reported-by: Ulrich Sibiller <ulrich.sibiller at atos.net>
> > In preparation for takeover, for NFS we only remember connections on
> > 2049.  The other ports, such as lockd, would be handled on reboot
> > (rather than crash) by the releasing side... but I doubt anyone has
> > thought about this!  Perhaps we seen that part handling by UDP in the
> > past?  Please feel free to open a bug to remind me to look at this.  
> 
> Ok, I will.
> Your comment is interesting because it indicates ctdb might be
> designed with the idea of always rebooting after a failover in mind.
> We are also using ctdb moveip or a systemctl stop ctdb to perform
> updates without reboot (e.g. for gpfs). So are we "holding it
wrong"?
We're confused.  :-)  No, I meant that nobody has thought about
resetting lockd connections.  Luckily, they will be handled on a clean
reboot because of the ctdb_killtcp 2-way thing.  We really should
update the connection tracking code to track all connections to public
addresses (VIPs).  That's difficult with the current structure, but in
a branch I have a shiny new but bit-rotted connection tracking daemon
that can do just this for both releasing and takeover nodes.  We'll get
there...
> > I suspect this could be improved to say "Never mount the same NFS
> > directory ...".  I think you are OK with mounting the
subdirectories
> > for each user.  
> 
> Good!
Updated.

Thanks for trying to help us make this better...  :-)

peace & happiness,
martin

Ulrich Sibiller

2023-Mar-01 16:18 UTC

head link

[Samba] ctdb tcp kill: remaining connections

Martin Schwenke schrieb am 28.02.2023 23:03:> On Fri, 17 Feb 2023 18:07:17 +0000, Ulrich Sibiller
> <ulrich.sibiller at atos.net> wrote:
>> [static, current ctdb_killtcp]
> Any progress with this?
Yes, we have that version running for some days now. It works significantly
better than the previous version of ctdb 4.15. Of all the 139 takeovers in the
last 2 days (unfortunately there were so many due to some other problem) I have
seen only 4 that did not kill all expected connections and the number of missing
connections was significantly smaller than before:

Feb 27 14:49:18 serverpnn1 ctdb-eventd[85619]: 10.interface.debug: Killed 0/4
TCP connections to released IP x.x.253.252
Feb 27 15:14:14 serverpnn1 ctdb-eventd[85619]: 10.interface.debug: Killed
450/463 TCP connections to released IP x.x.253.252
Feb 27 15:27:09 serverpnn1 ctdb-eventd[85619]: 10.interface.debug: Killed 62/63
TCP connections to released IP y.y.139.15
2023-02-28T18:42:15.065129+01:00 serverpnn2 ctdb-eventd[29607]:
10.interface.debug: Killed 157/158 TCP connections to released IP x.x.253.95
> If this doesn't work then it might be possible to increase the number
> of attempts from 50 to 100 here:
> 
>         state->max_attempts = 50;
> This means each ctdb_killtcp could take up to 10s.  This increases the
> chances of the releaseip event timing out.
What's the timeout for releaseip? I have not seen that value yet.
max_attempts and release/takeover timeouts should be configurables anyway,
I'd say.  BTW: I have seen this just today, which seems to be hardcoded,
too:
2023-03-01T02:50:14.938438+01:00 serverpnn0 ctdbd[24778]: Maximum monitor
timeout count 20 reached. Making node unhealthy
2023-03-01T02:50:14.938593+01:00 serverpnn0 ctdbd[24778]: monitor event failed -
disabling node
2023-03-01T02:50:14.938812+01:00 serverpnn0 ctdbd[24778]: Node became UNHEALTHY.
Ask recovery master to reallocate IPs
 > The concern is that packets are being dropped because there is simply
> too much traffic.  If this turns out to be the case then I wonder if it
> can be improved in the pcap implementation by adding a capture filter.
Don't know. However, I could improve my current debugging versions of all
the scripts to record the number of dropped packages on the interfaces, too.
 > Another improvement might be to change this message:
> 
>         DBG_DEBUG("Sending tickle ACK for connection
'%s'\n",
> to INFO level and then (in the first instance) debug this at INFO level
> instead.  That might be a good change.  Queued.
Hmm, for debugging it could come handy to record those messages (that tend to
flood logs) separately for reference.
 > Thanks.  There's now a patch in my queue for this.  OK to add this tag
> to the commit?
> 
> Reported-by: Ulrich Sibiller <ulrich.sibiller at atos.net>
Yes ;-)
>>> past?  Please feel free to open a bug to remind me to look at this.
Done: https://bugzilla.samba.org/show_bug.cgi?id=15320
>> Your comment is interesting because it indicates ctdb might be
>> designed with the idea of always rebooting after a failover in mind.
>> We are also using ctdb moveip or a systemctl stop ctdb to perform
>> updates without reboot (e.g. for gpfs). So are we "holding it
wrong"?
> We're confused.  :-)  No, I meant that nobody has thought about
> resetting lockd connections.  Luckily, they will be handled on a clean
> reboot because of the ctdb_killtcp 2-way thing.  We really should update
> the connection tracking code to track all connections to public
> addresses (VIPs).  That's difficult with the current structure, but in
a
> branch I have a shiny new but bit-rotted connection tracking daemon that
> can do just this for both releasing and takeover nodes.  We'll get
> there...
I am not really sure if that is necessary. The code in 10.interface that kills
the connections uses this from /etc/ctdb/functions:
get_tcp_connections_for_ip ()
{
    _ip="$1"

    ss -tn state established "src [$_ip]" | awk 'NR > 1 {print
$3, $4}'
}

which ignores the port and thus matches all connections for the ip anyway. On
the other hand there's

       update_tickles 2049

in /etc/ctdb/events/legacy/60.nfs without a corresponding tickle handling for
lockd connections. I am thinking about adding an update_tickles 599 for lockd
connections (what's the best way to determine that port?). Any objections?
> Thanks for trying to help us make this better...  :-)
You're welcome. At least we both profit from this ?

One further observation I had: For tests I had temporarily switched from
IPAllocAlgorithm=2 to 1. While it worked it also resulted in a much worse
distribution of the IPs. Some nodes had two IPs in the same network while others
had none in that network. In that setup I found identical tickles being sent up
to three times on TAKEIP. Is this on purpose? See attached logfile for details.

Kind regards,

Ulrich Sibiller
-- 
Dipl.-Inf. Ulrich Sibiller           science + computing ag
System Administration                    Hagellocher Weg 73
Hotline +49 7071 9457 681          72070 Tuebingen, Germany
https://atos.net/de/deutschland/sc

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: tickles_sent_multiple_times.txt
URL:
<http://lists.samba.org/pipermail/samba/attachments/20230301/5109bb2c/tickles_sent_multiple_times.txt>

samba - Mar 2023 - ctdb tcp kill: remaining connections

[Samba] ctdb tcp kill: remaining connections

[Samba] ctdb tcp kill: remaining connections