thr3ads.net - samba - [Samba] ctdb tcp kill: remaining connections [Mar 2023]

If this information is useful, please help other people find it:
Share via:

Ulrich Sibiller

2023-Mar-01 16:18 UTC

[Samba] ctdb tcp kill: remaining connections

Martin Schwenke schrieb am 28.02.2023 23:03:> On Fri, 17 Feb 2023 18:07:17 +0000, Ulrich Sibiller
> <ulrich.sibiller at atos.net> wrote:
>> [static, current ctdb_killtcp]
> Any progress with this?
Yes, we have that version running for some days now. It works significantly
better than the previous version of ctdb 4.15. Of all the 139 takeovers in the
last 2 days (unfortunately there were so many due to some other problem) I have
seen only 4 that did not kill all expected connections and the number of missing
connections was significantly smaller than before:

Feb 27 14:49:18 serverpnn1 ctdb-eventd[85619]: 10.interface.debug: Killed 0/4
TCP connections to released IP x.x.253.252
Feb 27 15:14:14 serverpnn1 ctdb-eventd[85619]: 10.interface.debug: Killed
450/463 TCP connections to released IP x.x.253.252
Feb 27 15:27:09 serverpnn1 ctdb-eventd[85619]: 10.interface.debug: Killed 62/63
TCP connections to released IP y.y.139.15
2023-02-28T18:42:15.065129+01:00 serverpnn2 ctdb-eventd[29607]:
10.interface.debug: Killed 157/158 TCP connections to released IP x.x.253.95
> If this doesn't work then it might be possible to increase the number
> of attempts from 50 to 100 here:
> 
>         state->max_attempts = 50;
> This means each ctdb_killtcp could take up to 10s.  This increases the
> chances of the releaseip event timing out.
What's the timeout for releaseip? I have not seen that value yet.
max_attempts and release/takeover timeouts should be configurables anyway,
I'd say.  BTW: I have seen this just today, which seems to be hardcoded,
too:
2023-03-01T02:50:14.938438+01:00 serverpnn0 ctdbd[24778]: Maximum monitor
timeout count 20 reached. Making node unhealthy
2023-03-01T02:50:14.938593+01:00 serverpnn0 ctdbd[24778]: monitor event failed -
disabling node
2023-03-01T02:50:14.938812+01:00 serverpnn0 ctdbd[24778]: Node became UNHEALTHY.
Ask recovery master to reallocate IPs
 > The concern is that packets are being dropped because there is simply
> too much traffic.  If this turns out to be the case then I wonder if it
> can be improved in the pcap implementation by adding a capture filter.
Don't know. However, I could improve my current debugging versions of all
the scripts to record the number of dropped packages on the interfaces, too.
 > Another improvement might be to change this message:
> 
>         DBG_DEBUG("Sending tickle ACK for connection
'%s'\n",
> to INFO level and then (in the first instance) debug this at INFO level
> instead.  That might be a good change.  Queued.
Hmm, for debugging it could come handy to record those messages (that tend to
flood logs) separately for reference.
 > Thanks.  There's now a patch in my queue for this.  OK to add this tag
> to the commit?
> 
> Reported-by: Ulrich Sibiller <ulrich.sibiller at atos.net>
Yes ;-)
>>> past?  Please feel free to open a bug to remind me to look at this.
Done: https://bugzilla.samba.org/show_bug.cgi?id=15320
>> Your comment is interesting because it indicates ctdb might be
>> designed with the idea of always rebooting after a failover in mind.
>> We are also using ctdb moveip or a systemctl stop ctdb to perform
>> updates without reboot (e.g. for gpfs). So are we "holding it
wrong"?
> We're confused.  :-)  No, I meant that nobody has thought about
> resetting lockd connections.  Luckily, they will be handled on a clean
> reboot because of the ctdb_killtcp 2-way thing.  We really should update
> the connection tracking code to track all connections to public
> addresses (VIPs).  That's difficult with the current structure, but in
a
> branch I have a shiny new but bit-rotted connection tracking daemon that
> can do just this for both releasing and takeover nodes.  We'll get
> there...
I am not really sure if that is necessary. The code in 10.interface that kills
the connections uses this from /etc/ctdb/functions:
get_tcp_connections_for_ip ()
{
    _ip="$1"

    ss -tn state established "src [$_ip]" | awk 'NR > 1 {print
$3, $4}'
}

which ignores the port and thus matches all connections for the ip anyway. On
the other hand there's

       update_tickles 2049

in /etc/ctdb/events/legacy/60.nfs without a corresponding tickle handling for
lockd connections. I am thinking about adding an update_tickles 599 for lockd
connections (what's the best way to determine that port?). Any objections?
> Thanks for trying to help us make this better...  :-)
You're welcome. At least we both profit from this ?

One further observation I had: For tests I had temporarily switched from
IPAllocAlgorithm=2 to 1. While it worked it also resulted in a much worse
distribution of the IPs. Some nodes had two IPs in the same network while others
had none in that network. In that setup I found identical tickles being sent up
to three times on TAKEIP. Is this on purpose? See attached logfile for details.

Kind regards,

Ulrich Sibiller
-- 
Dipl.-Inf. Ulrich Sibiller           science + computing ag
System Administration                    Hagellocher Weg 73
Hotline +49 7071 9457 681          72070 Tuebingen, Germany
https://atos.net/de/deutschland/sc

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: tickles_sent_multiple_times.txt
URL:
<http://lists.samba.org/pipermail/samba/attachments/20230301/5109bb2c/tickles_sent_multiple_times.txt>

Martin Schwenke

2023-Mar-01 22:53 UTC

head link

[Samba] ctdb tcp kill: remaining connections

On Wed, 1 Mar 2023 16:18:58 +0000, Ulrich Sibiller
<ulrich.sibiller at atos.net> wrote:
> Martin Schwenke schrieb am 28.02.2023 23:03:
> Yes, we have that version running for some days now. It works
> significantly better than the previous version of ctdb 4.15. Of all
> the 139 takeovers in the last 2 days (unfortunately there were so
> many due to some other problem) I have seen only 4 that did not kill
> all expected connections and the number of missing connections was
> significantly smaller than before:
> Feb 27 14:49:18 serverpnn1 ctdb-eventd[85619]: 10.interface.debug: Killed
0/4 TCP connections to released IP x.x.253.252
> Feb 27 15:14:14 serverpnn1 ctdb-eventd[85619]: 10.interface.debug: Killed
450/463 TCP connections to released IP x.x.253.252
> Feb 27 15:27:09 serverpnn1 ctdb-eventd[85619]: 10.interface.debug: Killed
62/63 TCP connections to released IP y.y.139.15
> 2023-02-28T18:42:15.065129+01:00 serverpnn2 ctdb-eventd[29607]:
10.interface.debug: Killed 157/158 TCP connections to released IP x.x.253.95
Not perfect, but better...
> > If this doesn't work then it might be possible to increase the
number
> > of attempts from 50 to 100 here:
> > 
> >         state->max_attempts = 50;
> > This means each ctdb_killtcp could take up to 10s.  This increases the
> > chances of the releaseip event timing out.  
> What's the timeout for releaseip? I have not seen that value yet.
max_attempts and release/takeover timeouts should be configurables anyway,
I'd say.
The timeout for releaseip is EventScriptTimeout, which defaults to 30
(seconds).  This is the total time for all scripts running an event.

For ctdb_killtcp, when it was basically rewritten, we considered adding
options for max_attempts, but decided to see if it was foolproof.  We
could now add those options.  Patches welcome too...
> BTW: I have seen this just today, which seems to be hardcoded, too:
> 2023-03-01T02:50:14.938438+01:00 serverpnn0 ctdbd[24778]: Maximum monitor
timeout count 20 reached. Making node unhealthy
> 2023-03-01T02:50:14.938593+01:00 serverpnn0 ctdbd[24778]: monitor event
failed - disabling node
> 2023-03-01T02:50:14.938812+01:00 serverpnn0 ctdbd[24778]: Node became
UNHEALTHY. Ask recovery master to reallocate IPs
MonitorTimeoutCount defaults to 20 but can also be changed.

For this, you should check the output of:

  ctdb event status legacy monitor

All the scripts should be taking less than about 1s.

If 50.samba is taking many seconds, then this may indicate a DNS issue.
I have some work-in-progress changes to address this, and also to
monitor DNS (and other hosts).
> > Another improvement might be to change this message:
> > 
> >         DBG_DEBUG("Sending tickle ACK for connection
'%s'\n",
> > to INFO level and then (in the first instance) debug this at INFO
level
> > instead.  That might be a good change.  Queued.  
> 
> Hmm, for debugging it could come handy to record those messages (that
> tend to flood logs) separately for reference.
True, but for a basic understanding of what happened, it could be good
to just use INFO level.  Right now, I think you're doing developer
level debugging.  ;-)
> > Thanks.  There's now a patch in my queue for this.  OK to add this
tag
> > to the commit?
> > 
> > Reported-by: Ulrich Sibiller <ulrich.sibiller at atos.net>  
> 
> Yes ;-)
OK.
> >>> past?  Please feel free to open a bug to remind me to look at
this.
> 
> Done: https://bugzilla.samba.org/show_bug.cgi?id=15320
Thanks.
> I am not really sure if that is necessary. The code in 10.interface
> that kills the connections uses this from /etc/ctdb/functions:
> get_tcp_connections_for_ip ()
> {
>     _ip="$1"
> 
>     ss -tn state established "src [$_ip]" | awk 'NR > 1
{print $3, $4}'
> }
> 
> which ignores the port and thus matches all connections for the ip anyway.
On the other hand there's
> 
>        update_tickles 2049
> in /etc/ctdb/events/legacy/60.nfs without a corresponding tickle
> handling for lockd connections. I am thinking about adding an
> update_tickles 599 for lockd connections (what's the best way to
> determine that port?). Any objections?
We are in violent agreement!  :-)

The main problem is the code in 10.interface releaseip is only run when
the "releasing" node does not crash hard.  So, we need another method.

The port number is tricky because we need to support both kernel NFS
and NFS Ganesha.  We could add it to the call-out... but I'm not sure
it would be easy.  Probably better via a configuration variable.

I just rebased this old 2015 branch for the first time in ~6 years:

 
https://git.samba.org/?p=martins/samba.git;a=shortlog;h=refs/heads/ctdb-connections

I think it would help.

I can't remember exactly why I abandoned it.  I think it might have
been because I was going to work on the dedicated connection tracking
code, but that seems to have started a couple of years later.  Or
perhaps Amitay pointed out a valid flaw that I no longer remember.  I
will have to take a more careful look.
> One further observation I had: For tests I had temporarily switched
> from IPAllocAlgorithm=2 to 1. While it worked it also resulted in a
> much worse distribution of the IPs. Some nodes had two IPs in the
> same network while others had none in that network. In that setup I
> found identical tickles being sent up to three times on TAKEIP. Is
> this on purpose? See attached logfile for details.
Yep, that's why the algorithm known as LCP? (or LCP2 or #2) exists.
While it doesn't know about different networks, it contains a very
clever heuristic (thought of by Ronnie Sahlberg) to try to spread IP
addresses from different networks.  With the speed optimisations it has
received over the years, it doesn't seem sane to use anything else.
Perhaps some time it can be improved to encourage certain addresses to
be more likely on certain nodes - not sure.

The 3 identical tickle ACKs are standard and should not differ between
algorithms.

I think we're getting somewhere...  :-)

peace & happiness,
martin

samba - Mar 2023 - ctdb tcp kill: remaining connections

[Samba] ctdb tcp kill: remaining connections

[Samba] ctdb tcp kill: remaining connections