thr3ads.net - tinc devel - "Invalid KEX record length" during SPTPS key regeneration and related issues [May 2015]

If this information is useful, please help other people find it:
Share via:

Etienne Dechamps

2015-May-16 15:53 UTC

"Invalid KEX record length" during SPTPS key regeneration and related issues

Hi,

I'm currently trying to troubleshoot what appears to be a very subtle
bug (most likely a race condition) in SPTPS that causes state to
become corrupted during SPTPS key regeneration.

The tinc version currently deployed to my production nodes is git
7ac5263, which is somewhat old (2014-09-06), but I think this is still
relevant because the affected code paths haven't really changed since.
The only difference is that since 2e7f68a this will not make the
entire tinc process crash, but it will most likely still make nodes
unreachable. The reason why I'm not testing with a newer version is
because it happens quite rarely (say, once per week and not on all
nodes) and is very difficult to reproduce, therefore I don't have a
lot of flexibility here. I have to jump in with what I got.

Anyway, the main symptom is that tinc *sometimes* (rarely) crashes
with the error message "Invalid KEX record length" during its hourly
SPTPS key regeneration. I managed to catch it in a long-running debug
tincd running under gdb. Here's the complete log, with some gdb
investigation at the end: http://pastebin.com/H3qCCAxy

Here's where I got so far in my investigation. Apparently, the reason
why tinc is complaining about the invalid length seems to be that tinc
is expecting to receive a KEX message (65 bytes) but is actually
receiving a SIG message (64 bytes). Looking at the log, I managed to
reconstruct the following sequence of events:

* -> KEY_CHANGED
ake_kobol -> KEX
sharuruzure_izure -> KEX
sharuruzure_sandy -> KEX
ake_kobol <- KEX
ake_kobol -> SIG
ake_kobol <- SIG
ake_kobol -> ACK
sharuruzure_sandy <- KEX
sharuruzure_sandy -> SIG
sharuruzure_sandy <- SIG
sharuruzure_sandy -> ACK
sharuruzure_izure <- SIG = CRASH (expected KEX)

Unfortunately, due to unrelated issues, I don't have any logs from
sharuruzure_izure that I can study.

I believe there is a design flaw in the way SPTPS key regeneration
works, because upon reception of the KEX message the other nodes will
send both KEX and SIG messages at the same time. However, the node
expects SIG to arrive after KEX. Therefore, there is an implicit
assumption that messages won't arrive out of order. tinc makes no such
guarantee, even over TCP metaconnections, because there is no
guarantee the two messages will travel along the same path (consider
the case where there is a change in the graph while the KEX and SIG
messages are traveling). In fact, messages can even be lost if a node
responsible for forwarding them crashed before being able to do so.

This is not so much of an issue for initial SPTPS negotiation because
the handshake is restarted after a 10-second timeout, but there is no
such timeout for key regeneration, which means that if there is some
kind of problem (lost or out-of-order messages), things will get stuck
and the node will become unreachable until the next key regeneration
(1 hour by default!). The legacy protocol doesn't have that problem
because KEY_CHANGED is a broadcast message - meaning it can't really
get lost.

It is not clear to me whether the issue shown in the log above is
actually caused by a KEX message from sharuruzure_izure being lost or
reordered. I don't see why that would have happened, because there was
no change in the graph while that was happening.

Another possibility is that maybe sharuruzure_izure's state was
corrupted all along and it was actually stuck in SPTPS_KEX state
(instead of SPTPS_SECONDARY_KEX which is the normal state for an
established tunnel). If it was indeed stuck in that state, then
instead of sending back KEX and SIG messages, it would only have sent
SIG, resulting in the crash. However it is not clear to me which
sequence of events could result in a node getting stuck in SPTPS_KEX
for a long period of time.

I believe there is yet another, more benign issue with key
regeneration as well: during the short window of time where it
happens, the tunnel is unusable, and packets get lost. Key
regeneration takes 2 round trips over the network, which can easily
result in 300+ ms outages on high latency links.

With these issues in mind, I wonder if it's really worth trying to
patch the current key regeneration protocol - maybe we should simply
come up with a new one. How about simply terminating the current SPTPS
channel and creating a new one? That would remove the need for a key
regeneration protocol altogether, since it's just creating and
terminating SPTPS channels. In fact, req_key_ext_h(REQ_KEY) is already
smart enough to restart SPTPS if there's already a channel.
Furthermore, if we allow the old and new channels to overlap for a
short period of time, we can prevent packet loss during regeneration.

Guus Sliepen

2015-May-16 18:36 UTC

head link

"Invalid KEX record length" during SPTPS key regeneration and related issues

On Sat, May 16, 2015 at 04:53:33PM +0100, Etienne Dechamps wrote:
> I believe there is a design flaw in the way SPTPS key regeneration
> works, because upon reception of the KEX message the other nodes will
> send both KEX and SIG messages at the same time. However, the node
> expects SIG to arrive after KEX. Therefore, there is an implicit
> assumption that messages won't arrive out of order. tinc makes no such
> guarantee, even over TCP metaconnections, because there is no
> guarantee the two messages will travel along the same path (consider
> the case where there is a change in the graph while the KEX and SIG
> messages are traveling). In fact, messages can even be lost if a node
> responsible for forwarding them crashed before being able to do so.
You are right. The main issue with the SPTPS datagram protocol is that
it actually doesn't handle any packet loss or reordering during
authentication and key regeneration. I will add this, so it will be able
to run completely over UDP.

One reason it is currently using TCP is that for UDP packet reception,
tinc requires an established session so it can very the source of the
packets, so the initial handshake has to be done via the metaprotocol.
And then I got lazy.
> This is not so much of an issue for initial SPTPS negotiation because
> the handshake is restarted after a 10-second timeout, but there is no
> such timeout for key regeneration,
Indeed, such a timeout should be added.
> The legacy protocol doesn't have that problem because KEY_CHANGED is a
> broadcast message - meaning it can't really get lost.
Actually, it can just as well, although it is very unlikely to happen
that a broadcast message can get lost, and even less likely that this
happens right when a KEY_CHANGED message gets sent.
> I believe there is yet another, more benign issue with key
> regeneration as well: during the short window of time where it
> happens, the tunnel is unusable, and packets get lost. Key
> regeneration takes 2 round trips over the network, which can easily
> result in 300+ ms outages on high latency links.
Yes, I will fix that. If the secondary KEX is done properly over UDP,
then it is quite trivial to have it switch over to a new key without
losing packets if there is no reordering going on.
> With these issues in mind, I wonder if it's really worth trying to
> patch the current key regeneration protocol - maybe we should simply
> come up with a new one. How about simply terminating the current SPTPS
> channel and creating a new one? That would remove the need for a key
> regeneration protocol altogether, since it's just creating and
> terminating SPTPS channels. In fact, req_key_ext_h(REQ_KEY) is already
> smart enough to restart SPTPS if there's already a channel.
> Furthermore, if we allow the old and new channels to overlap for a
> short period of time, we can prevent packet loss during regeneration.
I think this just shifts the complexity from sptps.c to net_packet.c. So
I'd rather fix SPTPS.

-- 
Met vriendelijke groet / with kind regards,
     Guus Sliepen <guus at tinc-vpn.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL:
<http://www.tinc-vpn.org/pipermail/tinc-devel/attachments/20150516/3cb2011b/attachment.sig>

Etienne Dechamps

2015-May-17 18:46 UTC

head link

"Invalid KEX record length" during SPTPS key regeneration and related issues

I sent you a pull request that addresses the general issue, at least
for the short term: https://github.com/gsliepen/tinc/pull/83

On 16 May 2015 at 19:36, Guus Sliepen <guus at tinc-vpn.org>
wrote:> On Sat, May 16, 2015 at 04:53:33PM +0100, Etienne Dechamps wrote:
>
>> I believe there is a design flaw in the way SPTPS key regeneration
>> works, because upon reception of the KEX message the other nodes will
>> send both KEX and SIG messages at the same time. However, the node
>> expects SIG to arrive after KEX. Therefore, there is an implicit
>> assumption that messages won't arrive out of order. tinc makes no
such
>> guarantee, even over TCP metaconnections, because there is no
>> guarantee the two messages will travel along the same path (consider
>> the case where there is a change in the graph while the KEX and SIG
>> messages are traveling). In fact, messages can even be lost if a node
>> responsible for forwarding them crashed before being able to do so.
>
> You are right. The main issue with the SPTPS datagram protocol is that
> it actually doesn't handle any packet loss or reordering during
> authentication and key regeneration. I will add this, so it will be able
> to run completely over UDP.
Well, actually... in the pull request above I "solved" this problem
simply by doing "if anything weird comes in, just restart the whole
thing". I wonder if this solution could simply be used as the final
long term solution. Indeed, SPTPS already knows how to handle
lost/reordered *data* packets, and handshake packets are only
exchanged rarely (hourly be default), therefore it seems like
restarting the whole thing on a bad handshake is not that expensive.

In fact, I suspect that with this patch, we might be able to let
handshake packets go over UDP and it would run just fine, although I
don't really see why that would be useful in practice.

It's also more reliable since it can potentially address *any* issue
with the SPTPS tunnel, possibly including ones we don't even know of
yet. And the code is much simpler than trying to implement full-blown
packet loss/reorder detection in SPTPS code, I think.
>> This is not so much of an issue for initial SPTPS negotiation because
>> the handshake is restarted after a 10-second timeout, but there is no
>> such timeout for key regeneration,
>
> Indeed, such a timeout should be added.
I don't think the timeout is needed with my patch applied, because if
key regeneration "silently" fails (i.e. the initial KEX packet gets
dropped) then the other node will continue sending packets with the
wrong key, and therefore trigger a SPTPS failure followed by a clean
restart. It would actually recover faster than using a timeout :)
>> The legacy protocol doesn't have that problem because KEY_CHANGED
is a
>> broadcast message - meaning it can't really get lost.
>
> Actually, it can just as well, although it is very unlikely to happen
> that a broadcast message can get lost, and even less likely that this
> happens right when a KEY_CHANGED message gets sent.
That's interesting, can you explain how you see a broadcast message
getting lost? The only way I can see that happening is if a complete
network split happens, but then the nodes will notice that and mark
the "other side" as unreachable, and at that point it doesn't
really
matter that the message got lost.

Possibly Parallel Threads

Search for more seemingly similar threads

tinc devel - May 2015 - "Invalid KEX record length" during SPTPS key regeneration and related issues

"Invalid KEX record length" during SPTPS key regeneration and related issues

"Invalid KEX record length" during SPTPS key regeneration and related issues

"Invalid KEX record length" during SPTPS key regeneration and related issues

Possibly Parallel Threads