Etienne Dechamps
2015-May-16 15:53 UTC
"Invalid KEX record length" during SPTPS key regeneration and related issues
Hi, I'm currently trying to troubleshoot what appears to be a very subtle bug (most likely a race condition) in SPTPS that causes state to become corrupted during SPTPS key regeneration. The tinc version currently deployed to my production nodes is git 7ac5263, which is somewhat old (2014-09-06), but I think this is still relevant because the affected code paths haven't really changed since. The only difference is that since 2e7f68a this will not make the entire tinc process crash, but it will most likely still make nodes unreachable. The reason why I'm not testing with a newer version is because it happens quite rarely (say, once per week and not on all nodes) and is very difficult to reproduce, therefore I don't have a lot of flexibility here. I have to jump in with what I got. Anyway, the main symptom is that tinc *sometimes* (rarely) crashes with the error message "Invalid KEX record length" during its hourly SPTPS key regeneration. I managed to catch it in a long-running debug tincd running under gdb. Here's the complete log, with some gdb investigation at the end: http://pastebin.com/H3qCCAxy Here's where I got so far in my investigation. Apparently, the reason why tinc is complaining about the invalid length seems to be that tinc is expecting to receive a KEX message (65 bytes) but is actually receiving a SIG message (64 bytes). Looking at the log, I managed to reconstruct the following sequence of events: * -> KEY_CHANGED ake_kobol -> KEX sharuruzure_izure -> KEX sharuruzure_sandy -> KEX ake_kobol <- KEX ake_kobol -> SIG ake_kobol <- SIG ake_kobol -> ACK sharuruzure_sandy <- KEX sharuruzure_sandy -> SIG sharuruzure_sandy <- SIG sharuruzure_sandy -> ACK sharuruzure_izure <- SIG = CRASH (expected KEX) Unfortunately, due to unrelated issues, I don't have any logs from sharuruzure_izure that I can study. I believe there is a design flaw in the way SPTPS key regeneration works, because upon reception of the KEX message the other nodes will send both KEX and SIG messages at the same time. However, the node expects SIG to arrive after KEX. Therefore, there is an implicit assumption that messages won't arrive out of order. tinc makes no such guarantee, even over TCP metaconnections, because there is no guarantee the two messages will travel along the same path (consider the case where there is a change in the graph while the KEX and SIG messages are traveling). In fact, messages can even be lost if a node responsible for forwarding them crashed before being able to do so. This is not so much of an issue for initial SPTPS negotiation because the handshake is restarted after a 10-second timeout, but there is no such timeout for key regeneration, which means that if there is some kind of problem (lost or out-of-order messages), things will get stuck and the node will become unreachable until the next key regeneration (1 hour by default!). The legacy protocol doesn't have that problem because KEY_CHANGED is a broadcast message - meaning it can't really get lost. It is not clear to me whether the issue shown in the log above is actually caused by a KEX message from sharuruzure_izure being lost or reordered. I don't see why that would have happened, because there was no change in the graph while that was happening. Another possibility is that maybe sharuruzure_izure's state was corrupted all along and it was actually stuck in SPTPS_KEX state (instead of SPTPS_SECONDARY_KEX which is the normal state for an established tunnel). If it was indeed stuck in that state, then instead of sending back KEX and SIG messages, it would only have sent SIG, resulting in the crash. However it is not clear to me which sequence of events could result in a node getting stuck in SPTPS_KEX for a long period of time. I believe there is yet another, more benign issue with key regeneration as well: during the short window of time where it happens, the tunnel is unusable, and packets get lost. Key regeneration takes 2 round trips over the network, which can easily result in 300+ ms outages on high latency links. With these issues in mind, I wonder if it's really worth trying to patch the current key regeneration protocol - maybe we should simply come up with a new one. How about simply terminating the current SPTPS channel and creating a new one? That would remove the need for a key regeneration protocol altogether, since it's just creating and terminating SPTPS channels. In fact, req_key_ext_h(REQ_KEY) is already smart enough to restart SPTPS if there's already a channel. Furthermore, if we allow the old and new channels to overlap for a short period of time, we can prevent packet loss during regeneration.
Guus Sliepen
2015-May-16 18:36 UTC
"Invalid KEX record length" during SPTPS key regeneration and related issues
On Sat, May 16, 2015 at 04:53:33PM +0100, Etienne Dechamps wrote:> I believe there is a design flaw in the way SPTPS key regeneration > works, because upon reception of the KEX message the other nodes will > send both KEX and SIG messages at the same time. However, the node > expects SIG to arrive after KEX. Therefore, there is an implicit > assumption that messages won't arrive out of order. tinc makes no such > guarantee, even over TCP metaconnections, because there is no > guarantee the two messages will travel along the same path (consider > the case where there is a change in the graph while the KEX and SIG > messages are traveling). In fact, messages can even be lost if a node > responsible for forwarding them crashed before being able to do so.You are right. The main issue with the SPTPS datagram protocol is that it actually doesn't handle any packet loss or reordering during authentication and key regeneration. I will add this, so it will be able to run completely over UDP. One reason it is currently using TCP is that for UDP packet reception, tinc requires an established session so it can very the source of the packets, so the initial handshake has to be done via the metaprotocol. And then I got lazy.> This is not so much of an issue for initial SPTPS negotiation because > the handshake is restarted after a 10-second timeout, but there is no > such timeout for key regeneration,Indeed, such a timeout should be added.> The legacy protocol doesn't have that problem because KEY_CHANGED is a > broadcast message - meaning it can't really get lost.Actually, it can just as well, although it is very unlikely to happen that a broadcast message can get lost, and even less likely that this happens right when a KEY_CHANGED message gets sent.> I believe there is yet another, more benign issue with key > regeneration as well: during the short window of time where it > happens, the tunnel is unusable, and packets get lost. Key > regeneration takes 2 round trips over the network, which can easily > result in 300+ ms outages on high latency links.Yes, I will fix that. If the secondary KEX is done properly over UDP, then it is quite trivial to have it switch over to a new key without losing packets if there is no reordering going on.> With these issues in mind, I wonder if it's really worth trying to > patch the current key regeneration protocol - maybe we should simply > come up with a new one. How about simply terminating the current SPTPS > channel and creating a new one? That would remove the need for a key > regeneration protocol altogether, since it's just creating and > terminating SPTPS channels. In fact, req_key_ext_h(REQ_KEY) is already > smart enough to restart SPTPS if there's already a channel. > Furthermore, if we allow the old and new channels to overlap for a > short period of time, we can prevent packet loss during regeneration.I think this just shifts the complexity from sptps.c to net_packet.c. So I'd rather fix SPTPS. -- Met vriendelijke groet / with kind regards, Guus Sliepen <guus at tinc-vpn.org> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://www.tinc-vpn.org/pipermail/tinc-devel/attachments/20150516/3cb2011b/attachment.sig>
Etienne Dechamps
2015-May-17 18:46 UTC
"Invalid KEX record length" during SPTPS key regeneration and related issues
I sent you a pull request that addresses the general issue, at least for the short term: https://github.com/gsliepen/tinc/pull/83 On 16 May 2015 at 19:36, Guus Sliepen <guus at tinc-vpn.org> wrote:> On Sat, May 16, 2015 at 04:53:33PM +0100, Etienne Dechamps wrote: > >> I believe there is a design flaw in the way SPTPS key regeneration >> works, because upon reception of the KEX message the other nodes will >> send both KEX and SIG messages at the same time. However, the node >> expects SIG to arrive after KEX. Therefore, there is an implicit >> assumption that messages won't arrive out of order. tinc makes no such >> guarantee, even over TCP metaconnections, because there is no >> guarantee the two messages will travel along the same path (consider >> the case where there is a change in the graph while the KEX and SIG >> messages are traveling). In fact, messages can even be lost if a node >> responsible for forwarding them crashed before being able to do so. > > You are right. The main issue with the SPTPS datagram protocol is that > it actually doesn't handle any packet loss or reordering during > authentication and key regeneration. I will add this, so it will be able > to run completely over UDP.Well, actually... in the pull request above I "solved" this problem simply by doing "if anything weird comes in, just restart the whole thing". I wonder if this solution could simply be used as the final long term solution. Indeed, SPTPS already knows how to handle lost/reordered *data* packets, and handshake packets are only exchanged rarely (hourly be default), therefore it seems like restarting the whole thing on a bad handshake is not that expensive. In fact, I suspect that with this patch, we might be able to let handshake packets go over UDP and it would run just fine, although I don't really see why that would be useful in practice. It's also more reliable since it can potentially address *any* issue with the SPTPS tunnel, possibly including ones we don't even know of yet. And the code is much simpler than trying to implement full-blown packet loss/reorder detection in SPTPS code, I think.>> This is not so much of an issue for initial SPTPS negotiation because >> the handshake is restarted after a 10-second timeout, but there is no >> such timeout for key regeneration, > > Indeed, such a timeout should be added.I don't think the timeout is needed with my patch applied, because if key regeneration "silently" fails (i.e. the initial KEX packet gets dropped) then the other node will continue sending packets with the wrong key, and therefore trigger a SPTPS failure followed by a clean restart. It would actually recover faster than using a timeout :)>> The legacy protocol doesn't have that problem because KEY_CHANGED is a >> broadcast message - meaning it can't really get lost. > > Actually, it can just as well, although it is very unlikely to happen > that a broadcast message can get lost, and even less likely that this > happens right when a KEY_CHANGED message gets sent.That's interesting, can you explain how you see a broadcast message getting lost? The only way I can see that happening is if a complete network split happens, but then the nodes will notice that and mark the "other side" as unreachable, and at that point it doesn't really matter that the message got lost.
Reasonably Related Threads
- "Invalid KEX record length" during SPTPS key regeneration and related issues
- "Invalid KEX record length" during SPTPS key regeneration and related issues
- "Invalid KEX record length" during SPTPS key regeneration and related issues
- Seeing: "Got REQ_KEY from XXX while we already started a SPTPS session!"
- Some questions about SPTPS