Etienne Dechamps
2015-May-17 18:46 UTC
"Invalid KEX record length" during SPTPS key regeneration and related issues
I sent you a pull request that addresses the general issue, at least for the short term: https://github.com/gsliepen/tinc/pull/83 On 16 May 2015 at 19:36, Guus Sliepen <guus at tinc-vpn.org> wrote:> On Sat, May 16, 2015 at 04:53:33PM +0100, Etienne Dechamps wrote: > >> I believe there is a design flaw in the way SPTPS key regeneration >> works, because upon reception of the KEX message the other nodes will >> send both KEX and SIG messages at the same time. However, the node >> expects SIG to arrive after KEX. Therefore, there is an implicit >> assumption that messages won't arrive out of order. tinc makes no such >> guarantee, even over TCP metaconnections, because there is no >> guarantee the two messages will travel along the same path (consider >> the case where there is a change in the graph while the KEX and SIG >> messages are traveling). In fact, messages can even be lost if a node >> responsible for forwarding them crashed before being able to do so. > > You are right. The main issue with the SPTPS datagram protocol is that > it actually doesn't handle any packet loss or reordering during > authentication and key regeneration. I will add this, so it will be able > to run completely over UDP.Well, actually... in the pull request above I "solved" this problem simply by doing "if anything weird comes in, just restart the whole thing". I wonder if this solution could simply be used as the final long term solution. Indeed, SPTPS already knows how to handle lost/reordered *data* packets, and handshake packets are only exchanged rarely (hourly be default), therefore it seems like restarting the whole thing on a bad handshake is not that expensive. In fact, I suspect that with this patch, we might be able to let handshake packets go over UDP and it would run just fine, although I don't really see why that would be useful in practice. It's also more reliable since it can potentially address *any* issue with the SPTPS tunnel, possibly including ones we don't even know of yet. And the code is much simpler than trying to implement full-blown packet loss/reorder detection in SPTPS code, I think.>> This is not so much of an issue for initial SPTPS negotiation because >> the handshake is restarted after a 10-second timeout, but there is no >> such timeout for key regeneration, > > Indeed, such a timeout should be added.I don't think the timeout is needed with my patch applied, because if key regeneration "silently" fails (i.e. the initial KEX packet gets dropped) then the other node will continue sending packets with the wrong key, and therefore trigger a SPTPS failure followed by a clean restart. It would actually recover faster than using a timeout :)>> The legacy protocol doesn't have that problem because KEY_CHANGED is a >> broadcast message - meaning it can't really get lost. > > Actually, it can just as well, although it is very unlikely to happen > that a broadcast message can get lost, and even less likely that this > happens right when a KEY_CHANGED message gets sent.That's interesting, can you explain how you see a broadcast message getting lost? The only way I can see that happening is if a complete network split happens, but then the nodes will notice that and mark the "other side" as unreachable, and at that point it doesn't really matter that the message got lost.
Guus Sliepen
2015-May-17 20:16 UTC
"Invalid KEX record length" during SPTPS key regeneration and related issues
On Sun, May 17, 2015 at 07:46:45PM +0100, Etienne Dechamps wrote:> I sent you a pull request that addresses the general issue, at least > for the short term: https://github.com/gsliepen/tinc/pull/83Merged.> > You are right. The main issue with the SPTPS datagram protocol is that > > it actually doesn't handle any packet loss or reordering during > > authentication and key regeneration. I will add this, so it will be able > > to run completely over UDP. > > Well, actually... in the pull request above I "solved" this problem > simply by doing "if anything weird comes in, just restart the whole > thing". I wonder if this solution could simply be used as the final > long term solution. Indeed, SPTPS already knows how to handle > lost/reordered *data* packets, and handshake packets are only > exchanged rarely (hourly be default), therefore it seems like > restarting the whole thing on a bad handshake is not that expensive.That's true. But I'll try to fix SPTPS anyway.> In fact, I suspect that with this patch, we might be able to let > handshake packets go over UDP and it would run just fine, although I > don't really see why that would be useful in practice.Currently tinc still relies on UDP packets being properly authenticated, so it would need some extra work to allow the initial handshake to be over UDP. It would be nice to have it at some point (after 1.1.0), to prepare for the possibility of running tinc completely over UDP.> It's also more reliable since it can potentially address *any* issue > with the SPTPS tunnel, possibly including ones we don't even know of > yet. And the code is much simpler than trying to implement full-blown > packet loss/reorder detection in SPTPS code, I think.Yes, having the ability to restart after getting stuck is certainly desirable.> >> The legacy protocol doesn't have that problem because KEY_CHANGED is a > >> broadcast message - meaning it can't really get lost. > > > > Actually, it can just as well, although it is very unlikely to happen > > that a broadcast message can get lost, and even less likely that this > > happens right when a KEY_CHANGED message gets sent. > > That's interesting, can you explain how you see a broadcast message > getting lost?When a node has a single metaconnection that has stopped working, but it hasn't detected that yet, and when before it detects that it has failed, has made a second metaconnection. Then, as far as the VPN is concerned, that node has never been unreachable, but a KEY_CHANGED message sent during that small window will be lost. -- Met vriendelijke groet / with kind regards, Guus Sliepen <guus at tinc-vpn.org> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://www.tinc-vpn.org/pipermail/tinc-devel/attachments/20150517/59cc8a23/attachment.sig>
Etienne Dechamps
2015-May-17 20:26 UTC
"Invalid KEX record length" during SPTPS key regeneration and related issues
On 17 May 2015 at 21:16, Guus Sliepen <guus at tinc-vpn.org> wrote:>> >> The legacy protocol doesn't have that problem because KEY_CHANGED is a >> >> broadcast message - meaning it can't really get lost. >> > >> > Actually, it can just as well, although it is very unlikely to happen >> > that a broadcast message can get lost, and even less likely that this >> > happens right when a KEY_CHANGED message gets sent. >> >> That's interesting, can you explain how you see a broadcast message >> getting lost? > > When a node has a single metaconnection that has stopped working, but it > hasn't detected that yet, and when before it detects that it has failed, > has made a second metaconnection. Then, as far as the VPN is concerned, > that node has never been unreachable, but a KEY_CHANGED message sent > during that small window will be lost.Ah, indeed. Thanks for that clarification.
Apparently Analagous Threads
- "Invalid KEX record length" during SPTPS key regeneration and related issues
- "Invalid KEX record length" during SPTPS key regeneration and related issues
- "Invalid KEX record length" during SPTPS key regeneration and related issues
- Some questions about SPTPS
- Seeing: "Got REQ_KEY from XXX while we already started a SPTPS session!"