thr3ads.net - tinc devel - "Invalid KEX record length" during SPTPS key regeneration and related issues [May 2015]

If this information is useful, please help other people find it:
Share via:

Etienne Dechamps

2015-May-17 18:46 UTC

"Invalid KEX record length" during SPTPS key regeneration and related issues

I sent you a pull request that addresses the general issue, at least
for the short term: https://github.com/gsliepen/tinc/pull/83

On 16 May 2015 at 19:36, Guus Sliepen <guus at tinc-vpn.org>
wrote:> On Sat, May 16, 2015 at 04:53:33PM +0100, Etienne Dechamps wrote:
>
>> I believe there is a design flaw in the way SPTPS key regeneration
>> works, because upon reception of the KEX message the other nodes will
>> send both KEX and SIG messages at the same time. However, the node
>> expects SIG to arrive after KEX. Therefore, there is an implicit
>> assumption that messages won't arrive out of order. tinc makes no
such
>> guarantee, even over TCP metaconnections, because there is no
>> guarantee the two messages will travel along the same path (consider
>> the case where there is a change in the graph while the KEX and SIG
>> messages are traveling). In fact, messages can even be lost if a node
>> responsible for forwarding them crashed before being able to do so.
>
> You are right. The main issue with the SPTPS datagram protocol is that
> it actually doesn't handle any packet loss or reordering during
> authentication and key regeneration. I will add this, so it will be able
> to run completely over UDP.
Well, actually... in the pull request above I "solved" this problem
simply by doing "if anything weird comes in, just restart the whole
thing". I wonder if this solution could simply be used as the final
long term solution. Indeed, SPTPS already knows how to handle
lost/reordered *data* packets, and handshake packets are only
exchanged rarely (hourly be default), therefore it seems like
restarting the whole thing on a bad handshake is not that expensive.

In fact, I suspect that with this patch, we might be able to let
handshake packets go over UDP and it would run just fine, although I
don't really see why that would be useful in practice.

It's also more reliable since it can potentially address *any* issue
with the SPTPS tunnel, possibly including ones we don't even know of
yet. And the code is much simpler than trying to implement full-blown
packet loss/reorder detection in SPTPS code, I think.
>> This is not so much of an issue for initial SPTPS negotiation because
>> the handshake is restarted after a 10-second timeout, but there is no
>> such timeout for key regeneration,
>
> Indeed, such a timeout should be added.
I don't think the timeout is needed with my patch applied, because if
key regeneration "silently" fails (i.e. the initial KEX packet gets
dropped) then the other node will continue sending packets with the
wrong key, and therefore trigger a SPTPS failure followed by a clean
restart. It would actually recover faster than using a timeout :)
>> The legacy protocol doesn't have that problem because KEY_CHANGED
is a
>> broadcast message - meaning it can't really get lost.
>
> Actually, it can just as well, although it is very unlikely to happen
> that a broadcast message can get lost, and even less likely that this
> happens right when a KEY_CHANGED message gets sent.
That's interesting, can you explain how you see a broadcast message
getting lost? The only way I can see that happening is if a complete
network split happens, but then the nodes will notice that and mark
the "other side" as unreachable, and at that point it doesn't
really
matter that the message got lost.

Guus Sliepen

2015-May-17 20:16 UTC

head link

"Invalid KEX record length" during SPTPS key regeneration and related issues

On Sun, May 17, 2015 at 07:46:45PM +0100, Etienne Dechamps wrote:
> I sent you a pull request that addresses the general issue, at least
> for the short term: https://github.com/gsliepen/tinc/pull/83
Merged.
> > You are right. The main issue with the SPTPS datagram protocol is that
> > it actually doesn't handle any packet loss or reordering during
> > authentication and key regeneration. I will add this, so it will be
able
> > to run completely over UDP.
> 
> Well, actually... in the pull request above I "solved" this
problem
> simply by doing "if anything weird comes in, just restart the whole
> thing". I wonder if this solution could simply be used as the final
> long term solution. Indeed, SPTPS already knows how to handle
> lost/reordered *data* packets, and handshake packets are only
> exchanged rarely (hourly be default), therefore it seems like
> restarting the whole thing on a bad handshake is not that expensive.
That's true. But I'll try to fix SPTPS anyway.
> In fact, I suspect that with this patch, we might be able to let
> handshake packets go over UDP and it would run just fine, although I
> don't really see why that would be useful in practice.
Currently tinc still relies on UDP packets being properly authenticated,
so it would need some extra work to allow the initial handshake to be
over UDP. It would be nice to have it at some point (after 1.1.0), to
prepare for the possibility of running tinc completely over UDP.
> It's also more reliable since it can potentially address *any* issue
> with the SPTPS tunnel, possibly including ones we don't even know of
> yet. And the code is much simpler than trying to implement full-blown
> packet loss/reorder detection in SPTPS code, I think.
Yes, having the ability to restart after getting stuck is certainly
desirable.
> >> The legacy protocol doesn't have that problem because
KEY_CHANGED is a
> >> broadcast message - meaning it can't really get lost.
> >
> > Actually, it can just as well, although it is very unlikely to happen
> > that a broadcast message can get lost, and even less likely that this
> > happens right when a KEY_CHANGED message gets sent.
> 
> That's interesting, can you explain how you see a broadcast message
> getting lost?
When a node has a single metaconnection that has stopped working, but it
hasn't detected that yet, and when before it detects that it has failed,
has made a second metaconnection. Then, as far as the VPN is concerned,
that node has never been unreachable, but a KEY_CHANGED message sent
during that small window will be lost.

-- 
Met vriendelijke groet / with kind regards,
     Guus Sliepen <guus at tinc-vpn.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL:
<http://www.tinc-vpn.org/pipermail/tinc-devel/attachments/20150517/59cc8a23/attachment.sig>

Etienne Dechamps

2015-May-17 20:26 UTC

head link

"Invalid KEX record length" during SPTPS key regeneration and related issues

On 17 May 2015 at 21:16, Guus Sliepen <guus at tinc-vpn.org>
wrote:>> >> The legacy protocol doesn't have that problem because
KEY_CHANGED is a
>> >> broadcast message - meaning it can't really get lost.
>> >
>> > Actually, it can just as well, although it is very unlikely to
happen
>> > that a broadcast message can get lost, and even less likely that
this
>> > happens right when a KEY_CHANGED message gets sent.
>>
>> That's interesting, can you explain how you see a broadcast message
>> getting lost?
>
> When a node has a single metaconnection that has stopped working, but it
> hasn't detected that yet, and when before it detects that it has
failed,
> has made a second metaconnection. Then, as far as the VPN is concerned,
> that node has never been unreachable, but a KEY_CHANGED message sent
> during that small window will be lost.
Ah, indeed. Thanks for that clarification.

Seemingly Similar Threads

Search for more possibly parallel threads

tinc devel - May 2015 - "Invalid KEX record length" during SPTPS key regeneration and related issues

"Invalid KEX record length" during SPTPS key regeneration and related issues

"Invalid KEX record length" during SPTPS key regeneration and related issues

"Invalid KEX record length" during SPTPS key regeneration and related issues

Seemingly Similar Threads