Etienne Dechamps
2014-Sep-28 17:29 UTC
Proposals for UDP information transport over the metagraph
While working on SPTPS UDP relaying I realized that there is one issue I didn't account for, which is that the sending node only knows the PMTU to the first relay node. It doesn't know the PMTU of the entire relay path beyond the first hop, because the relay nodes don't provide their own PMTU information over the metaprotocol. Now, in the legacy protocol this is not really an issue, because TCP MSS clamping (which is really what matters in practice) is applied on a hop-per-hop basis, i.e. each hop clamps the MSS according to the PMTU information it has. Unfortunately, that doesn't work for SPTPS because relay nodes can't decrypt the contents of the packets that pass through them. This can result in relays falling back to TCP in the middle of a relay path if part of the relay path has a smaller PMTU than the first edge. Therefore it appears we need a way to figure out, for a specific node, what the "indirect PMTU" (i.e. PMTU when relaying) is. This information would presumably be stored in an "indirect_mtu" field (or something like that) in node_t, and maintained separately from the "mtu" field which is set by the normal discovery process. Interestingly, this piece of information could perhaps also be used to kickstart PMTU discovery, using the indirect PMTU as a "hint" to constrain the initial range and thereby making it converge faster on the actual PMTU. Note that the dynamic UDP address information (currently transmitted via ANS_KEY messages) poses similar issues, and therefore these proposals can be applied to UDP address data as well with the same pros and cons. In the end I expect us to adopt the same proposal for both PMTU and address information. Arguably, having a robust dynamic way to transmit fresh UDP address information could make edge addresses obsolete (except of course in the last proposal below) since the address information can be obtained on-the-fly and TCP used in the mean time. I came up with a few proposals that are described below. I'm currently leaning towards the "periodic unicast info messages" proposal. Comments welcome. PROPOSAL: ADD PMTU INFORMATION TO ANS_KEY ANS_KEY messages are already used for opportunistically transmitting UDP address information. We can use the same mechanism to transmit PMTU information as well. Pros: Extremely simple. High-quality information that takes into account perspective from intermediate nodes to provide the best possible bet for UDP address and PMTU data. Cons: PMTU is a dynamic piece of data that is subject to real-time changes. ANS_KEY messages, on the other hand, are sent quite rarely. This can result in PMTU information quickly becoming stale, and the mechanism is dangerously dependent on the precise timing of the very first ANS_KEY message (i.e. whether it is sent before or after PMTU is discovered). PROPOSAL: PUSH-STYLE BROADCAST MESSAGES This proposal consists in adding a new type of broadcast message whole sole purpose would be to provide the PMTU between two nodes. This message would be sent to all nodes every time PMTU information changes. So, for example, if node A discovers its PMTU to node B is X, then it would broadcast PMTU(A, B) = X to all nodes. Each node would then keep track of these messages using a dedicated data structure containing all known PMTU pairs. Every time a metaconnection is established, nodes would dump the contents of this data structure to each other. Pros: high-freshness, immediately available UDP information via low-latency push notifications. High-quality information that takes into account perspective from intermediate nodes to provide the best possible bet for UDP address and PMTU data. Cons: questionable scalability due to the use of broadcast messages and the need for each node to keep global state about not just nodes (which it already does), but about node *pairs* as well. An extreme example would be a 1000-node graph on which someone decides to run Bittorrent, thereby making all nodes discover PMTU to each other: this would result in O(1000000) messages getting sent and the in-memory data structure would grow to at least 17 megabytes. A back-of-the envelope calculation suggests transmitting the contents of the data structure to a joining node would use ~30MB of network traffic. PROPOSAL: PERIODIC UNICAST INFO MESSAGES In this proposal, PMTU information would be exchanged only between nodes that are currently sending packets to each other, and would be refreshed every few seconds or so. So for example, if node B is currently receiving packets from node A, it would send a small "UDP information message" directed at node A over the metaprotocol every few seconds. Contrary to the previous proposal, this message would be unicast, not broadcast. When B stops receiving packets from A, it stops sending UDP information messages to A. The original UDP information message would be empty, because B doesn't really know anything about its *own* PMTU or UDP address. Instead, what happens is that the nodes sitting between A and B would alter the message as it is transmitted, replacing the information in it with the node's own UDP information about node B. This is somewhat similar to how UDP address information is currently populated in ANS_KEY messages, with a small difference: if a node has confirmed working UDP information about node B, it will *erase* the information in the message as it retransmits it, even if the message already contains UDP information. The rationale is that UDP information is more relevant the closer it is to the packet sender (A). ANS_KEY currently doesn't behave that way, in that it will not replace UDP address information if it's already there (that might be an oversight, because I don't think this behavior makes much sense IMHO). Of course, this is mainly useful when B is receiving packets over TCP, because that means A might me missing information about how to reach B via UDP. However, it is also useful to send this message even while receiving UDP packets to inform A of any PMTU increases, and also to optimize intermediate relaying steps (the alternative would be to make the relays initiate information messages as opposed to the final recipient node, but that would result in more traffic for no apparent benefit). Pros: High-quality information that takes into account perspective from intermediate nodes to provide the best possible bet for UDP address and PMTU data. Good freshness. Cons: Not as fresh as "push" notifications (probably not that big a deal). More metaconnection traffic, though I think this is acceptable because the messages are unicast, small, rate-limited and are only exchanged between nodes that are actively exchanging packets. Presumably the traffic from the packets themselves dwarfs the traffic from the information messages. PROPOSAL: DYNAMIC EDGE MUTATIONS This proposal goes like this: - Add PMTU information to edges. Assuming node A has a direct edge to node B, then edge(A, B).mtu = node(B).mtu (from the perspective of node A). - Broadcast edge PMTU information in ADD_EDGE messages. - If the PMTU changes, then the edge would be updated and a new ADD_EDGE message would be broadcast again, thereby dynamically *mutating* the edge as the situation changes. (as a matter of fact, it turns out tinc already supports updating edges in that way, though it currently prints a warning when that happens) - sssp_bfs() would take PMTU into account while traversing the graph, and update each node with the indirect MTU. Indirect MTU would be defined as the minimum of all PMTUs across the edges that form the path to that node. Interestingly, this proposal is not perfectly optimal, because of the following situation with graph A <-> B <-> C <-> D: if PMTU(B, C) = X and PMTU(C, D) = X but PMTU(B, D) < X, then from the point of view of A, D.indirect_mtu will be X, which means that if A relays a packet to D through B, B won't be able to send it directly to D because the packet would be too large. That said, B won't have to fallback to TCP: instead it will still be able to send it via UDP, but it will have to relay it through C instead of sending it directly to D. Still suboptimal though, and it could become painful when dealing with weird network topologies. Pro: high-freshness UDP information via low-latency push notifications. Con: provides lower quality information compared to other proposals, because it is impossible to find out what information node A knows about node B if the two don't have an edge between each other (see example above). More metaconnection traffic (edge updates).
Guus Sliepen
2014-Oct-02 18:10 UTC
Proposals for UDP information transport over the metagraph
On Sun, Sep 28, 2014 at 07:29:49PM +0200, Etienne Dechamps wrote:> While working on SPTPS UDP relaying I realized that there is one issue > I didn't account for, which is that the sending node only knows the > PMTU to the first relay node. It doesn't know the PMTU of the entire > relay path beyond the first hop, because the relay nodes don't provide > their own PMTU information over the metaprotocol. > > Now, in the legacy protocol this is not really an issue, because TCP > MSS clamping (which is really what matters in practice) is applied on > a hop-per-hop basis,Actually, it's not only MSS clamping but also the generation of ICMP Fragmentation Needed/Message Too Big packets. But the effect is the same.> Therefore it appears we need a way to figure out, for a specific node, > what the "indirect PMTU" (i.e. PMTU when relaying) is. This > information would presumably be stored in an "indirect_mtu" field (or > something like that) in node_t, and maintained separately from the > "mtu" field which is set by the normal discovery process.I don't think we need another field for this. Instead, I think we should try to do end-to-end PMTU discovery. If SPTPS UDP relaying is implemented, this should be quite easy. This would also solve the freshness issue, as normal PMTU discovery already handles PMTU shrinking or growing. It could be optimised by having intermediate nodes send the equivalent of an ICMP Fragmentation Needed/Packet Too Big message back in case the next hop has a smaller PMTU than the packet that has to be relayed. Pros: no need to change/amend the meta or UDP protocol. Reuses the existing PMTU code. Same quality as the existing PMTU code. Cons: more probes are being sent than with the legacy protocol. What do you think? -- Met vriendelijke groet / with kind regards, Guus Sliepen <guus at tinc-vpn.org> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://www.tinc-vpn.org/pipermail/tinc-devel/attachments/20141002/7511ec43/attachment.sig>