TL;DR: a proposal for a new tinc feature that allows nodes to filter
ADD_SUBNET messages based on the metaconnection on which they are
received, so that nodes can't impersonate each other's VPN Subnets.
Similar to StrictSubnets in spirit, but way more flexible.
BACKGROUND: THE ISSUE OF TRUST IN A TINC NETWORK
In terms of metaconnections (I'm not discussing data tunnels here),
one of the most interesting properties of tinc is that users are free
to come up with any topology for the node graph; this makes tinc
extremely flexible. For example, one can come up with a full mesh
graph where every node is connected to every other node, or a
completely centralized graph where all nodes have one connection to a
"central" node, or any hybrid combination of the two.
In my personal use case, I have a small number (currently two) of
redundant central nodes, and all other nodes are "clients" that
establish connections to all these central nodes. This makes the graph
easy to manage and change as the configuration is centralized (as well
as services like DNS), while still providing high reliability since
the network will continue to function even if central nodes go down,
as long as at least one remains.
However, one important limitation of current tinc is that "trust" is a
binary concept: if you're a node inside the graph, you're completely
trusted and can do pretty much anything you want (including
impersonating other nodes in control messages). That's not very
scalable - if you have a 1000-node graph (tinc design goal), an attack
from any one of these 1000 nodes can compromise the entire VPN, which
is a huge security risk. Possible threats of course include a node
getting compromised by outside attackers, but one should also consider
use cases where the legitimate owner of the node is not completely
trusted in the first place. Conceptually, the current situation could
be likened to an operating system that either give you root access, or
no access at all.
AVAILABLE SOLUTIONS FOR DEALING WITH NON-TRUSTED NODES
In this regard, tinc 1.1 improves on 1.0 by introducing end-to-end
security - relay nodes can't decrypt or forge SPTPS packets that pass
through them, assuming both endpoints did not get fed forged SPTPS
keys on the first exchange (which is easy to ensure if you care enough
about it).
However, SPTPS alone doesn't really solve anything on its own, because
it only ensures the physical channel is secure - it only cares about
data packets and node identities, not VPN addresses (i.e. internal IP
addresses if tinc is used in Router mode). Indeed there is nothing
stopping a rogue node from sending an ADD_SUBNET message with another
node's subnet, and (depending on priorities) tinc will happily send
all packets for that subnet to the rogue node. The result is a
laughably simple and effective inside MITM attack.
tinc network operators have a few ways of mitigating such an attack.
First of all, they could deploy active monitoring and alert when
"suspicious" subnets appear. However, that is complicated,
user-unfriendly and it's not bulletproof.
A more effective solution is to use the StrictSubnets option, which,
when combined with SPTPS, solves that problem directly and
comprehensively. However, there is a major downside: it operates at
the node level, which means that all nodes need to know (i.e. static
configuration, host files) about every other node (and their subnets)
in advance. This is because StrictSubnets basically means "ignore all
ADD_SUBNET messages and use static configuration instead"; if the
subnet list is modified (e.g. a new node is being set up), a node
won't "see" the new subnets unless its configuration is changed.
This
makes StrictSubnets completely impractical for large graphs, unless
one deploys custom systems that are able to seamlessly push
configuration changes to all existing nodes, which is a major
undertaking.
IMPROVING STRICTSUBNETS: A FLAWED APPROACH
I am aware that improving tinc security in the presence of non-trusted
nodes is one of the stated goals of tinc 2.0 (according to the Goals
page). However, tinc 2.0 is not going to materialize any time soon,
and I would like to propose an interim solution that's more flexible
than StrictSubnets.
Of course such an interim solution would be limited in scope to
authenticating VPN addresses, and would not address other issues with
non-trusted nodes, such as other potential security vulnerabilities
that can be exploited from inside the graph (e.g. bugs in control
message parsing, DoS opportunities). I do not envision such an interim
solution would stop a determined attacker, at least not at first. I'd
just like to improve on the current situation, addressing the
impossible choice between an unmanageable network (StrictSubnets) or
nodes being able to impersonate each other's subnets just by adding
one single line to a configuration file. In other words, this proposal
is less about dealing with "non-trusted nodes" and more about dealing
with "not fully trusted nodes".
In practical terms, with regard to my use case which I described
above, these "not fully trusted" nodes are my client nodes, as opposed
to my central nodes. Central nodes are fully trusted and are
authoritative with regard to which subnets belong to which nodes.
Ideally, a solution would only require me to ensure central nodes have
the correct list of subnets, while client nodes would not require any
configuration changes when that list changes.
The obvious way to enforce this is to introduce a way for these
central nodes to filter ADD_SUBNET messages that they receive, such
that they will not forward ADD_SUBNET messages that are deemed invalid
(i.e. forged). Oddly enough, enabling StrictSubnets on central nodes
is *not* enough to do this, because StrictSubnets only affects the
node itself - ADD_SUBNET messages are still unconditionally forwarded
(see add_subnet_h())! In practice that means that StrictSubnets will
only protect the central nodes themselves, it will not protect a
client node from attacking another client node.
One quick fix would be to change the behavior of StrictSubnets so that
it not only prevents invalid ADD_SUBNET messages from being processed,
but also prevents them from being forwarded, too - so that central
nodes (in my use case) act as a "shield" for the client nodes behind
them. That is indeed a possible solution, and I believe it would work.
However, with regard to flexibility and ease of management, I believe
we can do better.
See, the thing that irritates me about StrictSubnets is that it is
node-based, not metaconnection-based. For my use case that makes no
difference, but in other use cases, that makes StrictSubnets
needlessly complicated. Indeed, even with the fix I just proposed, any
node that uses StrictSubnets would still need to know about every node
and every subnet in the graph. Which means that the only graph
topology in which this is a scalable solution is a topology in which
there are only two "layers" of nodes - "authoritative" nodes
(i.e.
"God" nodes that know about everything and use StrictSubnets - central
nodes in my use case) and "non-authoritative" nodes (i.e. the nodes
that need to be protected from each other - client nodes in my use
case). It does not allow for more complex topologies such as some
nodes being authoritative for some subnets but not others.
Let's take a practical example: let's say that another group of people
decides to build a tinc network in a similar topology as mine (i.e.
client nodes and central nodes). Now, let's say that this network and
mine decide to join forces and establish metaconnections between my
central nodes and theirs (e.g. company merger), leveraging the amazing
flexibility that tinc allows for graph topology. However, we would
still like to manage our parts of the graph semi-independently, so my
client nodes would still only establish metaconnections to my central
nodes, and similarly for their client nodes; we would still use a
different subnet space; and we would both manage our subnet lists
independently.
The proposed StrictSubnets behavior doesn't allow for the above use
case. My central nodes (and theirs) either know everything, or they
know nothing. I would need to keep my central nodes in sync with the
subnet lists of *their* part of the graph, which limits scalability.
It's easy to imagine that manager a very large tinc network that way
could quickly become a nightmare because there is no way to
effectively "partition" the network configuration while preserving
subnet security.
METACONNECTION-BASED SUBNET FILTERING: A PROPOSAL
The solution to the above problem is simple: instead of using
node-based subnet filtering, we introduce connection-based subnet
filtering. The difference is that node-based filtering is implemented
as "don't add/forward subnets from that node if they don't match
these
conditions", while connection-based filtering is implemented as
"don't
add/forward subnets that are received *over the metaconnection from
that node* if they don't match these conditions".
Here's how it solves the aforementioned use case: my central nodes
would apply these rules to connections from client nodes, and it would
behave roughly the same way as the proposed StrictSubnets behavior as
far as directly connected client nodes are concerned. However, my
central nodes would not have to know about any client nodes from the
"other side": instead, I will simply configure my central nodes to
accept the entire subnet of the "other side" over the connections
between *my* central nodes and *their* central nodes.
Voil?: the other half of the network can manage their nodes and
subnets independently without having to talk to me, and vice-versa.
Subnet from client nodes are authenticated according to "my" rules,
and subnets from their client nodes are authenticated according to
"their" rules. Cherry on the cake: my part of the network is protected
to some extent against a *central* node of their network going rogue,
because it will not allow a central node from the "other side" from
impersonating my side's subnets.
In practice, one would want to introduce some kind of subnet ACL
system for full flexibility. Here's how it could look like in practice
on my central nodes:
/etc/tinc/my_network/hosts/client_node:
ConnectionSubnetACL = +10.42.42.42 # this client's assigned subnet
ConnectionSubnetACL = -ALL # deny everything else
/etc/tinc/my_network/hosts/other_central_node:
ConnectionSubnetACL = +ALL # trust everything from that node (could be
the default)
/etc/tinc/my_network/hosts/central_node_from_other_side:
ConnectionSubnetACL = +10.13.37.0/24 # the other side's subnet space
ConnectionSubnetACL = -ALL # deny everything else
And the resulting behavior:
- tinc will only accept and forward ADD_SUBNET messages received over
a direct connection to client_node if they refer to 10.42.42.42/32
(client nodes can't impersonate other nodes).
- tinc will accept and forward any ADD_SUBNET message received over a
direct connection to one of my other control nodes (they are within
the "fully trusted" ring)
- tinc will only accept and forward ADD_SUBNET messages received over
a direct connection to central_node_from_other_side if they refer to a
subnet contained within 10.13.37.0/24, so that one of "their" nodes
cannot impersonate one of "my" nodes.
(Of course, we can easily come up with other use cases for this. For
example, let's assume that two of my "client nodes" want to
establish
a "special" highly trusted connection to each other, where they
don't
even trust my central nodes when it comes to the other client's
subnets. This can be achieved by having both client nodes establish a
metaconnection to each other, and explicitly deny each other's subnets
in the host files they have for my central nodes.)