------- TL;DR ------- Many of the tinc daemons in a large (~50 node) tinc network tend to hog a lot of memory (>100MB). None of our smaller tinc networks do this. As to which nodes use a lot of memory, I could not find a distinguishing factor: tinc version, OS version, kernel version, being included in ConnectTo and tinc traffic seem all not to determine memory consumption. Architecture may be a factor, as only x68_64 ones exhibit high RAM usage, but I cannot change that. What could cause such high memory usage? Any tips how to avoid it? Does anyone have experience with larger tinc networks? ------- The Long Version ------- I have three tinc networks, two smaller application-specific ones (10-15 active hosts), and one larger that spans most servers and VMs in the department (~50 active hosts), mainly for administrative purposes. The larger network consumes a lot of memory on many hosts (but really modest amounts on some hosts). Generally, this would not be a big deal, but having a process on _all_ VMs results in a cluster-wide memory consumption increase of ~5-10GB, which is quite significant. First, I thought the architecture will be the main differentiator, and in some ways it is, but there are only two i686 systems, all with old versions: - i686 - small network: 1.5 MB - large network: 9-12 MB - x86_64 - small network: 1-5 MB - large network: 3-450 MB (median ~200MB) The problem is, that I simply cannot switch to i686 on most problematic machines. There are a variety of tinc versions running. All from distribution packages (mostly Debian). The memory usage is varying, but there are some trends (memory usage was measured with: "ps ax -o rss,command | grep tinc[d]"): - 1.0.13 (only one VM) - large network: 3 MB - 1.0.19 - small network: 1-2 MB - large network: 3-120 MB (only three under 30MB, curiously only the ones built in 2012 were under 4MB, the ones built in 2013 were over 10MB, and generally over 80MB) - 1.0.24 - small network: 1-4 MB - large network: 9-140 MB - 1.0.31 - small network: 1.5-4 MB - large network: 6-420 MB (median around 200MB) - 1.0.35 - large network: 15-450 MB (median around 200MB) So newer tincs tend to use more RAM, but not _necessarily_ that much more. Also, the small networks' usage never goes above 5MB. So I tried to break it up by distribution (only large network observed): - Ubuntu - 12.04: 3 MB - 14.04: 120-160 MB - Debian - 6: 3-140 MB - 7: 12-140 MB - 8: 9-110 MB - 9: 6-420 MB - 10: 15-450 MB I also tried comparing kernel versions, and there were similarly widespread numbers: on some 2.6 kernels it would eat 140MB, and on some 4.19 kernels it uses only 6MB. Being in the ConnectTo list also does not guarantee high memory consumption, nor does not being listed as ConnectTo preclude a machine. A tinc restart reduces memory consumption to acceptable levels (under 10MB), memory consumption seems to need more than an hour to rise again, but I will need more measurements to get something usable. The above made me curious as whether high traffic causes high memory consumption, and calculated (RX+TX bytes on the tinc interface)/(RSS). The resulting number varies between 1 and 50000, so this also seems inconclusive. I do not know if you can get the bytes forwarded by the tinc daemon (AFAIK the tun device stats do not include this), does anyone know that? So all in all ..... I am stuck. It may very well be that there is a memory leak somewhere that is triggered by strange coincidences that need many hosts in the network. Any tips to solve this mystery are appreciated! Thanks in advance: PP ----- P.s.: additional information (maybe relevant?): The big tinc network is installed using the modified ansible-tinc role: https://github.com/pallinger/ansible-tinc, forked from thisismitch/ansible-tinc. The main difference is that I can limit ConnectTo hosts (all hosts were added as ConnectTo in the original ansible role), as too many ConnectTo hosts caused some tinc versions (the old ones, maybe (I forgot)) to spin the CPU for more than 10 minutes at startup while not actually forwarding anything on the VPN. Currently there are 10 ConnectTo hosts on the large network -- the ones with public IP addresses -- and the network works fine aside from the high memory usage.