I am using tinc to connect together VPCs in AWS across multiple regions and accounts to provide secure communication. For the most part, it works great. A few times, I have seen issues where something got into an unstable state that didn't seem to resolve itself. Shutting down tinc on all hosts and restarting seemed to do the trick, but I'd like to see if there is something that I can change with the configuration to mitigate this. Here is what the issue looks like when it happens with the hostnames/ips changed: Feb 26 07:39:20 host1 [daemon.info] tinc.network[1022]: host2 (1.1.1.1 port 655) could not flush for 9 seconds (324908 bytes remaining) Feb 26 07:39:20 host1 [daemon.notice] tinc.network[1022]: Closing connection with host2 (1.1.1.1 port 655) Feb 26 07:39:20 host1 [daemon.err] tinc.network[1022]: Could not set up a meta connection to host2 Feb 26 07:39:20 host1 [daemon.info] tinc.network[1022]: Trying to connect to host2 (1.1.1.1 port 655) Feb 26 07:39:20 host1 [daemon.info] tinc.network[1022]: Connected to host2 (1.1.1.1 port 655) Feb 26 07:39:20 host1 [daemon.notice] tinc.network[1022]: Connection with host2 (1.1.1.1 port 655) activated Feb 26 07:39:20 host1 [daemon.err] tinc.network[1022]: Flushing meta data to host2 (1.1.1.1 port 655) failed: Connection reset by peer Feb 26 07:39:20 host1 [daemon.notice] tinc.network[1022]: Closing connection with host2 (1.1.1.1 port 655) Feb 26 07:39:20 host1 [daemon.err] tinc.network[1022]: Could not set up a meta connection to host2 Feb 26 07:39:20 host1 [daemon.info] tinc.network[1022]: Trying to connect to host2 (1.1.1.1 port 655) Feb 26 07:39:20 host1 [daemon.info] tinc.network[1022]: Connected to host2 (1.1.1.1 port 655) Feb 26 07:39:20 host1 [daemon.notice] tinc.network[1022]: Connection with host2 (1.1.1.1 port 655) activated Feb 26 07:39:20 host1 [daemon.info] tinc.network[1022]: host2 (1.1.1.1 port 655) could not flush for 5 seconds (459308 bytes remaining) Feb 26 07:39:20 host1 [daemon.notice] tinc.network[1022]: Closing connection with host2 (1.1.1.1 port 655) Feb 26 07:39:20 host1 [daemon.err] tinc.network[1022]: Could not set up a meta connection to host2 Feb 26 07:39:20 host1 [daemon.info] tinc.network[1022]: Trying to connect to host2 (1.1.1.1 port 655) Feb 26 07:39:20 host1 [daemon.warning] tinc.network[1022]: Timeout from host2 (1.1.1.1 port 655) during authentication Feb 26 07:39:20 host1 [daemon.err] tinc.network[1022]: Could not set up a meta connection to host2 Feb 26 07:39:20 host1 [daemon.info] tinc.network[1022]: Trying to connect to host2 (1.1.1.1 port 655) Feb 26 07:39:20 host1 [daemon.info] tinc.network[1022]: Connected to host2 (1.1.1.1 port 655) Feb 26 07:39:20 host1 [daemon.notice] tinc.network[1022]: Connection with host2 (1.1.1.1 port 655) activated Feb 26 07:39:20 host1 [daemon.err] tinc.network[1022]: Flushing meta data to host2 (1.1.1.1 port 655) failed: Connection reset by peer We see this across all of the hosts with no real pattern. Our main tinc configuration looks like this: Name = myname Device = /dev/net/tun ConnectTo = host1 ConnectTo = host2 ConnectTo = host3 The per host configs look like: Subnet = subnet1 Subnet = subnet2 Subnet = subnet3 Address = eip public_key_info We are using tinc-1.0.24. We have roughly 170 hosts running tinc and 907 subnets total. Please let me know if any other details would help. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.tinc-vpn.org/pipermail/tinc/attachments/20150320/dd15ffb2/attachment.html>
On Fri, Mar 20, 2015 at 02:07:56PM +0000, Michael Drzal wrote:> I am using tinc to connect together VPCs in AWS across multiple regions and > accounts to provide secure communication. For the most part, it works > great. A few times, I have seen issues where something got into an > unstable state that didn't seem to resolve itself. Shutting down tinc on > all hosts and restarting seemed to do the trick, but I'd like to see if > there is something that I can change with the configuration to mitigate > this. Here is what the issue looks like when it happens with the > hostnames/ips changed: > > Feb 26 07:39:20 host1 [daemon.info] tinc.network[1022]: host2 (1.1.1.1 port 655) could not flush for 9 seconds (324908 bytes remaining) > Feb 26 07:39:20 host1 [daemon.notice] tinc.network[1022]: Closing connection with host2 (1.1.1.1 port 655)It looks like there is a lot of data that host1 wants to send to host2 via TCP, but it is not able to handle all the data in a timely manner. Tinc killed the connection in an attempt to get rid of the problem.> Feb 26 07:39:20 host1 [daemon.info] tinc.network[1022]: Trying to connect to host2 (1.1.1.1 port 655) > Feb 26 07:39:20 host1 [daemon.info] tinc.network[1022]: Connected to host2 (1.1.1.1 port 655) > Feb 26 07:39:20 host1 [daemon.notice] tinc.network[1022]: Connection with host2 (1.1.1.1 port 655) activated > Feb 26 07:39:20 host1 [daemon.info] tinc.network[1022]: host2 (1.1.1.1 port 655) could not flush for 5 seconds (459308 bytes remaining)Ok, immediately after connecting it tries to send half a megabyte of data... Looking at the number of hosts and subnets, I would expect around 60 kB of data being sent to synchronise the state between the hosts, not 460 kB. Could it be there was a lot of VPN traffic going on at the same moment? Maybe broadcast traffic? I suggest adding the following to your configuration files: Broadcast = no PingTimeout = 15 This will prevent broadcast traffic from being forwarded. It could be that with the number of hosts you have, this is adding a significant amount of network traffic. If you don't need that, it's best to disable it. The increased PingTimeout will cause tinc to wait a bit longer before giving up on a connection.> Please let me know if any other details would help.If the problem persists, capturing a more detailed log of what tinc is doing would b every helpful. You can send a running tincd the INT signal to raise its debug level to 5, and send it again to revert to the value it had when starting. -- Met vriendelijke groet / with kind regards, Guus Sliepen <guus at tinc-vpn.org> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://www.tinc-vpn.org/pipermail/tinc/attachments/20150331/70054360/attachment.sig>