Howdy Folks, I've got a 5 node setup here. My server "home" is the primary server that all other servers connect to. The configs on all the servers look like this: # cat /etc/tinc/home/hosts/node1 Subnet = 10.2.0.0/16 Address = 192.168.2.1 <RSA KEY> # cat /etc/tinc/home/hosts/node2 Subnet = 10.3.0.0/16 Address = 192.168.3.1 <RSA KEY> Etc. All the hosts are setup the same. # /sbin/tinc -n home dump subnets 10.1.0.0/16 owner home 10.2.0.0/16 owner node1 10.3.0.0/16 owner node2 10.4.0.0/16 owner node3 10.5.0.0/16 owner node4 # cat /etc/tinc/home/tinc-up ifconfig $INTERFACE 10.2.0.10 netmask 255.0.0.0 ifconfig $INTERFACE up # cat tinc.conf Name = node1 ConnectTo = home Mode = router AddressFamily = ipv4 PingInterval = 600 PingTimeout = 15 4 out of 5 nodes work just fine. Node 2 however has issues. It does work fine for 5-30m and then pauses my connection to it. It's still up. I can't ping it over the "pause time" with 0% packet loss. Any TCP connection over the link just pauses for a while. The odd thing is it doesn't timeout. In an SSH session to the box over the tinc link I'll type "ps -ef" and 10m later I'll get the response. SSH should timeout way before then so I'm not sure what's going on. It's not like that all the time. I get maybe 15-30m when it's working just fine and then 10m of network pause. While my SSH session is paused I can see that the app on the server is talking to my primary node over the tunnel. That seems odd. The app on the node side seems happy and can reach everything it needs to. No sign of issue there. It only seems to be an issue over the tinc tunnel. It kind of feels like maybe something is routing the IP space in a different direction for a period of time and then it comes back. If that were the case my TCP ssh connection would timeout well before the connection returns to life. Has anyone seen anything like this? I've poked at a bunch of things to try and pinpoint the issue. So far no love. The routing table looks fine and the same on all of them: Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.3.1 0.0.0.0 UG 0 0 0 eth0 10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 home 192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 Nothing else in node2's area uses 10. space. Any ideas I would appreciate it. Thanks, -Matt
My 1st guess would be IP or MAC address conflicts. On Tue, Jan 21, 2014 at 1:07 AM, Matthew Tolle <matt at night.com> wrote:> > Howdy Folks, > > I've got a 5 node setup here. My server "home" is the primary server that > all other servers connect to. The configs on all the servers look like this: > > # cat /etc/tinc/home/hosts/node1 > Subnet = 10.2.0.0/16 > Address = 192.168.2.1 > > <RSA KEY> > > # cat /etc/tinc/home/hosts/node2 > Subnet = 10.3.0.0/16 > Address = 192.168.3.1 > > <RSA KEY> > > Etc. All the hosts are setup the same. > > # /sbin/tinc -n home dump subnets > 10.1.0.0/16 owner home > 10.2.0.0/16 owner node1 > 10.3.0.0/16 owner node2 > 10.4.0.0/16 owner node3 > 10.5.0.0/16 owner node4 > > # cat /etc/tinc/home/tinc-up > ifconfig $INTERFACE 10.2.0.10 netmask 255.0.0.0 > ifconfig $INTERFACE up > > # cat tinc.conf > Name = node1 > ConnectTo = home > Mode = router > AddressFamily = ipv4 > PingInterval = 600 > PingTimeout = 15 > > 4 out of 5 nodes work just fine. Node 2 however has issues. It does work > fine for 5-30m and then pauses my connection to it. It's still up. I can't > ping it over the "pause time" with 0% packet loss. Any TCP connection over > the link just pauses for a while. The odd thing is it doesn't timeout. In > an SSH session to the box over the tinc link I'll type "ps -ef" and 10m > later I'll get the response. SSH should timeout way before then so I'm not > sure what's going on. It's not like that all the time. I get maybe 15-30m > when it's working just fine and then 10m of network pause. While my SSH > session is paused I can see that the app on the server is talking to my > primary node over the tunnel. That seems odd. > > The app on the node side seems happy and can reach everything it needs to. > No sign of issue there. It only seems to be an issue over the tinc tunnel. > It kind of feels like maybe something is routing the IP space in a > different direction for a period of time and then it comes back. If that > were the case my TCP ssh connection would timeout well before the > connection returns to life. > > Has anyone seen anything like this? I've poked at a bunch of things to try > and pinpoint the issue. So far no love. > > The routing table looks fine and the same on all of them: > > Kernel IP routing table > Destination Gateway Genmask Flags Metric Ref Use > Iface > 0.0.0.0 192.168.3.1 0.0.0.0 UG 0 0 0 > eth0 > 10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 > home > 192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 > eth0 > > Nothing else in node2's area uses 10. space. > > Any ideas I would appreciate it. > > Thanks, > > -Matt > _______________________________________________ > tinc mailing list > tinc at tinc-vpn.org > http://www.tinc-vpn.org/cgi-bin/mailman/listinfo/tinc >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.tinc-vpn.org/pipermail/tinc/attachments/20140121/946a2562/attachment.html>
Matt, just a wild guess. I had problems in the past with badly configured IPv6 support and too large MTUs (unrelated). -rsd 2014-01-21 Matthew Tolle <matt at night.com>> > Howdy Folks, > > I've got a 5 node setup here. My server "home" is the primary server that > all other servers connect to. The configs on all the servers look like this: > > # cat /etc/tinc/home/hosts/node1 > Subnet = 10.2.0.0/16 > Address = 192.168.2.1 > > <RSA KEY> > > # cat /etc/tinc/home/hosts/node2 > Subnet = 10.3.0.0/16 > Address = 192.168.3.1 > > <RSA KEY> > > Etc. All the hosts are setup the same. > > # /sbin/tinc -n home dump subnets > 10.1.0.0/16 owner home > 10.2.0.0/16 owner node1 > 10.3.0.0/16 owner node2 > 10.4.0.0/16 owner node3 > 10.5.0.0/16 owner node4 > > # cat /etc/tinc/home/tinc-up > ifconfig $INTERFACE 10.2.0.10 netmask 255.0.0.0 > ifconfig $INTERFACE up > > # cat tinc.conf > Name = node1 > ConnectTo = home > Mode = router > AddressFamily = ipv4 > PingInterval = 600 > PingTimeout = 15 > > 4 out of 5 nodes work just fine. Node 2 however has issues. It does work > fine for 5-30m and then pauses my connection to it. It's still up. I can't > ping it over the "pause time" with 0% packet loss. Any TCP connection over > the link just pauses for a while. The odd thing is it doesn't timeout. In > an SSH session to the box over the tinc link I'll type "ps -ef" and 10m > later I'll get the response. SSH should timeout way before then so I'm not > sure what's going on. It's not like that all the time. I get maybe 15-30m > when it's working just fine and then 10m of network pause. While my SSH > session is paused I can see that the app on the server is talking to my > primary node over the tunnel. That seems odd. > > The app on the node side seems happy and can reach everything it needs to. > No sign of issue there. It only seems to be an issue over the tinc tunnel. > It kind of feels like maybe something is routing the IP space in a > different direction for a period of time and then it comes back. If that > were the case my TCP ssh connection would timeout well before the > connection returns to life. > > Has anyone seen anything like this? I've poked at a bunch of things to try > and pinpoint the issue. So far no love. > > The routing table looks fine and the same on all of them: > > Kernel IP routing table > Destination Gateway Genmask Flags Metric Ref Use > Iface > 0.0.0.0 192.168.3.1 0.0.0.0 UG 0 0 0 > eth0 > 10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 > home > 192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 > eth0 > > Nothing else in node2's area uses 10. space. > > Any ideas I would appreciate it. > > Thanks, > > -Matt > _______________________________________________ > tinc mailing list > tinc at tinc-vpn.org > http://www.tinc-vpn.org/cgi-bin/mailman/listinfo/tinc >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.tinc-vpn.org/pipermail/tinc/attachments/20140127/9771c344/attachment-0001.html>
On Mon, Jan 20, 2014 at 11:07:53PM -0700, Matthew Tolle wrote:> I've got a 5 node setup here. My server "home" is the primary server that all other servers connect to. The configs on all the servers look like this:[...]> # cat tinc.conf > Name = node1 > ConnectTo = home > Mode = router > AddressFamily = ipv4 > PingInterval = 600 > PingTimeout = 15 > > 4 out of 5 nodes work just fine. Node 2 however has issues. It does work fine for 5-30m and then pauses my connection to it. It's still up. I can't ping it over the "pause time" with 0% packet loss. Any TCP connection over the link just pauses for a while. The odd thing is it doesn't timeout. In an SSH session to the box over the tinc link I'll type "ps -ef" and 10m later I'll get the response. SSH should timeout way before then so I'm not sure what's going on. It's not like that all the time. I get maybe 15-30m when it's working just fine and then 10m of network pause. While my SSH session is paused I can see that the app on the server is talking to my primary node over the tunnel. That seems odd.Your configuration looks perfectly fine. The 10 minutes period might correspond to the PingInterval setting. Try changing that to see if that is true (if so, that helps narrow down the problem). If you are using 1.1pre9, then it might be an issue with the new protocol, which is enabled by default in that version. You could try disabling it by setting ExperimentalProtocol = no in tinc.conf on all nodes. You could also try going back to tinc 1.0.23 (you don't need to change the configuration files). Of course, it is strange that it only affects one node. Is there anything different on that node compared to the others? -- Met vriendelijke groet / with kind regards, Guus Sliepen <guus at tinc-vpn.org> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 181 bytes Desc: Digital signature URL: <http://www.tinc-vpn.org/pipermail/tinc/attachments/20140127/93f7ffad/attachment.sig>
Reasonably Related Threads
- Route certain trafic via a tinc node that is not directly connected.
- Tinc Router Mode - PING RESULT is destination host unreachable
- PingTimeout
- No connection between nodes on same LAN
- Reg: ocfs2 two node cluster crashed, node2 crashed, when I rebooted node1 for maintenance.