thr3ads.net - tinc - Network pause issue. [Jan 2014]

If this information is useful, please help other people find it:
Share via:

Matthew Tolle

2014-Jan-21 06:07 UTC

Network pause issue.

Howdy Folks,

I've got a 5 node setup here. My server "home" is the primary
server that all other servers connect to. The configs on all the servers look
like this:

# cat /etc/tinc/home/hosts/node1
Subnet  = 10.2.0.0/16
Address = 192.168.2.1

<RSA KEY>

# cat /etc/tinc/home/hosts/node2
Subnet  = 10.3.0.0/16
Address = 192.168.3.1

<RSA KEY>

Etc. All the hosts are setup the same.

# /sbin/tinc -n home dump subnets
10.1.0.0/16 owner home
10.2.0.0/16 owner node1
10.3.0.0/16 owner node2
10.4.0.0/16 owner node3
10.5.0.0/16 owner node4

# cat /etc/tinc/home/tinc-up
ifconfig $INTERFACE 10.2.0.10 netmask 255.0.0.0
ifconfig $INTERFACE up

# cat tinc.conf
Name = node1
ConnectTo = home
Mode = router
AddressFamily = ipv4
PingInterval = 600
PingTimeout = 15

4 out of 5 nodes work just fine. Node 2 however has issues. It does work fine
for 5-30m and then pauses my connection to it. It's still up. I can't
ping it over the "pause time" with 0% packet loss. Any TCP connection
over the link just pauses for a while. The odd thing is it doesn't timeout.
In an SSH session to the box over the tinc link I'll type "ps -ef"
and 10m later I'll get the response. SSH should timeout way before then so
I'm not sure what's going on. It's not like that all the time. I get
maybe 15-30m when it's working just fine and then 10m of network pause.
While my SSH session is paused I can see that the app on the server is talking
to my primary node over the tunnel. That seems odd.

The app on the node side seems happy and can reach everything it needs to. No
sign of issue there. It only seems to be an issue over the tinc tunnel. It kind
of feels like maybe something is routing the IP space in a different direction
for a period of time and then it comes back. If that were the case my TCP ssh
connection would timeout well before the connection returns to life.

Has anyone seen anything like this? I've poked at a bunch of things to try
and pinpoint the issue. So far no love.

The routing table looks fine and the same on all of them: 

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.3.1     0.0.0.0         UG    0      0        0 eth0
10.0.0.0        0.0.0.0         255.0.0.0       U     0      0        0 home
192.168.3.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0

Nothing else in node2's area uses 10. space. 

Any ideas I would appreciate it.

Thanks,

-Matt

Donald Pearson

2014-Jan-21 14:05 UTC

head link

Network pause issue.

My 1st guess would be IP or MAC address conflicts.


On Tue, Jan 21, 2014 at 1:07 AM, Matthew Tolle <matt at night.com> wrote:
>
> Howdy Folks,
>
> I've got a 5 node setup here. My server "home" is the primary
server that
> all other servers connect to. The configs on all the servers look like
this:
>
> # cat /etc/tinc/home/hosts/node1
> Subnet  = 10.2.0.0/16
> Address = 192.168.2.1
>
> <RSA KEY>
>
> # cat /etc/tinc/home/hosts/node2
> Subnet  = 10.3.0.0/16
> Address = 192.168.3.1
>
> <RSA KEY>
>
> Etc. All the hosts are setup the same.
>
> # /sbin/tinc -n home dump subnets
> 10.1.0.0/16 owner home
> 10.2.0.0/16 owner node1
> 10.3.0.0/16 owner node2
> 10.4.0.0/16 owner node3
> 10.5.0.0/16 owner node4
>
> # cat /etc/tinc/home/tinc-up
> ifconfig $INTERFACE 10.2.0.10 netmask 255.0.0.0
> ifconfig $INTERFACE up
>
> # cat tinc.conf
> Name = node1
> ConnectTo = home
> Mode = router
> AddressFamily = ipv4
> PingInterval = 600
> PingTimeout = 15
>
> 4 out of 5 nodes work just fine. Node 2 however has issues. It does work
> fine for 5-30m and then pauses my connection to it. It's still up. I
can't
> ping it over the "pause time" with 0% packet loss. Any TCP
connection over
> the link just pauses for a while. The odd thing is it doesn't timeout.
In
> an SSH session to the box over the tinc link I'll type "ps
-ef" and 10m
> later I'll get the response. SSH should timeout way before then so
I'm not
> sure what's going on. It's not like that all the time. I get maybe
15-30m
> when it's working just fine and then 10m of network pause. While my SSH
> session is paused I can see that the app on the server is talking to my
> primary node over the tunnel. That seems odd.
>
> The app on the node side seems happy and can reach everything it needs to.
> No sign of issue there. It only seems to be an issue over the tinc tunnel.
> It kind of feels like maybe something is routing the IP space in a
> different direction for a period of time and then it comes back. If that
> were the case my TCP ssh connection would timeout well before the
> connection returns to life.
>
> Has anyone seen anything like this? I've poked at a bunch of things to
try
> and pinpoint the issue. So far no love.
>
> The routing table looks fine and the same on all of them:
>
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags Metric Ref    Use
> Iface
> 0.0.0.0         192.168.3.1     0.0.0.0         UG    0      0        0
> eth0
> 10.0.0.0        0.0.0.0         255.0.0.0       U     0      0        0
> home
> 192.168.3.0     0.0.0.0         255.255.255.0   U     0      0        0
> eth0
>
> Nothing else in node2's area uses 10. space.
>
> Any ideas I would appreciate it.
>
> Thanks,
>
> -Matt
> _______________________________________________
> tinc mailing list
> tinc at tinc-vpn.org
> http://www.tinc-vpn.org/cgi-bin/mailman/listinfo/tinc
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.tinc-vpn.org/pipermail/tinc/attachments/20140121/946a2562/attachment.html>

Raul Dias

2014-Jan-27 17:20 UTC

head link

Network pause issue.

Matt, just a wild guess.

I had problems in the past with badly configured IPv6 support and too large
MTUs (unrelated).

-rsd


2014-01-21 Matthew Tolle <matt at night.com>
>
> Howdy Folks,
>
> I've got a 5 node setup here. My server "home" is the primary
server that
> all other servers connect to. The configs on all the servers look like
this:
>
> # cat /etc/tinc/home/hosts/node1
> Subnet  = 10.2.0.0/16
> Address = 192.168.2.1
>
> <RSA KEY>
>
> # cat /etc/tinc/home/hosts/node2
> Subnet  = 10.3.0.0/16
> Address = 192.168.3.1
>
> <RSA KEY>
>
> Etc. All the hosts are setup the same.
>
> # /sbin/tinc -n home dump subnets
> 10.1.0.0/16 owner home
> 10.2.0.0/16 owner node1
> 10.3.0.0/16 owner node2
> 10.4.0.0/16 owner node3
> 10.5.0.0/16 owner node4
>
> # cat /etc/tinc/home/tinc-up
> ifconfig $INTERFACE 10.2.0.10 netmask 255.0.0.0
> ifconfig $INTERFACE up
>
> # cat tinc.conf
> Name = node1
> ConnectTo = home
> Mode = router
> AddressFamily = ipv4
> PingInterval = 600
> PingTimeout = 15
>
> 4 out of 5 nodes work just fine. Node 2 however has issues. It does work
> fine for 5-30m and then pauses my connection to it. It's still up. I
can't
> ping it over the "pause time" with 0% packet loss. Any TCP
connection over
> the link just pauses for a while. The odd thing is it doesn't timeout.
In
> an SSH session to the box over the tinc link I'll type "ps
-ef" and 10m
> later I'll get the response. SSH should timeout way before then so
I'm not
> sure what's going on. It's not like that all the time. I get maybe
15-30m
> when it's working just fine and then 10m of network pause. While my SSH
> session is paused I can see that the app on the server is talking to my
> primary node over the tunnel. That seems odd.
>
> The app on the node side seems happy and can reach everything it needs to.
> No sign of issue there. It only seems to be an issue over the tinc tunnel.
> It kind of feels like maybe something is routing the IP space in a
> different direction for a period of time and then it comes back. If that
> were the case my TCP ssh connection would timeout well before the
> connection returns to life.
>
> Has anyone seen anything like this? I've poked at a bunch of things to
try
> and pinpoint the issue. So far no love.
>
> The routing table looks fine and the same on all of them:
>
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags Metric Ref    Use
> Iface
> 0.0.0.0         192.168.3.1     0.0.0.0         UG    0      0        0
> eth0
> 10.0.0.0        0.0.0.0         255.0.0.0       U     0      0        0
> home
> 192.168.3.0     0.0.0.0         255.255.255.0   U     0      0        0
> eth0
>
> Nothing else in node2's area uses 10. space.
>
> Any ideas I would appreciate it.
>
> Thanks,
>
> -Matt
> _______________________________________________
> tinc mailing list
> tinc at tinc-vpn.org
> http://www.tinc-vpn.org/cgi-bin/mailman/listinfo/tinc
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.tinc-vpn.org/pipermail/tinc/attachments/20140127/9771c344/attachment-0001.html>

Guus Sliepen

2014-Jan-27 22:44 UTC

head link

Network pause issue.

On Mon, Jan 20, 2014 at 11:07:53PM -0700, Matthew Tolle wrote:
> I've got a 5 node setup here. My server "home" is the primary
server that all other servers connect to. The configs on all the servers look
like this:
[...]> # cat tinc.conf
> Name = node1
> ConnectTo = home
> Mode = router
> AddressFamily = ipv4
> PingInterval = 600
> PingTimeout = 15
> 
> 4 out of 5 nodes work just fine. Node 2 however has issues. It does work
fine for 5-30m and then pauses my connection to it. It's still up. I
can't ping it over the "pause time" with 0% packet loss. Any TCP
connection over the link just pauses for a while. The odd thing is it
doesn't timeout. In an SSH session to the box over the tinc link I'll
type "ps -ef" and 10m later I'll get the response. SSH should
timeout way before then so I'm not sure what's going on. It's not
like that all the time. I get maybe 15-30m when it's working just fine and
then 10m of network pause. While my SSH session is paused I can see that the app
on the server is talking to my primary node over the tunnel. That seems odd.
Your configuration looks perfectly fine.

The 10 minutes period might correspond to the PingInterval setting. Try
changing that to see if that is true (if so, that helps narrow down the
problem). If you are using 1.1pre9, then it might be an issue with the new
protocol, which is enabled by default in that version. You could try disabling
it by setting ExperimentalProtocol = no in tinc.conf on all nodes. You could
also try going back to tinc 1.0.23 (you don't need to change the
configuration
files).

Of course, it is strange that it only affects one node. Is there anything
different on that node compared to the others?

-- 
Met vriendelijke groet / with kind regards,
     Guus Sliepen <guus at tinc-vpn.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL:
<http://www.tinc-vpn.org/pipermail/tinc/attachments/20140127/93f7ffad/attachment.sig>

Possibly Parallel Threads

Search for more apparently analagous threads

tinc - Jan 2014 - Network pause issue.

Network pause issue.

Network pause issue.

Network pause issue.

Network pause issue.

Possibly Parallel Threads