Tomasz Chmielewski
2017-Feb-21 03:38 UTC
no connectivity to some hosts behind tinc for the first few seconds
I have the following tinc setup: client -- tinc DC1 -- tinc DC2 -- 10.1.2.0/24 subnet It generally works well, however, there is one issue I'm not able to solve: *sometimes*, connectivity to *some* destinations does not work for the first few seconds. To demonstrate: $ mongo mongo.example.com:27017 MongoDB shell version: 3.2.12 connecting to: mongo.example.com:27017/test 2017-02-21T03:29:30.243+0000 W NETWORK [thread1] Failed to connect to 10.1.2.3:27017 after 5000ms milliseconds, giving up. 2017-02-21T03:29:30.243+0000 E QUERY [thread1] Error: couldn't connect to server mongo.example.com:27017, connection attempt failed : connect at src/mongo/shell/mongo.js:231:14 @(connect):1:6 exception: connect failed Then connect again immediately after - connection works, once it's established, no packet losses: $ mongo mongo.example.com:27017 MongoDB shell version: 3.2.12 connecting to: mongo.example.com:27017/test tab:SECONDARY> ^C Any clues what might be causing this issue? Looks like ARP resolution somewhere, but I've not sure how to debug this. Or not ARP resolution - as it can sometimes be that the first connection is successful, and the next one is not, i.e.: $ echo exit | mongo mongo.example.com:27017 MongoDB shell version: 3.2.12 connecting to: mongo.example.com:27017/test bye Some 10-20 secs later (ARP should live longer than that): echo exit + mongo mongo.example.com:27017 MongoDB shell version: 3.2.12 connecting to: mongo.example.com:27017/test 2017-02-21T03:34:55.754+0000 W NETWORK [thread1] Failed to connect to 10.1.2.3:27017 after 5000ms milliseconds, giving up. 2017-02-21T03:34:55.754+0000 E QUERY [thread1] Error: couldn't connect to server mongo.example.com:27017, connection attempt failed : connect at src/mongo/shell/mongo.js:231:14 @(connect):1:6 exception: connect failed Tomasz Chmielewski https://lxadm.com
Tomasz Chmielewski
2017-Feb-21 07:39 UTC
no connectivity to some hosts behind tinc for the first few seconds
On 2017-02-21 12:38, Tomasz Chmielewski wrote:> I have the following tinc setup: > > client -- tinc DC1 -- tinc DC2 -- 10.1.2.0/24 subnet > > > It generally works well, however, there is one issue I'm not able to > solve: *sometimes*, connectivity to *some* destinations does not work > for the first few seconds.I was able to reproduce it reliably in the following simplified scenario: tinc DC1 -- tinc DC2 -- MASQUERADE -- 10.1.2.3 (webserver) Script to reproduce (it tries to fetch the URL for 5 seconds, exits if it fails): #!/bin/bash set -e i=1 while true ; do echo "Run number $i" curl -s -m 5 10.1.2.3/XXXXXXXX >/dev/null echo $? i=$((i+1)) done Usually, it will break after about 30 iterations. "time curl -s -m 5 10.1.2.3/XXXXXXXX" takes around 0.2-0.3 secs to execute - so 5 seconds should be enough time. tshark shows "TCP Spurious Retransmission" for cases where curl is not able to fetch any data. Both tinc servers are running Ubuntu 16.04 (64 bit) with tinc 1.0.26. DC1 is Europe (Hetzner); DC2 is in USA (Amazon AWS). What's interesting, I don't have these timeouts when I replace tinc with openvpn. Any help appreciated! Tomasz Chmielewski https://lxadm.com
Tomasz Chmielewski
2017-Feb-21 11:50 UTC
no connectivity to some hosts behind tinc for the first few seconds
On 2017-02-21 16:39, Tomasz Chmielewski wrote:> tshark shows "TCP Spurious Retransmission" for cases where curl is not > able to fetch any data. > > > Both tinc servers are running Ubuntu 16.04 (64 bit) with tinc 1.0.26. > > DC1 is Europe (Hetzner); DC2 is in USA (Amazon AWS). > > > > What's interesting, I don't have these timeouts when I replace tinc > with openvpn. > > Any help appreciated!Some good news: - not getting "TCP Spurious Retransmission" with tinc 1.1pre14 - not getting "TCP Spurious Retransmission" with tinc 1.0.31 I didn't test versions in between 1.0.26 (one with connectivity issues) and 1.0.31 (one without connectivity issues). Tomasz Chmielewski https://lxadm.com