On Mon, Feb 26, 2018 at 5:39 PM, Kip Warner <kip at thevertigo.com> wrote:
> On Mon, 2018-02-26 at 17:31 -0800, Nathan Anderson wrote:
>
>> Is one end or the other behind a NAT of some kind?
>
> Both are behind NATs. I've heard this can have something to do with
> MTUs, but not sure.
If PMTUD is broken for some reason between the two hosts, then it's
possible for it to be an MTU issue, sure. If that's the case then a
tcpdump / Wireshark analysis should reveal this to be the case pretty
quickly.
Also, try sending pings of various sizes with DF (Don't Fragment) bit
set in the IP header between the two hosts and see what is the largest
size packet you can send and still get a reply back. (When you stop
getting replies, also note whether the echo request just timed out, or
whether you are getting an ICMP message back...should be type 3 code 4
"packet too large". If you get nothing back then some netadmin who is
wholesale blocking ICMP on a gateway between the two hosts needs to be
shot.)
If it turns out to be PMTUD breakage, then you can probably work
around by "clamping" the TCP MSS on one side or the other, forcing
that host to announce to the other side in the TCP SYN that it can
accept packets no larger than X.
None of this strictly has anything to do with SSH, of course.
The reason I ask about NAT is because depending on how this timeout is
manifesting, it could be the NAT. Is it an SSH session that only
times out after it has gone idle for a few minutes? Or does it time
out while it is actively in the middle of a data transfer?
If it's timing out in the middle of a transfer, it *could* be MTU.
If it's timing out after it idles for a set amount of time, that is
almost surely your NAT router deleting the connection tracking
information for that TCP session from its local table. On some
not-crappy routers, these connection tracking timeout values can be
tweaked.
-- Nathan