Hello, I have a bandwidth-constrained connection that I'd like to run rsync over through an SSH tunnel. I also want to detect any network drops pretty rapidly. On the servers I'm setting (via sshd_config): ClientAliveCountMax 5 ClientAliveInterval 1 TCPKeepAlive no and on the clients I'm setting (via ssh_config): ServerAliveCountMax 5 ServerAliveInterval 1 TCPKeepAlive no After about 5 seconds, the connection is being dropped, but during that time the rsync is successfully transferring data near the full bandwidth of the connection. My understanding is that since the alive mechanism is running inside the encrypted connection, OpenSSH would be able to (and would) prioritize the alive packets over other data. So if any data is able to get through (and it does) the alive packets should be able to as well. But this doesn't seem to be the case. Is my understanding of how this is supposed to work wrong? If not, could I have a misconfiguration somewhere, or is it possible that this is some old bug? (This is OpenSSH_5.5p1 with OpenSSL 1.0.0a.) Thanks, Jeff
Hi, On Wed, Jan 25, 2012 at 12:26:34PM -0500, Jeff Mitchell wrote:> My understanding is that since the alive mechanism is running inside the > encrypted connection, OpenSSH would be able to (and would) prioritize > the alive packets over other data. So if any data is able to get through > (and it does) the alive packets should be able to as well. But this > doesn't seem to be the case.For that, OpenSSH would need to know that there is congestion, and throttle it's send rate to avoid buffers building up *elsewhere*. If there is 10 seconds worth of data in your DSL router, there is nothing OpenSSH can do to achieve a round-trip time of 1s for it's keepalives. gert -- USENET is *not* the non-clickable part of WWW! //www.muc.de/~gert/ Gert Doering - Munich, Germany gert at greenie.muc.de fax: +49-89-35655025 gert at net.informatik.tu-muenchen.de
Old thread I know but I have opposite problem. Maybe SSH was changed in connection with this report ? See my recent (Jan 2014) ML thread. I am observing SSH waiting for a TCP level timeout to occur when the other end has done away (and it not sending back any data or TCP RST). Jeff Mitchell wrote:> I have a bandwidth-constrained connection that I'd like to run rsync > over through an SSH tunnel. I also want to detect any network drops > pretty rapidly.If you are bandwidth constrained why are you wasting bandwidth on 1 second ping-pongs ? What % of your overall data are you wasting on that effort? Does your usage of the application require connection recovery (for a stalled, non-working connection) within 10s of seconds ? So you are in a bandwidth contained environment trying to send bulk data and must know if the other end has become unavailable within 6 seconds of it doing so ? If you're bandwidth constrained I would have thought both ends would be patient when waiting for data and turning up Interval (like 10 seconds) and turning down CountMax (like 2) is a better way to go, increasing Interval as necessary.> After about 5 seconds, the connection is being dropped, but during that > time the rsync is successfully transferring data near the full bandwidth > of the connection.Maybe you can ask SSH client/server (on both sides or at least the side with the most data being pushed) to turn down the SO_SNDBUF to the minimize the kernel buffer. This can be done on a socket by socket basis using C kernel API setsockopt(). So is something ssh/sshd needs to implement on your behalf. When the connection is sending if you run "netstat -tanp" (on Linux) the number of bytes in the kernel buffer will be shown in the Send-Q. Reducing SO_SNDBUF decreases this value but with the effect of causing the sending process to wake up more often to refill the kernel buffer. It sounds like your CPU processing power far exceed the network throughput so I do not think this will be a concern in your scenario. The lowest value for SO_SNDBUF according to Linux man page is 2048 bytes. Note if you make this value too low and your CPU does not refill the kernel buffer and it underruns (i.e. the TCP stack could send data but there was none available as the application did not wakeup and write() data quick enough) it will mess up performance as TCP slow start congestion control may reset causing overall measured throughput to drop. man 7 socket (search SO_SNDBUF) man 7 tcp On Linux see also /proc/sys/net/core/wmem_default for system wide default of SSH application does not have option. You can 'cat /proc/sys/net/core/wmem_default' to see the current value, going below 32k for 100mbit (or better) ethernet system is probably a bad idea. Note on the bandwidth restricted application you want to tweak it, setting it too low will have a major effect on normal performance of a normal Ethernet based system.> > My understanding is that since the alive mechanism is running inside the > encrypted connection, OpenSSH would be able to (and would) prioritize > the alive packets over other data. So if any data is able to get through > (and it does) the alive packets should be able to as well. But this > doesn't seem to be the case.No. While SSH is able to multiplex different streams inside a single TCP connection, the aggregated stream is still subject to kernel Send-Q buffering and then network latency, congestion and performance metrics. So what are doing is taking system memory and Ethernet performance tuned parameters for networking (in the cause of Linux again) and trying to use them with bandwidth restricted connectivity. The default OS picked wmem/sendq is based on system memory and other such inter-related params allowing auto-tune. Darryl
Reasonably Related Threads
- ssh client does not timeout if the network fails after ssh_connect but before ssh_exchange_identification, even with Alive options set
- Persistent SSH sessions
- Need Help to Fix CVE-2008-1483, CVE-2008-5161, CVE-2015-5600 and CVE-2015-6565
- AIX and zlib
- matching on client public key