Christoph Anton Mitterer
2012-Mar-25 16:36 UTC
how to speed up OpenSSH command execution (and a speed analysis)
Hi. I recently did some investigation about how to get out the last microseconds of executing commands via OpenSSH on remote host (of course I'm using ConnectMaster). MOTIVATION: I'm introducing Nagios (well actualla Icinga) at the local institute. We have many active checks that must run locally on the remote hosts. The "best" way to do this is using NRPE (Nagios Remote Plugin Executor), which runs a daemon listening on a port, waiting for commands to be executed. The problem with NRPE is that it's inherently insecure (even when using the fake-SSL mode) it provides (as extensively disscussed here [0], [1] and [2]). Also executing commands on a remote host is bussiness the "belongs" to SSH and NRPE more or less duplicates this. Another reason why NRPE is broken is, that the mode in which argument passing (to the check scripts) is enabled is already marked as being unsecure. Why have NRPE then? - It allows only certain commands to be executed => With SSH this could however be done, too, I guess, by means as rssh. - It's much faster. => What I try to "solve" here? Why not using stunnel + NRPE? => This would still allow any local user on the remote host to contact the running NRPE daemon, and execute commands. This might be a security risk, e.g. if the NRPE has sudo rights or so. What's the goal? - Drop NRPE and use SSH instead of it, if the latter can be made as fast (or nearly as fast) as NRPE. - Use rssh to restrict the commands that may be run. - Use SSH-keys to allow the Nagios node to login to the (rssh-restricted) remote host. USING CONTROLMASTER: I guess it's inevitable to use ControlMaster for the connections from the Nagios host to the remot hosts. The actual connections for the commands usually close immediately, so a spawner is required that keeps up a connection for ALL checked hosts. I.e. something like: for each host ssh -f -N host Problems here: - What other options to use (largely for the sake of speed and security)? * -o ServerAliveInterval=30 ? * -C ? * -a -k -x ? * others? - How to spawn that first connection? I'd prefer that ssh has another mode, e.g. ControlMaster autoswpan, which makes about the following: When the first time a "normal" command is executed, e.g. ssh example.host.org check_load it actually does a ssh -f -N host and uses that one to do the ssh example.host.org check_load That way I wouldn't have to take care on * spawning the master sessions * restarting them, when they die for some reason * they would be only started when really required the frist time Ideally, there would be a way to timeout those automatically spawned master sessions. E.g. when not used for a day, stop it. ANALYSIS: I made some tests on the speed of command executiong with NRPE, SSH, SSH+NRPE, etc.: The check_load command was defined as /usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 in NRPE. The sshd_config of the remote host is found below[3]. The nagtest user on the remote host has: - this /etc/passwd entry: nagtest:x:54115:100::/home/nagtest:/bin/bash - a .bashrc and .profile in his homedir 1) NRPE (with it's fake-SSL mode) alone, no SSH or so at all: # time /usr/lib/nagios/plugins/check_nrpe -H host.example.org -c check_load OK - load average: 0.00, 0.02, 0.00|load1=0.000;15.000;30.000;0; load5=0.020;10.000;25.000;0; load15=0.000;5.000;20.000;0; real 0m0.047s user 0m0.000s sys 0m0.004s # time /usr/lib/nagios/plugins/check_nrpe -H host.example.org -c check_load OK - load average: 0.00, 0.02, 0.00|load1=0.000;15.000;30.000;0; load5=0.020;10.000;25.000;0; load15=0.000;5.000;20.000;0; real 0m0.008s user 0m0.004s sys 0m0.000s => The first time it#s quite slow, I guess because of the DNS lookup, but sub- sequent invocations are really fast (0.008s) 2) NRPE (withOUT it's fake-SSL mode) alone, no SSH or so at all: # time /usr/lib/nagios/plugins/check_nrpe -H host.example.org -c check_load -n OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0; load5=0.000;10.000;25.000;0; load15=0.000;5.000;20.000;0; real 0m0.006s user 0m0.004s sys 0m0.000s => It's even faster than (1). So given that NRPEs SSL is absolutely useless in anyway, one should always just disable it. 3) NRPE (withOUT it's fake-SSL mode) and with tunneling the connection over SSH via port-forwarding, NO(!) ControlMaster set: # ssh nagtest at host.example.org -L 2000:host.example.org:5666 -N (running everything under the nagtest user, the NRPE daemon listens on port 5666) (running check_nrpe on localhost:2000 in order to use the port-forwarding) # time /usr/lib/nagios/plugins/check_nrpe -p 2000 -H localhost -c check_load -n OK - load average: 0.31, 0.07, 0.02|load1=0.310;15.000;30.000;0; load5=0.070;10.000;25.000;0; load15=0.020;5.000;20.000;0; real 0m0.023s user 0m0.004s sys 0m0.000s real 0m0.010s user 0m0.004s sys 0m0.000s real 0m0.017s user 0m0.004s sys 0m0.000s real 0m0.006s user 0m0.004s sys 0m0.000s => On the first few invocations, time varied quite a lot (perhaps the remove system was under load). But then it got as fast as NRPE without SSH tunneling! This is really interesting, as it shows, I guess, that it's not the encryption layer of SSH that makes things slow Sidenode: Why don't I just stop here, and use NRPE tunneled over SSH? Cause NRPE would still be insecure and could be invoked on the localhost by other users 4) From now on, no more NRPE. Plain SSH, no special options, no ControlMaster, obviously no port-forwarding: # time ssh nagtest at host.example.org /usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 OK - load average: 0.01, 0.05, 0.05|load1=0.010;15.000;30.000;0; load5=0.050;10.000;25.000;0; load15=0.050;5.000;20.000;0; real 0m0.126s user 0m0.036s sys 0m0.000s real 0m0.169s user 0m0.036s sys 0m0.000s => Once it was "fast" (0.126s), but all other times I've tested it was around 0.169s. Control Master setup: Host * ControlPath ~/.ssh/master-%l-%r@%h:%p ControlMaster auto 5) SSH with ControlMaster: Opening the background control master: # ssh -f -N nagtest@@host.example.org # time ssh nagtest at lcg-lrz-dc20.grid.lrz-muenchen.de /usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0; load5=0.000;10.000;25.000;0; load15=0.000;5.000;20.000;0; real 0m0.045s user 0m0.004s sys 0m0.000s => Fastest result with SSH so far. 6) SSH with ControlMaster but dash as shell I thought maybe it's bash that is slow, so I changed the users shell to "dash". So I changed this in /etc/passwd. First I found out that this only takes effect when the Controls Master is restarted,... why? But apart from that, it had no impact on speed. 7) SSH with ControlMaster but ash as shell I made a test with ash as shell, where I actually got down to the 0.006s. But I couldn't reproduce this later. 8) SSH with ControlMaster but /bin/true as shell # time ssh nagtest at lcg-lrz-dc20.grid.lrz-muenchen.de /usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 real 0m0.040s user 0m0.004s sys 0m0.000s => As only true is executed, and no shell config files are read... it seems that the problem is not related to shell start up. MISCELLANEOUS Are there any further ways to speed things up? * I think disabling UseDNS isn't of that much use as it only affects the first control master connection, right? * Any ways, e.g. to speed up choice of the identity file? Or disabling everything but ssh-keys? etc. pp. So the question in the end is, can I somehow speed things even more up? If you need any further analysis work, just tell me. Thanks, Chris. [0] http://tracker.nagios.org/view.php?id=90 [1] http://tracker.nagios.org/view.php?id=125 [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=547092 [3] AllowUsers root nagtest ChallengeResponseAuthentication no PasswordAuthentication no RSAAuthentication no Protocol 2 Ciphers aes256-cbc,aes192-cbc,aes128-cbc,aes256-ctr,aes192-ctr,aes128-ctr,blowfish-cbc MACs hmac-sha1,hmac-ripemd160 ClientAliveInterval 30 TCPKeepAlive no AcceptEnv LANG LC_* X11Forwarding yes Subsystem sftp /usr/lib/openssh/sftp-server => I really wouldn't want to change the Ciphers to something weaker!
Alan Barrett
2012-Mar-25 17:32 UTC
how to speed up OpenSSH command execution (and a speed analysis)
On Sun, 25 Mar 2012, Christoph Anton Mitterer wrote:>- How to spawn that first connection? > I'd prefer that ssh has another mode, e.g. ControlMaster autoswpan, > which makes about the following: > When the first time a "normal" command is executed, e.g. > ssh example.host.org check_load > it actually does a > ssh -f -N host > and uses that one to do the > ssh example.host.org check_load"ControlMaster auto" is supposed to do that.> Ideally, there would be a way to timeout those automatically spawned > master sessions. E.g. when not used for a day, stop it."ControlPersist 24h" is supposed to do that. --apb (Alan Barrett)