Hi, There has been some recent work to improve the speed of the Message Authentication Codes (MACs) that are used in OpenSSH. The first improvement is a change from Markus Friedl to reuse the MAC context, rather than reinitialising it for every packet. This saves two calls to the underlying hash function (e.g. SHA1) for each packet. My tests found that this yielded at 12-16% speedup for bulk transfers to localhost using HMAC-MD5 and arcfour256. HMAC-SHA1 should see an even bigger improvement, because SHA1 is a more expensive hash function. The second improvement is Peter Valchev's addition of a new MAC: Ted Krovetz' UMAC-64[1]. This MAC uses a very different approach than the HMACs that OpenSSH currently supports, and it comes with a nice security proof that guarantees its resistance so long as its underlying block cipher (AES) remains cryptologically intact. Testing (bulk transfers to localhost using arcfour256) found UMAC-64 to perform 20% better than HMAC-MD5, and 28% faster than HMAC-SHA1. This new MAC may be selected by specifying "MACs=umac-64 at openssh.com" in a server or client config. These changes need testing on as many platforms as possible. In particular we are interested in the following corner cases: - Old OpenSSL version (0.9.5ish) - Testing between big and little endian machines (i386 vs. sparc for example) - Testing between previous OpenSSH versions and -current - Testing on strict alignment architectures like Alpha and Itanium Please report your findings to the mailing list. -d [1] http://fastcrypto.org/umac/
Should we just use a recent snap or is there a patch to apply against 4.6 canonical? Damien Miller wrote:> Hi, > > There has been some recent work to improve the speed of the Message > Authentication Codes (MACs) that are used in OpenSSH. > > The first improvement is a change from Markus Friedl to reuse the MAC > context, rather than reinitialising it for every packet. This saves two > calls to the underlying hash function (e.g. SHA1) for each packet. My > tests found that this yielded at 12-16% speedup for bulk transfers to > localhost using HMAC-MD5 and arcfour256. HMAC-SHA1 should see an even > bigger improvement, because SHA1 is a more expensive hash function. > > The second improvement is Peter Valchev's addition of a new MAC: Ted > Krovetz' UMAC-64[1]. This MAC uses a very different approach than the > HMACs that OpenSSH currently supports, and it comes with a nice security > proof that guarantees its resistance so long as its underlying block > cipher (AES) remains cryptologically intact. Testing (bulk transfers to > localhost using arcfour256) found UMAC-64 to perform 20% better than > HMAC-MD5, and 28% faster than HMAC-SHA1. This new MAC may be selected > by specifying "MACs=umac-64 at openssh.com" in a server or client config. > > These changes need testing on as many platforms as possible. In particular > we are interested in the following corner cases: > > - Old OpenSSL version (0.9.5ish) > - Testing between big and little endian machines (i386 vs. sparc for example) > - Testing between previous OpenSSH versions and -current > - Testing on strict alignment architectures like Alpha and Itanium > > Please report your findings to the mailing list. > > -d > > [1] http://fastcrypto.org/umac/ > _______________________________________________ > openssh-unix-dev mailing list > openssh-unix-dev at mindrot.org > http://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
On Mon, Jun 11, 2007 at 01:01:03PM -0400, Chris Rapier wrote:> Should we just use a recent snap or is there a patch to apply against > 4.6 canonical?please use a recent snapshot. -m
These sound like very interesting improvements. Is the 20% improvement you quote for UMAC-64 relative to 4.6p1 or to a build that has Markus' improvement? On Mon, Jun 11, 2007 at 14:43:33 +1000, Damien Miller wrote:> Hi, > > There has been some recent work to improve the speed of the Message > Authentication Codes (MACs) that are used in OpenSSH. > > The first improvement is a change from Markus Friedl to reuse the MAC > context, rather than reinitialising it for every packet. This saves two > calls to the underlying hash function (e.g. SHA1) for each packet. My > tests found that this yielded at 12-16% speedup for bulk transfers to > localhost using HMAC-MD5 and arcfour256. HMAC-SHA1 should see an even > bigger improvement, because SHA1 is a more expensive hash function. > > The second improvement is Peter Valchev's addition of a new MAC: Ted > Krovetz' UMAC-64[1]. This MAC uses a very different approach than the > HMACs that OpenSSH currently supports, and it comes with a nice security > proof that guarantees its resistance so long as its underlying block > cipher (AES) remains cryptologically intact. Testing (bulk transfers to > localhost using arcfour256) found UMAC-64 to perform 20% better than > HMAC-MD5, and 28% faster than HMAC-SHA1. This new MAC may be selected > by specifying "MACs=umac-64 at openssh.com" in a server or client config. > > These changes need testing on as many platforms as possible. In particular > we are interested in the following corner cases: > > - Old OpenSSL version (0.9.5ish) > - Testing between big and little endian machines (i386 vs. sparc for example) > - Testing between previous OpenSSH versions and -current > - Testing on strict alignment architectures like Alpha and Itanium > > Please report your findings to the mailing list. > > -d > > [1] http://fastcrypto.org/umac/ > _______________________________________________ > openssh-unix-dev mailing list > openssh-unix-dev at mindrot.org > http://lists.mindrot.org/mailman/listinfo/openssh-unix-dev-- Iain Morgan
Environment: OpenSSH 4.6p1 1Gb/s 1.12ms RTT Linux 2.6.18-web100 to Linux 2.6.16-web100 Autotuning enabled. 2.4Ghz Xeon SMP 20 iterations of 2GB scp transfer. Disk to /dev/null scp -caes256-cbc -P 42222 ~/2gb rapier at delta:/dev/null MAC - avg 25.0MB/s (1:22) STD - avg 24.0MB/s (1:25) So its definitely an improvement. If you eliminate the outliers the average of the MAC runs improves to 25.1MB/s. I haven't tried the UMAC yet but when I do I'll post it here. There are some Itaniums and other architectures I can try it against here. I'll try to get to those tomorrow. I've included the raw data below if anyone wants it. Chris ps. I did also try the HPN patch. That got me 25.36MB/s. I'm guessing that might be more due to a change in the scp pipe buffer size I made than the channel buffer tuning. I'll rerun it tomorrow by catting through ssh and see if I can factor that out. STD 2gb 100% 2048MB 23 1:29 24.035 2gb 100% 2048MB 24.4 1:24 1:25 2gb 100% 2048MB 22.8 1:30 2gb 100% 2048MB 22.5 1:31 2gb 100% 2048MB 24.4 1:24 2gb 100% 2048MB 24.4 1:24 2gb 100% 2048MB 24.4 1:24 2gb 100% 2048MB 24.4 1:24 2gb 100% 2048MB 24.4 1:24 2gb 100% 2048MB 24.4 1:24 2gb 100% 2048MB 22.8 1:30 2gb 100% 2048MB 23.3 1:28 2gb 100% 2048MB 24.4 1:24 2gb 100% 2048MB 24.4 1:24 2gb 100% 2048MB 24.7 1:23 2gb 100% 2048MB 24.4 1:24 2gb 100% 2048MB 24.4 1:24 2gb 100% 2048MB 24.4 1:24 2gb 100% 2048MB 24.4 1:24 2gb 100% 2048MB 24.4 1:24 MAC 2gb 100% 2048MB 25.3 1:21 24.9952381 2gb 100% 2048MB 24.7 1:23 1:22 2gb 100% 2048MB 25.3 1:21 2gb 100% 2048MB 25 1:22 2gb 100% 2048MB 25 1:22 2gb 100% 2048MB 25 1:22 2gb 100% 2048MB 25.3 1:21 2gb 100% 2048MB 22.5 1:31 2gb 100% 2048MB 25 1:22 2gb 100% 2048MB 25 1:22 2gb 100% 2048MB 25 1:22 2gb 100% 2048MB 25 1:22 2gb 100% 2048MB 25.3 1:21 2gb 100% 2048MB 25.3 1:21 2gb 100% 2048MB 25 1:22 2gb 100% 2048MB 25.6 1:20 2gb 100% 2048MB 25.3 1:21 2gb 100% 2048MB 25.3 1:21 2gb 100% 2048MB 25 1:22 2gb 100% 2048MB 24.7 1:23 2gb 100% 2048MB 25.3 1:21 HPN 2gb 100% 2048MB 25.6 1:20 25.36 2gb 100% 2048MB 25.6 1:20 1:20 2gb 100% 2048MB 25.6 1:20 2gb 100% 2048MB 23.5 1:27 2gb 100% 2048MB 24.1 1:25 2gb 100% 2048MB 25.9 1:19 2gb 100% 2048MB 25.9 1:19 2gb 100% 2048MB 24.1 1:25 2gb 100% 2048MB 25.6 1:20 2gb 100% 2048MB 25.6 1:20 2gb 100% 2048MB 25.6 1:20 2gb 100% 2048MB 25.3 1:21 2gb 100% 2048MB 25.6 1:20 2gb 100% 2048MB 25.6 1:20 2gb 100% 2048MB 25.6 1:20 2gb 100% 2048MB 25.9 1:19 2gb 100% 2048MB 25.9 1:19 2gb 100% 2048MB 25.3 1:21 2gb 100% 2048MB 25.3 1:21 2gb 100% 2048MB 25.6 1:20
On Mon, 11 Jun 2007, Chris Rapier wrote:> Environment: > OpenSSH 4.6p1 > 1Gb/s 1.12ms RTT > Linux 2.6.18-web100 to Linux 2.6.16-web100 > Autotuning enabled. 2.4Ghz Xeon SMP > 20 iterations of 2GB scp transfer. Disk to /dev/null > scp -caes256-cbc -P 42222 ~/2gb rapier at delta:/dev/null > > MAC - avg 25.0MB/s (1:22) > STD - avg 24.0MB/s (1:25)You will see the difference more clearly if you use a faster cipher, like aes128-cbc, aes128-ctr or arcfour256. -d
On Mon, 11 Jun 2007, Damien Miller wrote:> These changes need testing on as many platforms as possible. In particular > we are interested in the following corner cases: > > - Old OpenSSL version (0.9.5ish) > - Testing between big and little endian machines (i386 vs. sparc for example) > - Testing between previous OpenSSH versions and -current > - Testing on strict alignment architectures like Alpha and ItaniumOne more case: - Interoperability against non-OpenSSH implementations This applies mainly for the MAC reuse change, as no other implementations would support UMAC yet. If other implementors want to support UMAC, there is a specification for how OpenSSH does it at [1] which is awaiting publication (assuming I have the IETF boilerplate de jour correct this time). OpenSSH uses a slightly tweaked version of the UMAC reference implementation[2]. -d [1] http://www.mindrot.org/~djm/internet-drafts/draft-miller-secsh-umac-00.txt [2] http://www.fastcrypto.org/umac/2004/code.html
On Tue, Jun 12, 2007 at 11:21:32AM +1000, Damien Miller wrote:> On Mon, 11 Jun 2007, Damien Miller wrote: > > > These changes need testing on as many platforms as possible. In particular > > we are interested in the following corner cases: > > > > - Old OpenSSL version (0.9.5ish) > > - Testing between big and little endian machines (i386 vs. sparc for example) > > - Testing between previous OpenSSH versions and -current > > - Testing on strict alignment architectures like Alpha and Itanium > > - Interoperability against non-OpenSSH implementationsI built the openssh-SNAP-20070613.tar.gz on a RHEL5 Itanium system and was able to successfully connect to an SSH.com server running on a Tru64 Alpha system (SSH-2.0-3.2.0 on v5.1B of Tru64.) I also was able to use umac-64 at openssh.org when connecting over localhost to my Itanium system. -matt
On Jun 11 14:43, Damien Miller wrote:> Hi, > > There has been some recent work to improve the speed of the Message > Authentication Codes (MACs) that are used in OpenSSH. > > The first improvement is a change from Markus Friedl to reuse the MAC > context, rather than reinitialising it for every packet. This saves two > calls to the underlying hash function (e.g. SHA1) for each packet. My > tests found that this yielded at 12-16% speedup for bulk transfers to > localhost using HMAC-MD5 and arcfour256. HMAC-SHA1 should see an even > bigger improvement, because SHA1 is a more expensive hash function. > > The second improvement is Peter Valchev's addition of a new MAC: Ted > Krovetz' UMAC-64[1]. This MAC uses a very different approach than the > HMACs that OpenSSH currently supports, and it comes with a nice security > proof that guarantees its resistance so long as its underlying block > cipher (AES) remains cryptologically intact. Testing (bulk transfers to > localhost using arcfour256) found UMAC-64 to perform 20% better than > HMAC-MD5, and 28% faster than HMAC-SHA1. This new MAC may be selected > by specifying "MACs=umac-64 at openssh.com" in a server or client config. > > These changes need testing on as many platforms as possible. In particular > we are interested in the following corner cases: > > - Old OpenSSL version (0.9.5ish) > - Testing between big and little endian machines (i386 vs. sparc for example) > - Testing between previous OpenSSH versions and -current > - Testing on strict alignment architectures like Alpha and Itanium > > Please report your findings to the mailing list.Builds and runs fine on Cygwin w/ openssl 0.9.8e. Exchanging data with Cygwin 4.6p1 and Linux 4.5p1 works fine. UMAC works fine between Cygwin machines. I see a 14% speed improvement in a default scp with no further options, relative to 4.6p1. Using umac-64 the speed improvement is 15%. Corinna -- Corinna Vinschen Cygwin Project Co-Leader Red Hat
I just have a quick question about how channels are managed. As far as I can tell all of the channel structs are stored in channels[]. It seems that ssh cycles through all allocated slots in channels[] is to determine which channels are active and which are just place holders. Is this about right or am I missing something important about channels[]? Thanks for your time Chris