Hello Dirk ! As I got a new machine with the AES-NI crypto extensions, I'm getting interested with it and as you may have seen, I've already merged into stable/9 two changesets for AES-NI support in GELI & cryptodev. Now, I'm trying to measure the impact of said AES extentions, I tumbled on a very weird difference in behaviour between our base system openssl and the one in ports. /usr/bin/openssl: OpenSSL 0.9.8y 5 Feb 2013 /usr/local/bin/openssl: OpenSSL 1.0.1e 11 Feb 2013 The one is base is not supposed to have cryptodev (and aesni) support at all as it was added apparently in 1.0.1. Fine. 1. Trying to run both on a machine without the AES-NI extensions, I should have similar results in running speed tests but: 1181 [17:18] roberto at centre:/usr/ports> /usr/bin/openssl speed aes-256-cbc ... OpenSSL 0.9.8y 5 Feb 2013 (9.1-BETA1) built on: date not available options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: cc available timing options: USE_TOD HZ=128 [sysconf value] timing function used: getrusage The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256 cbc 58919.92k 62134.88k 62611.08k 62776.47k 62910.03k and 1182 [17:19] roberto at centre:/usr/ports> /usr/local/bin/openssl speed aes-256-cbc ... OpenSSL 1.0.1e 11 Feb 2013 built on: Sun Jul 28 16:36:48 CEST 2013 options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: cc -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -pthread -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -DL_ENDIAN -DTERMIOS -O3 -DMD32_REG_T=int -Wall -O -pipe -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256 cbc 38790.95k 41415.66k 42009.00k 42257.07k 42213.38k Wow, how would you explain the 37% (in the worng direction!) difference? Is there something I could add/change in the port's configuration to fix that? 2. I have another machine with the AES-NI extensions, with a E3-1220 CPU. If I load crypto, aesni and cryptodev, it is indentified as using them: cryptosoft0: <software crypto> on motherboard aesni0: <AES-CBC,AES-XTS> on motherboard Results of openssl speed with the base one are better as you would expect, CPU is faster: % /usr/bin/openssl speed aes-256-cbc ... OpenSSL 0.9.8x 10 May 2012 (9.1-RELEASE) built on: date not available options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) compiler: cc available timing options: USE_TOD HZ=128 [sysconf value] timing function used: getrusage The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256 cbc 125404.07k 129849.19k 130514.37k 131242.71k 131164.72k but... % /usr/local/bin/openssl speed -engine cryptodev aes-256-cbc engine "cryptodev" set. ... OpenSSL 1.0.1c 10 May 2012 built on: Mon Apr 8 19:45:18 UTC 2013 options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: cc -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -pthread -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -DL_ENDIAN -DTERMIOS -O3 -DMD32_REG_T=int -Wall -O2 -pipe -fno-strict-aliasing -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256 cbc 71203.16k 74667.39k 75631.27k 75975.34k 76090.03k Still 42% diff and no "aesni" usage at all!? I'm guessing we have an issue there... Thanks, -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto at keltia.net In memoriam to Ondine, our 2nd child: http://ondine.keltia.net/
Ollivier Robert wrote this message on Tue, Aug 27, 2013 at 17:32 +0200:> As I got a new machine with the AES-NI crypto extensions, I'm getting interested with it and as you may have seen, I've already merged into stable/9 two changesets for AES-NI support in GELI & cryptodev. > > Now, I'm trying to measure the impact of said AES extentions, I tumbled on a very weird difference in behaviour between our base system openssl and the one in ports. > > /usr/bin/openssl: > OpenSSL 0.9.8y 5 Feb 2013 > > /usr/local/bin/openssl: > OpenSSL 1.0.1e 11 Feb 2013 > > The one is base is not supposed to have cryptodev (and aesni) support at all as it was added apparently in 1.0.1. Fine. > > 1. Trying to run both on a machine without the AES-NI extensions, I should have similar results in running speed tests but: > > 1181 [17:18] roberto at centre:/usr/ports> /usr/bin/openssl speed aes-256-cbc > ... > OpenSSL 0.9.8y 5 Feb 2013 (9.1-BETA1) > built on: date not available > options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) > compiler: cc > available timing options: USE_TOD HZ=128 [sysconf value] > timing function used: getrusage > The 'numbers' are in 1000s of bytes per second processed. > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > aes-256 cbc 58919.92k 62134.88k 62611.08k 62776.47k 62910.03k > > and > > 1182 [17:19] roberto at centre:/usr/ports> /usr/local/bin/openssl speed aes-256-cbc > ... > OpenSSL 1.0.1e 11 Feb 2013 > built on: Sun Jul 28 16:36:48 CEST 2013 > options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) > compiler: cc -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -pthread -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -DL_ENDIAN -DTERMIOS -O3 -DMD32_REG_T=int -Wall -O -pipe -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM > The 'numbers' are in 1000s of bytes per second processed. > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > aes-256 cbc 38790.95k 41415.66k 42009.00k 42257.07k 42213.38k > > Wow, how would you explain the 37% (in the worng direction!) difference? Is there something I could add/change in the port's configuration to fix that? > > 2. I have another machine with the AES-NI extensions, with a E3-1220 CPU. If I load crypto, aesni and cryptodev, it is indentified as using them: > > cryptosoft0: <software crypto> on motherboard > aesni0: <AES-CBC,AES-XTS> on motherboard > > Results of openssl speed with the base one are better as you would expect, CPU is faster: > > % /usr/bin/openssl speed aes-256-cbc > ... > OpenSSL 0.9.8x 10 May 2012 (9.1-RELEASE) > built on: date not available > options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) > compiler: cc > available timing options: USE_TOD HZ=128 [sysconf value] > timing function used: getrusage > The 'numbers' are in 1000s of bytes per second processed. > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > aes-256 cbc 125404.07k 129849.19k 130514.37k 131242.71k 131164.72k > > but... > > % /usr/local/bin/openssl speed -engine cryptodev aes-256-cbc > engine "cryptodev" set. > ... > OpenSSL 1.0.1c 10 May 2012 > built on: Mon Apr 8 19:45:18 UTC 2013 > options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) > compiler: cc -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -pthread -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -DL_ENDIAN -DTERMIOS -O3 -DMD32_REG_T=int -Wall -O2 -pipe -fno-strict-aliasing -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM > The 'numbers' are in 1000s of bytes per second processed. > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > aes-256 cbc 71203.16k 74667.39k 75631.27k 75975.34k 76090.03k > > Still 42% diff and no "aesni" usage at all!? > > I'm guessing we have an issue there...I discovered a similar issue on HEAD w/ 1.0.1e where openssl speed -engine aes-256-cbc when ktraced would not issue any ioctl's during the speed test... You can see that it opens the device, but then it gets a number of failures: 11466 openssl CALL ioctl(0x4,CIOCGSESSION,0x7fffffffd590) 11466 openssl RET ioctl -1 errno 22 Invalid argument and then you see no more ioctl's As far as I can tell, 1.0.1e doesn't properly detect AES-NI and uses these instructions when present, and cryptodev usage doesn't work, and doesn't warn when it fails... My own program that tests cryptodev out performs openssl because of this.. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Ollivier Robert wrote this message on Tue, Aug 27, 2013 at 17:32 +0200:> As I got a new machine with the AES-NI crypto extensions, I'm getting interested with it and as you may have seen, I've already merged into stable/9 two changesets for AES-NI support in GELI & cryptodev. > > Now, I'm trying to measure the impact of said AES extentions, I tumbled on a very weird difference in behaviour between our base system openssl and the one in ports. > > /usr/bin/openssl: > OpenSSL 0.9.8y 5 Feb 2013 > > /usr/local/bin/openssl: > OpenSSL 1.0.1e 11 Feb 2013 > > The one is base is not supposed to have cryptodev (and aesni) support at all as it was added apparently in 1.0.1. Fine. > > 1. Trying to run both on a machine without the AES-NI extensions, I should have similar results in running speed tests but: > > 1181 [17:18] roberto at centre:/usr/ports> /usr/bin/openssl speed aes-256-cbcThis is not a very good way to run the tests... It turns out that if you run it this way, it will NOT go through the EVP OpenSSL system, and instead call a slow AES function directly, which will not be the one that gets used in real applications... To use the EVP system, and see what performance OpenSSH and others will get, use openssl speed -evp aes-256-cbc Now if you have cryptodev+aesni loaded and my aesni patches applied, you'll see something like this: $ openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 928863 aes-256-cbc's in 0.20s Doing aes-256-cbc for 3s on 64 size blocks: 880075 aes-256-cbc's in 0.28s Doing aes-256-cbc for 3s on 256 size blocks: 775018 aes-256-cbc's in 0.20s Doing aes-256-cbc for 3s on 1024 size blocks: 490425 aes-256-cbc's in 0.09s Doing aes-256-cbc for 3s on 8192 size blocks: 102189 aes-256-cbc's in 0.05s OpenSSL 1.0.1e-freebsd 11 Feb 2013 built on: Tue Aug 27 11:52:46 PDT 2013 options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: cc -O The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 76092.46k 200265.96k 1015831.59k 5843725.96k 17858822.14k Man, 17GB/sec! Impressive! Except not... notice the times above.. it only took .05s to do it.. That's because by default, openssl speed only computes user time, not real time and since it's now properly using cryptodev, not much cpu time is spent in the process... The undocumented -elapsed option to the rescue: $ openssl speed -elapsed -evp aes-256-cbc You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 962245 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 64 size blocks: 918001 aes-256-cbc's in 3.02s Doing aes-256-cbc for 3s on 256 size blocks: 799186 aes-256-cbc's in 3.05s Doing aes-256-cbc for 3s on 1024 size blocks: 496954 aes-256-cbc's in 3.02s Doing aes-256-cbc for 3s on 8192 size blocks: 102473 aes-256-cbc's in 3.00s OpenSSL 1.0.1e-freebsd 11 Feb 2013 built on: Tue Aug 27 11:52:46 PDT 2013 options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: cc -O The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 5118.64k 19432.21k 67148.02k 168312.03k 279819.61k This gives a bit more resonable results... And if you specify -decrypt also, you'll see it's faster because w/ my aesni patches, it pipelines cbc decrypt: $ openssl speed -elapsed -decrypt -evp aes-256-cbc You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 941128 aes-256-cbc's in 3.02s Doing aes-256-cbc for 3s on 64 size blocks: 950875 aes-256-cbc's in 3.03s Doing aes-256-cbc for 3s on 256 size blocks: 922503 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 750362 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 208038 aes-256-cbc's in 3.03s OpenSSL 1.0.1e-freebsd 11 Feb 2013 built on: Tue Aug 27 11:52:46 PDT 2013 options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: cc -O The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 4993.34k 20076.21k 78720.26k 256123.56k 562225.91k So, the good news is that cryptodev does properly work for OpenSSL... And even better news (if you unload cryptodev), OpenSSL does work with AESNI: $ openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 56809161 aes-256-cbc's in 3.02s Doing aes-256-cbc for 3s on 64 size blocks: 15626726 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 256 size blocks: 4289939 aes-256-cbc's in 3.04s Doing aes-256-cbc for 3s on 1024 size blocks: 1090334 aes-256-cbc's in 3.02s Doing aes-256-cbc for 3s on 8192 size blocks: 136649 aes-256-cbc's in 2.99s OpenSSL 1.0.1e-freebsd 11 Feb 2013 built on: Tue Aug 27 11:52:46 PDT 2013 options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: cc -O The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 300633.49k 332504.26k 361369.46k 370239.01k 374117.13k and decrypt is even faster: $ openssl speed -decrypt -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 61585473 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size blocks: 60938441 aes-256-cbc's in 3.02s Doing aes-256-cbc for 3s on 256 size blocks: 34632152 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 1024 size blocks: 9779213 aes-256-cbc's in 3.03s Doing aes-256-cbc for 3s on 8192 size blocks: 1320658 aes-256-cbc's in 3.05s OpenSSL 1.0.1e-freebsd 11 Feb 2013 built on: Tue Aug 27 11:52:46 PDT 2013 options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: cc -O The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 328455.86k 1289942.40k 2947600.93k 3303559.29k 3550795.60k Just for people wondering what CPU: CPU: AMD A10-5700 APU with Radeon(tm) HD Graphics (3393.89-MHz K8-class CPU) I guess now we need to figure out how to teach OpenSSL to use AES-NI natively even when /dev/crypto is available... but at least we did solve the (non-)issue of bad OpenSSL performance... I will submit a patch to OpenSSL to not make the documentation of the -elapsed option dependent on defines... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."