I?ve got an Atom C2758 system: CPU: Intel(R) Atom(TM) CPU C2758 @ 2.40GHz (2400.06-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x406d8 Family = 0x6 Model = 0x4d Stepping = 8 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x43d8e3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,AESNI,RDRAND> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x101<LAHF,Prefetch> Standard Extended Features=0x2282<TSCADJ,SMEP,ENHMOVSB> Enabling aesni seems to make performance much worse: root at router:~ # openssl speed -evp aes-256-cbc -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 33200486 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 64 size blocks: 11444626 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 256 size blocks: 3328753 aes-256-cbc's in 3.02s Doing aes-256-cbc for 3s on 1024 size blocks: 866523 aes-256-cbc's in 3.02s Doing aes-256-cbc for 3s on 8192 size blocks: 108891 aes-256-cbc's in 3.00s OpenSSL 1.0.1e-freebsd 11 Feb 2013 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: cc The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 176609.34k 243517.86k 281851.62k 293480.37k 297345.02k root at router:~ # kldload aesni root at router:~ # openssl speed -evp aes-256-cbc -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 881020 aes-256-cbc's in 3.02s Doing aes-256-cbc for 3s on 64 size blocks: 842078 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 700368 aes-256-cbc's in 3.03s Doing aes-256-cbc for 3s on 1024 size blocks: 425602 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 76495 aes-256-cbc's in 3.00s OpenSSL 1.0.1e-freebsd 11 Feb 2013 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: cc The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 4662.35k 17964.33k 59148.60k 145272.15k 208882.35k Is this expected here, or is something broken? ? Kevin
Can you provide the output of freebsd-version, and openssl version? It looks like you're using a very old version of OpenSSL. Here's my output as an example: % freebsd-version 10.1-RELEASE-p10 % openssl version OpenSSL 1.0.1l-freebsd 15 Jan 2015 % /usr/local/bin/openssl version OpenSSL 1.0.2a 19 Mar 2015 On Sun, May 24, 2015 at 12:22 PM, Kevin Day <toasty at dragondata.com> wrote:> > I?ve got an Atom C2758 system: > > CPU: Intel(R) Atom(TM) CPU C2758 @ 2.40GHz (2400.06-MHz K8-class CPU) > Origin = "GenuineIntel" Id = 0x406d8 Family = 0x6 Model = 0x4d Stepping = 8 > Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > Features2=0x43d8e3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,AESNI,RDRAND> > AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> > AMD Features2=0x101<LAHF,Prefetch> > Standard Extended Features=0x2282<TSCADJ,SMEP,ENHMOVSB> > > Enabling aesni seems to make performance much worse: > > root at router:~ # openssl speed -evp aes-256-cbc -elapsed > You have chosen to measure elapsed time instead of user CPU time. > Doing aes-256-cbc for 3s on 16 size blocks: 33200486 aes-256-cbc's in 3.01s > Doing aes-256-cbc for 3s on 64 size blocks: 11444626 aes-256-cbc's in 3.01s > Doing aes-256-cbc for 3s on 256 size blocks: 3328753 aes-256-cbc's in 3.02s > Doing aes-256-cbc for 3s on 1024 size blocks: 866523 aes-256-cbc's in 3.02s > Doing aes-256-cbc for 3s on 8192 size blocks: 108891 aes-256-cbc's in 3.00s > OpenSSL 1.0.1e-freebsd 11 Feb 2013 > built on: date not available > options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) > compiler: cc > The 'numbers' are in 1000s of bytes per second processed. > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > aes-256-cbc 176609.34k 243517.86k 281851.62k 293480.37k 297345.02k > > > root at router:~ # kldload aesni > root at router:~ # openssl speed -evp aes-256-cbc -elapsed > You have chosen to measure elapsed time instead of user CPU time. > Doing aes-256-cbc for 3s on 16 size blocks: 881020 aes-256-cbc's in 3.02s > Doing aes-256-cbc for 3s on 64 size blocks: 842078 aes-256-cbc's in 3.00s > Doing aes-256-cbc for 3s on 256 size blocks: 700368 aes-256-cbc's in 3.03s > Doing aes-256-cbc for 3s on 1024 size blocks: 425602 aes-256-cbc's in 3.00s > Doing aes-256-cbc for 3s on 8192 size blocks: 76495 aes-256-cbc's in 3.00s > OpenSSL 1.0.1e-freebsd 11 Feb 2013 > built on: date not available > options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) > compiler: cc > The 'numbers' are in 1000s of bytes per second processed. > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > aes-256-cbc 4662.35k 17964.33k 59148.60k 145272.15k 208882.35k > > > Is this expected here, or is something broken? > > ? Kevin > > _______________________________________________ > freebsd-security at freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-security > To unsubscribe, send any mail to "freebsd-security-unsubscribe at freebsd.org"
Christoph Moench-Tegeder
2015-May-24 20:47 UTC
Atom C2758 - loading aesni(4) reduces performance
## Kevin Day (toasty at dragondata.com):> Is this expected here, or is something broken?I'd expect there's something wrong (I don't have access to an AES-NI capable Atom, but on my i7 there's no such impact). The performance numbers for the "openssl speed" suite show heavy fluctutation even under light load - was this a one-shot test or is this reproducable on a "unloaded" (yes, I know, system stuff...) system? Can you run multiple tests in each configuration and check average, median and standard deviation? (just to make sure this is significant). Anyways, openssl does not use crypto(4) by default (and therefore cannot use aesni(4)). openssl detects the cpu features by itself and uses the AES-NI instruction set if available - unless told otherwise (see OPENSSL_ia32cap(3)). To make the long manual short - you can force openssl not to use AES-NI by setting the environment OPENSSL_ia32cap="~0x0200000000000000". From my tests I estimate (I did only a few tests) that this option alone cuts aes-256-cbc by 50 to 60%. Loading (or not) aesni(4) has no obvious effect on the numbers in both cases (variations are in the order of the usual noise). Regards, Christoph -- Spare Space
Kevin Day wrote this message on Sun, May 24, 2015 at 11:22 -0500:> I???ve got an Atom C2758 system: > > CPU: Intel(R) Atom(TM) CPU C2758 @ 2.40GHz (2400.06-MHz K8-class CPU) > Origin = "GenuineIntel" Id = 0x406d8 Family = 0x6 Model = 0x4d Stepping = 8 > Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > Features2=0x43d8e3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,AESNI,RDRAND> > AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> > AMD Features2=0x101<LAHF,Prefetch> > Standard Extended Features=0x2282<TSCADJ,SMEP,ENHMOVSB> > > Enabling aesni seems to make performance much worse: > > root at router:~ # openssl speed -evp aes-256-cbc -elapsed > You have chosen to measure elapsed time instead of user CPU time. > Doing aes-256-cbc for 3s on 16 size blocks: 33200486 aes-256-cbc's in 3.01s > Doing aes-256-cbc for 3s on 64 size blocks: 11444626 aes-256-cbc's in 3.01s > Doing aes-256-cbc for 3s on 256 size blocks: 3328753 aes-256-cbc's in 3.02s > Doing aes-256-cbc for 3s on 1024 size blocks: 866523 aes-256-cbc's in 3.02s > Doing aes-256-cbc for 3s on 8192 size blocks: 108891 aes-256-cbc's in 3.00s > OpenSSL 1.0.1e-freebsd 11 Feb 2013 > built on: date not available > options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) > compiler: cc > The 'numbers' are in 1000s of bytes per second processed. > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > aes-256-cbc 176609.34k 243517.86k 281851.62k 293480.37k 297345.02k > > > root at router:~ # kldload aesni > root at router:~ # openssl speed -evp aes-256-cbc -elapsed > You have chosen to measure elapsed time instead of user CPU time. > Doing aes-256-cbc for 3s on 16 size blocks: 881020 aes-256-cbc's in 3.02s > Doing aes-256-cbc for 3s on 64 size blocks: 842078 aes-256-cbc's in 3.00s > Doing aes-256-cbc for 3s on 256 size blocks: 700368 aes-256-cbc's in 3.03s > Doing aes-256-cbc for 3s on 1024 size blocks: 425602 aes-256-cbc's in 3.00s > Doing aes-256-cbc for 3s on 8192 size blocks: 76495 aes-256-cbc's in 3.00s > OpenSSL 1.0.1e-freebsd 11 Feb 2013 > built on: date not available > options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) > compiler: cc > The 'numbers' are in 1000s of bytes per second processed. > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > aes-256-cbc 4662.35k 17964.33k 59148.60k 145272.15k 208882.35k > > > Is this expected here, or is something broken?If you have cryptodev loaded, this is to be expected as OpenSSL will use /dev/crypto instead of the AES-NI instructions.. Just don't load cryptodev and you'll be fine.. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."