Yuriy M. Kaminskiy
2019-Jan-17 07:51 UTC
[patch 1/2] use chacha20 from openssl (1.1.0+) when possible
On some cpu's optimized chacha implementation in openssl (1.1.0+) is notably faster (and on others it is just faster) than generic C implementation in openssh. Sadly, openssl's chacha20-poly1305 (EVP_chacha20_poly1305) uses different scheme (with padding/etc - see rfc8439) and it looks it is not possible to use in openssh. OpenSSL 1.1.1+ also exports "raw" poly1305 primitive, but I have not tried it yet (it was not in 1.1.0). Trivial benchmark: time ssh -c chacha20-poly1305 at openssh.com -S none -o Compression=no \ localhost 'dd if=/dev/zero bs=100000 count=10000' >/dev/null (comparing "user time" only) openssh: 7.9p1, self-compiled, based on upstream package from debian/unstable, hostkey - ecdsa/p256, pubkey auth key - ecdh/p256 Machine: pretty old amd k8 (w/ SSE2, but no SSSE3/AVX/AESNI) OS: linux/debian/stretch, openssl 1.1.0j-1deb9u1 i386: speed: +8% amd64: speed: +10% Machine: raspberry pi 3b+ (BCM2837B0, 4-core Cortex-A53 @1.4GHz) OS: raspbian/stretch baseline: armhf/raspbian: unpatched ssh-7.9p1: 30.8s with openssl 1.1.0j-1deb9u1 from raspbian (compiled for armv6 without neon): armhf/raspbian: 24.7 seconds, speed: +25% with openssl 1.1.0j-1deb9u1 from debian/stretch/armhf (compiled for armv7 with neon autodetection): armhf: 22.2 seconds, speed: +39% Patches against 7.9p1 (tested) and git master (untested, only resolved configure.ac conflict) attached. -------------- next part -------------- A non-text attachment was scrubbed... Name: 7.9p1-0001-use-chacha20-from-openssl-when-possible.patch Type: text/x-diff Size: 7480 bytes Desc: not available URL: <http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20190117/e47f4802/attachment-0002.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: master-0001-use-chacha20-from-openssl-when-possible.patch Type: text/x-diff Size: 7261 bytes Desc: not available URL: <http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20190117/e47f4802/attachment-0003.bin>
Yuriy M. Kaminskiy
2019-Feb-06 12:08 UTC
[patch 2/2] use poly1305 from openssl (1.1.1+) when possible
On 01/16/19 13:21 , Yuriy M. Kaminskiy wrote:> On some cpu's optimized chacha implementation in openssl (1.1.0+) is > notably faster (and on others it is just faster) than generic C > implementation in openssh. > > OpenSSL 1.1.1+ also exports "raw" poly1305 primitive, but I > have not tried it yet (it was not in 1.1.0).And here it is.> Trivial benchmark: > time ssh -c chacha20-poly1305 at openssh.com -S none -o Compression=no \ > localhost 'dd if=/dev/zero bs=100000 count=10000' >/dev/null > (comparing "user time") > > openssh: 7.9p1, self-compiled, based on upstream package from> debian/unstable, hostkey - ecdsa/p256, pubkey auth key - ecdh/p256> > Machine: pretty old amd k8 (SSE2, but no SSSE3/AVX/AESNI) > OS: debian linux stretch, openssl 1.1.0j-1deb9u1 > i386: +8% > amd64: +10% > > Machine: raspberry pi 3b+ (BCM2837B0, 4-core Cortex-A53 @1.4GHz) > OS: raspbian/stretch> baseline: armhf/raspbian: unpatched ssh-7.9p1: 30.8s> with openssl 1.1.0j-1deb9u1 from raspbian (compiled for armv6 without > neon): > armhf/raspbian: 24.7 seconds, speed: +23% > > with openssl 1.1.0j-1deb9u1 from debian/stretch/armhf (compiled for > armv7 with neon autodetection): > armhf: 22.2 seconds, speed: +39%openssh: 7.9p1, self-compiled, based on upstream package from debian/unstable, with both chacha20 and poly1305 patches applied, compiled against: openssl: 1.1.1a, self-compiled, based on upstream package from debian/unstable. armhf: 12.0 seconds, speed: +155% against original, +85% against chacha20-only version. Preliminary patches attached (again, tested against 7.9p1, on the top of chacha20 patch). I relied on presence of EVP_PKEY_POLY1305 for autodetection; it uses openssl-1.1.0 abi, and can runtime-fallback to builtin openssh implementation. -------------- next part -------------- A non-text attachment was scrubbed... Name: openssl-poly1305-3.patch Type: text/x-patch Size: 4522 bytes Desc: not available URL: <http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20190206/24d78bed/attachment.bin>
Damien Miller
2019-Jul-12 05:54 UTC
[patch 1/2] use chacha20 from openssl (1.1.0+) when possible
On Thu, 17 Jan 2019, Yuriy M. Kaminskiy wrote:> On some cpu's optimized chacha implementation in openssl (1.1.0+) is > notably faster (and on others it is just faster) than generic C > implementation in openssh. > > Sadly, openssl's chacha20-poly1305 (EVP_chacha20_poly1305) uses > different scheme (with padding/etc - see rfc8439) and it looks it is not > possible to use in openssh. > > OpenSSL 1.1.1+ also exports "raw" poly1305 primitive, but I > have not tried it yet (it was not in 1.1.0). > > Trivial benchmark: > time ssh -c chacha20-poly1305 at openssh.com -S none -o Compression=no \ > localhost 'dd if=/dev/zero bs=100000 count=10000' >/dev/null > (comparing "user time" only) > > openssh: 7.9p1, self-compiled, based on upstream package from debian/unstable, > hostkey - ecdsa/p256, pubkey auth key - ecdh/p256 > > Machine: pretty old amd k8 (w/ SSE2, but no SSSE3/AVX/AESNI) > OS: linux/debian/stretch, openssl 1.1.0j-1deb9u1 > i386: speed: +8% > amd64: speed: +10% > > Machine: raspberry pi 3b+ (BCM2837B0, 4-core Cortex-A53 @1.4GHz) > OS: raspbian/stretch > > baseline: armhf/raspbian: unpatched ssh-7.9p1: 30.8s > > with openssl 1.1.0j-1deb9u1 from raspbian (compiled for armv6 without neon): > > armhf/raspbian: 24.7 seconds, speed: +25% > > with openssl 1.1.0j-1deb9u1 from debian/stretch/armhf (compiled for > armv7 with neon autodetection): > armhf: 22.2 seconds, speed: +39% > > Patches against 7.9p1 (tested) and git master (untested, only resolved > configure.ac conflict) attached.Thanks for this - it seems to work okay with OpenSSL when patched to -current, but when I adapt it for OpenBSD/LibreSSL the encryption is broken and the connection fails right after KEX. I expect that there is some difference between OpenSSL and LibreSSL wrt IV lengths or something. OpenSSH does need to support both, so this will take a little figuring out. One comment on the patch itself: it passes do_encrypt though in a bunch of places and I'm not sure the usage is correct in all of them. In fact I don't think it can even be made consistent for decryption, as the ctx->main_evp has to be used in encryption mode (not decryption) to generate the poly1305 key. Given this is a stream cipher and there is AFAIK no difference between encryption and decryption, I think it would be better just fix do_encrypt to 1 to avoid inconsistency. -d
Jakub Jelen
2020-Jan-16 10:27 UTC
[patch 1/2] use chacha20 from openssl (1.1.0+) when possible
On Fri, 2019-07-12 at 15:54 +1000, Damien Miller wrote:> On Thu, 17 Jan 2019, Yuriy M. Kaminskiy wrote: > > > On some cpu's optimized chacha implementation in openssl (1.1.0+) > > is > > notably faster (and on others it is just faster) than generic C > > implementation in openssh. > > > > Sadly, openssl's chacha20-poly1305 (EVP_chacha20_poly1305) uses > > different scheme (with padding/etc - see rfc8439) and it looks it > > is not > > possible to use in openssh. > > > > OpenSSL 1.1.1+ also exports "raw" poly1305 primitive, but I > > have not tried it yet (it was not in 1.1.0). > > > > Trivial benchmark: > > time ssh -c chacha20-poly1305 at openssh.com -S none -o Compression=no > > \ > > localhost 'dd if=/dev/zero bs=100000 count=10000' >/dev/null > > (comparing "user time" only) > > > > openssh: 7.9p1, self-compiled, based on upstream package from > > debian/unstable, > > hostkey - ecdsa/p256, pubkey auth key - ecdh/p256 > > > > Machine: pretty old amd k8 (w/ SSE2, but no SSSE3/AVX/AESNI) > > OS: linux/debian/stretch, openssl 1.1.0j-1deb9u1 > > i386: speed: +8% > > amd64: speed: +10% > > > > Machine: raspberry pi 3b+ (BCM2837B0, 4-core Cortex-A53 @1.4GHz) > > OS: raspbian/stretch > > > > baseline: armhf/raspbian: unpatched ssh-7.9p1: 30.8s > > > > with openssl 1.1.0j-1deb9u1 from raspbian (compiled for armv6 > > without neon): > > > > armhf/raspbian: 24.7 seconds, speed: +25% > > > > with openssl 1.1.0j-1deb9u1 from debian/stretch/armhf (compiled for > > armv7 with neon autodetection): > > armhf: 22.2 seconds, speed: +39% > > > > Patches against 7.9p1 (tested) and git master (untested, only > > resolved > > configure.ac conflict) attached. > > Thanks for this - it seems to work okay with OpenSSL when patched to > -current, but when I adapt it for OpenBSD/LibreSSL the encryption is > broken and the connection fails right after KEX. > > I expect that there is some difference between OpenSSL and LibreSSL > wrt > IV lengths or something. OpenSSH does need to support both, so this > will > take a little figuring out. > > One comment on the patch itself: it passes do_encrypt though in a > bunch > of places and I'm not sure the usage is correct in all of them. In > fact > I don't think it can even be made consistent for decryption, as the > ctx->main_evp has to be used in encryption mode (not decryption) to > generate the poly1305 key. > > Given this is a stream cipher and there is AFAIK no difference > between > encryption and decryption, I think it would be better just fix > do_encrypt > to 1 to avoid inconsistency.Hi Damien, do you have any update on this? Indeed, it looks like LibreSSL has the IV of 96 b [1], while OpenSSL uses 128 bits (including the 32b counter) [2]. Otherwise, I did not notice any differences. I have really no experience with OpenBSD so I do not have simple way to test my changes, but I believe something like this should address the difference: diff --git a/cipher-chachapoly.c b/cipher-chachapoly.c index a58616fb..7e6995f6 100644 --- a/cipher-chachapoly.c +++ b/cipher-chachapoly.c @@ -109,7 +109,14 @@ chachapoly_crypt(struct chachapoly_ctx *ctx, u_int seqnr, u_char *dest, const u_char *src, u_int len, u_int aadlen, u_int authlen, int do_encrypt) { #if defined(WITH_OPENSSL) && defined(HAVE_EVP_CHACHA20) +#if defined(LIBRESSL_VERSION_NUMBER) +#define CHACHA_IV_OFFSET 4 + u_char seqbuf[12]; +#else +#define CHACHA_IV_OFFSET 8 + /* OpenSSL IV contains also the counter in the first 4 bytes */ u_char seqbuf[16]; +#endif int r = SSH_ERR_LIBCRYPTO_ERROR; #else u_char seqbuf[8]; @@ -125,7 +132,7 @@ chachapoly_crypt(struct chachapoly_ctx *ctx, u_int seqnr, u_char *dest, memset(poly_key, 0, sizeof(poly_key)); #if defined(WITH_OPENSSL) && defined(HAVE_EVP_CHACHA20) memset(seqbuf + 0, 0, 8); - POKE_U64(seqbuf + 8, seqnr); + POKE_U64(seqbuf + CHACHA_IV_OFFSET, seqnr); if (!EVP_CipherInit(ctx->main_evp, NULL, NULL, seqbuf, do_encrypt)) goto out; if (EVP_Cipher(ctx->main_evp, poly_key, (u_char *)poly_key, sizeof(poly_key)) < 0) For the do_encrypt, you are right. Chacha20 is stream cipher so there is no difference between decryption and encryption but the EVP API requires this argument. For consistency, I would be for using 1 in all the cases. If you have some wip branch you used for porting to openbsd or something I can test, I guess I can try that. [1] https://man.openbsd.org/man3/EVP_EncryptInit.3 [2] https://www.openssl.org/docs/man1.1.1/man3/EVP_chacha20_poly1305.html Regards, -- Jakub Jelen Senior Software Engineer Security Technologies Red Hat, Inc.