tsalim at tutanota.com
2020-May-11 19:06 UTC
Request For Opinion on Adding AEGIS-256 to SSH
Dear OpenSSH Mailing List, Hello! This is my first email to the SSH Mailing List. I have discussed the benefits and disadvantages of ChaCha20-Poly1305 with Frank Denis--the lead developer of LibSodium. Frank admitted that a better option for most users than ChaCha20-Poly1305 would be the new AEGIS-256 cipher. Frank admitted that hardware-accelerated AEGIS-256 would be much faster than even AVX-accelerated ChaCha20-Poly1305. Are the developers of SSH considering to add AEGIS-256 to SSH? Appended below is the original response email Frank Denis sent me.? I thank the SSH Mailing Team for any responses they send back to me. Sincerely, Tanveer Salim Below is the email Frank sent me:-------------------------------------------------------------------------------------------------------------------------------------- Hi Salim, See?https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/ AVX512 may look great on benchmarks, but it is also going to make everything else slow. In addition to that, AVX512 is only present on high-end Xeons. The development code of libsodium supports the AEGIS construction instead. It is about 4 times faster, works on most CPUS including ARMs, and doesn?t cause thermal issues. This makes it a far more compelling option than ChaPoly. BLAKE3 is likely to be added, but there is no rush. Besides being faster, it doesn?t have any advantages over BLAKE2, has a smaller security margin, and is not available in any libraries yet. Also, KangarooTwelve is about as fast, and is on the standards track, so it may get more adoption than BLAKE3.? ---------------------------------------------------------------------------------------------------------------------------------------I have also decided to attach the email Kent Ross sent to me: Throttling concerns around the use wide SIMD are usually regarding slowdowns and stalls for concurrent operations on the same CPU core from onlining the hot and expensive AVX units. Skylake CPUs can downclock 20% or more under sustained AVX512 utilization, and there can also be stalls in the order of tens of microseconds to adjust CPU frequency and power licenses. In mixed workloads it *can* have a measurable negative impact, though the net performance is often a wash or still an improvement. As weird as it sounds, there are some staunch opponents to vectorization in software and at times the downsides I mentioned are overstated. One example of some further reading:?https://arxiv.org/pdf/1901.04982.pdf