Roman Lebedev via llvm-dev
2020-Jul-09 14:41 UTC
[llvm-dev] [RFC] carry-less multiplication instruction
(As per IRC discussion) I understand that the carry-less multiplication algorithm has it's uses since/and it is implemented as an instruction in many architectures and that adding it as a general-purpose intrinsic will allow us to drop target-specific intrinsics as by-product. What i do *NOT* understand is: what is the actual/main goal/driving factor of adding an LLVM intrinsic for it? The use that was mentioned is crypto, and i'm personally not really registering anything else. Am i just misreading it? The crypto use-case doesn't make sense to me, because as of this moment LLVM "explicitly" has zero constant-time guarantees for LLVM IR instructions/intrinsics. I feel like it's a really important question. If there isn't interest in crypto/constant-time here, i think it would be best to explicitly state so. If there is, i think it may be good to hear from Chandler (who IIRC is driving some constant-time work for C++/LLVM, that is not yet widely public) Roman On Wed, Jul 8, 2020 at 7:23 PM Steve (Numerics) Canon via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > FWIW, this seems like a no-brainer to me (as llvm.experimental initially), assuming that it can be designed in such a way that it would eliminate the need for intrinsics on at least two targets (I think it should be possible to do so, with a small amount of back-end work). > > – Steve > > On Jul 5, 2020, at 5:18 AM, Shawn Landden via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Carry-less multiplication[1] instructions exist (at least optionally) on many architectures: armv8, RISC-V, x86_64, POWER, SPARC, C64x, and possibly more. > > This proposal is to add a llvm.clmul instruction. Or if that is contentious, llvm.experimental.bitmanip.clmul instruction. It takes two integer operands of the same width, and returns an integer with twice the width of the operands. (Is there a good reason to make these the same width, as all the other operations do even when it doesn’t really make sense for the mathematical operation–like multiplication or ctpop/ctlz/cttz?) > > If the CPU does not have a dedication clmul operation, it can be lowered to regular multiplication, by using holes to avoid carrys. > > ==Where is clmul used?=> > While somewhat specialized, the RISC-V manual documents many uses: [2] > > The classic applications forclmulare Cyclic Redundancy Check (CRC) [11, 26] > > and Galois/CounterMode (GCM), but more applications exist, including the following examples.There are obvious applications in hashing and pseudo random number generations. For exam-ple, it has been reported that hashes based on carry-less multiplications can outperform Google’sCityHash [17]. > > clmulof a number with itself inserts zeroes between each input bit. This can be useful for generatingMorton code [23]. > > clmulof a number with -1 calculates the prefix XOR operation. This can be useful for decodinggray codes.Another application of XOR prefix sums calculated withclmulis branchless tracking of quotedstrings in high-performance parsers. [16] > > Carry-less multiply can also be used to implement Erasure code efficiently. [14] > > ==clmul lowering without hardware support=> A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplication when there is no specialized instruction (also 15x15=>30, to a 60x60=>120, or if bitreverse is available 16x16=>32 to TWO 64x64=>64 multiplications)[3]. > > [1] https://en.wikipedia.org/wiki/Carry-less_product > [2] (page 30) https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-0.92.pdf > [3] https://www.bearssl.org/constanttime.html > > > > (First posted to discord > > -- > Shawn Landden > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Steve (Numerics) Canon via llvm-dev
2020-Jul-09 15:13 UTC
[llvm-dev] [RFC] carry-less multiplication instruction
CLMUL is absolutely useful outside of “crypto” contexts that want/require “constant time” operation. To name just two families of uses, it’s the backbone of many hash/checksum algorithms and error-correcting codes, where the goal is often simply to go as fast as possible, and uArch side-channel resistance is not a concern. – Steve> On Jul 9, 2020, at 10:41 AM, Roman Lebedev via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > What i do *NOT* understand is: what is the actual/main goal/driving > factor of adding an LLVM intrinsic for it? > > The use that was mentioned is crypto, and i'm personally not really > registering anything else. Am i just misreading it? > The crypto use-case doesn't make sense to me, because > as of this moment LLVM "explicitly" has zero constant-time > guarantees for LLVM IR instructions/intrinsics.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200709/80863647/attachment.html>
Hal Finkel via llvm-dev
2020-Jul-09 15:24 UTC
[llvm-dev] [RFC] carry-less multiplication instruction
On 7/9/20 10:13 AM, Steve (Numerics) Canon via llvm-dev wrote:> CLMUL is absolutely useful outside of “crypto” contexts that > want/require “constant time” operation. > > To name just two families of uses, it’s the backbone of many > hash/checksum algorithms and error-correcting codes, where the goal is > often simply to go as fast as possible, and uArch side-channel > resistance is not a concern. > > – Steve+1 See, e.g., https://lemire.me/blog/2015/10/26/crazily-fast-hashing-with-carry-less-multiplications/ -- and also, https://en.wikipedia.org/wiki/CLMUL_instruction_set, "One use of these instructions is to improve the speed of applications doing block cipher encryption in Galois/Counter Mode, which depends on finite field GF(2^k) multiplication. Another application is the fast calculation of CRC values, including those used to implement the LZ77 sliding window DEFLATE algorithm in zlib and pngcrush." -Hal> >> On Jul 9, 2020, at 10:41 AM, Roman Lebedev via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> >> What i do *NOT* understand is: what is the actual/main goal/driving >> factor of adding an LLVM intrinsic for it? >> >> The use that was mentioned is crypto, and i'm personally not really >> registering anything else. Am i just misreading it? >> The crypto use-case doesn't make sense to me, because >> as of this moment LLVM "explicitly" has zero constant-time >> guarantees for LLVM IR instructions/intrinsics. > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200709/191dcfd1/attachment.html>