thr3ads.net - llvm dev - [llvm-dev] [RFC] carry-less multiplication instruction [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Roman Lebedev via llvm-dev

2020-Jul-09 14:41 UTC

[llvm-dev] [RFC] carry-less multiplication instruction

(As per IRC discussion)

I understand that the carry-less multiplication algorithm has it's uses
since/and it is implemented as an instruction in many architectures
and that adding it as a general-purpose intrinsic will allow us
to drop target-specific intrinsics as by-product.

What i do *NOT* understand is: what is the actual/main goal/driving
factor of adding an LLVM intrinsic for it?

The use that was mentioned is crypto, and i'm personally not really
registering anything else. Am i just misreading it?
The crypto use-case doesn't make sense to me, because
as of this moment LLVM "explicitly" has zero constant-time
guarantees for LLVM IR instructions/intrinsics.

I feel like it's a really important question.
If there isn't interest in crypto/constant-time here, i think it would
be best to explicitly state so. If there is, i think it may be good
to hear from Chandler (who IIRC is driving some constant-time
work for C++/LLVM, that is not yet widely public)

Roman

On Wed, Jul 8, 2020 at 7:23 PM Steve (Numerics) Canon via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> FWIW, this seems like a no-brainer to me (as llvm.experimental initially),
assuming that it can be designed in such a way that it would eliminate the need
for intrinsics on at least two targets (I think it should be possible to do so,
with a small amount of back-end work).
>
> – Steve
>
> On Jul 5, 2020, at 5:18 AM, Shawn Landden via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>
> Carry-less multiplication[1] instructions exist (at least optionally) on
many architectures: armv8, RISC-V, x86_64, POWER, SPARC, C64x, and possibly
more.
>
> This proposal is to add a llvm.clmul instruction. Or if that is
contentious, llvm.experimental.bitmanip.clmul instruction. It takes two integer
operands of the same width, and returns an integer with twice the width of the
operands. (Is there a good reason to make these the same width, as all the other
operations do even when it doesn’t really make sense for the mathematical
operation–like multiplication or ctpop/ctlz/cttz?)
>
> If the CPU does not have a dedication clmul operation, it can be lowered to
regular multiplication, by using holes to avoid carrys.
>
> ==Where is clmul used?=>
> While somewhat specialized, the RISC-V manual documents many uses: [2]
>
> The classic applications forclmulare Cyclic Redundancy Check (CRC) [11, 26]
>
> and Galois/CounterMode (GCM), but more applications exist, including the
following examples.There are obvious applications in hashing and pseudo random
number generations. For exam-ple, it has been reported that hashes based on
carry-less multiplications can outperform Google’sCityHash [17].
>
> clmulof a number with itself inserts zeroes between each input bit. This
can be useful for generatingMorton code [23].
>
> clmulof a number with -1 calculates the prefix XOR operation. This can be
useful for decodinggray codes.Another application of XOR prefix sums calculated
withclmulis branchless tracking of quotedstrings in high-performance parsers.
[16]
>
> Carry-less multiply can also be used to implement Erasure code efficiently.
[14]
>
> ==clmul lowering without hardware support=> A 8x8=>16 clmul can also
be lowered to a 32x32=>64 multiplication when there is no specialized
instruction (also 15x15=>30, to a 60x60=>120, or if bitreverse is
available 16x16=>32 to TWO 64x64=>64 multiplications)[3].
>
> [1] https://en.wikipedia.org/wiki/Carry-less_product
> [2] (page 30)
https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-0.92.pdf
> [3] https://www.bearssl.org/constanttime.html
>
>
>
> (First posted to discord
>
> --
> Shawn Landden
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Steve (Numerics) Canon via llvm-dev

2020-Jul-09 15:13 UTC

head link

[llvm-dev] [RFC] carry-less multiplication instruction

CLMUL is absolutely useful outside of “crypto” contexts that want/require
“constant time” operation.

To name just two families of uses, it’s the backbone of many hash/checksum
algorithms and error-correcting codes, where the goal is often simply to go as
fast as possible, and uArch side-channel resistance is not a concern.

– Steve
> On Jul 9, 2020, at 10:41 AM, Roman Lebedev via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> What i do *NOT* understand is: what is the actual/main goal/driving
> factor of adding an LLVM intrinsic for it?
> 
> The use that was mentioned is crypto, and i'm personally not really
> registering anything else. Am i just misreading it?
> The crypto use-case doesn't make sense to me, because
> as of this moment LLVM "explicitly" has zero constant-time
> guarantees for LLVM IR instructions/intrinsics.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200709/80863647/attachment.html>

Hal Finkel via llvm-dev

2020-Jul-09 15:24 UTC

head link

[llvm-dev] [RFC] carry-less multiplication instruction

On 7/9/20 10:13 AM, Steve (Numerics) Canon via llvm-dev
wrote:> CLMUL is absolutely useful outside of “crypto” contexts that 
> want/require “constant time” operation.
>
> To name just two families of uses, it’s the backbone of many 
> hash/checksum algorithms and error-correcting codes, where the goal is 
> often simply to go as fast as possible, and uArch side-channel 
> resistance is not a concern.
>
> – Steve

+1

See, e.g., 
https://lemire.me/blog/2015/10/26/crazily-fast-hashing-with-carry-less-multiplications/
-- and also, https://en.wikipedia.org/wiki/CLMUL_instruction_set, "One 
use of these instructions is to improve the speed of applications doing 
block cipher encryption in Galois/Counter Mode, which depends on finite 
field GF(2^k) multiplication. Another application is the fast 
calculation of CRC values, including those used to implement the LZ77 
sliding window DEFLATE algorithm in zlib and pngcrush."

  -Hal

>
>> On Jul 9, 2020, at 10:41 AM, Roman Lebedev via llvm-dev 
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>>
>> What i do *NOT* understand is: what is the actual/main goal/driving
>> factor of adding an LLVM intrinsic for it?
>>
>> The use that was mentioned is crypto, and i'm personally not really
>> registering anything else. Am i just misreading it?
>> The crypto use-case doesn't make sense to me, because
>> as of this moment LLVM "explicitly" has zero constant-time
>> guarantees for LLVM IR instructions/intrinsics.
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200709/191dcfd1/attachment.html>

llvm dev - Jul 2020 - [RFC] carry-less multiplication instruction

[llvm-dev] [RFC] carry-less multiplication instruction

[llvm-dev] [RFC] carry-less multiplication instruction

[llvm-dev] [RFC] carry-less multiplication instruction