thr3ads.net - llvm dev - [llvm-dev] multiprecision add/sub [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Bagel via llvm-dev

2017-Feb-15 22:22 UTC

[llvm-dev] multiprecision add/sub

I suggest that LLVM needs intrinsics for add/sub with carry, e.g.

  declare {T, i1} @llvm.addc.T(T %a, T %b, i1 c)

The current multiprecision clang intrinsics example:
  void foo(unsigned *x, unsigned *y, unsigned *z)
  { unsigned carryin = 0;
    unsigned carryout;
    z[0] = __builtin_addc(x[0], y[0], carryin, &carryout);
    carryin = carryout;
    z[1] = __builtin_addc(x[1], y[1], carryin, &carryout);
    carryin = carryout;
    z[2] = __builtin_addc(x[2], y[2], carryin, &carryout);
    carryin = carryout;
    z[3] = __builtin_addc(x[3], y[3], carryin, &carryout);
  }
uses the LLVM intrinsic "llvm.uadd.with.overflow" and generates
horrible code that doesn't use the "adc" x86 instruction.

What is the current thinking on improving multiprecision arithmetic?

Stephen Canon via llvm-dev

2017-Feb-15 22:28 UTC

head link

[llvm-dev] multiprecision add/sub

On Feb 15, 2017, at 2:22 PM, Bagel via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> I suggest that LLVM needs intrinsics for add/sub with carry, e.g.
> 
>  declare {T, i1} @llvm.addc.T(T %a, T %b, i1 c)
> 
> The current multiprecision clang intrinsics example:
>  void foo(unsigned *x, unsigned *y, unsigned *z)
>  { unsigned carryin = 0;
>    unsigned carryout;
>    z[0] = __builtin_addc(x[0], y[0], carryin, &carryout);
>    carryin = carryout;
>    z[1] = __builtin_addc(x[1], y[1], carryin, &carryout);
>    carryin = carryout;
>    z[2] = __builtin_addc(x[2], y[2], carryin, &carryout);
>    carryin = carryout;
>    z[3] = __builtin_addc(x[3], y[3], carryin, &carryout);
>  }
> uses the LLVM intrinsic "llvm.uadd.with.overflow" and generates
> horrible code that doesn't use the "adc" x86 instruction.
> 
> What is the current thinking on improving multiprecision arithmetic?
Why do you think this requires new intrinsics instead of teaching the optimizer
what to do with the existing intrinsics?

– Steve

David Majnemer via llvm-dev

2017-Feb-15 22:59 UTC

head link

[llvm-dev] multiprecision add/sub

On Wed, Feb 15, 2017 at 2:28 PM, Stephen Canon via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Feb 15, 2017, at 2:22 PM, Bagel via llvm-dev <llvm-dev at
lists.llvm.org>
> wrote:
>
> > I suggest that LLVM needs intrinsics for add/sub with carry, e.g.
> >
> >  declare {T, i1} @llvm.addc.T(T %a, T %b, i1 c)
> >
> > The current multiprecision clang intrinsics example:
> >  void foo(unsigned *x, unsigned *y, unsigned *z)
> >  { unsigned carryin = 0;
> >    unsigned carryout;
> >    z[0] = __builtin_addc(x[0], y[0], carryin, &carryout);
> >    carryin = carryout;
> >    z[1] = __builtin_addc(x[1], y[1], carryin, &carryout);
> >    carryin = carryout;
> >    z[2] = __builtin_addc(x[2], y[2], carryin, &carryout);
> >    carryin = carryout;
> >    z[3] = __builtin_addc(x[3], y[3], carryin, &carryout);
> >  }
> > uses the LLVM intrinsic "llvm.uadd.with.overflow" and
generates
> > horrible code that doesn't use the "adc" x86
instruction.
> >
> > What is the current thinking on improving multiprecision arithmetic?
>
> Why do you think this requires new intrinsics instead of teaching the
> optimizer what to do with the existing intrinsics?
>
In general, it is harder to reason about memory. Also, you are forced to
allocate memory for the carryout even if you are not interested in using it.

>
> – Steve
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170215/7ad59336/attachment.html>

Brian Smith via llvm-dev

2017-Feb-16 00:24 UTC

head link

[llvm-dev] multiprecision add/sub

Stephen Canon via llvm-dev <llvm-dev at lists.llvm.org>
wrote:>
> Why do you think this requires new intrinsics instead of teaching the
optimizer what to do with the existing intrinsics?
IMO, as a multiprecision math library maker, the "teaching the
optimizer what to do with the existing intrinsics" approach is much
better as long as it can be made to work. If one is careful, MSVC does
optimize its intrinsics into ADC instructions in a reasonable way, so
I think it is probably doable.

(Below, all math is multiprecision.)

There are actually a few different operations besides pure addition
and subtraction, e.g. `a - (b >> 1)` instead of just `a - b`. Also
consider that we sometimes want "a - b" to be side-channel free and
other times we'd rather "a - b" to be as fast as possible and
optimized for the case where carries are unlikely to propagate (far).
That's already three different operations, just for subtraction.

Cheers,
Brian
-- 
https://briansmith.org/

Bagel via llvm-dev

2017-Feb-16 17:12 UTC

head link

[llvm-dev] multiprecision add/sub

It takes two "llvm.uadd.with.overflow" instances to model the
add-with-carry
when there is a carry-in.  Look at the IR generated by the example.

I figured that the optimization of this would bedifficult (else it would have
already been done :-)).  And would this optimization have to be done for every
architecture?

On 02/15/2017 04:28 PM, Stephen Canon wrote:> 
> Why do you think this requires new intrinsics instead of teaching the
optimizer what to do with the existing intrinsics?
> 
> – Steve
>

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Feb 2017 - multiprecision add/sub

[llvm-dev] multiprecision add/sub

[llvm-dev] multiprecision add/sub

[llvm-dev] multiprecision add/sub

[llvm-dev] multiprecision add/sub

[llvm-dev] multiprecision add/sub

Apparently Analagous Threads