thr3ads.net - llvm dev - [LLVMdev] Proposal: add intrinsics for safe division [Apr 2014]

If this information is useful, please help other people find it:
Share via:

Filip Pizlo

2014-Apr-25 17:19 UTC

[LLVMdev] Proposal: add intrinsics for safe division

On April 25, 2014 at 9:52:35 AM, Eric Christopher (echristo at gmail.com) wrote:

Hi Michael,
> I’d like to propose to extend LLVM IR intrinsics set, adding new ones for
> safe-division. There are intrinsics for detecting overflow errors, like
> sadd.with.overflow, and the intrinsics I’m proposing will augment this set.
>
> The new intrinsics will return a structure with two elements according to
> the following rules:
>
> safe.[us]div(x,0) = safe.[us]rem(x,0) = {0, 1}
> safe.sdiv(min<T>, -1) = safe.srem(min<T>, -1) = {min<T>,
1}
> In other cases: safe.op(x,y) = {x op y, 0}, where op is sdiv, udiv, srem,
or
> urem
>
>
> The use of these intrinsics would be quite the same as it was for
> arith.with.overflow intrinsics. For instance:
> %res = call {i32, i1} @llvm.safe.sdiv.i32(i32 %a, i32 %b)
> %div = extractvalue {i32, i1} %res, 0
> %bit = extractvalue {i32, i1} %res, 1
> br i1 %bit, label %trap, label %normal
>
I notice that the patch is still in ToT even though we're now having a
discussion on whether it should go in. It should be reverted.

That said, I don't see how this is anything except for an optimization
intrinsic and not necessary for correct behavior for any language.
I.e. You can do what the PNaCl people are doing and emit branches
instead. Since this will only happen on a single target and not on all
targets why not just make this a target intrinisic? I don't buy the
argument that it biases the target independent IR since any pass that
uses TTI in the IR level does the same.
The sdiv operation in LLVM IR only makes sense for C and its very direct
relatives.  The amount of control flow necessary to represent a safe division
for any other language is ghastly.  (a/b) becomes something like (b != 0 ? ((a
!= INT_MIN || b != -1) ? a / b : INT_MIN) : 0).  It's more useful to the
optimizer to see this as the single operation that it really is.

Also, this isn't about just a single target.  Any target that has to emulate
integer division (like a lot of embedded systems would do) would benefit from
this since the division emulation can most naturally be written to provide the
"safe" semantics rather than LLVM's default semantics (i.e.
undefined behavior on x/0 and INT_MIN/-1).

I think adding these intrinics to the IR would be a great help for non-C clients
of LLVM.

-Filip




-eric

> Now a few words about their implementation in LLVM. Though the new
> intrinsics look quite similar to the ones with overflow, there are
> significant differences. One of them is that during lowering we need to
> create control-flow for the new ones, while for the existing ones it was
> sufficient to simply compute the overflow flag. The control flow is needed
> to guard the division operation, which otherwise can cause an undefined
> behaviour.
>
> The existing intrinsics are lowered in a back-end, during legalization
> steps. To do the same for the new ones, we’d need a more complicated
> implementation because of the need to create a new control flow. Also, that
> would be needed to be done in every backend.
>
> Another alternative here is to lower the new intrinsics in CodeGenPrepare
> pass. That approach looks more convenient to me, because it allows us to
> have a single implementation for all targets in one place, and it’s easier
> to introduce control-flow at this point.
>
> The patch below implements the second alternative. Along with a
> straight-forward lowering (which is valid and could be used as a base on
all
> platforms), during the lowering some simple optimizations are performed
> (which I think is also easier to implement in CodeGenPrepare, than on
DAGs):
>
> We don’t to generate code for unused part of the result structure.
> If div-instruction on the given platform behaves exactly as needed for the
> intrinsic (e.g. it takes place for ARM64), we don’t guard the div
> instruction. As a result, we could avoid branches at all if the second part
> of the result structure is not used.
> The most expected users of the result structure are extractvalue
> instructions. Having that in mind, we try to propagate the results - in
most
> cases that allows to get rid of all corresponding extractvalues.
>
>
> Attached are two patches: the first one with the described implementation
> and tests, and the second one with the corresponding documentation changes.
>
> The first patch happened to already get to the trunk, but the topic is
open,
> and any suggestions are welcome.
>
>
>
>
> Best regards,
> Michael
>
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140425/ff7e0463/attachment.html>

Reid Kleckner

2014-Apr-25 17:48 UTC

head link

[LLVMdev] Proposal: add intrinsics for safe division

On Fri, Apr 25, 2014 at 10:19 AM, Filip Pizlo <fpizlo at apple.com>
wrote:>
> The sdiv operation in LLVM IR only makes sense for C and its very direct
> relatives.  The amount of control flow necessary to represent a safe
> division for any other language is ghastly.  (a/b) becomes something like
> (b != 0 ? ((a != INT_MIN || b != -1) ? a / b : INT_MIN) : 0).  It's
more
> useful to the optimizer to see this as the single operation that it really
> is.
>This argument makes sense to me. It would be hard for the arm64 backend to
pattern match all this control flow back into a single instruction after
optimizations.

Are there ISAs out there that produce different results without trapping
for these boundary conditions? Should we leave the quotient result
unspecified in the b == 0 || (a == INT_MIN && B == -1) cases? If we do
that, we require frontends to check the "overflow" bit if they want
deterministic cross-platform behavior.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140425/4dbca872/attachment.html>

Philip Reames

2014-Apr-25 18:06 UTC

head link

[LLVMdev] Proposal: add intrinsics for safe division

On 04/25/2014 10:19 AM, Filip Pizlo wrote:>
>
> On April 25, 2014 at 9:52:35 AM, Eric Christopher (echristo at gmail.com 
> <mailto:echristo at gmail.com>) wrote:
>
>> Hi Michael,
>>
>> > I'd like to propose to extend LLVM IR intrinsics set, adding
new
>> ones for
>> > safe-division. There are intrinsics for detecting overflow errors,
like
>> > sadd.with.overflow, and the intrinsics I'm proposing will
augment
>> this set.
>> >
>> > The new intrinsics will return a structure with two elements 
>> according to
>> > the following rules:
>> >
>> > safe.[us]div(x,0) = safe.[us]rem(x,0) = {0, 1}
>> > safe.sdiv(min<T>, -1) = safe.srem(min<T>, -1) =
{min<T>, 1}
>> > In other cases: safe.op(x,y) = {x op y, 0}, where op is sdiv,
udiv,
>> srem, or
>> > urem
>> >
>> >
>> > The use of these intrinsics would be quite the same as it was for
>> > arith.with.overflow intrinsics. For instance:
>> > %res = call {i32, i1} @llvm.safe.sdiv.i32(i32 %a, i32 %b)
>> > %div = extractvalue {i32, i1} %res, 0
>> > %bit = extractvalue {i32, i1} %res, 1
>> > br i1 %bit, label %trap, label %normal
>> >
>>
>> I notice that the patch is still in ToT even though we're now
having a
>> discussion on whether it should go in. It should be reverted.
>>
>> That said, I don't see how this is anything except for an
optimization
>> intrinsic and not necessary for correct behavior for any language.
>> I.e. You can do what the PNaCl people are doing and emit branches
>> instead. Since this will only happen on a single target and not on all
>> targets why not just make this a target intrinisic? I don't buy the
>> argument that it biases the target independent IR since any pass that
>> uses TTI in the IR level does the same.
>
> The sdiv operation in LLVM IR only makes sense for C and its very 
> direct relatives.  The amount of control flow necessary to represent a 
> safe division for any other language is ghastly.  (a/b) becomes 
> something like (b != 0 ? ((a != INT_MIN || b != -1) ? a / b : INT_MIN) 
> : 0).  It's more useful to the optimizer to see this as the single 
> operation that it really is.
>
> Also, this isn't about just a single target.  Any target that has to 
> emulate integer division (like a lot of embedded systems would do) 
> would benefit from this since the division emulation can most 
> naturally be written to provide the "safe" semantics rather than 
> LLVM's default semantics (i.e. undefined behavior on x/0 and
INT_MIN/-1).
>
> I think adding these intrinics to the IR would be a great help for 
> non-C clients of LLVM.
>
> -Filip
>I agree with all of Filip's points above.  I'm in support of including 
these intrinsics in their current form.

I have two suggestions, but both of these can be incremental improvements:
1) On many targets, it might make sense to lower these much earlier than 
codegenprepare.  By lowering them that late, you loose a lot of 
potentially to have branches eliminated.  If there's no special hardware 
support which makes those conditions cheap, getting rid of them early is 
probably the best choice.
2) Extending Reid's suggestion (in a separate response) could we define 
the intrinsics to take the (constant) values to be used in the boundary 
conditions?  llvm.safe.div(x,y, undef, undef) would give Reid's desired 
semantics.  llvm.safe.div(x,y, 0, min<T>) would give the original 
proposed semantics.  This would allow a variety of language semantics 
with a shared implementation.

Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140425/35f11ce0/attachment.html>

Eric Christopher

2014-Apr-25 18:15 UTC

head link

[LLVMdev] Proposal: add intrinsics for safe division

On Fri, Apr 25, 2014 at 10:48 AM, Reid Kleckner <rnk at google.com>
wrote:> On Fri, Apr 25, 2014 at 10:19 AM, Filip Pizlo <fpizlo at apple.com>
wrote:
>>
>> The sdiv operation in LLVM IR only makes sense for C and its very
direct
>> relatives.  The amount of control flow necessary to represent a safe
>> division for any other language is ghastly.  (a/b) becomes something
like (b
>> != 0 ? ((a != INT_MIN || b != -1) ? a / b : INT_MIN) : 0).  It's
more useful
>> to the optimizer to see this as the single operation that it really is.
>
> This argument makes sense to me. It would be hard for the arm64 backend to
> pattern match all this control flow back into a single instruction after
> optimizations.
No objections on this point here either - I agree that this makes
things more verbose for generating IR and that pattern matching it
back might be difficult (CVP should be able to do it, however...)
>
> Are there ISAs out there that produce different results without trapping
for
> these boundary conditions? Should we leave the quotient result unspecified
> in the b == 0 || (a == INT_MIN && B == -1) cases? If we do that, we
require
> frontends to check the "overflow" bit if they want deterministic
> cross-platform behavior.
Agreed, perhaps we want to say "undefined" or "returns the same
value"
or are there other semantics?

-eric

Filip Pizlo

2014-Apr-25 18:49 UTC

head link

[LLVMdev] Proposal: add intrinsics for safe division

On April 25, 2014 at 10:48:18 AM, Reid Kleckner (rnk at google.com) wrote:

On Fri, Apr 25, 2014 at 10:19 AM, Filip Pizlo <fpizlo at apple.com> wrote:
The sdiv operation in LLVM IR only makes sense for C and its very direct
relatives.  The amount of control flow necessary to represent a safe division
for any other language is ghastly.  (a/b) becomes something like (b != 0 ? ((a
!= INT_MIN || b != -1) ? a / b : INT_MIN) : 0).  It's more useful to the
optimizer to see this as the single operation that it really is.

This argument makes sense to me. It would be hard for the arm64 backend to
pattern match all this control flow back into a single instruction after
optimizations.

Are there ISAs out there that produce different results without trapping for
these boundary conditions? Should we leave the quotient result unspecified in
the b == 0 || (a == INT_MIN && B == -1) cases? If we do that, we require
frontends to check the "overflow" bit if they want deterministic
cross-platform behavior.
That's a good question.  I believe that all ISAs go to one of the following
paths:

- Trap on x/0 and INT_MIN/-1 like x86.

- Return 0 for x/0 and INT_MIN for INT_MIN/-1 like ARM.

Also, here's what I know about what different languages do:

C/C++/etc: undef on x/0 and INT_MIN/-1, so they would use LLVM's existing
div instruction.

JavaScript: optimizing compilers will look for places where people say things
like (a/b)|0 - in JavaScript saying x|0 means "truncate to int32", and
they will do an integer division if the inputs were already represented as
integers.  x/0 will produce NaN and NaN|0 produces 0; INT_MIN/-1 produces
-INT_MIN (which is representable as double) and then (-INT_MIN)|0 produces
INT_MIN.  So, the ARM64 semantics match optimized JavaScript semantics.  There
is also the case that you say a/b in JavaScript, and we have inferred that a and
b are integers but there is no |0.  In that case we want to detect when x/0 and
INT_MIN/-1 so that we can promote the result to double.

Java: x/0 throws an exception but INT_MIN/-1 produces INT_MIN.

Ruby and probably a lot of others: provide BigInt semantics but have fast paths
for small ints.  So, these will throw an exception on x/0 and they will inflate
to a BigInt on INT_MIN/-1.

I don't know about other languages.

Looking at this, it makes me *slightly* prefer to generalize these intrinsics a
bit.  Instead of returning an i1 flag for all overflow cases, you could return
an i2 status enum that says 0 for neither overflow condition, 1 for x/0, and 2
for INT_MIN/-1.  Then, you get the following matrix for how Java and JavaScript
would map to this and how it would be lowered:

- Java on x86: use safediv and check if the if the status is 1.  If it's 1
then throw.  This lowers to a bunch of branches fairly late in the backend.

- JavaScript on x86 with |0: use safediv and ignore the status.  Also loweres to
a bunch of branches, which is as good as as it gets.

- JavaScript on x86 without |0: use safediv and deoptimize if status != 0.  Also
a bunch of branches.

- Ruby on x86: use safediv and detect both status==1 and status==2.  This lowers
to a bunch of branches.

- Java on ARM: use safediv as on x86, but not it lowers to just a branch to
detect x/0 and not the other case.  Note that without the safediv intrinsic, you
wouldn't be able to get this to one branch unless you used a target
intrinsic.

- JavaScript on ARM with |0: use safediv and you ignore the status and you get
no branches.  It maps directly to ARM.

- JavaScript on ARM without |0: use safediv and test the status, so you get a
bunch of branches as before.

- Ruby on ARM: this lowers to a bunch of branches as on x86.

So, it seems that for JavaScript and Java on ARM - and probably any environment
with emulated division, these intrinsics would result in tighter code without
requiring target intrinsics.  For all other language/hardware combinations, the
main benefit here is to materialize the gnarly control flow later and so
hopefully reveal more optimization opportunities.  That seems like a pretty good
sweet-spot for the systems that I know about.

-Filip



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140425/707883f3/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Apr 2014 - [LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division

Possibly Parallel Threads