thr3ads.net - llvm dev - [LLVMdev] Proposal: add intrinsics for safe division [Apr 2014]

If this information is useful, please help other people find it:
Share via:

Michael Zolotukhin

2014-Apr-24 05:52 UTC

[LLVMdev] Proposal: add intrinsics for safe division

Hi,

I’d like to propose to extend LLVM IR intrinsics set, adding new ones for
safe-division. There are intrinsics for detecting overflow errors, like
sadd.with.overflow, and the intrinsics I’m proposing will augment this set.

The new intrinsics will return a structure with two elements according to the
following rules:
safe.[us]div(x,0) = safe.[us]rem(x,0) = {0, 1}
safe.sdiv(min<T>, -1) = safe.srem(min<T>, -1) = {min<T>, 1}
In other cases: safe.op(x,y) = {x op y, 0}, where op is sdiv, udiv, srem, or
urem

The use of these intrinsics would be quite the same as it was for
arith.with.overflow intrinsics. For instance:
%res = call {i32, i1} @llvm.safe.sdiv.i32(i32 %a, i32 %b)
%div = extractvalue {i32, i1} %res, 0
%bit = extractvalue {i32, i1} %res, 1
br i1 %bit, label %trap, label %normal

Now a few words about their implementation in LLVM. Though the new intrinsics
look quite similar to the ones with overflow, there are significant differences.
One of them is that during lowering we need to create control-flow for the new
ones, while for the existing ones it was sufficient to simply compute the
overflow flag. The control flow is needed to guard the division operation, which
otherwise can cause an undefined behaviour.

The existing intrinsics are lowered in a back-end, during legalization steps. To
do the same for the new ones, we’d need a more complicated implementation
because of the need to create a new control flow. Also, that would be needed to
be done in every backend.

Another alternative here is to lower the new intrinsics in CodeGenPrepare pass.
That approach looks more convenient to me, because it allows us to have a single
implementation for all targets in one place, and it’s easier to introduce
control-flow at this point.

The patch below implements the second alternative. Along with a straight-forward
lowering (which is valid and could be used as a base on all platforms), during
the lowering some simple optimizations are performed (which I think is also
easier to implement in CodeGenPrepare, than on DAGs):
We don’t to generate code for unused part of the result structure.
If div-instruction on the given platform behaves exactly as needed for the
intrinsic (e.g. it takes place for ARM64), we don’t guard the div instruction.
As a result, we could avoid branches at all if the second part of the result
structure is not used.
The most expected users of the result structure are extractvalue instructions.
Having that in mind, we try to propagate the results - in most cases that allows
to get rid of all corresponding extractvalues.

Attached are two patches: the first one with the described implementation and
tests, and the second one with the corresponding documentation changes.

The first patch happened to already get to the trunk, but the topic is open, and
any suggestions are welcome.

Best regards,
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140424/f5c7111f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1-safediv-intrinsics.patch
Type: application/octet-stream
Size: 28806 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140424/f5c7111f/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140424/f5c7111f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2-safediv-intrinsics-docs.patch
Type: application/octet-stream
Size: 6941 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140424/f5c7111f/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140424/f5c7111f/attachment-0002.html>

JF Bastien

2014-Apr-24 21:05 UTC

head link

[LLVMdev] Proposal: add intrinsics for safe division

Why not also support 8-bit vectors?

I'd be interested in these intrinsics also supporting integer vector types
with the same semantics, and having the vectorization code be able to
understand scalar/vector safe division the same way it understands
scalar/vector division.

FWIW we did something in PNaCl, but only to guarantee that integer
division/modulo by zero reliably traps on all architectures:

https://chromium.googlesource.com/native_client/pnacl-llvm/+/master/lib/Transforms/NaCl/InsertDivideCheck.cpp



On Wed, Apr 23, 2014 at 10:52 PM, Michael Zolotukhin
<mzolotukhin at apple.com>wrote:
> Hi,
>
> I’d like to propose to extend LLVM IR intrinsics set, adding new ones for
> safe-division. There are intrinsics for detecting overflow errors, like
> sadd.with.overflow, and the intrinsics I’m proposing will augment this set.
>
> The new intrinsics will return a structure with two elements according to
> the following rules:
>
>    1. safe.[us]div(x,0) = safe.[us]rem(x,0) = {0, 1}
>    2. safe.sdiv(min<T>, -1) = safe.srem(min<T>, -1) =
{min<T>, 1}
>    3. In other cases: safe.op(x,y) = {x op y, 0}, where op is sdiv, udiv,
>    srem, or urem
>
>
> The use of these intrinsics would be quite the same as it was for
> arith.with.overflow intrinsics. For instance:
>       %res = call {i32, i1} @llvm.safe.sdiv.i32(i32 %a, i32 %b)
>       %div = extractvalue {i32, i1} %res, 0
>       %bit = extractvalue {i32, i1} %res, 1
>       br i1 %bit, label %trap, label %normal
>
> Now a few words about their implementation in LLVM. Though the new
> intrinsics look quite similar to the ones with overflow, there are
> significant differences. One of them is that during lowering we need to
> create control-flow for the new ones, while for the existing ones it was
> sufficient to simply compute the overflow flag. The control flow is needed
> to guard the division operation, which otherwise can cause an undefined
> behaviour.
>
> The existing intrinsics are lowered in a back-end, during legalization
> steps. To do the same for the new ones, we’d need a more complicated
> implementation because of the need to create a new control flow. Also, that
> would be needed to be done in every backend.
>
> Another alternative here is to lower the new intrinsics in CodeGenPrepare
> pass. That approach looks more convenient to me, because it allows us to
> have a single implementation for all targets in one place, and it’s easier
> to introduce control-flow at this point.
>
> The patch below implements the second alternative. Along with a
> straight-forward lowering (which is valid and could be used as a base on
> all platforms), during the lowering some simple optimizations are performed
> (which I think is also easier to implement in CodeGenPrepare, than on
DAGs):
>
>    1. We don’t to generate code for unused part of the result structure.
>    2. If div-instruction on the given platform behaves exactly as needed
>    for the intrinsic (e.g. it takes place for ARM64), we don’t guard the
div
>    instruction. As a result, we could avoid branches at all if the second
part
>    of the result structure is not used.
>    3. The most expected users of the result structure are extractvalue
>    instructions. Having that in mind, we try to propagate the results - in
>    most cases that allows to get rid of all corresponding extractvalues.
>
>
> Attached are two patches: the first one with the described implementation
> and tests, and the second one with the corresponding documentation changes.
>
> The first patch happened to already get to the trunk, but the topic is
> open, and any suggestions are welcome.
>
>
>
>
> Best regards,
> Michael
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140424/6b588a5e/attachment.html>

Nadav Rotem

2014-Apr-25 04:06 UTC

head link

[LLVMdev] Proposal: add intrinsics for safe division

On Apr 24, 2014, at 2:05 PM, JF Bastien <jfb at google.com> wrote:
> Why not also support 8-bit vectors?
> 
> I'd be interested in these intrinsics also supporting integer vector
types with the same semantics, and having the vectorization code be able to
understand scalar/vector safe division the same way it understands scalar/vector
division.
Sounds good JF. I suggest that we start by making scalar save division work and
then move on to supporting vectors.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140424/ca5f3c82/attachment.html>

Eric Christopher

2014-Apr-25 16:51 UTC

head link

[LLVMdev] Proposal: add intrinsics for safe division

Hi Michael,
> I’d like to propose to extend LLVM IR intrinsics set, adding new ones for
> safe-division. There are intrinsics for detecting overflow errors, like
> sadd.with.overflow, and the intrinsics I’m proposing will augment this set.
>
> The new intrinsics will return a structure with two elements according to
> the following rules:
>
> safe.[us]div(x,0) = safe.[us]rem(x,0) = {0, 1}
> safe.sdiv(min<T>, -1) = safe.srem(min<T>, -1) = {min<T>,
1}
> In other cases: safe.op(x,y) = {x op y, 0}, where op is sdiv, udiv, srem,
or
> urem
>
>
> The use of these intrinsics would be quite the same as it was for
> arith.with.overflow intrinsics. For instance:
>       %res = call {i32, i1} @llvm.safe.sdiv.i32(i32 %a, i32 %b)
>       %div = extractvalue {i32, i1} %res, 0
>       %bit = extractvalue {i32, i1} %res, 1
>       br i1 %bit, label %trap, label %normal
>
I notice that the patch is still in ToT even though we're now having a
discussion on whether it should go in. It should be reverted.

That said, I don't see how this is anything except for an optimization
intrinsic and not necessary for correct behavior for any language.
I.e. You can do what the PNaCl people are doing and emit branches
instead. Since this will only happen on a single target and not on all
targets why not just make this a target intrinisic? I don't buy the
argument that it biases the target independent IR since any pass that
uses TTI in the IR level does the same.

-eric

> Now a few words about their implementation in LLVM. Though the new
> intrinsics look quite similar to the ones with overflow, there are
> significant differences. One of them is that during lowering we need to
> create control-flow for the new ones, while for the existing ones it was
> sufficient to simply compute the overflow flag. The control flow is needed
> to guard the division operation, which otherwise can cause an undefined
> behaviour.
>
> The existing intrinsics are lowered in a back-end, during legalization
> steps. To do the same for the new ones, we’d need a more complicated
> implementation because of the need to create a new control flow. Also, that
> would be needed to be done in every backend.
>
> Another alternative here is to lower the new intrinsics in CodeGenPrepare
> pass. That approach looks more convenient to me, because it allows us to
> have a single implementation for all targets in one place, and it’s easier
> to introduce control-flow at this point.
>
> The patch below implements the second alternative. Along with a
> straight-forward lowering (which is valid and could be used as a base on
all
> platforms), during the lowering some simple optimizations are performed
> (which I think is also easier to implement in CodeGenPrepare, than on
DAGs):
>
> We don’t to generate code for unused part of the result structure.
> If div-instruction on the given platform behaves exactly as needed for the
> intrinsic (e.g. it takes place for ARM64), we don’t guard the div
> instruction. As a result, we could avoid branches at all if the second part
> of the result structure is not used.
> The most expected users of the result structure are extractvalue
> instructions. Having that in mind, we try to propagate the results - in
most
> cases that allows to get rid of all corresponding extractvalues.
>
>
> Attached are two patches: the first one with the described implementation
> and tests, and the second one with the corresponding documentation changes.
>
> The first patch happened to already get to the trunk, but the topic is
open,
> and any suggestions are welcome.
>
>
>
>
> Best regards,
> Michael
>

Filip Pizlo

2014-Apr-25 17:19 UTC

head link

[LLVMdev] Proposal: add intrinsics for safe division

On April 25, 2014 at 9:52:35 AM, Eric Christopher (echristo at gmail.com) wrote:

Hi Michael,
> I’d like to propose to extend LLVM IR intrinsics set, adding new ones for
> safe-division. There are intrinsics for detecting overflow errors, like
> sadd.with.overflow, and the intrinsics I’m proposing will augment this set.
>
> The new intrinsics will return a structure with two elements according to
> the following rules:
>
> safe.[us]div(x,0) = safe.[us]rem(x,0) = {0, 1}
> safe.sdiv(min<T>, -1) = safe.srem(min<T>, -1) = {min<T>,
1}
> In other cases: safe.op(x,y) = {x op y, 0}, where op is sdiv, udiv, srem,
or
> urem
>
>
> The use of these intrinsics would be quite the same as it was for
> arith.with.overflow intrinsics. For instance:
> %res = call {i32, i1} @llvm.safe.sdiv.i32(i32 %a, i32 %b)
> %div = extractvalue {i32, i1} %res, 0
> %bit = extractvalue {i32, i1} %res, 1
> br i1 %bit, label %trap, label %normal
>
I notice that the patch is still in ToT even though we're now having a
discussion on whether it should go in. It should be reverted.

That said, I don't see how this is anything except for an optimization
intrinsic and not necessary for correct behavior for any language.
I.e. You can do what the PNaCl people are doing and emit branches
instead. Since this will only happen on a single target and not on all
targets why not just make this a target intrinisic? I don't buy the
argument that it biases the target independent IR since any pass that
uses TTI in the IR level does the same.
The sdiv operation in LLVM IR only makes sense for C and its very direct
relatives.  The amount of control flow necessary to represent a safe division
for any other language is ghastly.  (a/b) becomes something like (b != 0 ? ((a
!= INT_MIN || b != -1) ? a / b : INT_MIN) : 0).  It's more useful to the
optimizer to see this as the single operation that it really is.

Also, this isn't about just a single target.  Any target that has to emulate
integer division (like a lot of embedded systems would do) would benefit from
this since the division emulation can most naturally be written to provide the
"safe" semantics rather than LLVM's default semantics (i.e.
undefined behavior on x/0 and INT_MIN/-1).

I think adding these intrinics to the IR would be a great help for non-C clients
of LLVM.

-Filip




-eric

> Now a few words about their implementation in LLVM. Though the new
> intrinsics look quite similar to the ones with overflow, there are
> significant differences. One of them is that during lowering we need to
> create control-flow for the new ones, while for the existing ones it was
> sufficient to simply compute the overflow flag. The control flow is needed
> to guard the division operation, which otherwise can cause an undefined
> behaviour.
>
> The existing intrinsics are lowered in a back-end, during legalization
> steps. To do the same for the new ones, we’d need a more complicated
> implementation because of the need to create a new control flow. Also, that
> would be needed to be done in every backend.
>
> Another alternative here is to lower the new intrinsics in CodeGenPrepare
> pass. That approach looks more convenient to me, because it allows us to
> have a single implementation for all targets in one place, and it’s easier
> to introduce control-flow at this point.
>
> The patch below implements the second alternative. Along with a
> straight-forward lowering (which is valid and could be used as a base on
all
> platforms), during the lowering some simple optimizations are performed
> (which I think is also easier to implement in CodeGenPrepare, than on
DAGs):
>
> We don’t to generate code for unused part of the result structure.
> If div-instruction on the given platform behaves exactly as needed for the
> intrinsic (e.g. it takes place for ARM64), we don’t guard the div
> instruction. As a result, we could avoid branches at all if the second part
> of the result structure is not used.
> The most expected users of the result structure are extractvalue
> instructions. Having that in mind, we try to propagate the results - in
most
> cases that allows to get rid of all corresponding extractvalues.
>
>
> Attached are two patches: the first one with the described implementation
> and tests, and the second one with the corresponding documentation changes.
>
> The first patch happened to already get to the trunk, but the topic is
open,
> and any suggestions are welcome.
>
>
>
>
> Best regards,
> Michael
>
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140425/ff7e0463/attachment.html>

llvm dev - Apr 2014 - [LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division