thr3ads.net - llvm dev - [LLVMdev] Proposal: add intrinsics for safe division [May 2014]

If this information is useful, please help other people find it:
Share via:

Andrew Trick

2014-May-02 23:46 UTC

[LLVMdev] Proposal: add intrinsics for safe division

On May 2, 2014, at 4:25 PM, Philip Reames <listmail at philipreames.com>
wrote:
> 
> I'm still not happy with hiding the explicit control flow, but I can
achieve the same effect as:
> if( div by zero ) throw
> (r, o, d) = safe.div(a,b);
> if( o ) {
>   r = a;
> }
> 
> i.e. emit a separate guard branch for the interesting condition and not
utilize the value from the safe div.
Oh, I was just assuming you would deopt on either div-by-zero or overflow in
order to optimize the common case.

So, the point of safe.div is that you don’t need control flow in the IR except
to indicate the throw.

(r, o, d) = safe.div(a,b)
if (d) { patchpoint(throw, state) }

You could also explicitly check div-by-zero but it would be harder to fold away
that extra check during lowering.

-Andy
> 
> Assuming the codegen patterns are right, this should give me options for
expressing all of the patterns I might like to explore.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140502/6517cf61/attachment.html>

Philip Reames

2014-May-03 00:01 UTC

head link

[LLVMdev] Proposal: add intrinsics for safe division

On 05/02/2014 04:46 PM, Andrew Trick wrote:>
> On May 2, 2014, at 4:25 PM, Philip Reames <listmail at philipreames.com 
> <mailto:listmail at philipreames.com>> wrote:
>
>>
>> I'm still not happy with hiding the explicit control flow, but I
can
>> achieve the same effect as:
>> if( div by zero ) throw
>> (r, o, d) = safe.div(a,b);
>> if( o ) {
>>   r = a;
>> }
>>
>> i.e. emit a separate guard branch for the interesting condition and 
>> not utilize the value from the safe div.
>
> Oh, I was just assuming you would deopt on either div-by-zero or 
> overflow in order to optimize the common case.
>
> So, the point of safe.div is that you don’t need control flow in the 
> IR except to indicate the throw.
>
> (r, o, d) = safe.div(a,b)
> if (d) { patchpoint(throw, state) }
>
> You could also explicitly check div-by-zero but it would be harder to 
> fold away that extra check during lowering.I think I may have worded this badly.  The motivation for the explicit 
guard is to enable LICM and other branch simplifications. This point is 
completely separate from the choice to deopt or not.

Consider the example below.  This is specifically for x86 with it's 
trapping divide.  Assume I am not utilizing traps, but need to guard the 
divide to handle the two special cases.  The exact handling of the edge 
cases isn't really significant, but the guards must exist.

Given:
void foo(int[] a, int d) {
   for i = 0....a.length {
     a[i] = a[i]/d
   }
}

I want:
void foo(int[]a, int d) {
   // A single check here, not one per iteration
   if( a.length > 0 && d == 0 ) throw;
   for i = 0...a.length {
     // arguably, it might be better to split the loop for d == -1
     if( a[i] == INT_MIN && d == -1 ) {
        continue;
     }
     a[i] = x86_div(a[i], d);
   }
}

Not:
void foo(int[]a, int d) {

   for i = 0...a.length {
     if( a[i] == INT_MIN && d == -1 ) {
        continue;
     }
     if( d == 0 ) throw;
     a[i] = x86_div(a[i], d);
   }
}

Hopefully, this helps clarify my point.

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140502/8b1637e0/attachment.html>

Andrew Trick

2014-May-03 00:45 UTC

head link

[LLVMdev] Proposal: add intrinsics for safe division

On May 2, 2014, at 5:01 PM, Philip Reames <listmail at philipreames.com>
wrote:
> 
> On 05/02/2014 04:46 PM, Andrew Trick wrote:
>> 
>> On May 2, 2014, at 4:25 PM, Philip Reames <listmail at
philipreames.com> wrote:
>> 
>>> 
>>> I'm still not happy with hiding the explicit control flow, but
I can achieve the same effect as:
>>> if( div by zero ) throw
>>> (r, o, d) = safe.div(a,b);
>>> if( o ) {
>>>   r = a;
>>> }
>>> 
>>> i.e. emit a separate guard branch for the interesting condition and
not utilize the value from the safe div.
>> 
>> Oh, I was just assuming you would deopt on either div-by-zero or
overflow in order to optimize the common case.
>> 
>> So, the point of safe.div is that you don’t need control flow in the IR
except to indicate the throw.
>> 
>> (r, o, d) = safe.div(a,b)
>> if (d) { patchpoint(throw, state) }
>> 
>> You could also explicitly check div-by-zero but it would be harder to
fold away that extra check during lowering.
> I think I may have worded this badly.  The motivation for the explicit
guard is to enable LICM and other branch simplifications.  This point is
completely separate from the choice to deopt or not.
> 
> Consider the example below.  This is specifically for x86 with it's
trapping divide.  Assume I am not utilizing traps, but need to guard the divide
to handle the two special cases.  The exact handling of the edge cases isn't
really significant, but the guards must exist.
Yes, I completely understand what you want. I would like to hoist runtime traps
even more aggressively than what LLVM will do when the control flow is explicit,
but that has nothing to do with safe.div. We’ll discuss it later in a patchpoint
thread.

The issue of loop unswitching came up earlier in the thread. All these options
are good
- generate fully explicit control flow + sdiv
- generate partially explicit control flow + safe.div
- expand safe.div early based on TTI
- teach unswitching about safe.div

The only thing I did not particularly like was the need for frontends to
generate target specific intrinsics for a common operation.

-Andy
> Given:
> void foo(int[] a, int d) {
>   for i = 0....a.length {
>     a[i] = a[i]/d
>   }
> }
> 
> I want:
> void foo(int[]a, int d) {
>   // A single check here, not one per iteration
>   if( a.length > 0 && d == 0 ) throw;
>   for i = 0...a.length {
>     // arguably, it might be better to split the loop for d == -1
>     if( a[i] == INT_MIN && d == -1 ) {
>        continue;
>     }
>     a[i] = x86_div(a[i], d);
>   }
> }
> 
> Not:
> void foo(int[]a, int d) {
> 
>   for i = 0...a.length {
>     if( a[i] == INT_MIN && d == -1 ) {
>        continue;
>     }
>     if( d == 0 ) throw;
>     a[i] = x86_div(a[i], d);
>   }
> }
> 
> Hopefully, this helps clarify my point.  
> 
> Philip
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140502/41b6334d/attachment.html>

Michael Zolotukhin

2014-May-05 12:58 UTC

head link

[LLVMdev] Proposal: add intrinsics for safe division

Hi Andy, Philip, Filip, Eric and others,

I didn’t want to talk about patchpoints, because I don’t know much about them.

Let me comment some remarks regarding the current proposal though:

I do request the following changes:
- Mark it clearly as experimental.
I agree.

- Either don't specify the value computed in the edge cases, or allow those
values to be specified as constant arguments to the call.  This allows efficient
lowering to x86's div instruction if you want to make use of the trapping
semantics.
Actually, I don’t see much use of this. Frontends like Java and Go will still
have to generate calls with explicit constant for the edge cases, making them
equivalent to the currently proposed intrinsics.

Given:
void foo(int[] a, int d) {
  for i = 0....a.length {
    a[i] = a[i]/d
  }
}
I want:
void foo(int[]a, int d) {
  // A single check here, not one per iteration
  if( a.length > 0 && d == 0 ) throw;
  for i = 0...a.length {
    // arguably, it might be better to split the loop for d == -1
    if( a[i] == INT_MIN && d == -1 ) {
       continue;
    }
    a[i] = x86_div(a[i], d);
  }
}
I understand your concern about the checks on x86 - in this example it really
might be useful to have explicit CFG to help LICM to hoist the check out of the
loop. I suggest to address this concern by adding mechanism to lower the
intrinsics early (as proposed by Andy as well). Along with that, we can try to
teach loop optimizations to deal with the intrinsics, and maybe later we won’t
need this early-lowering at all (but while it’s needed it’ll be there). Thus, at
the beginning, when the optimizations know nothing about the intrinsics, we’ll
just lower them early for some targets (x86) and keep until CGP for the others
(ARM64). Also, that will allow us to compare performance with early lowering vs
CGP lowering.

Does that sound reasonable?

Thanks,
Michael

On May 3, 2014, at 4:45 AM, Andrew Trick <atrick at apple.com> wrote:
> 
> On May 2, 2014, at 5:01 PM, Philip Reames <listmail at
philipreames.com> wrote:
> 
>> 
>> On 05/02/2014 04:46 PM, Andrew Trick wrote:
>>> 
>>> On May 2, 2014, at 4:25 PM, Philip Reames <listmail at
philipreames.com> wrote:
>>> 
>>>> 
>>>> I'm still not happy with hiding the explicit control flow,
but I can achieve the same effect as:
>>>> if( div by zero ) throw
>>>> (r, o, d) = safe.div(a,b);
>>>> if( o ) {
>>>>   r = a;
>>>> }
>>>> 
>>>> i.e. emit a separate guard branch for the interesting condition
and not utilize the value from the safe div.
>>> 
>>> Oh, I was just assuming you would deopt on either div-by-zero or
overflow in order to optimize the common case.
>>> 
>>> So, the point of safe.div is that you don’t need control flow in
the IR except to indicate the throw.
>>> 
>>> (r, o, d) = safe.div(a,b)
>>> if (d) { patchpoint(throw, state) }
>>> 
>>> You could also explicitly check div-by-zero but it would be harder
to fold away that extra check during lowering.
>> I think I may have worded this badly.  The motivation for the explicit
guard is to enable LICM and other branch simplifications.  This point is
completely separate from the choice to deopt or not.
>> 
>> Consider the example below.  This is specifically for x86 with it's
trapping divide.  Assume I am not utilizing traps, but need to guard the divide
to handle the two special cases.  The exact handling of the edge cases isn't
really significant, but the guards must exist.
> 
> Yes, I completely understand what you want. I would like to hoist runtime
traps even more aggressively than what LLVM will do when the control flow is
explicit, but that has nothing to do with safe.div. We’ll discuss it later in a
patchpoint thread.
> 
> The issue of loop unswitching came up earlier in the thread. All these
options are good
> - generate fully explicit control flow + sdiv
> - generate partially explicit control flow + safe.div
> - expand safe.div early based on TTI
> - teach unswitching about safe.div
> 
> The only thing I did not particularly like was the need for frontends to
generate target specific intrinsics for a common operation.
> 
> -Andy
> 
>> Given:
>> void foo(int[] a, int d) {
>>   for i = 0....a.length {
>>     a[i] = a[i]/d
>>   }
>> }
>> 
>> I want:
>> void foo(int[]a, int d) {
>>   // A single check here, not one per iteration
>>   if( a.length > 0 && d == 0 ) throw;
>>   for i = 0...a.length {
>>     // arguably, it might be better to split the loop for d == -1
>>     if( a[i] == INT_MIN && d == -1 ) {
>>        continue;
>>     }
>>     a[i] = x86_div(a[i], d);
>>   }
>> }
>> 
>> Not:
>> void foo(int[]a, int d) {
>> 
>>   for i = 0...a.length {
>>     if( a[i] == INT_MIN && d == -1 ) {
>>        continue;
>>     }
>>     if( d == 0 ) throw;
>>     a[i] = x86_div(a[i], d);
>>   }
>> }
>> 
>> Hopefully, this helps clarify my point.  
>> 
>> Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140505/89730cee/attachment.html>

llvm dev - May 2014 - [LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division

[LLVMdev] Proposal: add intrinsics for safe division