thr3ads.net - llvm dev - [llvm-dev] Speculative execution of FP divide Instructions

If this information is useful, please help other people find it:
Share via:

Kaylor, Andrew via llvm-dev

2017-Mar-15 18:35 UTC

[llvm-dev] Speculative execution of FP divide Instructions - Call-Graph Simplify

It’s true, I am working on this.  I have committed the initial patch to add
constrained intrinsics for the basic FP operations.  This has the desired effect
of preventing optimizations that would violate strict FP semantics with regard
to rounding mode and exception status, but it also prevents many safe
optimizations.  Later this year I’ll be going through the code base and trying
to teach various optimizations to handle the constrained intrinsics safely.

Nothing has been added to clang yet to generate these intrinsics, though I have
some rough code to do so that’s just waiting for an implementation of the much
harder task of figuring out when such restrictions are needed.  If anyone did
have a front end that generated these intrinsics, everything is in place to get
them through to code generation (though they currently become at least
theoretically unrestricted again after ISel).  I have an experimental pass that
converts all basic FP operations to the constrained versions and I’ve been able
to successfully compile and run real-world FP-using applications with that pass
in place.

I’m currently working on a patch to add constrained versions of the standard
library FP intrinsics (sin, cos, pow, etc.).

If anyone is interested in getting pieces of the work-in-progress I’ve mentioned
above to test drive, let me know.  I’d be happy to share.

-Andy


From: Sanjay Patel [mailto:spatel at rotateright.com]
Sent: Wednesday, March 15, 2017 8:16 AM
To: Samuel Antão <samuelfantao at gmail.com>; Kaylor, Andrew
<andrew.kaylor at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; acjacob at us.ibm.com
Subject: Re: [llvm-dev] Speculative execution of FP divide Instructions -
Call-Graph Simplify
> - is there any target for which fp division does not have side effects?
Yes - all of them. This goes back to the fact that clang/llvm do not support
changing the FPENV:
https://bugs.llvm.org/show_bug.cgi?id=8100
There has been some effort to change that recently, so maybe this is (partly)
fixed? (cc'ing Andy Kaylor)
These reports may also provide info / answers / work-arounds:
https://bugs.llvm.org/show_bug.cgi?id=6050
https://bugs.llvm.org/show_bug.cgi?id=24343
https://bugs.llvm.org/show_bug.cgi?id=24818

On Wed, Mar 15, 2017 at 3:41 AM, Samuel Antão via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi all,

I came across an issue caused by the Call-Graph Simplify Pass. Here is a a small
repro:

```
define double @foo(double %a1, double %a2, double %a3) #0 {
entry:
  %a_mul = fmul double %a1, %a2
  %a_cmp = fcmp ogt double %a3, %a_mul
  br i1 %a_cmp, label %a.then, label %a.end

a.then:
  %a_div = fdiv double %a_mul, %a3
  br label %a.end

a.end:
  %a_factor = phi double [ %a_div, %a.then ], [ 1.000000e+00, %entry ]
  ret double %a_factor
}
```
Here, the conditional is guarding a possible division by zero. However if I run
CGSimplify on this I get:
```
define double @foo(double %a1, double %a2, double %a3) local_unnamed_addr #0 {
entry:
  %a_mul = fmul double %a1, %a2
  %a_cmp = fcmp olt double %a_mul, %a3
  %a_div = fdiv double %a_mul, %a3
  %a_factor = select i1 %a_cmp, double %a_div, double 1.000000e+00
  ret double %a_factor
}
```
This will cause a FP arithmetic exception, and the application will get a SIGFPE
signal. The code that drives the change in the optimizer relies on
`llvm::isSafeToSpeculativelyExecute` to decide whether the division should be
performed speculatively. Right now, this function returns true. One could do
something similar to integer divisions and add a case there (this solved the
issue for me):
```
diff --git a/lib/Analysis/ValueTracking.cpp b/lib/Analysis/ValueTracking.cpp
index 1761dac..c61fefd 100644
--- a/lib/Analysis/ValueTracking.cpp
+++ b/lib/Analysis/ValueTracking.cpp
@@ -3352,6 +3352,21 @@ bool llvm::isSafeToSpeculativelyExecute(const Value *V,
     // The numerator *might* be MinSignedValue.
     return false;
   }
+  case Instruction::FDiv:
+  case Instruction::FRem:{
+    const Value *Denominator = Inst->getOperand(1);
+    // x / y is undefined if y == 0
+    // The denominator is not a constant, so there is nothing we can do to
prove
+    // it is non-zero.
+    if (auto *VV = dyn_cast<ConstantVector>(Denominator))
+      Denominator = VV->getSplatValue();
+    if (!isa<ConstantFP>(Denominator))
+      return false;
+    // The denominator is a zero constant - we can't speculate here.
+    if (m_AnyZero().match(Denominator))
+      return false;
+    return true;
+  }
   case Instruction::Load: {
     const LoadInst *LI = cast<LoadInst>(Inst);
     if (!LI->isUnordered() ||
```
I did my tests using a powerpc64le target, but I couldn't find any target
specific login involved in this transform. In any case, I wanted to drop the
questions:

- is there any target that would benefit from speculative fp divisions?
- is there any target for which fp division does not have side effects?

If not, I can go ahead and post the patch above for review.

Many thanks!
Sam

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170315/cd00a6a1/attachment.html>

Hal Finkel via llvm-dev

2017-Mar-15 20:55 UTC

head link

[llvm-dev] Speculative execution of FP divide Instructions - Call-Graph Simplify

On 03/15/2017 01:35 PM, Kaylor, Andrew via llvm-dev
wrote:>
> It’s true, I am working on this.  I have committed the initial patch 
> to add constrained intrinsics for the basic FP operations.  This has 
> the desired effect of preventing optimizations that would violate 
> strict FP semantics with regard to rounding mode and exception status, 
> but it also prevents many safe optimizations.  Later this year I’ll be 
> going through the code base and trying to teach various optimizations 
> to handle the constrained intrinsics safely.
>
> Nothing has been added to clang yet to generate these intrinsics, 
> though I have some rough code to do so that’s just waiting for an 
> implementation of the much harder task of figuring out when such 
> restrictions are needed.
>
Can you elaborate on this? What do you mean by figuring out where the 
restrictions are needed?

  -Hal
> If anyone did have a front end that generated these intrinsics, 
> everything is in place to get them through to code generation (though 
> they currently become at least theoretically unrestricted again after 
> ISel).  I have an experimental pass that converts all basic FP 
> operations to the constrained versions and I’ve been able to 
> successfully compile and run real-world FP-using applications with 
> that pass in place.
>
> I’m currently working on a patch to add constrained versions of the 
> standard library FP intrinsics (sin, cos, pow, etc.).
>
> If anyone is interested in getting pieces of the work-in-progress I’ve 
> mentioned above to test drive, let me know.  I’d be happy to share.
>
> -Andy
>
> *From:*Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Wednesday, March 15, 2017 8:16 AM
> *To:* Samuel Antão <samuelfantao at gmail.com>; Kaylor, Andrew 
> <andrew.kaylor at intel.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; acjacob at us.ibm.com
> *Subject:* Re: [llvm-dev] Speculative execution of FP divide 
> Instructions - Call-Graph Simplify
>
> > - is there any target for which fp division does not have side
effects?
>
>
> Yes - all of them. This goes back to the fact that clang/llvm do not 
> support changing the FPENV:
> https://bugs.llvm.org/show_bug.cgi?id=8100
>
> There has been some effort to change that recently, so maybe this is 
> (partly) fixed? (cc'ing Andy Kaylor)
>
> These reports may also provide info / answers / work-arounds:
>
> https://bugs.llvm.org/show_bug.cgi?id=6050
> https://bugs.llvm.org/show_bug.cgi?id=24343
>
> https://bugs.llvm.org/show_bug.cgi?id=24818
>
> On Wed, Mar 15, 2017 at 3:41 AM, Samuel Antão via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     Hi all,
>
>     I came across an issue caused by the Call-Graph Simplify Pass.
>     Here is a a small repro:
>
>     ```
>
>     define double @foo(double %a1, double %a2, double %a3) #0 {
>
>     entry:
>
>       %a_mul = fmul double %a1, %a2
>
>       %a_cmp = fcmp ogt double %a3, %a_mul
>
>       br i1 %a_cmp, label %a.then, label %a.end
>
>     a.then:
>
>       %a_div = fdiv double %a_mul, %a3
>
>       br label %a.end
>
>     a.end:
>
>       %a_factor = phi double [ %a_div, %a.then ], [ 1.000000e+00, %entry ]
>
>       ret double %a_factor
>
>     }
>
>     ```
>
>     Here, the conditional is guarding a possible division by zero.
>     However if I run CGSimplify on this I get:
>
>     ```
>
>     define double @foo(double %a1, double %a2, double %a3)
>     local_unnamed_addr #0 {
>
>     entry:
>
>       %a_mul = fmul double %a1, %a2
>
>       %a_cmp = fcmp olt double %a_mul, %a3
>
>       %a_div = fdiv double %a_mul, %a3
>
>       %a_factor = select i1 %a_cmp, double %a_div, double 1.000000e+00
>
>       ret double %a_factor
>
>     }
>
>     ```
>
>     This will cause a FP arithmetic exception, and the application
>     will get a SIGFPE signal. The code that drives the change in the
>     optimizer relies on `llvm::isSafeToSpeculativelyExecute` to decide
>     whether the division should be performed speculatively. Right now,
>     this function returns true. One could do something similar to
>     integer divisions and add a case there (this solved the issue for me):
>
>     ```
>
>     diff --git a/lib/Analysis/ValueTracking.cpp
>     b/lib/Analysis/ValueTracking.cpp
>
>     index 1761dac..c61fefd 100644
>
>     --- a/lib/Analysis/ValueTracking.cpp
>
>     +++ b/lib/Analysis/ValueTracking.cpp
>
>     @@ -3352,6 +3352,21 @@ bool
>     llvm::isSafeToSpeculativelyExecute(const Value *V,
>
>          // The numerator *might* be MinSignedValue.
>
>          return false;
>
>        }
>
>     +  case Instruction::FDiv:
>
>     +  case Instruction::FRem:{
>
>     +    const Value *Denominator = Inst->getOperand(1);
>
>     +    // x / y is undefined if y == 0
>
>     +    // The denominator is not a constant, so there is nothing we
>     can do to prove
>
>     +    // it is non-zero.
>
>     +    if (auto *VV = dyn_cast<ConstantVector>(Denominator))
>
>     +  Denominator = VV->getSplatValue();
>
>     +    if (!isa<ConstantFP>(Denominator))
>
>     +      return false;
>
>     +    // The denominator is a zero constant - we can't speculate
here.
>
>     +    if (m_AnyZero().match(Denominator))
>
>     +      return false;
>
>     +    return true;
>
>     +  }
>
>        case Instruction::Load: {
>
>          const LoadInst *LI = cast<LoadInst>(Inst);
>
>          if (!LI->isUnordered() ||
>
>     ```
>
>     I did my tests using a powerpc64le target, but I couldn't find any
>     target specific login involved in this transform. In any case, I
>     wanted to drop the questions:
>
>     - is there any target that would benefit from speculative fp
>     divisions?
>
>     - is there any target for which fp division does not have side
>     effects?
>
>     If not, I can go ahead and post the patch above for review.
>
>     Many thanks!
>
>     Sam
>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170315/00e73ca5/attachment-0001.html>

Kaylor, Andrew via llvm-dev

2017-Mar-15 21:24 UTC

head link

[llvm-dev] Speculative execution of FP divide Instructions - Call-Graph Simplify

>Can you elaborate on this? What do you mean by figuring out where the
restrictions are needed?
Sure.  I’m just referring to having the front end support the FENV_ACCESS pragma
(or whatever else) to decide for whatever scope it is translating whether or not
it needs the constrained intrinsics.  I found one of the places in the front end
where it was emitting an fadd instruction, and I wrote some code of the form ‘if
(true) emit the intrinsic instead’ to test out an end-to-end code path.  So
what’s missing is a replacement of that ‘if (true)’ with something that actually
checks the compilation context to decide what should happen.

I might have been exaggerating a bit when I described this as ‘much harder’ but
generating the intrinsic is fairly trivial.

-Andy

From: Hal Finkel [mailto:hfinkel at anl.gov]
Sent: Wednesday, March 15, 2017 1:55 PM
To: Kaylor, Andrew <andrew.kaylor at intel.com>; Sanjay Patel <spatel
at rotateright.com>; Samuel Antão <samuelfantao at gmail.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; acjacob at us.ibm.com
Subject: Re: [llvm-dev] Speculative execution of FP divide Instructions -
Call-Graph Simplify



On 03/15/2017 01:35 PM, Kaylor, Andrew via llvm-dev wrote:
It’s true, I am working on this.  I have committed the initial patch to add
constrained intrinsics for the basic FP operations.  This has the desired effect
of preventing optimizations that would violate strict FP semantics with regard
to rounding mode and exception status, but it also prevents many safe
optimizations.  Later this year I’ll be going through the code base and trying
to teach various optimizations to handle the constrained intrinsics safely.

Nothing has been added to clang yet to generate these intrinsics, though I have
some rough code to do so that’s just waiting for an implementation of the much
harder task of figuring out when such restrictions are needed.

Can you elaborate on this? What do you mean by figuring out where the
restrictions are needed?

 -Hal


If anyone did have a front end that generated these intrinsics, everything is in
place to get them through to code generation (though they currently become at
least theoretically unrestricted again after ISel).  I have an experimental pass
that converts all basic FP operations to the constrained versions and I’ve been
able to successfully compile and run real-world FP-using applications with that
pass in place.

I’m currently working on a patch to add constrained versions of the standard
library FP intrinsics (sin, cos, pow, etc.).

If anyone is interested in getting pieces of the work-in-progress I’ve mentioned
above to test drive, let me know.  I’d be happy to share.

-Andy


From: Sanjay Patel [mailto:spatel at rotateright.com]
Sent: Wednesday, March 15, 2017 8:16 AM
To: Samuel Antão <samuelfantao at gmail.com><mailto:samuelfantao at
gmail.com>; Kaylor, Andrew <andrew.kaylor at
intel.com><mailto:andrew.kaylor at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org><mailto:llvm-dev at
lists.llvm.org>; acjacob at us.ibm.com<mailto:acjacob at us.ibm.com>
Subject: Re: [llvm-dev] Speculative execution of FP divide Instructions -
Call-Graph Simplify
> - is there any target for which fp division does not have side effects?
Yes - all of them. This goes back to the fact that clang/llvm do not support
changing the FPENV:
https://bugs.llvm.org/show_bug.cgi?id=8100
There has been some effort to change that recently, so maybe this is (partly)
fixed? (cc'ing Andy Kaylor)
These reports may also provide info / answers / work-arounds:
https://bugs.llvm.org/show_bug.cgi?id=6050
https://bugs.llvm.org/show_bug.cgi?id=24343
https://bugs.llvm.org/show_bug.cgi?id=24818

On Wed, Mar 15, 2017 at 3:41 AM, Samuel Antão via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi all,

I came across an issue caused by the Call-Graph Simplify Pass. Here is a a small
repro:

```
define double @foo(double %a1, double %a2, double %a3) #0 {
entry:
  %a_mul = fmul double %a1, %a2
  %a_cmp = fcmp ogt double %a3, %a_mul
  br i1 %a_cmp, label %a.then, label %a.end

a.then:
  %a_div = fdiv double %a_mul, %a3
  br label %a.end

a.end:
  %a_factor = phi double [ %a_div, %a.then ], [ 1.000000e+00, %entry ]
  ret double %a_factor
}
```
Here, the conditional is guarding a possible division by zero. However if I run
CGSimplify on this I get:
```
define double @foo(double %a1, double %a2, double %a3) local_unnamed_addr #0 {
entry:
  %a_mul = fmul double %a1, %a2
  %a_cmp = fcmp olt double %a_mul, %a3
  %a_div = fdiv double %a_mul, %a3
  %a_factor = select i1 %a_cmp, double %a_div, double 1.000000e+00
  ret double %a_factor
}
```
This will cause a FP arithmetic exception, and the application will get a SIGFPE
signal. The code that drives the change in the optimizer relies on
`llvm::isSafeToSpeculativelyExecute` to decide whether the division should be
performed speculatively. Right now, this function returns true. One could do
something similar to integer divisions and add a case there (this solved the
issue for me):
```
diff --git a/lib/Analysis/ValueTracking.cpp b/lib/Analysis/ValueTracking.cpp
index 1761dac..c61fefd 100644
--- a/lib/Analysis/ValueTracking.cpp
+++ b/lib/Analysis/ValueTracking.cpp
@@ -3352,6 +3352,21 @@ bool llvm::isSafeToSpeculativelyExecute(const Value *V,
     // The numerator *might* be MinSignedValue.
     return false;
   }
+  case Instruction::FDiv:
+  case Instruction::FRem:{
+    const Value *Denominator = Inst->getOperand(1);
+    // x / y is undefined if y == 0
+    // The denominator is not a constant, so there is nothing we can do to
prove
+    // it is non-zero.
+    if (auto *VV = dyn_cast<ConstantVector>(Denominator))
+      Denominator = VV->getSplatValue();
+    if (!isa<ConstantFP>(Denominator))
+      return false;
+    // The denominator is a zero constant - we can't speculate here.
+    if (m_AnyZero().match(Denominator))
+      return false;
+    return true;
+  }
   case Instruction::Load: {
     const LoadInst *LI = cast<LoadInst>(Inst);
     if (!LI->isUnordered() ||
```
I did my tests using a powerpc64le target, but I couldn't find any target
specific login involved in this transform. In any case, I wanted to drop the
questions:

- is there any target that would benefit from speculative fp divisions?
- is there any target for which fp division does not have side effects?

If not, I can go ahead and post the patch above for review.

Many thanks!
Sam

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev





_______________________________________________

LLVM Developers mailing list

llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>

http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170315/da42e216/attachment.html>

llvm dev - Mar 2017 - Speculative execution of FP divide Instructions - Call-Graph Simplify

[llvm-dev] Speculative execution of FP divide Instructions - Call-Graph Simplify

[llvm-dev] Speculative execution of FP divide Instructions - Call-Graph Simplify

[llvm-dev] Speculative execution of FP divide Instructions - Call-Graph Simplify