thr3ads.net - llvm dev - [LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das

2015-Mar-26 21:54 UTC

[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution

I've run into cases where, because not all of LLVM's optimizations
understand the semantics of the `llvm.$op.with.overflow` intrinsics,
canonicalizing compares to `llvm.$op.with.overflow` ends up preventing
optimization.

For instance, running the following snippet through `opt -indvars`
optimizes `%to.optimize` to `true`, but running it through
`opt -instcombine -indvars` does not.

```
declare void @side_effect(i1)

define void @foo(i8 %x, i8 %y) {
 entry:
  %sum = add i8 %x, %y
  %e = icmp ugt i8 %x, %sum
  br i1 %e, label %exit, label %loop

 loop:
  %idx = phi i8 [ %x, %entry ], [ %idx.inc, %loop ]
  %idx.inc = add i8 %idx, 1
  %to.optimize = icmp ule i8 %idx, %sum
  call void @side_effect(i1 %to.optimize)
  %c = icmp ule i8 %idx.inc, %sum
  br i1 %c, label %loop, label %exit

 exit:
  ret void
}
```

This happens because `-instcombine` does the following tranform:

```
entry:
  %uadd = call { i8, i1 } @llvm.uadd.with.overflow.i8(i8 %x, i8 %y)
  %0 = extractvalue { i8, i1 } %uadd, 0
  %e = extractvalue { i8, i1 } %uadd, 1
  br i1 %e, label %exit, label %loop.preheader
```

and ScalarEvolution can no longer see through the `extractvalue` of
the call to `llvm.uadd.with.overflow.i8`.

The right way to fix this depends on the intended purpose of the
`llvm.$op.with.overflow` intrinsics.  Three solutions I can think of:

 * if they're a convenience useful only for better codegen, can the
   transform that instcombine is doing currently (transforming
   compares to `extractvalue` of a `.with.overflow` intrinsic) be
   moved to CodeGenPrep?

 * if they're something more fundamental, then maybe they should to be
   promoted to an instruction?  They've been around since at least
   llvm 2.6 as far as I can tell.  Personally, I seriously doubt this
   is a good idea, given that the semantics of these intrinsics can be
   completely described by a DAG composed of existing instructions.

 * add rules to ScalarEvolution to have it understand these intrinsics
   (or maybe even expand them before `-indvars` -- I think
   `-liv-reduce` tries to do this in some cases), but I'd vote for
   keeping this complexity out of ScalarEvolution unless there are
   good reasons why the `.with.overflow` calls need to be introduced
   before codegenprep.

What do you think?

-- Sanjoy

David Majnemer

2015-Mar-26 22:18 UTC

head link

[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution

On Thu, Mar 26, 2015 at 2:54 PM, Sanjoy Das <sanjoy at
playingwithpointers.com>
wrote:
> I've run into cases where, because not all of LLVM's optimizations
> understand the semantics of the `llvm.$op.with.overflow` intrinsics,
> canonicalizing compares to `llvm.$op.with.overflow` ends up preventing
> optimization.
>
> For instance, running the following snippet through `opt -indvars`
> optimizes `%to.optimize` to `true`, but running it through
> `opt -instcombine -indvars` does not.
>
> ```
> declare void @side_effect(i1)
>
> define void @foo(i8 %x, i8 %y) {
>  entry:
>   %sum = add i8 %x, %y
>   %e = icmp ugt i8 %x, %sum
>   br i1 %e, label %exit, label %loop
>
>  loop:
>   %idx = phi i8 [ %x, %entry ], [ %idx.inc, %loop ]
>   %idx.inc = add i8 %idx, 1
>   %to.optimize = icmp ule i8 %idx, %sum
>   call void @side_effect(i1 %to.optimize)
>   %c = icmp ule i8 %idx.inc, %sum
>   br i1 %c, label %loop, label %exit
>
>  exit:
>   ret void
> }
> ```
>
> This happens because `-instcombine` does the following tranform:
>
> ```
> entry:
>   %uadd = call { i8, i1 } @llvm.uadd.with.overflow.i8(i8 %x, i8 %y)
>   %0 = extractvalue { i8, i1 } %uadd, 0
>   %e = extractvalue { i8, i1 } %uadd, 1
>   br i1 %e, label %exit, label %loop.preheader
> ```
>
> and ScalarEvolution can no longer see through the `extractvalue` of
> the call to `llvm.uadd.with.overflow.i8`.
>
> The right way to fix this depends on the intended purpose of the
> `llvm.$op.with.overflow` intrinsics.  Three solutions I can think of:
>
>  * if they're a convenience useful only for better codegen, can the
>    transform that instcombine is doing currently (transforming
>    compares to `extractvalue` of a `.with.overflow` intrinsic) be
>    moved to CodeGenPrep?
>
Unfortunately, they are not a mere convenience.  I've personally spent
effort trying to optimize away overflow checks in InstCombine, it does this
in its InstCombineCalls visitor.

>
>  * if they're something more fundamental, then maybe they should to be
>    promoted to an instruction?  They've been around since at least
>    llvm 2.6 as far as I can tell.  Personally, I seriously doubt this
>    is a good idea, given that the semantics of these intrinsics can be
>    completely described by a DAG composed of existing instructions.
>
I think promoting them would be quite disruptive.  Most passes don't care
about these operations and can't do much with them.

>
>  * add rules to ScalarEvolution to have it understand these intrinsics
>    (or maybe even expand them before `-indvars` -- I think
>    `-liv-reduce` tries to do this in some cases), but I'd vote for
>    keeping this complexity out of ScalarEvolution unless there are
>    good reasons why the `.with.overflow` calls need to be introduced
>    before codegenprep.
>
If we don't care about trying to optimize out overflow checks in
InstCombine, I'd go with moving the complexity to CGP.  However, I think
InstCombine is doing the right thing here by forming these.  I think there
are two choices that keep our optimizations:
- Teach SCEV, one way or another, about *_with_overflow.
- Don't form overflow intrinsics in InstCombine but instead teach
ProcessUGT_ADDCST_ADD, et al. to use WillNotOverflowUnsignedAdd, et al. to
optimize away the compare.

The second approach sounds much easier but I don't know what we lose by
doing it.

>
> What do you think?
>
> -- Sanjoy
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150326/2b2138ed/attachment.html>

Andrew Trick

2015-Mar-26 23:37 UTC

head link

[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution

> On Mar 26, 2015, at 2:54 PM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
> I've run into cases where, because not all of LLVM's optimizations
> understand the semantics of the `llvm.$op.with.overflow` intrinsics,
> canonicalizing compares to `llvm.$op.with.overflow` ends up preventing
> optimization.
> 
> For instance, running the following snippet through `opt -indvars`
> optimizes `%to.optimize` to `true`, but running it through
> `opt -instcombine -indvars` does not.
> 
> ```
> declare void @side_effect(i1)
> 
> define void @foo(i8 %x, i8 %y) {
> entry:
>  %sum = add i8 %x, %y
>  %e = icmp ugt i8 %x, %sum
>  br i1 %e, label %exit, label %loop
> 
> loop:
>  %idx = phi i8 [ %x, %entry ], [ %idx.inc, %loop ]
>  %idx.inc = add i8 %idx, 1
>  %to.optimize = icmp ule i8 %idx, %sum
>  call void @side_effect(i1 %to.optimize)
>  %c = icmp ule i8 %idx.inc, %sum
>  br i1 %c, label %loop, label %exit
> 
> exit:
>  ret void
> }
> ```
> 
> This happens because `-instcombine` does the following tranform:
> 
> ```
> entry:
>  %uadd = call { i8, i1 } @llvm.uadd.with.overflow.i8(i8 %x, i8 %y)
>  %0 = extractvalue { i8, i1 } %uadd, 0
>  %e = extractvalue { i8, i1 } %uadd, 1
>  br i1 %e, label %exit, label %loop.preheader
> ```
That’s new to me. I guess I missed that.
> and ScalarEvolution can no longer see through the `extractvalue` of
> the call to `llvm.uadd.with.overflow.i8`.
> 
> The right way to fix this depends on the intended purpose of the
> `llvm.$op.with.overflow` intrinsics.  Three solutions I can think of:
> 
> * if they're a convenience useful only for better codegen, can the
>   transform that instcombine is doing currently (transforming
>   compares to `extractvalue` of a `.with.overflow` intrinsic) be
>   moved to CodeGenPrep?
The intrinsics tightly couple the operation with the branch, preventing the
optimizer from obscuring the relationship, thereby effectively guaranteeing good
codegen. If the intrinsics are needed to expose mid-level optimizations, I’m not
aware of it. I think mid-level optimization should generally work on the
canonical compare-and-branch form.

I don't understand why we would want to generate intrinsics early unless the
frontend does it. It seems that if InstCombine can pattern match these cases,
then it can just as easily optimize the comparison directly. Isn’t it already
doing that?
> * if they're something more fundamental, then maybe they should to be
>   promoted to an instruction?  They've been around since at least
>   llvm 2.6 as far as I can tell.  Personally, I seriously doubt this
>   is a good idea, given that the semantics of these intrinsics can be
>   completely described by a DAG composed of existing instructions.
Right. They are a way for the frontend to communicate directly with codegen,
*preventing* optimization from getting in the way. That’s a pretty standard use
for intrinsics.
> * add rules to ScalarEvolution to have it understand these intrinsics
>   (or maybe even expand them before `-indvars` -- I think
>   `-liv-reduce` tries to do this in some cases), but I'd vote for
>   keeping this complexity out of ScalarEvolution unless there are
>   good reasons why the `.with.overflow` calls need to be introduced
>   before codegenprep.
> 
> What do you think?
I don’t think ScalarEvolution can possibly represent overflow conditions.
-liv-reduce clones the data flow, decoupling the overflow check from the
arithmetic, just hoping that ISEL can put it back together! I don’t think anyone
actively uses at this point, so basically loop optimization is disabled if you
use these intrinsics.

If cloning+expanding the intrinsic (-liv-reduce) is really the best way to
expose these to SCEV, then clearly InstCombine should not be fusing them to
begin with.

An alternative that avoids cloning+expanding the intrinsic would be to allow
SCEV to analyze the expression for the intrinsic’s value result, but avoid
replacing the intrinsic with a materialized SCEV expression (we would have no
way to replace the overflow result anyway). To do that, I think the current uses
of SCEVExpander need to be surveyed to ensure that they will handle lazily
cloning the intrinsic when materializing the expression for the intrinsic’s
value result. This would be wrong for LSR, where it assumes SCEV knows about all
uses of the value, but that could probably be worked around or we could just
bail on LSR.

So:
- I don’t understand why InstCombine is materializing the intrinsic
- But it is still a good idea to teach SCEV clients how to handle code with the
intrinsic

Andy
> 
> -- Sanjoy

Sanjoy Das

2015-Mar-27 06:39 UTC

head link

[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution

> The intrinsics tightly couple the operation with the branch, preventing the
optimizer from obscuring the relationship, thereby effectively guaranteeing good
codegen. If the intrinsics are needed to expose mid-level optimizations, I’m not
aware of it. I think mid-level optimization should generally work on the
canonical compare-and-branch form.
The places within opt (other than InstCombine) that special case the
_with_overflow intrinsics are ConstantFolding, ValueTracking and GVN.
Their extent of "special casing" is just pretending "extractvalue
0
(smul_with_overflow A B)" is "A * B", and the like so I don't
think
we'll lose optimization potential by not transforming IR to use
the _with_overflow intrinsics in -instcombine.

The _with_overflow instrinsics are also matched and handled in some
places within SelectionDAG, ISel and some TTIs.  I'm more worried
about these as it is possible that missing out on the materializing
_with_overflow may affect codegen quality.  But we can always fix this
by moving the "introduce _with_overflow" phase to within CodegenPrep.
> An alternative that avoids cloning+expanding the intrinsic would be to
allow SCEV to analyze the expression for the intrinsic’s value result, but avoid
replacing the intrinsic with a materialized SCEV expression (we would have no
way to replace the overflow result anyway).
If I understand you correctly, I suspect this will be hard to do right
because SCEV expressions will
have to track state that they don't track currently (which Value* did
this AddRecExpr come from?).  We'll also end up with weird uniqueing
issues -- is "extractvalue 0
(uadd.with.overflow(A, B))" the same SCEVAddExpr* as "A + B"?

-- Sanjoy

Sanjoy Das

2015-Mar-27 06:52 UTC

head link

[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution

> If we don't care about trying to optimize out overflow checks in
> InstCombine, I'd go with moving the complexity to CGP.
I think instcombine should optimize out overflow checks (as it does
today) without introducing _with_overflow calls.  Are there reasons
why such an approach would not work?
> However, I think
> InstCombine is doing the right thing here by forming these.
I don't quite agree with you on this -- by materializing these
intrinsics InstCombine is making every pass that runs after it less
effective.  All of them have to know about the _with_overflow
intrinsics to optimize IR that it could have otherwise optimized.

Another example is GVN -- `opt -gvn` optimizes away %to.optimize but
`opt -instcombine -gvn` does not.

declare void @side_effect(i1)

define void @foo(i8 %x, i8 %y) {
 entry:
  %sum = add i8 %x, %y
  %e = icmp ugt i8 %x, %sum
  br i1 %e, label %yes, label %no

 yes:
  %to.optimize = icmp ugt i8 %x, %sum
  call void @side_effect(i1 %to.optimize)
  br label %no

 no:
  ret void
}

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Mar 2015 - [LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution

[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution

[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution

[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution

[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution

[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution

Maybe Matching Threads