Sanjoy Das
2015-Mar-26 21:54 UTC
[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution
I've run into cases where, because not all of LLVM's optimizations understand the semantics of the `llvm.$op.with.overflow` intrinsics, canonicalizing compares to `llvm.$op.with.overflow` ends up preventing optimization. For instance, running the following snippet through `opt -indvars` optimizes `%to.optimize` to `true`, but running it through `opt -instcombine -indvars` does not. ``` declare void @side_effect(i1) define void @foo(i8 %x, i8 %y) { entry: %sum = add i8 %x, %y %e = icmp ugt i8 %x, %sum br i1 %e, label %exit, label %loop loop: %idx = phi i8 [ %x, %entry ], [ %idx.inc, %loop ] %idx.inc = add i8 %idx, 1 %to.optimize = icmp ule i8 %idx, %sum call void @side_effect(i1 %to.optimize) %c = icmp ule i8 %idx.inc, %sum br i1 %c, label %loop, label %exit exit: ret void } ``` This happens because `-instcombine` does the following tranform: ``` entry: %uadd = call { i8, i1 } @llvm.uadd.with.overflow.i8(i8 %x, i8 %y) %0 = extractvalue { i8, i1 } %uadd, 0 %e = extractvalue { i8, i1 } %uadd, 1 br i1 %e, label %exit, label %loop.preheader ``` and ScalarEvolution can no longer see through the `extractvalue` of the call to `llvm.uadd.with.overflow.i8`. The right way to fix this depends on the intended purpose of the `llvm.$op.with.overflow` intrinsics. Three solutions I can think of: * if they're a convenience useful only for better codegen, can the transform that instcombine is doing currently (transforming compares to `extractvalue` of a `.with.overflow` intrinsic) be moved to CodeGenPrep? * if they're something more fundamental, then maybe they should to be promoted to an instruction? They've been around since at least llvm 2.6 as far as I can tell. Personally, I seriously doubt this is a good idea, given that the semantics of these intrinsics can be completely described by a DAG composed of existing instructions. * add rules to ScalarEvolution to have it understand these intrinsics (or maybe even expand them before `-indvars` -- I think `-liv-reduce` tries to do this in some cases), but I'd vote for keeping this complexity out of ScalarEvolution unless there are good reasons why the `.with.overflow` calls need to be introduced before codegenprep. What do you think? -- Sanjoy
David Majnemer
2015-Mar-26 22:18 UTC
[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution
On Thu, Mar 26, 2015 at 2:54 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> I've run into cases where, because not all of LLVM's optimizations > understand the semantics of the `llvm.$op.with.overflow` intrinsics, > canonicalizing compares to `llvm.$op.with.overflow` ends up preventing > optimization. > > For instance, running the following snippet through `opt -indvars` > optimizes `%to.optimize` to `true`, but running it through > `opt -instcombine -indvars` does not. > > ``` > declare void @side_effect(i1) > > define void @foo(i8 %x, i8 %y) { > entry: > %sum = add i8 %x, %y > %e = icmp ugt i8 %x, %sum > br i1 %e, label %exit, label %loop > > loop: > %idx = phi i8 [ %x, %entry ], [ %idx.inc, %loop ] > %idx.inc = add i8 %idx, 1 > %to.optimize = icmp ule i8 %idx, %sum > call void @side_effect(i1 %to.optimize) > %c = icmp ule i8 %idx.inc, %sum > br i1 %c, label %loop, label %exit > > exit: > ret void > } > ``` > > This happens because `-instcombine` does the following tranform: > > ``` > entry: > %uadd = call { i8, i1 } @llvm.uadd.with.overflow.i8(i8 %x, i8 %y) > %0 = extractvalue { i8, i1 } %uadd, 0 > %e = extractvalue { i8, i1 } %uadd, 1 > br i1 %e, label %exit, label %loop.preheader > ``` > > and ScalarEvolution can no longer see through the `extractvalue` of > the call to `llvm.uadd.with.overflow.i8`. > > The right way to fix this depends on the intended purpose of the > `llvm.$op.with.overflow` intrinsics. Three solutions I can think of: > > * if they're a convenience useful only for better codegen, can the > transform that instcombine is doing currently (transforming > compares to `extractvalue` of a `.with.overflow` intrinsic) be > moved to CodeGenPrep? >Unfortunately, they are not a mere convenience. I've personally spent effort trying to optimize away overflow checks in InstCombine, it does this in its InstCombineCalls visitor.> > * if they're something more fundamental, then maybe they should to be > promoted to an instruction? They've been around since at least > llvm 2.6 as far as I can tell. Personally, I seriously doubt this > is a good idea, given that the semantics of these intrinsics can be > completely described by a DAG composed of existing instructions. >I think promoting them would be quite disruptive. Most passes don't care about these operations and can't do much with them.> > * add rules to ScalarEvolution to have it understand these intrinsics > (or maybe even expand them before `-indvars` -- I think > `-liv-reduce` tries to do this in some cases), but I'd vote for > keeping this complexity out of ScalarEvolution unless there are > good reasons why the `.with.overflow` calls need to be introduced > before codegenprep. >If we don't care about trying to optimize out overflow checks in InstCombine, I'd go with moving the complexity to CGP. However, I think InstCombine is doing the right thing here by forming these. I think there are two choices that keep our optimizations: - Teach SCEV, one way or another, about *_with_overflow. - Don't form overflow intrinsics in InstCombine but instead teach ProcessUGT_ADDCST_ADD, et al. to use WillNotOverflowUnsignedAdd, et al. to optimize away the compare. The second approach sounds much easier but I don't know what we lose by doing it.> > What do you think? > > -- Sanjoy >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150326/2b2138ed/attachment.html>
Andrew Trick
2015-Mar-26 23:37 UTC
[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution
> On Mar 26, 2015, at 2:54 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > I've run into cases where, because not all of LLVM's optimizations > understand the semantics of the `llvm.$op.with.overflow` intrinsics, > canonicalizing compares to `llvm.$op.with.overflow` ends up preventing > optimization. > > For instance, running the following snippet through `opt -indvars` > optimizes `%to.optimize` to `true`, but running it through > `opt -instcombine -indvars` does not. > > ``` > declare void @side_effect(i1) > > define void @foo(i8 %x, i8 %y) { > entry: > %sum = add i8 %x, %y > %e = icmp ugt i8 %x, %sum > br i1 %e, label %exit, label %loop > > loop: > %idx = phi i8 [ %x, %entry ], [ %idx.inc, %loop ] > %idx.inc = add i8 %idx, 1 > %to.optimize = icmp ule i8 %idx, %sum > call void @side_effect(i1 %to.optimize) > %c = icmp ule i8 %idx.inc, %sum > br i1 %c, label %loop, label %exit > > exit: > ret void > } > ``` > > This happens because `-instcombine` does the following tranform: > > ``` > entry: > %uadd = call { i8, i1 } @llvm.uadd.with.overflow.i8(i8 %x, i8 %y) > %0 = extractvalue { i8, i1 } %uadd, 0 > %e = extractvalue { i8, i1 } %uadd, 1 > br i1 %e, label %exit, label %loop.preheader > ```That’s new to me. I guess I missed that.> and ScalarEvolution can no longer see through the `extractvalue` of > the call to `llvm.uadd.with.overflow.i8`. > > The right way to fix this depends on the intended purpose of the > `llvm.$op.with.overflow` intrinsics. Three solutions I can think of: > > * if they're a convenience useful only for better codegen, can the > transform that instcombine is doing currently (transforming > compares to `extractvalue` of a `.with.overflow` intrinsic) be > moved to CodeGenPrep?The intrinsics tightly couple the operation with the branch, preventing the optimizer from obscuring the relationship, thereby effectively guaranteeing good codegen. If the intrinsics are needed to expose mid-level optimizations, I’m not aware of it. I think mid-level optimization should generally work on the canonical compare-and-branch form. I don't understand why we would want to generate intrinsics early unless the frontend does it. It seems that if InstCombine can pattern match these cases, then it can just as easily optimize the comparison directly. Isn’t it already doing that?> * if they're something more fundamental, then maybe they should to be > promoted to an instruction? They've been around since at least > llvm 2.6 as far as I can tell. Personally, I seriously doubt this > is a good idea, given that the semantics of these intrinsics can be > completely described by a DAG composed of existing instructions.Right. They are a way for the frontend to communicate directly with codegen, *preventing* optimization from getting in the way. That’s a pretty standard use for intrinsics.> * add rules to ScalarEvolution to have it understand these intrinsics > (or maybe even expand them before `-indvars` -- I think > `-liv-reduce` tries to do this in some cases), but I'd vote for > keeping this complexity out of ScalarEvolution unless there are > good reasons why the `.with.overflow` calls need to be introduced > before codegenprep. > > What do you think?I don’t think ScalarEvolution can possibly represent overflow conditions. -liv-reduce clones the data flow, decoupling the overflow check from the arithmetic, just hoping that ISEL can put it back together! I don’t think anyone actively uses at this point, so basically loop optimization is disabled if you use these intrinsics. If cloning+expanding the intrinsic (-liv-reduce) is really the best way to expose these to SCEV, then clearly InstCombine should not be fusing them to begin with. An alternative that avoids cloning+expanding the intrinsic would be to allow SCEV to analyze the expression for the intrinsic’s value result, but avoid replacing the intrinsic with a materialized SCEV expression (we would have no way to replace the overflow result anyway). To do that, I think the current uses of SCEVExpander need to be surveyed to ensure that they will handle lazily cloning the intrinsic when materializing the expression for the intrinsic’s value result. This would be wrong for LSR, where it assumes SCEV knows about all uses of the value, but that could probably be worked around or we could just bail on LSR. So: - I don’t understand why InstCombine is materializing the intrinsic - But it is still a good idea to teach SCEV clients how to handle code with the intrinsic Andy> > -- Sanjoy
Sanjoy Das
2015-Mar-27 06:39 UTC
[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution
> The intrinsics tightly couple the operation with the branch, preventing the optimizer from obscuring the relationship, thereby effectively guaranteeing good codegen. If the intrinsics are needed to expose mid-level optimizations, I’m not aware of it. I think mid-level optimization should generally work on the canonical compare-and-branch form.The places within opt (other than InstCombine) that special case the _with_overflow intrinsics are ConstantFolding, ValueTracking and GVN. Their extent of "special casing" is just pretending "extractvalue 0 (smul_with_overflow A B)" is "A * B", and the like so I don't think we'll lose optimization potential by not transforming IR to use the _with_overflow intrinsics in -instcombine. The _with_overflow instrinsics are also matched and handled in some places within SelectionDAG, ISel and some TTIs. I'm more worried about these as it is possible that missing out on the materializing _with_overflow may affect codegen quality. But we can always fix this by moving the "introduce _with_overflow" phase to within CodegenPrep.> An alternative that avoids cloning+expanding the intrinsic would be to allow SCEV to analyze the expression for the intrinsic’s value result, but avoid replacing the intrinsic with a materialized SCEV expression (we would have no way to replace the overflow result anyway).If I understand you correctly, I suspect this will be hard to do right because SCEV expressions will have to track state that they don't track currently (which Value* did this AddRecExpr come from?). We'll also end up with weird uniqueing issues -- is "extractvalue 0 (uadd.with.overflow(A, B))" the same SCEVAddExpr* as "A + B"? -- Sanjoy
Sanjoy Das
2015-Mar-27 06:52 UTC
[LLVMdev] `llvm.$op.with.overflow`, InstCombine and ScalarEvolution
> If we don't care about trying to optimize out overflow checks in > InstCombine, I'd go with moving the complexity to CGP.I think instcombine should optimize out overflow checks (as it does today) without introducing _with_overflow calls. Are there reasons why such an approach would not work?> However, I think > InstCombine is doing the right thing here by forming these.I don't quite agree with you on this -- by materializing these intrinsics InstCombine is making every pass that runs after it less effective. All of them have to know about the _with_overflow intrinsics to optimize IR that it could have otherwise optimized. Another example is GVN -- `opt -gvn` optimizes away %to.optimize but `opt -instcombine -gvn` does not. declare void @side_effect(i1) define void @foo(i8 %x, i8 %y) { entry: %sum = add i8 %x, %y %e = icmp ugt i8 %x, %sum br i1 %e, label %yes, label %no yes: %to.optimize = icmp ugt i8 %x, %sum call void @side_effect(i1 %to.optimize) br label %no no: ret void }