<cc'ing llvmdev> On Nov 10, 2010, at 6:38 AM, Duncan Sands wrote:> Hi Chris, in commit 12800 you introduced the following transform: > > Implement select.ll:test12* > > This transforms code like this: > > %C = or %A, %B > %D = select %cond, %C, %A > into: > %C = select %cond, %B, 0 > %D = or %A, %C > > Since B is often a constant, the select can often be eliminated. In any case, > this reduces the usage count of A, allowing subsequent optimizations to > happen.Right, I still think this is goodness.> > Now consider the following example: > > %z = and i32 %x, %y > %s = select i1 %cond, i32 %y, i32 %z > %r = and i32 %x, %s > > Here %r could be simplified to > %r = and i32 %x, %y > (i.e. the same as %z) so replaced with %z everywhere. > > We even have logic to do this (ThreadBinOpOverSelect, which I just added) in > InstructionSimplify. The nice thing about InstructionSimplify is that it > doesn't ever create any new instructions, so is pretty much always a win. > The problem is that the logic never fires: your instcombine logic gets there > first, and transforms the example into > > %z = select i1 %cond, i32 -1, i32 %x > %s = and i32 %z, %y > %r = and i32 %s, %x > > which is not nearly as good.Yeah, that's not nearly as good as "%r = and i32 %x, %y" :) Note that: %z = select i1 %cond, i32 -1, i32 %x %s = and i32 %z, %y Is a "conditional and". It would be interesting to know if the ARM backend gets this as a single predicated 'and' instruction (similarly for the 'or' and 'xor' version of these patterns). I bet not, which is bad if instcombine is canonicalizing this way.> Do you have any thoughts for how to get the best of both worlds? How about > only applying your transform when one of the operands (eg B) is a constant?I don't think that's a great solution, because reducing # uses of a value is general goodness (increases odds that "hasOneUse()" xforms will kick in for example).> Alternatively, it could look at uses of the select and try to simplify those > first using InstructionSimplify (not sure if instcombine is allowed to > simplify a different instruction to the one it is visiting), or check to see > if some or all uses of the select would be simplified by InstructionSimplify, > and not perform the transform if so.Instead of turning this into a phase ordering issue, I'd rather increase the power of the folding logic to catch the general form. In this case, instead of handling this in instsimplify, why not handle this in reassociate? This is conceptually no different than (x & y & x) -> (x & y), it's just that in this case you have: (x & y & (x or -1)) which should fold to (x & y) just the same. -Chris
Hi Chris,>> We even have logic to do this (ThreadBinOpOverSelect, which I just added) in >> InstructionSimplify. The nice thing about InstructionSimplify is that it >> doesn't ever create any new instructions, so is pretty much always a win. >> The problem is that the logic never fires: your instcombine logic gets there >> first, and transforms the example into >> >> %z = select i1 %cond, i32 -1, i32 %x >> %s = and i32 %z, %y >> %r = and i32 %s, %x >> >> which is not nearly as good....> Instead of turning this into a phase ordering issue, I'd rather increase the power of the folding logic to catch the general form. In this case, instead of handling this in instsimplify, why not handle this in reassociate? > > This is conceptually no different than (x& y& x) -> (x& y), it's just that in this case you have: (x& y& (x or -1)) which should fold to (x& y) just the same.I actually came to the same conclusion, only since I didn't know about the reassociate pass I planned to do this in instcombine :) I will give it a go. Thanks for your help, Duncan.
Hi Chris,> Instead of turning this into a phase ordering issue, I'd rather increase the power of the folding logic to catch the general form. In this case, instead of handling this in instsimplify, why not handle this in reassociate? > > This is conceptually no different than (x& y& x) -> (x& y), it's just that in this case you have: (x& y& (x or -1)) which should fold to (x& y) just the same.in fact this already works, at least for the given example: -instcombine -reassociate -instcombine results in: original: %z = and i32 %x, %y %s = select i1 %cond, i32 %y, i32 %z %r = and i32 %x, %s after first instcombine: %z = select i1 %cond, i32 -1, i32 %x %s = and i32 %z, %y %r = and i32 %s, %x after reassociate: %z = select i1 %cond, i32 -1, i32 %x %s = and i32 %y, %x %r = and i32 %s, %z after second instcombine: %s = and i32 %y, %x This might be a coincidence, but I am happy for the moment :) Ciao, Duncan.
On Nov 11, 2010, at 8:30 AM, Duncan Sands wrote:> Hi Chris, > >> Instead of turning this into a phase ordering issue, I'd rather increase the power of the folding logic to catch the general form. In this case, instead of handling this in instsimplify, why not handle this in reassociate? >> >> This is conceptually no different than (x& y& x) -> (x& y), it's just that in this case you have: (x& y& (x or -1)) which should fold to (x& y) just the same. > > in fact this already works, at least for the given example: > -instcombine -reassociate -instcombine results in: > > original: > %z = and i32 %x, %y > %s = select i1 %cond, i32 %y, i32 %z > %r = and i32 %x, %s > > after first instcombine: > %z = select i1 %cond, i32 -1, i32 %x > %s = and i32 %z, %y > %r = and i32 %s, %x > > after reassociate: > %z = select i1 %cond, i32 -1, i32 %x > %s = and i32 %y, %x > %r = and i32 %s, %z > > after second instcombine: > %s = and i32 %y, %x > > This might be a coincidence, but I am happy for the moment :)Ok! If this "conditional and" idiom is common, it would still be worth teaching reassociate about it, so we don't run into phase order issues. -Chris
On Nov 10, 2010, at 11:39 PM, Chris Lattner wrote:>> > > Yeah, that's not nearly as good as "%r = and i32 %x, %y" :) > > Note that: > > %z = select i1 %cond, i32 -1, i32 %x > %s = and i32 %z, %y > > Is a "conditional and". It would be interesting to know if the ARM backend gets this as a single predicated 'and' instruction (similarly for the 'or' and 'xor' version of these patterns). I bet not, which is bad if instcombine is canonicalizing this way.On ARM, an instruction predicated on false predicate is still executed so it's frequently undesirable. Because llvm canonicalize to select instruction, it already generates significantly more predicated instructions than gcc. We have seen some regressions due to overly aggressive select formation. By definition select requires both source operands to be evaluated. Given how good branch predicators are these days, I'm not surprised it often turns branching code performs better. ICC also almost never generates conditional moves. Evan> >> Do you have any thoughts for how to get the best of both worlds? How about >> only applying your transform when one of the operands (eg B) is a constant? > > I don't think that's a great solution, because reducing # uses of a value is general goodness (increases odds that "hasOneUse()" xforms will kick in for example). > >> Alternatively, it could look at uses of the select and try to simplify those >> first using InstructionSimplify (not sure if instcombine is allowed to >> simplify a different instruction to the one it is visiting), or check to see >> if some or all uses of the select would be simplified by InstructionSimplify, >> and not perform the transform if so. > > Instead of turning this into a phase ordering issue, I'd rather increase the power of the folding logic to catch the general form. In this case, instead of handling this in instsimplify, why not handle this in reassociate? > > This is conceptually no different than (x & y & x) -> (x & y), it's just that in this case you have: (x & y & (x or -1)) which should fold to (x & y) just the same. > > -Chris > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20101112/aa8632e9/attachment.html>
On Nov 12, 2010, at 11:04 AM, Evan Cheng wrote:>> %z = select i1 %cond, i32 -1, i32 %x >> %s = and i32 %z, %y >> >> Is a "conditional and". It would be interesting to know if the ARM backend gets this as a single predicated 'and' instruction (similarly for the 'or' and 'xor' version of these patterns). I bet not, which is bad if instcombine is canonicalizing this way. > > On ARM, an instruction predicated on false predicate is still executed so it's frequently undesirable. Because llvm canonicalize to select instruction, it already generates significantly more predicated instructions than gcc. We have seen some regressions due to overly aggressive select formation. > > By definition select requires both source operands to be evaluated. Given how good branch predicators are these days, I'm not surprised it often turns branching code performs better. ICC also almost never generates conditional moves.That may be, but I can't imagine that: $ cat t.ll define i32 @test(i32 %a, i32 %b, i32 %x, i32 %y) nounwind { %cond = icmp slt i32 %a, %b %z = select i1 %cond, i32 -1, i32 %x %s = and i32 %z, %y ret i32 %s } $ llc t.ll -o - -march=arm _test: @ @test @ BB#0: cmp r0, r1 mvn r12, #0 movlt r2, r12 and r0, r2, r3 bx lr is better than a cmp + conditional and + bx. -Chris