On Nov 12, 2010, at 11:04 AM, Evan Cheng wrote:>> %z = select i1 %cond, i32 -1, i32 %x >> %s = and i32 %z, %y >> >> Is a "conditional and". It would be interesting to know if the ARM backend gets this as a single predicated 'and' instruction (similarly for the 'or' and 'xor' version of these patterns). I bet not, which is bad if instcombine is canonicalizing this way. > > On ARM, an instruction predicated on false predicate is still executed so it's frequently undesirable. Because llvm canonicalize to select instruction, it already generates significantly more predicated instructions than gcc. We have seen some regressions due to overly aggressive select formation. > > By definition select requires both source operands to be evaluated. Given how good branch predicators are these days, I'm not surprised it often turns branching code performs better. ICC also almost never generates conditional moves.That may be, but I can't imagine that: $ cat t.ll define i32 @test(i32 %a, i32 %b, i32 %x, i32 %y) nounwind { %cond = icmp slt i32 %a, %b %z = select i1 %cond, i32 -1, i32 %x %s = and i32 %z, %y ret i32 %s } $ llc t.ll -o - -march=arm _test: @ @test @ BB#0: cmp r0, r1 mvn r12, #0 movlt r2, r12 and r0, r2, r3 bx lr is better than a cmp + conditional and + bx. -Chris
On Nov 12, 2010, at 11:09 AM, Chris Lattner wrote:> On Nov 12, 2010, at 11:04 AM, Evan Cheng wrote: >>> %z = select i1 %cond, i32 -1, i32 %x >>> %s = and i32 %z, %y >>> >>> Is a "conditional and". It would be interesting to know if the ARM backend gets this as a single predicated 'and' instruction (similarly for the 'or' and 'xor' version of these patterns). I bet not, which is bad if instcombine is canonicalizing this way. >> >> On ARM, an instruction predicated on false predicate is still executed so it's frequently undesirable. Because llvm canonicalize to select instruction, it already generates significantly more predicated instructions than gcc. We have seen some regressions due to overly aggressive select formation. >> >> By definition select requires both source operands to be evaluated. Given how good branch predicators are these days, I'm not surprised it often turns branching code performs better. ICC also almost never generates conditional moves. > > That may be, but I can't imagine that: > > $ cat t.ll > define i32 @test(i32 %a, i32 %b, i32 %x, i32 %y) nounwind { > %cond = icmp slt i32 %a, %b > %z = select i1 %cond, i32 -1, i32 %x > %s = and i32 %z, %y > ret i32 %s > } > $ llc t.ll -o - -march=arm > _test: @ @test > @ BB#0: > cmp r0, r1 > mvn r12, #0 > movlt r2, r12 > and r0, r2, r3 > bx lr > > is better than a cmp + conditional and + bx.This should be cmp r0, r1 movlt.w r2, #-1 @ or mvnlt r2, #0 and.w r0, r2, r3 bx lr which we gets right in Thumb2 mode (I need to check why it's not matching in ARM mode). How can we use a conditional and here? The result is either (y & -1) or (y & x), the "and" is not conditional. Evan> > -Chris >
On Nov 12, 2010, at 11:46 AM, Evan Cheng wrote:> > This should be > > cmp r0, r1 > movlt.w r2, #-1 @ or mvnlt r2, #0 > and.w r0, r2, r3 > bx lr > > which we gets right in Thumb2 mode (I need to check why it's not matching in ARM mode). How can we use a conditional and here? The result is either (y & -1) or (y & x), the "and" is not conditional.y&-1 == y. There is no need to materialize -1 as a constant. -Chris