Matthew Curtis
2012-Dec-18 17:56 UTC
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
Dan, Thanks for the response ... On 12/17/2012 1:53 PM, Dan Gohman wrote:> On Mon, Dec 10, 2012 at 2:13 PM, Matthew Curtis <mcurtis at codeaurora.org> wrote: >> Hello all, >> >> I wanted to get some feedback on this patch for ScalarEvolution. >> >> It addresses a performance problem I am seeing for simple benchmark. >> >> Starting with this C code: >> >> 01: signed char foo(void) >> 02: { >> 03: const int count = 8000; >> 04: signed char result = 0; >> 05: int j; >> 06: >> 07: for (j = 0; j < count; ++j) { >> 08: result += (result_t)(3); >> 09: } >> 10: >> 11: return result; >> 12: } > FWIW, this code is probably not doing what it was intended to do; it's > result is implementation-defined or an implementation-defined signal is > raised (6.3.1.3).Yes. The code is somewhat contrived. It's a much simplified equivalent of a benchmark. :)>> I end up with this IR feeding into Induction Variable Simplification: >> >> 01: define signext i8 @foo() nounwind readnone { >> 02: entry: >> 03: br label %for.body >> 04: >> 05: for.body: ; preds = %entry, >> %for.body >> 06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ] >> 07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ] >> 08: %conv2 = and i32 %result.03, 255 >> 09: %add = add nsw i32 %conv2, 3 >> 10: %inc = add nsw i32 %j.04, 1 >> 11: %cmp = icmp slt i32 %inc, 8000 >> 12: br i1 %cmp, label %for.body, label %for.end >> 13: >> 14: for.end: ; preds = %for.body >> 15: %conv1 = trunc i32 %add to i8 >> 16: ret i8 %conv1 >> 17: } > I'm confused about how you got this IR from that C testcase. What is > result_t? Did you do anything special?Sorry, result_t is signed char. I was experimenting with other result types and mistakenly copied an intermediate version of the code. The IR was produced with an internal build of clang for hexagon (with -O2). The IR produced by the current clang trunk is not quite the same.>> Unfortunately, the 'and' on line 8 prevents Scalar Evolution from >> being able to create an expression for '%add' that it knows how to >> evaluate. >> >> The patch detects this pattern in createNodeForPHI and creates an >> equivalent expression that can be evaluated. >> >> Note that SCEV translates the 'and' into >> ZeroExtend(Truncate(%result.03, i8), i32) >> >> So in terms of SCEV expressions, we essentially have >> %add[n] = Add(ZeroExtend(Truncate(%add[n-1], i8), i32), 3) >> >> (BTW, I'm no scholar here, just a layman, so my terminology is >> probably all wrong). > Going with your LLVM IR testcase, this looks right so far. > >> The patch basically moves the ZeroExtend and Truncate outside of the >> recurrence, though it must take into account that for iteration n, the >> ZeroExtend and Truncate apply to the value at iteration n-1. >> >> For a constant step this is accomplished by subtracting one step from >> the recurrence, performing the Truncate and ZeroExtend, and then >> adding the step to the result. In other words: >> >> %add[n] = Add( >> ZeroExtend( >> Truncate( >> Minus(AddRec(Start=0,Step=3)[n], 3), >> i8), >> i32), >> 3) >> >> It's a little more complicated when the Step is another recurrence, >> but essentially the same. > Something is wrong here. On the first iteration, the %add value is 3. > On the first iteration, your replacement expression's value is 256. > > DanHere's how I'm evaluating the expression (in my head): 00: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[n],3), i8), i32),3) | 01: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[0],3), i8), i32),3) | 02: Add(ZeroExtend(Truncate(Minus(3,3), i8), i32),3) | 03: Add(ZeroExtend(Truncate(0, i8), i32),3) | 04: Add(ZeroExtend(0, i32),3) | 05: Add(0,3) | 06: 3 Thanks again. Matthew Curtis -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Dan Gohman
2012-Dec-18 18:54 UTC
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
On Tue, Dec 18, 2012 at 9:56 AM, Matthew Curtis <mcurtis at codeaurora.org> wrote:> > Here's how I'm evaluating the expression (in my head): > > 00: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[n],3), i8), i32),3) > | > 01: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[0],3), i8), i32),3) > | > 02: Add(ZeroExtend(Truncate(Minus(3,3), i8), i32),3)This step is wrong. The start value of the addrec is 0. Dan> | > 03: Add(ZeroExtend(Truncate(0, i8), i32),3) > | > 04: Add(ZeroExtend(0, i32),3) > | > 05: Add(0,3) > | > 06: 3 > > Thanks again. > Matthew Curtis
Matthew Curtis
2012-Dec-20 16:36 UTC
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
Ok, so I think I've mis-represented what's really happening. Ignore my previous statements concerning %add :) Again, given: 05: for.body: ; preds = %entry, %for.body 06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ] 07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ] 08: %conv2 = and i32 %result.03, 255 09: %add = add nsw i32 %conv2, 3 10: %inc = add nsw i32 %j.04, 1 11: %cmp = icmp slt i32 %inc, 8000 12: br i1 %cmp, label %for.body, label %for.end LLVM executes the following: 01: createSCEV(%conv2 = and i32 %result.03, 255) 02: calls getSCEV(%result.03) 03: returns (3 + (zext i8 {-3,+,3}<%for.body> to i32)) 04: calls getTruncateExpr((3 + (zext i8 {-3,+,3}<%for.body> to i32)), i8) 05: calls getTruncateExpr(3) 06: returns 3 07: calls getTruncateExpt((zext i8 {-3,+,3}<%for.body> to i32)) 08: returns {-3,+,3}<%for.body> 09: returns {0,+,3}<%for.body> 10: calls getZeroExtendExpr({0,+,3}<%for.body>, i32); 11: returns (zext i8 {0,+,3}<%for.body> to i32) 12: returns (zext i8 {0,+,3}<%for.body> to i32) This expression is (I believe) correct for %conv2. The intent of the patch is to construct the correct (evaluatable) expression for %result.03 being fed into the computation of %conv2. Does that make more sense? Matthew C. On 12/18/2012 12:54 PM, Dan Gohman wrote:> On Tue, Dec 18, 2012 at 9:56 AM, Matthew Curtis <mcurtis at codeaurora.org> wrote: >> Here's how I'm evaluating the expression (in my head): >> >> 00: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[n],3), i8), i32),3) >> | >> 01: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[0],3), i8), i32),3) >> | >> 02: Add(ZeroExtend(Truncate(Minus(3,3), i8), i32),3) > This step is wrong. The start value of the addrec is 0. > > Dan > >> | >> 03: Add(ZeroExtend(Truncate(0, i8), i32),3) >> | >> 04: Add(ZeroExtend(0, i32),3) >> | >> 05: Add(0,3) >> | >> 06: 3 >> >> Thanks again. >> Matthew Curtis-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Reasonably Related Threads
- [LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
- [LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
- [LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
- [LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
- [LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)