thr3ads.net - llvm dev - [LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step) [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Matthew Curtis

2012-Dec-18 17:56 UTC

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

Dan,

Thanks for the response ...

On 12/17/2012 1:53 PM, Dan Gohman wrote:> On Mon, Dec 10, 2012 at 2:13 PM, Matthew Curtis <mcurtis at
codeaurora.org> wrote:
>> Hello all,
>>
>> I wanted to get some feedback on this patch for ScalarEvolution.
>>
>> It addresses a performance problem I am seeing for simple benchmark.
>>
>> Starting with this C code:
>>
>> 01: signed char foo(void)
>> 02: {
>> 03:   const int count = 8000;
>> 04:   signed char result = 0;
>> 05:   int j;
>> 06:
>> 07:   for (j = 0; j < count; ++j) {
>> 08:     result += (result_t)(3);
>> 09:   }
>> 10:
>> 11:   return result;
>> 12: }
> FWIW, this code is probably not doing what it was intended to do; it's
> result is implementation-deﬁned or an implementation-deﬁned signal is
> raised (6.3.1.3).Yes. The code is somewhat contrived. It's a much simplified equivalent 
of a benchmark. :)
>> I end up with this IR feeding into Induction Variable Simplification:
>>
>> 01: define signext i8 @foo() nounwind readnone {
>> 02: entry:
>> 03:   br label %for.body
>> 04:
>> 05: for.body:                                         ; preds = %entry,
>> %for.body
>> 06:   %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
>> 07:   %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ]
>> 08:   %conv2 = and i32 %result.03, 255
>> 09:   %add = add nsw i32 %conv2, 3
>> 10:   %inc = add nsw i32 %j.04, 1
>> 11:   %cmp = icmp slt i32 %inc, 8000
>> 12:   br i1 %cmp, label %for.body, label %for.end
>> 13:
>> 14: for.end:                                          ; preds =
%for.body
>> 15:   %conv1 = trunc i32 %add to i8
>> 16:   ret i8 %conv1
>> 17: }
> I'm confused about how you got this IR from that C testcase. What is
> result_t? Did you do anything special?
Sorry, result_t is signed char. I was experimenting with other result 
types and mistakenly copied an intermediate version of the code.

The IR was produced with an internal build of clang for hexagon (with 
-O2). The IR produced by the current clang trunk is not quite the same.
>> Unfortunately, the 'and' on line 8 prevents Scalar Evolution
from
>> being able to create an expression for '%add' that it knows how
to
>> evaluate.
>>
>> The patch detects this pattern in createNodeForPHI and creates an
>> equivalent expression that can be evaluated.
>>
>> Note that SCEV translates the 'and' into
>>    ZeroExtend(Truncate(%result.03, i8), i32)
>>
>> So in terms of SCEV expressions, we essentially have
>>    %add[n] = Add(ZeroExtend(Truncate(%add[n-1], i8), i32), 3)
>>
>> (BTW, I'm no scholar here, just a layman, so my terminology is
>> probably all wrong).
> Going with your LLVM IR testcase, this looks right so far.
>
>> The patch basically moves the ZeroExtend and Truncate outside of the
>> recurrence, though it must take into account that for iteration n, the
>> ZeroExtend and Truncate apply to the value at iteration n-1.
>>
>> For a constant step this is accomplished by subtracting one step from
>> the recurrence, performing the Truncate and ZeroExtend, and then
>> adding the step to the result. In other words:
>>
>>    %add[n] = Add(
>>                ZeroExtend(
>>                  Truncate(
>>                    Minus(AddRec(Start=0,Step=3)[n], 3),
>>                    i8),
>>                  i32),
>>                3)
>>
>> It's a little more complicated when the Step is another recurrence,
>> but essentially the same.
> Something is wrong here. On the first iteration, the %add value is 3.
> On the first iteration, your replacement expression's value is 256.
>
> Dan
Here's how I'm evaluating the expression (in my head):

00: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[n],3), i8), i32),3)
                                                          |
01: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[0],3), i8), i32),3)
                                   |
02: Add(ZeroExtend(Truncate(Minus(3,3), i8), i32),3)
                             |
03: Add(ZeroExtend(Truncate(0, i8), i32),3)
                    |
04: Add(ZeroExtend(0, i32),3)
         |
05: Add(0,3)
     |
06: 3

Thanks again.
Matthew Curtis

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The
Linux Foundation

Dan Gohman

2012-Dec-18 18:54 UTC

head link

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

On Tue, Dec 18, 2012 at 9:56 AM, Matthew Curtis <mcurtis at
codeaurora.org> wrote:>
> Here's how I'm evaluating the expression (in my head):
>
> 00: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[n],3), i8),
i32),3)
>                                                          |
> 01: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[0],3), i8),
i32),3)
>                                   |
> 02: Add(ZeroExtend(Truncate(Minus(3,3), i8), i32),3)
This step is wrong. The start value of the addrec is 0.

Dan
>                             |
> 03: Add(ZeroExtend(Truncate(0, i8), i32),3)
>                    |
> 04: Add(ZeroExtend(0, i32),3)
>         |
> 05: Add(0,3)
>     |
> 06: 3
>
> Thanks again.
> Matthew Curtis

Matthew Curtis

2012-Dec-20 16:36 UTC

head link

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

Ok, so I think I've mis-represented what's really happening.
Ignore my previous statements concerning %add :)

Again, given:
05:  for.body:                                         ; preds = %entry, 
%for.body
06:    %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
07:    %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ]
08:    %conv2 = and i32 %result.03, 255
09:    %add = add nsw i32 %conv2, 3
10:    %inc = add nsw i32 %j.04, 1
11:    %cmp = icmp slt i32 %inc, 8000
12:    br i1 %cmp, label %for.body, label %for.end

LLVM executes the following:

01:  createSCEV(%conv2 = and i32 %result.03, 255)
02:    calls getSCEV(%result.03)
03:       returns (3 + (zext i8 {-3,+,3}<%for.body> to i32))
04:    calls getTruncateExpr((3 + (zext i8 {-3,+,3}<%for.body> to i32)),
i8)
05:       calls getTruncateExpr(3)
06:          returns 3
07:       calls getTruncateExpt((zext i8 {-3,+,3}<%for.body> to i32))
08:          returns {-3,+,3}<%for.body>
09:       returns {0,+,3}<%for.body>
10:    calls getZeroExtendExpr({0,+,3}<%for.body>, i32);
11:       returns (zext i8 {0,+,3}<%for.body> to i32)
12:    returns (zext i8 {0,+,3}<%for.body> to i32)

This expression is (I believe) correct for %conv2.

The intent of the patch is to construct the correct (evaluatable)
expression for %result.03 being fed into the computation of %conv2.

Does that make more sense?

Matthew C.

On 12/18/2012 12:54 PM, Dan Gohman wrote:> On Tue, Dec 18, 2012 at 9:56 AM, Matthew Curtis <mcurtis at
codeaurora.org> wrote:
>> Here's how I'm evaluating the expression (in my head):
>>
>> 00: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[n],3), i8),
i32),3)
>>                                                           |
>> 01: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[0],3), i8),
i32),3)
>>                                    |
>> 02: Add(ZeroExtend(Truncate(Minus(3,3), i8), i32),3)
> This step is wrong. The start value of the addrec is 0.
>
> Dan
>
>>                              |
>> 03: Add(ZeroExtend(Truncate(0, i8), i32),3)
>>                     |
>> 04: Add(ZeroExtend(0, i32),3)
>>          |
>> 05: Add(0,3)
>>      |
>> 06: 3
>>
>> Thanks again.
>> Matthew Curtis

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The
Linux Foundation

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Dec 2012 - [LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

Reasonably Related Threads