Matthew Curtis
2012-Dec-10  22:13 UTC
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
Hello all,
I wanted to get some feedback on this patch for ScalarEvolution.
It addresses a performance problem I am seeing for simple benchmark.
Starting with this C code:
01: signed char foo(void)
02: {
03:   const int count = 8000;
04:   signed char result = 0;
05:   int j;
06:
07:   for (j = 0; j < count; ++j) {
08:     result += (result_t)(3);
09:   }
10:
11:   return result;
12: }
I end up with this IR feeding into Induction Variable Simplification:
01: define signext i8 @foo() nounwind readnone {
02: entry:
03:   br label %for.body
04:
05: for.body:                                         ; preds = %entry, 
%for.body
06:   %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
07:   %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ]
08:   %conv2 = and i32 %result.03, 255
09:   %add = add nsw i32 %conv2, 3
10:   %inc = add nsw i32 %j.04, 1
11:   %cmp = icmp slt i32 %inc, 8000
12:   br i1 %cmp, label %for.body, label %for.end
13:
14: for.end:                                          ; preds = %for.body
15:   %conv1 = trunc i32 %add to i8
16:   ret i8 %conv1
17: }
Unfortunately, the 'and' on line 8 prevents Scalar Evolution from
being able to create an expression for '%add' that it knows how to
evaluate.
The patch detects this pattern in createNodeForPHI and creates an
equivalent expression that can be evaluated.
Note that SCEV translates the 'and' into
   ZeroExtend(Truncate(%result.03, i8), i32)
So in terms of SCEV expressions, we essentially have
   %add[n] = Add(ZeroExtend(Truncate(%add[n-1], i8), i32), 3)
(BTW, I'm no scholar here, just a layman, so my terminology is
probably all wrong).
The patch basically moves the ZeroExtend and Truncate outside of the
recurrence, though it must take into account that for iteration n, the
ZeroExtend and Truncate apply to the value at iteration n-1.
For a constant step this is accomplished by subtracting one step from
the recurrence, performing the Truncate and ZeroExtend, and then
adding the step to the result. In other words:
   %add[n] = Add(
               ZeroExtend(
                 Truncate(
                   Minus(AddRec(Start=0,Step=3)[n], 3),
                   i8),
                 i32),
               3)
It's a little more complicated when the Step is another recurrence,
but essentially the same.
Thoughts?
Matthew C.
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The
Linux Foundation
-------------- next part -------------->From e9ca53595ee7a274308c823d14cccd5a8598814d Mon Sep 17 00:00:00 2001
From: Matthew Curtis <mcurtis at codeaurora.org>
Date: Mon, 3 Dec 2012 17:25:20 -0600
Subject: [PATCH] Teach ScalarEvolution to handle IV=add(zext(trunc(IV)),
 Accum)
Code generated for 8-bit and 16-bit induction variables commonly
produces this pattern (on hexagon), where the add is performed as a
32-bit operation, but is preceeded by a truncate and zero-extend (or
equivalent bitwise and).
This change teaches ScalarEvolution to recognize this pattern and
create the right SCEVExpr's so that the induction variable values can
be computed at compile-time.
---
 include/llvm/Analysis/ScalarEvolution.h            |    4 +
 lib/Analysis/ScalarEvolution.cpp                   |   69 +++++++++--
 .../ScalarEvolution/2012-11-30-AddOfAnd.ll         |  125 ++++++++++++++++++++
 3 files changed, 188 insertions(+), 10 deletions(-)
 create mode 100644 test/Analysis/ScalarEvolution/2012-11-30-AddOfAnd.ll
diff --git a/include/llvm/Analysis/ScalarEvolution.h
b/include/llvm/Analysis/ScalarEvolution.h
index 235adca..79e545d 100644
--- a/include/llvm/Analysis/ScalarEvolution.h
+++ b/include/llvm/Analysis/ScalarEvolution.h
@@ -654,6 +654,10 @@ namespace llvm {
     const SCEV *getMinusSCEV(const SCEV *LHS, const SCEV *RHS,
                              SCEV::NoWrapFlags Flags = SCEV::FlagAnyWrap);
 
+    /// Return a SCEV that truncates 'V' to 'Ty' *and* then
zero extends that
+    /// value back to 'V's original type.
+    const SCEV *getTruncateAndZeroExtend(const SCEV *V, Type *Ty);
+
     /// getTruncateOrZeroExtend - Return a SCEV corresponding to a conversion
     /// of the input value to the specified type.  If the type must be
     /// extended, it is zero extended.
diff --git a/lib/Analysis/ScalarEvolution.cpp b/lib/Analysis/ScalarEvolution.cpp
index 7bc848a..5ce4fd2 100644
--- a/lib/Analysis/ScalarEvolution.cpp
+++ b/lib/Analysis/ScalarEvolution.cpp
@@ -2770,6 +2770,17 @@ const SCEV *ScalarEvolution::getMinusSCEV(const SCEV
*LHS, const SCEV *RHS,
   return getAddExpr(LHS, getNegativeSCEV(RHS), Flags);
 }
 
+const SCEV *
+ScalarEvolution::getTruncateAndZeroExtend(const SCEV *V, Type *Ty) {
+  Type *SrcTy = V->getType();
+  assert((SrcTy->isIntegerTy() || SrcTy->isPointerTy()) &&
+         (Ty->isIntegerTy() || Ty->isPointerTy()) &&
+         "Cannot truncate and zero extend with non-integer
arguments!");
+  const SCEV *trunc = getTruncateExpr(V, Ty);
+  const SCEV *zext = getZeroExtendExpr(trunc, V->getType());
+  return zext;
+}
+
 /// getTruncateOrZeroExtend - Return a SCEV corresponding to a conversion of
the
 /// input value to the specified type.  If the type must be extended, it is
zero
 /// extended.
@@ -3029,13 +3040,22 @@ const SCEV *ScalarEvolution::createNodeForPHI(PHINode
*PN) {
         if (const SCEVAddExpr *Add = dyn_cast<SCEVAddExpr>(BEValue)) {
           // If there is a single occurrence of the symbolic value, replace it
           // with a recurrence.
+          const SCEVZeroExtendExpr *zext = 0;
+          const SCEVTruncateExpr *trunc = 0;
           unsigned FoundIndex = Add->getNumOperands();
-          for (unsigned i = 0, e = Add->getNumOperands(); i != e; ++i)
-            if (Add->getOperand(i) == SymbolicName)
-              if (FoundIndex == e) {
-                FoundIndex = i;
-                break;
-              }
+          for (unsigned i = 0, e = Add->getNumOperands(); i != e; ++i) {
+            const SCEV *Op = Add->getOperand(i);
+            if (Op == SymbolicName) {
+              FoundIndex = i;
+              break;
+            }
+            if ((zext = dyn_cast<SCEVZeroExtendExpr>(Op)) &&
+                (trunc =
dyn_cast<SCEVTruncateExpr>(zext->getOperand())) &&
+                (trunc->getOperand() == SymbolicName)) {
+              FoundIndex = i;
+              break;
+            }
+          }
 
           if (FoundIndex != Add->getNumOperands()) {
             // Create an add with everything but the specified operand.
@@ -3044,10 +3064,10 @@ const SCEV *ScalarEvolution::createNodeForPHI(PHINode
*PN) {
               if (i != FoundIndex)
                 Ops.push_back(Add->getOperand(i));
             const SCEV *Accum = getAddExpr(Ops);
-
+            bool AccumIsLoopInvariant = isLoopInvariant(Accum, L);
             // This is not a valid addrec if the step amount is varying each
             // loop iteration, but is not itself an addrec in this loop.
-            if (isLoopInvariant(Accum, L) ||
+            if (AccumIsLoopInvariant ||
                 (isa<SCEVAddRecExpr>(Accum) &&
                  cast<SCEVAddRecExpr>(Accum)->getLoop() == L)) {
               SCEV::NoWrapFlags Flags = SCEV::FlagAnyWrap;
@@ -3071,11 +3091,40 @@ const SCEV *ScalarEvolution::createNodeForPHI(PHINode
*PN) {
               }
 
               const SCEV *StartVal = getSCEV(StartValueV);
-              const SCEV *PHISCEV = getAddRecExpr(StartVal, Accum, L, Flags);
+              const SCEV *PHISCEV;
+              if (zext && trunc) {
+                // We have an induction variable 'iv' of the form:
+                //   iv=add(zext(trunc(iv)), Accum)
+                // The basic strategy is to turn this into
+                //   iv'=add(iv',Accum)
+                //   iv=zext(trunc(iv'-Accum))+Accum
+                if (AccumIsLoopInvariant) {
+                  // Accum is a simple constant, just adjust the start value.
+                  PHISCEV = getMinusSCEV(StartVal, Accum);
+                  PHISCEV = getAddRecExpr(PHISCEV, Accum, L, Flags);
+                } else if (const SCEVAddRecExpr *AccumAddRec +                 
dyn_cast<SCEVAddRecExpr>(Accum)) {
+                  // Accum is another recurrence, apply some recurrence algebra
+                  //       iv': {StartVal,+,Accum}
+                  //     Accum: {a0,+,a1}
+                  // iv'-accum: {StartVal-a0,+,Accum-a1}
+                  const SCEV *SV +                    getMinusSCEV(StartVal,
AccumAddRec->getOperand(0));
+                  const SCEV *Ac +                    getMinusSCEV(Accum,
AccumAddRec->getStepRecurrence(*this));
+                  PHISCEV = getAddRecExpr(SV, Ac, L, Flags);
+                } else {
+                  assert("Unexpected type of Accum");
+                }
+                PHISCEV = getTruncateAndZeroExtend(PHISCEV,
trunc->getType());
+                PHISCEV = getAddExpr(PHISCEV, Accum);
+              } else {
+                PHISCEV = getAddRecExpr(StartVal, Accum, L, Flags);
+              }
 
               // Since the no-wrap flags are on the increment, they apply to
the
               // post-incremented value as well.
-              if (isLoopInvariant(Accum, L))
+              if (AccumIsLoopInvariant)
                 (void)getAddRecExpr(getAddExpr(StartVal, Accum),
                                     Accum, L, Flags);
 
diff --git a/test/Analysis/ScalarEvolution/2012-11-30-AddOfAnd.ll
b/test/Analysis/ScalarEvolution/2012-11-30-AddOfAnd.ll
new file mode 100644
index 0000000..2e8ebd4
--- /dev/null
+++ b/test/Analysis/ScalarEvolution/2012-11-30-AddOfAnd.ll
@@ -0,0 +1,125 @@
+; RUN: opt < %s -S -scalar-evolution -analyze | FileCheck %s
+;
+; Verify that ScalarEvolution handles induction variables of the form
+;   iv=add(and(iv,(2^n)-1), Accum)
+;
+
+; CHECK: @foo_6
+; CHECK: %add +; CHECK-NEXT: Exits: 64
+define i32 @foo_6() nounwind readnone {
+entry:
+  br label %for.body
+
+for.body:                                         ; preds = %entry, %for.body
+  %storemerge2 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %conv = and i32 %0, 63
+  %add = add nsw i32 %conv, 3
+  %inc = add nsw i32 %storemerge2, 1
+  %cmp = icmp slt i32 %inc, 8000
+  br i1 %cmp, label %for.body, label %for.end
+
+for.end:                                          ; preds = %for.body
+  %fold = add i32 %0, 3
+  %conv2 = and i32 %fold, 63
+  ret i32 %conv2
+}
+
+; CHECK: @foo_8
+; CHECK: %add +; CHECK-NEXT: Exits: 192
+define i32 @foo_8() nounwind readnone {
+entry:
+  br label %for.body
+
+for.body:                                         ; preds = %entry, %for.body
+  %storemerge2 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %conv = and i32 %0, 255
+  %add = add nsw i32 %conv, 3
+  %inc = add nsw i32 %storemerge2, 1
+  %cmp = icmp slt i32 %inc, 8000
+  br i1 %cmp, label %for.body, label %for.end
+
+for.end:                                          ; preds = %for.body
+  %fold = add i32 %0, 3
+  %conv2 = and i32 %fold, 255
+  ret i32 %conv2
+}
+
+; CHECK: @foo_11
+; CHECK: %add +; CHECK-NEXT: Exits: 1472
+define i32 @foo_11() nounwind readnone {
+entry:
+  br label %for.body
+
+for.body:                                         ; preds = %entry, %for.body
+  %storemerge2 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %conv = and i32 %0, 2047
+  %add = add nsw i32 %conv, 3
+  %inc = add nsw i32 %storemerge2, 1
+  %cmp = icmp slt i32 %inc, 8000
+  br i1 %cmp, label %for.body, label %for.end
+
+for.end:                                          ; preds = %for.body
+  %fold = add i32 %0, 3
+  %conv2 = and i32 %fold, 2047
+  ret i32 %conv2
+}
+
+
+; CHECK: @foo_16
+; CHECK: %add +; CHECK-NEXT: Exits: 24000
+define i32 @foo_16() nounwind readnone {
+entry:
+  br label %for.body
+
+for.body:                                         ; preds = %entry, %for.body
+  %storemerge2 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %conv = and i32 %0, 65535
+  %add = add nsw i32 %conv, 3
+  %inc = add nsw i32 %storemerge2, 1
+  %cmp = icmp slt i32 %inc, 8000
+  br i1 %cmp, label %for.body, label %for.end
+
+for.end:                                          ; preds = %for.body
+  %fold = add i32 %0, 3
+  %conv2 = and i32 %fold, 65535
+  ret i32 %conv2
+}
+
+; CHECK: @kernel
+; CHECK: %add +; CHECK-NEXT: Exits: (9 + (zext i8 (36 + (trunc i32 %0 to i8))
to i32))
+define signext i8 @kernel() nounwind readnone {
+entry:
+  br label %for.cond1.preheader
+
+for.cond1.preheader:                              ; preds = %entry, %for.inc7
+  %storemerge7 = phi i32 [ 0, %entry ], [ %inc8, %for.inc7 ]
+  %0 = phi i32 [ 0, %entry ], [ %add, %for.inc7 ]
+  br label %for.body3
+
+for.body3:                                        ; preds =
%for.cond1.preheader, %for.body3
+  %storemerge15 = phi i32 [ 0, %for.cond1.preheader ], [ %inc, %for.body3 ]
+  %1 = phi i32 [ %0, %for.cond1.preheader ], [ %add, %for.body3 ]
+  %conv2 = and i32 %1, 255
+  %add = add nsw i32 %conv2, %storemerge15
+  %conv6 = trunc i32 %add to i8
+  %inc = add nsw i32 %storemerge15, 1
+  %cmp2 = icmp slt i32 %inc, 10
+  br i1 %cmp2, label %for.body3, label %for.inc7
+
+for.inc7:                                         ; preds = %for.body3
+  %inc8 = add nsw i32 %storemerge7, 1
+  %cmp = icmp slt i32 %inc8, 10
+  br i1 %cmp, label %for.cond1.preheader, label %for.end9
+
+for.end9:                                         ; preds = %for.inc7
+  ret i8 %conv6
+}
-- 
1.7.8.3
Matthew Curtis
2012-Dec-17  12:46 UTC
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
Any ScalarEvolution experts out there have any thought about this patch? Thanks, Matthew Curtis. On 12/10/2012 4:13 PM, Matthew Curtis wrote:> Hello all, > > I wanted to get some feedback on this patch for ScalarEvolution. > > It addresses a performance problem I am seeing for simple benchmark. > > Starting with this C code: > > 01: signed char foo(void) > 02: { > 03: const int count = 8000; > 04: signed char result = 0; > 05: int j; > 06: > 07: for (j = 0; j < count; ++j) { > 08: result += (result_t)(3); > 09: } > 10: > 11: return result; > 12: } > > I end up with this IR feeding into Induction Variable Simplification: > > 01: define signext i8 @foo() nounwind readnone { > 02: entry: > 03: br label %for.body > 04: > 05: for.body: ; preds = > %entry, %for.body > 06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ] > 07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ] > 08: %conv2 = and i32 %result.03, 255 > 09: %add = add nsw i32 %conv2, 3 > 10: %inc = add nsw i32 %j.04, 1 > 11: %cmp = icmp slt i32 %inc, 8000 > 12: br i1 %cmp, label %for.body, label %for.end > 13: > 14: for.end: ; preds = %for.body > 15: %conv1 = trunc i32 %add to i8 > 16: ret i8 %conv1 > 17: } > > Unfortunately, the 'and' on line 8 prevents Scalar Evolution from > being able to create an expression for '%add' that it knows how to > evaluate. > > The patch detects this pattern in createNodeForPHI and creates an > equivalent expression that can be evaluated. > > Note that SCEV translates the 'and' into > ZeroExtend(Truncate(%result.03, i8), i32) > > So in terms of SCEV expressions, we essentially have > %add[n] = Add(ZeroExtend(Truncate(%add[n-1], i8), i32), 3) > > (BTW, I'm no scholar here, just a layman, so my terminology is > probably all wrong). > > The patch basically moves the ZeroExtend and Truncate outside of the > recurrence, though it must take into account that for iteration n, the > ZeroExtend and Truncate apply to the value at iteration n-1. > > For a constant step this is accomplished by subtracting one step from > the recurrence, performing the Truncate and ZeroExtend, and then > adding the step to the result. In other words: > > %add[n] = Add( > ZeroExtend( > Truncate( > Minus(AddRec(Start=0,Step=3)[n], 3), > i8), > i32), > 3) > > It's a little more complicated when the Step is another recurrence, > but essentially the same. > > Thoughts? > > Matthew C. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121217/9e9f6799/attachment.html>
Dan Gohman
2012-Dec-17  19:53 UTC
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
On Mon, Dec 10, 2012 at 2:13 PM, Matthew Curtis <mcurtis at codeaurora.org> wrote:> Hello all, > > I wanted to get some feedback on this patch for ScalarEvolution. > > It addresses a performance problem I am seeing for simple benchmark. > > Starting with this C code: > > 01: signed char foo(void) > 02: { > 03: const int count = 8000; > 04: signed char result = 0; > 05: int j; > 06: > 07: for (j = 0; j < count; ++j) { > 08: result += (result_t)(3); > 09: } > 10: > 11: return result; > 12: }FWIW, this code is probably not doing what it was intended to do; it's result is implementation-defined or an implementation-defined signal is raised (6.3.1.3).> > I end up with this IR feeding into Induction Variable Simplification: > > 01: define signext i8 @foo() nounwind readnone { > 02: entry: > 03: br label %for.body > 04: > 05: for.body: ; preds = %entry, > %for.body > 06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ] > 07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ] > 08: %conv2 = and i32 %result.03, 255 > 09: %add = add nsw i32 %conv2, 3 > 10: %inc = add nsw i32 %j.04, 1 > 11: %cmp = icmp slt i32 %inc, 8000 > 12: br i1 %cmp, label %for.body, label %for.end > 13: > 14: for.end: ; preds = %for.body > 15: %conv1 = trunc i32 %add to i8 > 16: ret i8 %conv1 > 17: }I'm confused about how you got this IR from that C testcase. What is result_t? Did you do anything special?> Unfortunately, the 'and' on line 8 prevents Scalar Evolution from > being able to create an expression for '%add' that it knows how to > evaluate. > > The patch detects this pattern in createNodeForPHI and creates an > equivalent expression that can be evaluated. > > Note that SCEV translates the 'and' into > ZeroExtend(Truncate(%result.03, i8), i32) > > So in terms of SCEV expressions, we essentially have > %add[n] = Add(ZeroExtend(Truncate(%add[n-1], i8), i32), 3) > > (BTW, I'm no scholar here, just a layman, so my terminology is > probably all wrong).Going with your LLVM IR testcase, this looks right so far.> > The patch basically moves the ZeroExtend and Truncate outside of the > recurrence, though it must take into account that for iteration n, the > ZeroExtend and Truncate apply to the value at iteration n-1. > > For a constant step this is accomplished by subtracting one step from > the recurrence, performing the Truncate and ZeroExtend, and then > adding the step to the result. In other words: > > %add[n] = Add( > ZeroExtend( > Truncate( > Minus(AddRec(Start=0,Step=3)[n], 3), > i8), > i32), > 3) > > It's a little more complicated when the Step is another recurrence, > but essentially the same.Something is wrong here. On the first iteration, the %add value is 3. On the first iteration, your replacement expression's value is 256. Dan
Matthew Curtis
2012-Dec-18  17:56 UTC
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
Dan, Thanks for the response ... On 12/17/2012 1:53 PM, Dan Gohman wrote:> On Mon, Dec 10, 2012 at 2:13 PM, Matthew Curtis <mcurtis at codeaurora.org> wrote: >> Hello all, >> >> I wanted to get some feedback on this patch for ScalarEvolution. >> >> It addresses a performance problem I am seeing for simple benchmark. >> >> Starting with this C code: >> >> 01: signed char foo(void) >> 02: { >> 03: const int count = 8000; >> 04: signed char result = 0; >> 05: int j; >> 06: >> 07: for (j = 0; j < count; ++j) { >> 08: result += (result_t)(3); >> 09: } >> 10: >> 11: return result; >> 12: } > FWIW, this code is probably not doing what it was intended to do; it's > result is implementation-defined or an implementation-defined signal is > raised (6.3.1.3).Yes. The code is somewhat contrived. It's a much simplified equivalent of a benchmark. :)>> I end up with this IR feeding into Induction Variable Simplification: >> >> 01: define signext i8 @foo() nounwind readnone { >> 02: entry: >> 03: br label %for.body >> 04: >> 05: for.body: ; preds = %entry, >> %for.body >> 06: %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ] >> 07: %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ] >> 08: %conv2 = and i32 %result.03, 255 >> 09: %add = add nsw i32 %conv2, 3 >> 10: %inc = add nsw i32 %j.04, 1 >> 11: %cmp = icmp slt i32 %inc, 8000 >> 12: br i1 %cmp, label %for.body, label %for.end >> 13: >> 14: for.end: ; preds = %for.body >> 15: %conv1 = trunc i32 %add to i8 >> 16: ret i8 %conv1 >> 17: } > I'm confused about how you got this IR from that C testcase. What is > result_t? Did you do anything special?Sorry, result_t is signed char. I was experimenting with other result types and mistakenly copied an intermediate version of the code. The IR was produced with an internal build of clang for hexagon (with -O2). The IR produced by the current clang trunk is not quite the same.>> Unfortunately, the 'and' on line 8 prevents Scalar Evolution from >> being able to create an expression for '%add' that it knows how to >> evaluate. >> >> The patch detects this pattern in createNodeForPHI and creates an >> equivalent expression that can be evaluated. >> >> Note that SCEV translates the 'and' into >> ZeroExtend(Truncate(%result.03, i8), i32) >> >> So in terms of SCEV expressions, we essentially have >> %add[n] = Add(ZeroExtend(Truncate(%add[n-1], i8), i32), 3) >> >> (BTW, I'm no scholar here, just a layman, so my terminology is >> probably all wrong). > Going with your LLVM IR testcase, this looks right so far. > >> The patch basically moves the ZeroExtend and Truncate outside of the >> recurrence, though it must take into account that for iteration n, the >> ZeroExtend and Truncate apply to the value at iteration n-1. >> >> For a constant step this is accomplished by subtracting one step from >> the recurrence, performing the Truncate and ZeroExtend, and then >> adding the step to the result. In other words: >> >> %add[n] = Add( >> ZeroExtend( >> Truncate( >> Minus(AddRec(Start=0,Step=3)[n], 3), >> i8), >> i32), >> 3) >> >> It's a little more complicated when the Step is another recurrence, >> but essentially the same. > Something is wrong here. On the first iteration, the %add value is 3. > On the first iteration, your replacement expression's value is 256. > > DanHere's how I'm evaluating the expression (in my head): 00: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[n],3), i8), i32),3) | 01: Add(ZeroExtend(Truncate(Minus(AddRec(Start=0,Step=3)[0],3), i8), i32),3) | 02: Add(ZeroExtend(Truncate(Minus(3,3), i8), i32),3) | 03: Add(ZeroExtend(Truncate(0, i8), i32),3) | 04: Add(ZeroExtend(0, i32),3) | 05: Add(0,3) | 06: 3 Thanks again. Matthew Curtis -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Maybe Matching Threads
- [LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
- [LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
- [LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
- [LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
- nlminb and optim