thr3ads.net - llvm dev - [llvm-dev] [RFC] New pass: LoopExitValues [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Gerolf Hoflehner via llvm-dev

2015-Sep-11 02:51 UTC

[llvm-dev] [RFC] New pass: LoopExitValues

Hi Steve

it seems the general consensus is that the patch feels like a work-around for a
problem with LSR (and possibly other loop transformations) that introduces
redundant instructions. It is probably best to file a bug and a few of your test
cases.

Thanks
Gerolf> On Sep 10, 2015, at 4:37 PM, Steve King via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On Thu, Sep 10, 2015 at 3:43 PM, Jake VanAdrighem
> <jvanadrighem at gmail.com> wrote:
>> Which cases does this pass handle which aren't otherwise optimized
out by
>> passes like GlobalValueNumbering or DeadCodeElimination?
>> 
> 
> The case uniquely handled here is one in which a computation after a
> loop so happens to be redundant with a value left over from loop
> execution.  At each nesting level of N nested loops, the pass compares
> computable SCEVs looking for matches.  This sounds specialized to me,
> but I'm not knowledgeable enough in LLVM's many passes to know
which
> one comes closest to this sort of optimization.  My colleague that
> originally wrote the pass noted that redundancies found by this pass
> pop up after loop transforms like LSR.
> 
> Previously, I showed examples of the type of optimization performed by
> this pass -- they're bulky so I won't cut and paste here.  In
> particular, check out the before-and-after IR example James asked for.
> 
> Regards,
> -steve
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Hal Finkel via llvm-dev

2015-Sep-11 17:06 UTC

head link

[llvm-dev] [RFC] New pass: LoopExitValues

Hi Steve,

I believe that others have been looking recently at a very similar set of
problems. For example, Wei Mi posted this patch:

  http://reviews.llvm.org/D12090

and I wonder if this would also address your use cases. If not, I think we'd
like to understand why. Also, if these redundant expressions involve induction
variables, then that's something that the IndVarSimplify is already supposed
to do, and if it is missing cases, then we should improve that pass (and, thus,
folding what you've done into IndVarSimplify might be the right way to go).

 -Hal

----- Original Message -----> From: "Gerolf Hoflehner via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "Steve King" <steve at metrokings.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Thursday, September 10, 2015 9:51:21 PM
> Subject: Re: [llvm-dev] [RFC] New pass: LoopExitValues
> 
> Hi Steve
> 
> it seems the general consensus is that the patch feels like a
> work-around for a problem with LSR (and possibly other loop
> transformations) that introduces redundant instructions. It is
> probably best to file a bug and a few of your test cases.
> 
> Thanks
> Gerolf
> > On Sep 10, 2015, at 4:37 PM, Steve King via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> > 
> > On Thu, Sep 10, 2015 at 3:43 PM, Jake VanAdrighem
> > <jvanadrighem at gmail.com> wrote:
> >> Which cases does this pass handle which aren't otherwise
optimized
> >> out by
> >> passes like GlobalValueNumbering or DeadCodeElimination?
> >> 
> > 
> > The case uniquely handled here is one in which a computation after
> > a
> > loop so happens to be redundant with a value left over from loop
> > execution.  At each nesting level of N nested loops, the pass
> > compares
> > computable SCEVs looking for matches.  This sounds specialized to
> > me,
> > but I'm not knowledgeable enough in LLVM's many passes to know
> > which
> > one comes closest to this sort of optimization.  My colleague that
> > originally wrote the pass noted that redundancies found by this
> > pass
> > pop up after loop transforms like LSR.
> > 
> > Previously, I showed examples of the type of optimization performed
> > by
> > this pass -- they're bulky so I won't cut and paste here.  In
> > particular, check out the before-and-after IR example James asked
> > for.
> > 
> > Regards,
> > -steve
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Steve King via llvm-dev

2015-Sep-14 18:08 UTC

head link

[llvm-dev] [RFC] New pass: LoopExitValues

On Fri, Sep 11, 2015 at 10:06 AM, Hal Finkel <hfinkel at anl.gov>
wrote:> Also, if these redundant expressions involve induction variables, then
that's something that the IndVarSimplify is already supposed to do, and if
it is missing cases, then we should improve that pass (and, thus, folding what
you've done into IndVarSimplify might be the right way to go).
Hal, thanks for the info.  Turns out IndVarSimplify intentionally
creates redundant calculation of loop exit values!

//   2. Any use outside of the loop of an expression derived from the indvar
//      is changed to compute the derived value outside of the loop, eliminating
//      the dependence on the exit value of the induction variable.  If the only
//      purpose of the loop is to compute the exit value of some derived
//      expression, this transformation will make the loop dead.

I understand the stated purpose for the redundancy, but there is no
mention of undoing the damage.  The LoopExitValues pass seems to be
exactly the cleanup that was missing.  Do passes coming after IVS like
LSR need the redundant calculations in place to identify dead loops or
should IVS have cleaned up after itself immediately?

Regards,
-steve

Wei Mi via llvm-dev

2015-Sep-14 21:18 UTC

head link

[llvm-dev] [RFC] New pass: LoopExitValues

On Fri, Sep 11, 2015 at 10:06 AM, Hal Finkel <hfinkel at anl.gov>
wrote:> Hi Steve,
>
> I believe that others have been looking recently at a very similar set of
problems. For example, Wei Mi posted this patch:
>
>   http://reviews.llvm.org/D12090
>
> and I wonder if this would also address your use cases. If not, I think
we'd like to understand why. Also, if these redundant expressions involve
induction variables, then that's something that the IndVarSimplify is
already supposed to do, and if it is missing cases, then we should improve that
pass (and, thus, folding what you've done into IndVarSimplify might be the
right way to go).
>
>  -Hal
I tried the patch on the matrix_mul testcase and saw where the
redundency was removed. My patch in http://reviews.llvm.org/D12090
cannot handle the case. I feel like it is a different problem.

void matrix_mul(unsigned int Size, unsigned int *Dst, unsigned int
*Src, unsigned int Val) {
  for (int Outer = 0; Outer < Size; ++Outer)
    for (int Inner = 0; Inner < Size; ++Inner)
       Dst[Outer * Size + Inner] = Src[Outer * Size + Inner] * Val;
}

"Outer * Size + Inner" is an induction variable in inner loop. Its
SCEV is {{0,+,%Size}<outerloop>,+,1}<innerloop>, so the initial
value
of the induction variable for inner loop is {0, +, %Size}<outerloop>,
.i.e, an "lsr.iv.next = lsr.iv + %Size" will be generated at the end
of every iteration of outerloop to prepare the initial value of the
induction variable in the next outerloop iteration.

However, the iteration number of inner loop happens to be "Size" too,
so the exit value of "Outer * Size + Inner" in inner loop happens to
be the same with the initial value of "Outer * Size + Inner" in the
next iteration of outerloop. So the exit value of the induction
variable can be reused and "lsr.iv.next = lsr.iv + %Size" which is
used to compute the initial value of "Outer * Size + Inner" can be
omitted.

If I changed Inner < Size to Inner < Size1, compiler will generate the
same code w/wo the patches.

I tried gcc and gcc also couldn't remove the redundency too. Looks
like the optimization doesn't fit current LSR very well because it
needs to consider the relationship between initial and exit value of
the induction variable in different iterations of outerloop.

This is an interesting work because the code pattern to implement a
multi-dimensional array as a single linear array is common.

Thanks,
Wei.

Steve King via llvm-dev

2015-Sep-15 19:27 UTC

head link

[llvm-dev] [RFC] New pass: LoopExitValues

On Mon, Sep 14, 2015 at 2:10 PM, Wei Mi <wmi at google.com>
wrote:>
> I tried the patch on the matrix_mul testcase and saw where the
> redundency was removed. My patch in http://reviews.llvm.org/D12090
> cannot handle the case. I feel like it is a different problem.
>
> void matrix_mul(unsigned int Size, unsigned int *Dst, unsigned int
> *Src, unsigned int Val) {
>   for (int Outer = 0; Outer < Size; ++Outer)
>     for (int Inner = 0; Inner < Size; ++Inner)
>        Dst[Outer * Size + Inner] = Src[Outer * Size + Inner] * Val;
> }
>
> "Outer * Size + Inner" is an induction variable in inner loop.
Its
> SCEV is {{0,+,%Size}<outerloop>,+,1}<innerloop>, so the initial
value
> of the induction variable for inner loop is {0, +, %Size}<outerloop>,
> .i.e, an "lsr.iv.next = lsr.iv + %Size" will be generated at the
end
> of every iteration of outerloop to prepare the initial value of the
> induction variable in the next outerloop iteration.
>
> However, the iteration number of inner loop happens to be "Size"
too,
> so the exit value of "Outer * Size + Inner" in inner loop happens
to
> be the same with the initial value of "Outer * Size + Inner" in
the
> next iteration of outerloop. So the exit value of the induction
> variable can be reused and "lsr.iv.next = lsr.iv + %Size" which
is
> used to compute the initial value of "Outer * Size + Inner" can
be
> omitted.
>
Thanks for experimenting with the patch!  If I understand correctly,
you show that the redundancy is inherent in the loop and not a
byproduct of other passes.  Following this insight, changing just one
line of the IR produces assembly *identical* to the effect of the new
LEV pass.  Specifically, in the outer loop, the multiply can be
replaced with a PHI node that recycles the %add value coming from the
inner loop.

Should the front end be responsible for this sort of optimization?

Original:
  %mul = mul i32 %Outer.026, %Size

Replacing with this line has same effect as LEV pass:
  %mul = phi i32 [ %add, %for.cond.cleanup.3 ], [ 0, %entry ]


The original IR for the matrix_mul.c test case:

define void @matrix_mul(i32 %Size, i32* nocapture %Dst, i32* nocapture
readonly %Src, i32 %Val) #0 {
entry:
  %cmp.25 = icmp eq i32 %Size, 0
  br i1 %cmp.25, label %for.cond.cleanup, label %for.body.4.lr.ph

for.body.4.lr.ph:                                 ; preds = %entry,
%for.cond.cleanup.3
  %Outer.026 = phi i32 [ %inc10, %for.cond.cleanup.3 ], [ 0, %entry ]
;****************
  %mul = mul i32 %Outer.026, %Size
;****************
  br label %for.body.4

for.cond.cleanup:                                 ; preds %for.cond.cleanup.3,
%entry
  ret void

for.cond.cleanup.3:                               ; preds = %for.body.4
  %inc10 = add nuw nsw i32 %Outer.026, 1
  %exitcond28 = icmp eq i32 %inc10, %Size
  br i1 %exitcond28, label %for.cond.cleanup, label %for.body.4.lr.ph

for.body.4:                                       ; preds %for.body.4,
%for.body.4.lr.ph
  %Inner.024 = phi i32 [ 0, %for.body.4.lr.ph ], [ %inc, %for.body.4 ]
  %add = add i32 %Inner.024, %mul
  %arrayidx = getelementptr inbounds i32, i32* %Src, i32 %add
  %0 = load i32, i32* %arrayidx, align 4, !tbaa !1
  %mul5 = mul i32 %0, %Val
  %arrayidx8 = getelementptr inbounds i32, i32* %Dst, i32 %add
  store i32 %mul5, i32* %arrayidx8, align 4, !tbaa !1
  %inc = add nuw nsw i32 %Inner.024, 1
  %exitcond = icmp eq i32 %inc, %Size
  br i1 %exitcond, label %for.cond.cleanup.3, label %for.body.4
}

Steve King via llvm-dev

2015-Sep-21 17:12 UTC

head link

[llvm-dev] [RFC] New pass: LoopExitValues

Hi Folks, Let's keep this optimization alive.  To summarize: several
folks voiced general support, but with questions about why existing
optimizations do not already catch this case.  Deep dive by Wei Mi
showed that the optimization is most likely not a clean-up of LSR
sloppiness, but something new.  Follow-up by myself confirmed that the
redundancy eliminated the LoopExitValues pass exists in the original
IR from Clang.

As a next step, can we come to consensus on where the LoopExitValues
optimization belongs?

Regards,
-steve

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Sep 2015 - [RFC] New pass: LoopExitValues

[llvm-dev] [RFC] New pass: LoopExitValues

[llvm-dev] [RFC] New pass: LoopExitValues

[llvm-dev] [RFC] New pass: LoopExitValues

[llvm-dev] [RFC] New pass: LoopExitValues

[llvm-dev] [RFC] New pass: LoopExitValues

[llvm-dev] [RFC] New pass: LoopExitValues

Reasonably Related Threads