thr3ads.net - llvm dev - [llvm-dev] Two ideas for general extensions of the loop unswitch cost model [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Philip Reames via llvm-dev

2018-Jul-24 23:04 UTC

[llvm-dev] Two ideas for general extensions of the loop unswitch cost model

This email is mostly to record an idea.  If I get lucky enough to nerd 
snipe someone else before I have a chance to get back to this, even 
better.  :)

I was taking a look at Chandler's new loop unswitch pass.  One of the 
really key differences in the new pass is that it models the approximate 
cost of the actual unswitch.  At the moment, the current cost model just 
considers whether particular blocks exist in only one of the two 
resulting loops, but since the old unswitcher just assumed every 
instruction had to be duplicated, that's a huge improvement right 
there.  Thinking about this while trying to understand the likely 
implications for idiomatic code, led me to the following ideas.

*First Iteration Cost *- We end up with a lot of loops which contain 
range checks before loop invariant conditions. Almost always, we can 
prove that the range check isn't going to fail on the first iteration.  
As a result, we can ignore the early exit (or conditionally reached 
block) cost in one copy of the loop as long as the branch in question 
dominates the latch.  The main value of this is eliminating a pass order 
dependency; we'd otherwise have to handle the range check (via 
indvarsimply, irce, or loop-predication) before rerunning unswitch.  
Example:

// example is C
for (int i = 0; i < len; i++) {
   if (i < 2^20) throw_A();
   if (a == null) throw_B(); // We can unswitch this at low cost
   sum += a[i];
}

This could be fairly simply implemented as a special case of trivial 
unswitch where we advance through branches which are known to be 
constant on the first iteration.  But that interacts badly with the 
second idea and gives up generality.

*Data Flow Cost *- We could model the cost of producing the loop exit 
values.  Since we have LCSSA, we know we can cheaply identify values 
used in the exit.  For any value which is otherwise unused in the loop, 
we end up producing only one copy of the computation which produces the 
value.  If we computed a per-block "cost to produce values used in this 
block", we could model this in the cost model.  The interesting bit is 
that this seems to subsume the current definition of trivial unswitch 
since the cost for a trivial unswitch as considered by the full 
unswitcher becomes zero.

// example, in C, using predicated loads/stores
for (int i = 0; i < len; i++) {
   int v = 0;
   if (b[i]) { v = *p }
   if (C) {
     break;
   }
   if (b[i]) { *p = v + i }
}

Philip





-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180724/10e625d9/attachment.html>

Chandler Carruth via llvm-dev

2018-Jul-24 23:27 UTC

head link

[llvm-dev] Two ideas for general extensions of the loop unswitch cost model

On Tue, Jul 24, 2018 at 4:05 PM Philip Reames <listmail at
philipreames.com>
wrote:
> This email is mostly to record an idea.  If I get lucky enough to nerd
> snipe someone else before I have a chance to get back to this, even
> better.  :)
>
> I was taking a look at Chandler's new loop unswitch pass.  One of the
> really key differences in the new pass is that it models the approximate
> cost of the actual unswitch.  At the moment, the current cost model just
> considers whether particular blocks exist in only one of the two resulting
> loops, but since the old unswitcher just assumed every instruction had to
> be duplicated, that's a huge improvement right there.  Thinking about
this
> while trying to understand the likely implications for idiomatic code, led
> me to the following ideas.
>
> *First Iteration Cost *- We end up with a lot of loops which contain
> range checks before loop invariant conditions.  Almost always, we can prove
> that the range check isn't going to fail on the first iteration.  As a
> result, we can ignore the early exit (or conditionally reached block) cost
> in one copy of the loop as long as the branch in question dominates the
> latch.  The main value of this is eliminating a pass order dependency;
we'd
> otherwise have to handle the range check (via indvarsimply, irce, or
> loop-predication) before rerunning unswitch.  Example:
>
> // example is C
> for (int i = 0; i < len; i++) {
>   if (i < 2^20) throw_A();
>   if (a == null) throw_B(); // We can unswitch this at low cost
>   sum += a[i];
> }
>
> This could be fairly simply implemented as a special case of trivial
> unswitch where we advance through branches which are known to be constant
> on the first iteration.  But that interacts badly with the second idea and
> gives up generality.
>
> *Data Flow Cost *- We could model the cost of producing the loop exit
> values.  Since we have LCSSA, we know we can cheaply identify values used
> in the exit.  For any value which is otherwise unused in the loop, we end
> up producing only one copy of the computation which produces the value.  If
> we computed a per-block "cost to produce values used in this
block", we
> could model this in the cost model.  The interesting bit is that this seems
> to subsume the current definition of trivial unswitch since the cost for a
> trivial unswitch as considered by the full unswitcher becomes zero.
>
> // example, in C, using predicated loads/stores
> for (int i = 0; i < len; i++) {
>   int v = 0;
>   if (b[i]) { v = *p }
>   if (C) {
>     break;
>   }
>   if (b[i]) { *p = v + i }
> }
>FWIW, your second idea is exactly how loop unrolling works. I'm a big fan.
I think of it as the "non-dead cost" because the algorithm ends up
looking
like classical DCE -- you start w/ roots (side effects reachable prior to
the exit + the loop exit SSA values) and walk up the def/use graph marking
the set of things in the loop still live. The cost is the cost of the
marked set. Loop unrolling *also* does the further thing that gets you the
above property (and more) by folding everything that it cane based on a
fixed iteration.

A couple of observations:
- While functionally this subsumes trivial unswitching, the complexity of
computing this cost will likely make it worthwhile to keep trivial
unswitching separate purely as a phase-ordering / compile time win.
- This only applies to an important subset of non-trivial unswitching:
non-trivial unswitching of a terminator in an exiting block. I agree w/ you
that this is likely an important subset worth custom (and much more
precise) cost modeling.
- Given the way the cost modeling works, my suggestion would be to change
it to work more like the following:
1) Compute the non-exiting clone cost. (what it does today)
2) Scale this by the number of non-exiting edges from the terminator.
3) For each exiting edge, compute the non-dead cost for that exit edge and
LCSSA phis. This should memoize well to avoid doing work O(|loop-size| *
|loop-exit-edges|) and instead be O(|loop-size| + |loop-exit-edges|). The
idea will be to forward simulate the folded iteration 0 cost from header to
exiting block, and then reverse-lookup the required parts of this cost from
the roots (where the in-loop roots are the same and can be computed once,
but the LCSSA exit value roots are per exit edge).

This should then give you a strong approximation of the unswitched cost
that can see through a huge number of idiomatic patterns from DRY coding
idioms and code generator idioms.

Would be really excited to see someone chase this.

One particularly useful component would be to lift the current
forward-simulation (complete with SCEV recurrence folding combined w/
instsimplify) used by loop unroll into a utility that could be shared here.
Then we might be able to also re-use it when computing the inlined cost of
loops where the loop trip count becomes a constant only for one inline
candidate.

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180724/b030ec87/attachment-0001.html>

llvm dev - Jul 2018 - Two ideas for general extensions of the loop unswitch cost model

[llvm-dev] Two ideas for general extensions of the loop unswitch cost model

[llvm-dev] Two ideas for general extensions of the loop unswitch cost model