thr3ads.net - llvm dev - [llvm-dev] LICM as canonical form [Dec 2021]

If this information is useful, please help other people find it:
Share via:

Michael Kruse via llvm-dev

2021-Dec-04 21:54 UTC

[llvm-dev] LICM as canonical form

I am in support of your proposal. Checking whether an llvm::Value is
not defined by an instruction in the loop is the most straightforward
way to determine whether a value is loop-invariant. Cf D87551 the
profile information should probably be used in LoopSink or similar
pass determining whether the only use of an instruction is so rare
that it is worth moving into the conditional execution in the loop,
with the added benefit that it would also sink instructions that were
not in the loop in the the first place.

Michael

Am Fr., 3. Dez. 2021 um 17:28 Uhr schrieb Philip Reames via llvm-dev
<llvm-dev at lists.llvm.org>:>
> Later today, I'm going to be reverting D87551.  I first raised serious
concern on said review back in Oct, but this is a bit of an unusual case because
the change landed roughly a year before that.
>
> This patch introduced a profile driven heuristic to selectively disable
hoisting of instructions out of loops.  By doing so, it changes a long standing
design element without broad consensus following discussion on llvm-dev. 
However, this email isn't really about the revert per se.
>
> In the course of the discussion leading to this point, I realized we
didn't really have a cite-able resource describing the historical design. 
This email is an attempt to provide that, and to highlight some of the issues
which need addressed if we do decide we want to change it.
>
> LICM as canonical form
>
> We have for many years treated hoisting instructions out of loops as a
canonical form.  That is, hoisting is not done because it is profitable (though
it often is), but is instead done so that other parts of the optimizer can rely
on it.
>
> We assume that an unprofitable hoist will be undone.  Historically, we have
generally assumed this to be done in the backend, but more recently, LoopSink
has also been added towards the end of the IR pipeline with the same goal.
>
> Why does this matter?
>
> Other transforms depend on us having hoisted instructions out of loops for
effectiveness.  The largest source of such assumptions is that SCEV is unable to
compute trip counts for any exit condition involving a loop varying load. 
Almost all of our loop transformations depend on SCEVs trip count logic, so
failing to hoist an otherwise hoistable load is a severe pessimization.
>
> For illustration purposes, consider this toy example:
>
>   for (int i = 0; ; i++) {
>     sum += a[i] + *b;
>     length = a.length;
>     if (i >= length) break;
>   }
>
> This example involves a typical for-loop for which the exit test depends on
a loop varying load.
>
> Here's a couple examples:
>
> In unrolling, the form above is not unrollable.  The trip count is
unknowable.  We might be able to use profile information to do a bounded full
unroll if this loop is short running, but all other forms of unrolling (exact
full, partial, and runtime) will be impossible.
> In the vectorizer, we will be unable to establish a trip count, and thus
will not vectorize.  Additionally, even if we can compute a trip count, the cost
model handling for uniformity depends on hoisting.
>
> Other impacts worth noting
>
> In loop idiom recognize, we will fail to recognize most counted idioms
(e.g. popcount, cttz).  Additionally, things like memset recognition will not
happen if the value being stored was hoistable, but not hoisted.
> Our ability to analyze dominating conditions (e.g. cvp, valuetracking,
SCEV's isKnownPredicateAt) will all be crippled by the inability to
recognize values are loop invariant.  When the RHS of a comparison is a
potentially different value every time it runs, it really limits our ability to
derive useful knowledge from that comparison or cross correlate comparisons.
>
> But what about an unprofitable hoist?
>
> There are examples where hoisting is not profitable.  Here's one such
example:
>
>   for (int i = 0; i < N; i++) {
>     if (dynamically_always_taken) continue;
>     sum += a[i];
>     length = a.length;
>     if (i >= length) break;
>   }
>
> Our general posture has been that we will perform hoisting in the middle
end, and then undo that hoisting if needed later in the pass pipeline.  The
basic reason for that is that it is nearly impossible to distinguish profitable
from unprofitable cases because the profitability of the transform depends too
heavily on which following transforms might run.
>
> Here's a small example which might at first seem unprofitable -
inspired by the patch being reverted - but where hoisting is in fact the far
more profitable outcome.
>
>   for (int i = 0; i < N; i++) {
>     i8* addr = a;
>     if (invariant_cond_usually_false) {
>        // very, very rare block e.g. 1 in 100 million
>        addr = a + 1;
>     }
>     *addr = 0
>   }
>
> Subtly, this example should be profitable to hoist even if the rare
condition is not invariant.  We still know this loop writes to at most two
memory locations.  While we might not exploit that fact today, an extended form
of load-store promotion could do so.  If we don't hoist the addressing
expression under the rare branch, we can not (in general) determine that at most
two locations are written.
>
> I will note that LoopSink appears to be a bit restricted in practice. 
Someone with unprofitable examples could reasonably push this much further.
>
> How would we change this?
>
> I want to be very explicit about saying this design is only one reasonable
design.  It would also be entirely reasonable to build a design around a profit
driven LICM.  That's simply not what we have today.  The remainder of this
section is about expanding on the work which would need to be undertaken to make
such a change.
>
> First, we would need a clear set of examples where LICM was truly
unprofitable.  These examples would need to be publicly accessible.  They would
also need to be fairly minimal.  In particular, there must not be other obvious
optimizations which if implemented makes the hoisting profitable after all.
>
> Second, we would need a proposal to llvm-dev which directly engages with
the fact that SCEV (and thus most of our loop passes) depends on having loads
hoisted for analysis quality.  We could build a mechanism in SCEV to model
possible load hoisting.  There are some tricky bits in doing so, but it should
be possible in theory.
>
> The main problem with modeling possible hoists in SCEV is the need to query
both memory analysis and fault legality efficiently.  Figuring out how to make
that available for all users of SCEV without introducing nasty invalidation bugs
or degenerate compile times is a hard design problem, but might be feasible.  (I
think that MemorySSA gives an interesting building block here, but have not
deeply considered this.)
>
> There's also an API design problem in making sure that analysis results
can't be consumed without committing to the hoisting in the IR.  A transform
which e.g. assumes a trip count without hoisting the relevant load would be
subtly incorrect.
>
> Third, a consensus must be built that the resulting additional complexity
for the mechanism build to address the previous point is worthwhile for the
project as a whole.  This will be a judgement call and would depend heavily on
the solution chosen for the previous point.
>
> Finally, to be explicit, I am using the SCEV use case by way of an example
only.  There may be other ways that we depend on hoisting as a canonical form
that I did not happen to think of when writing this email.  The burden is on the
person or person proposing a change to identify any other dependencies, and to
convince the community they have done so.  Discussion of a testing strategy to
find those dependencies should be a first class concern in any proposal.
>
> Philip
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Sjoerd Meijer via llvm-dev

2021-Dec-06 14:11 UTC

head link

[llvm-dev] LICM as canonical form

Hi Philip,

Many thanks for writing this up! I have been wanting to kick off this discussion
for some time, but didn't get round to it. I am in the "profit driven
LICM" camp because I see LICM as a canonical form leading to some poor
results in benchmarks. The problem as I see is that LICM is a canonical form,
but we don't have the mechanisms to undo this (in the backend). I.e.,
LoopSink keeps being mentioned but this only works when profile data is
available:

    // Enable LoopSink only when runtime profile is available.
    // With static profile, the sinking decision may be sub-optimal.

The other candidate that could do this is MachineSink, but it can't sink
back into loops. The result is that we have a canonical transform that is as
aggressive as it can be, doesn't make a profitability call, and we can't
undo this later and this obviously leads to suboptimal results for some cases.

Here be dragons, I think. Adding profitability analysis to LICM is going to be
tricky on IR (e.g. register pressure), but what I want to be explicit about is
that reversing LICM in the back-end and on the MIR (MachineSink) is also very
tricky. For example, alias analysis and just in general moving instructions
around is more tricky. I can't back this up with numbers, but letting LICM
serve a purpose such as enabling SCEV or loop idiom recognition and let it hoist
profit driven seems to make intuitively more sense than it being a canonical
form that (at the moment) we can't undo. And please note that we also have
MachineCSE, which performs hoisting on MIR.

I completely agree with your "How could we change this?" section and I
am happy with your mail/proposal, that we can discuss different approaches and
not just get the "it's a canonical transform" answer. I will
accept that the burden is on the person proposing a change and this being a
massive project IMHO has prevented me so far from kicking off this discussion
earlier.

That's why I would like to discuss here how we could best facilitate this
discussion, and what I mean by that is developing this "proof"
upstream. If we allow LICM to be profile driven under options that are
off-by-default, then this proof could developed incrementally and upstream, in
the open, which is by far a far more attractive development model than doing
this first all downstream and then trying to convince the community. The obvious
benefits are that the design can be discussed, others can contribute and test
it, etc. The disadvantage obviously is adding code that is not enabled by
default, but given that this serves the goal of an experiment and redesign that
seems reasonable to me.

Kind regards,
Sjoerd.


________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Michael
Kruse via llvm-dev <llvm-dev at lists.llvm.org>
Sent: 04 December 2021 21:54
To: Philip Reames <listmail at philipreames.com>
Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] LICM as canonical form

I am in support of your proposal. Checking whether an llvm::Value is
not defined by an instruction in the loop is the most straightforward
way to determine whether a value is loop-invariant. Cf D87551 the
profile information should probably be used in LoopSink or similar
pass determining whether the only use of an instruction is so rare
that it is worth moving into the conditional execution in the loop,
with the added benefit that it would also sink instructions that were
not in the loop in the the first place.

Michael

Am Fr., 3. Dez. 2021 um 17:28 Uhr schrieb Philip Reames via llvm-dev
<llvm-dev at lists.llvm.org>:>
> Later today, I'm going to be reverting D87551.  I first raised serious
concern on said review back in Oct, but this is a bit of an unusual case because
the change landed roughly a year before that.
>
> This patch introduced a profile driven heuristic to selectively disable
hoisting of instructions out of loops.  By doing so, it changes a long standing
design element without broad consensus following discussion on llvm-dev. 
However, this email isn't really about the revert per se.
>
> In the course of the discussion leading to this point, I realized we
didn't really have a cite-able resource describing the historical design. 
This email is an attempt to provide that, and to highlight some of the issues
which need addressed if we do decide we want to change it.
>
> LICM as canonical form
>
> We have for many years treated hoisting instructions out of loops as a
canonical form.  That is, hoisting is not done because it is profitable (though
it often is), but is instead done so that other parts of the optimizer can rely
on it.
>
> We assume that an unprofitable hoist will be undone.  Historically, we have
generally assumed this to be done in the backend, but more recently, LoopSink
has also been added towards the end of the IR pipeline with the same goal.
>
> Why does this matter?
>
> Other transforms depend on us having hoisted instructions out of loops for
effectiveness.  The largest source of such assumptions is that SCEV is unable to
compute trip counts for any exit condition involving a loop varying load. 
Almost all of our loop transformations depend on SCEVs trip count logic, so
failing to hoist an otherwise hoistable load is a severe pessimization.
>
> For illustration purposes, consider this toy example:
>
>   for (int i = 0; ; i++) {
>     sum += a[i] + *b;
>     length = a.length;
>     if (i >= length) break;
>   }
>
> This example involves a typical for-loop for which the exit test depends on
a loop varying load.
>
> Here's a couple examples:
>
> In unrolling, the form above is not unrollable.  The trip count is
unknowable.  We might be able to use profile information to do a bounded full
unroll if this loop is short running, but all other forms of unrolling (exact
full, partial, and runtime) will be impossible.
> In the vectorizer, we will be unable to establish a trip count, and thus
will not vectorize.  Additionally, even if we can compute a trip count, the cost
model handling for uniformity depends on hoisting.
>
> Other impacts worth noting
>
> In loop idiom recognize, we will fail to recognize most counted idioms
(e.g. popcount, cttz).  Additionally, things like memset recognition will not
happen if the value being stored was hoistable, but not hoisted.
> Our ability to analyze dominating conditions (e.g. cvp, valuetracking,
SCEV's isKnownPredicateAt) will all be crippled by the inability to
recognize values are loop invariant.  When the RHS of a comparison is a
potentially different value every time it runs, it really limits our ability to
derive useful knowledge from that comparison or cross correlate comparisons.
>
> But what about an unprofitable hoist?
>
> There are examples where hoisting is not profitable.  Here's one such
example:
>
>   for (int i = 0; i < N; i++) {
>     if (dynamically_always_taken) continue;
>     sum += a[i];
>     length = a.length;
>     if (i >= length) break;
>   }
>
> Our general posture has been that we will perform hoisting in the middle
end, and then undo that hoisting if needed later in the pass pipeline.  The
basic reason for that is that it is nearly impossible to distinguish profitable
from unprofitable cases because the profitability of the transform depends too
heavily on which following transforms might run.
>
> Here's a small example which might at first seem unprofitable -
inspired by the patch being reverted - but where hoisting is in fact the far
more profitable outcome.
>
>   for (int i = 0; i < N; i++) {
>     i8* addr = a;
>     if (invariant_cond_usually_false) {
>        // very, very rare block e.g. 1 in 100 million
>        addr = a + 1;
>     }
>     *addr = 0
>   }
>
> Subtly, this example should be profitable to hoist even if the rare
condition is not invariant.  We still know this loop writes to at most two
memory locations.  While we might not exploit that fact today, an extended form
of load-store promotion could do so.  If we don't hoist the addressing
expression under the rare branch, we can not (in general) determine that at most
two locations are written.
>
> I will note that LoopSink appears to be a bit restricted in practice. 
Someone with unprofitable examples could reasonably push this much further.
>
> How would we change this?
>
> I want to be very explicit about saying this design is only one reasonable
design.  It would also be entirely reasonable to build a design around a profit
driven LICM.  That's simply not what we have today.  The remainder of this
section is about expanding on the work which would need to be undertaken to make
such a change.
>
> First, we would need a clear set of examples where LICM was truly
unprofitable.  These examples would need to be publicly accessible.  They would
also need to be fairly minimal.  In particular, there must not be other obvious
optimizations which if implemented makes the hoisting profitable after all.
>
> Second, we would need a proposal to llvm-dev which directly engages with
the fact that SCEV (and thus most of our loop passes) depends on having loads
hoisted for analysis quality.  We could build a mechanism in SCEV to model
possible load hoisting.  There are some tricky bits in doing so, but it should
be possible in theory.
>
> The main problem with modeling possible hoists in SCEV is the need to query
both memory analysis and fault legality efficiently.  Figuring out how to make
that available for all users of SCEV without introducing nasty invalidation bugs
or degenerate compile times is a hard design problem, but might be feasible.  (I
think that MemorySSA gives an interesting building block here, but have not
deeply considered this.)
>
> There's also an API design problem in making sure that analysis results
can't be consumed without committing to the hoisting in the IR.  A transform
which e.g. assumes a trip count without hoisting the relevant load would be
subtly incorrect.
>
> Third, a consensus must be built that the resulting additional complexity
for the mechanism build to address the previous point is worthwhile for the
project as a whole.  This will be a judgement call and would depend heavily on
the solution chosen for the previous point.
>
> Finally, to be explicit, I am using the SCEV use case by way of an example
only.  There may be other ways that we depend on hoisting as a canonical form
that I did not happen to think of when writing this email.  The burden is on the
person or person proposing a change to identify any other dependencies, and to
convince the community they have done so.  Discussion of a testing strategy to
find those dependencies should be a first class concern in any proposal.
>
> Philip
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211206/80a9dc9f/attachment.html>

llvm dev - Dec 2021 - LICM as canonical form

[llvm-dev] LICM as canonical form

[llvm-dev] LICM as canonical form