thr3ads.net - llvm dev - [llvm-dev] How to best deal with undesirable Induction Variable Simplification? [Aug 2019]

If this information is useful, please help other people find it:
Share via:

Danila Malyutin via llvm-dev

2019-Aug-08 17:36 UTC

[llvm-dev] How to best deal with undesirable Induction Variable Simplification?

Hello,
Recently I've come across two instances where Induction Variable
Simplification lead to noticable performance regressions.
In one case, the removal of extra IV lead to the inability to reschedule
instructions in a tight loop to reduce stalls. In that case, there were enough
registers to spare, so using extra register for extra induction variable was
preferable since it reduced dependencies in the loop.
In the second case, there was a big nested loop made even bigger after
unswitching. However, the inner loop body was rather simple, of the form:
loop {
p+=n;
...
p+=n;
...
}
use p.

Due to unswitching there were several such loops each with the different number
of p+=n ops, so when the IndVars pass rewrote all exit values, it added a lot of
slightly different offsets to the main loop header that couldn't fit in the
available registers which lead to unnecessary spills/reloads.

I am wondering what is the usual strategy for dealing with such
"pessimizations"? Is it possible to somehow modify the IndVarSimplify
pass to take those issues into account (for example, tell it that adding offset
computation + gep is potentially more expensive than simply reusing last var
from the loop) or should it be recovered in some later pass? If so, is there an
easy way to revert IV elimination? Have anyone dealt with similar issues before?

--
Danila

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190808/375f3f8b/attachment-0001.html>

Michael Kruse via llvm-dev

2019-Aug-08 23:21 UTC

head link

[llvm-dev] How to best deal with undesirable Induction Variable Simplification?

Am Do., 8. Aug. 2019 um 12:37 Uhr schrieb Danila Malyutin via llvm-dev
<llvm-dev at lists.llvm.org>:>
> Hello,
> Recently I’ve come across two instances where Induction Variable
Simplification lead to noticable performance regressions.
>
> In one case, the removal of extra IV lead to the inability to reschedule
instructions in a tight loop to reduce stalls. In that case, there were enough
registers to spare, so using extra register for extra induction variable was
preferable since it reduced dependencies in the loop.
Since r139579, IndVarSimplify (the pass) should not normalize
induction variables without a reason anymore (a reason would be that
the loop can be deleted). Could you file a bug report, attach a
minimal .ll file and mention what output you would expect?


> Due to unswitching there were several such loops each with the different
number of p+=n ops, so when the IndVars pass rewrote all exit values, it added a
lot of slightly different offsets to the main loop header that couldn’t fit in
the available registers which lead to unnecessary spills/reloads.
Since after unswitching only one of the resulting loops is executed,
the register usage should be the maximum of those loops, which ideally
is at most the register usage of the pre-unswitched loop. In your
case, p could be in the same register in all unswitched loops.
However, other optimizations might increase register pressure again
and the register allocation is not optimal in all cases.

Again, could you file a bug report, include a minimal reproducer and
what output you expect?

> I am wondering what is the usual strategy for dealing with such
“pessimizations”? Is it possible to somehow modify the IndVarSimplify pass to
take those issues into account (for example, tell it that adding offset
computation + gep is potentially more expensive than simply reusing last var
from the loop) or should it be recovered in some later pass? If so, is there an
easy way to revert IV elimination? Have anyone dealt with similar issues before?
Ideally, we prefer to such pessimizations to not occur, as r139579
did. However, the transformation might also be a IR normalization that
enables other transformations. In that case, another pass down the
pipeline would transform the normalized form to an optimized one. For
instance, LoopSimplify inserts a loop preheader the CFGSimplify would
remove again. What is considered normalization depends on the case. If
you can show that a change generally improves performance (not just
for your code) and has at most minor regressions, then any approach is
worth considering.

Michael

Danila Malyutin via llvm-dev

2019-Aug-09 12:32 UTC

head link

[llvm-dev] How to best deal with undesirable Induction Variable Simplification?

> Since r139579, IndVarSimplify (the pass) should not normalize induction
variables without a reason anymore (a reason would be that the loop can be
deleted). Could you file a bug report, attach a minimal .ll file and mention
what output you would expect?
The IV is removed there by the replaceCongruentIVs. It is what I'd probably
expect when looking at the IR alone, but, as I've mentioned, this prevents
latency masking later down the line since now certain ops use single common
register.
> Since after unswitching only one of the resulting loops is executed, the
register usage should be the maximum of those loops, which ideally is at most
the register usage of the pre-unswitched loop. In your case, p could be in the
same register in all unswitched loops.However, other optimizations might increase register pressure again and the
register allocation is not optimal in all cases.

It looks like for some reason, when IndVars rewrote all loop exit values (which
were just pointers incremented in the loop body) from simple single-value phis
to GEP with recomputed offset (back edge count * increment inside the loop), it
expanded this offset computation in the main outermost loop (pre?)header even
when the value was used only inside one of the unswitched loops exits. Later
passes failed to sink them either for whatever reason so in the end instead of
max(unswitched loop regs) it became max(unswitched loop regs) + Const * number
of loops (for offsets, even though many were shared).

I'll see if I can come up with a minimal reproducer for some in-tree target.

--
Danila

-----Original Message-----
From: Michael Kruse [mailto:llvmdev at meinersbur.de] 
Sent: Friday, August 9, 2019 02:22
To: Danila Malyutin <Danila.Malyutin at synopsys.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] How to best deal with undesirable Induction Variable
Simplification?

Am Do., 8. Aug. 2019 um 12:37 Uhr schrieb Danila Malyutin via llvm-dev
<llvm-dev at lists.llvm.org>:>
> Hello,
> Recently I’ve come across two instances where Induction Variable
Simplification lead to noticable performance regressions.
>
> In one case, the removal of extra IV lead to the inability to reschedule
instructions in a tight loop to reduce stalls. In that case, there were enough
registers to spare, so using extra register for extra induction variable was
preferable since it reduced dependencies in the loop.
Since r139579, IndVarSimplify (the pass) should not normalize induction
variables without a reason anymore (a reason would be that the loop can be
deleted). Could you file a bug report, attach a minimal .ll file and mention
what output you would expect?


> Due to unswitching there were several such loops each with the different
number of p+=n ops, so when the IndVars pass rewrote all exit values, it added a
lot of slightly different offsets to the main loop header that couldn’t fit in
the available registers which lead to unnecessary spills/reloads.
Since after unswitching only one of the resulting loops is executed, the
register usage should be the maximum of those loops, which ideally is at most
the register usage of the pre-unswitched loop. In your case, p could be in the
same register in all unswitched loops.
However, other optimizations might increase register pressure again and the
register allocation is not optimal in all cases.

Again, could you file a bug report, include a minimal reproducer and what output
you expect?

> I am wondering what is the usual strategy for dealing with such
“pessimizations”? Is it possible to somehow modify the IndVarSimplify pass to
take those issues into account (for example, tell it that adding offset
computation + gep is potentially more expensive than simply reusing last var
from the loop) or should it be recovered in some later pass? If so, is there an
easy way to revert IV elimination? Have anyone dealt with similar issues before?
Ideally, we prefer to such pessimizations to not occur, as r139579 did. However,
the transformation might also be a IR normalization that enables other
transformations. In that case, another pass down the pipeline would transform
the normalized form to an optimized one. For instance, LoopSimplify inserts a
loop preheader the CFGSimplify would remove again. What is considered
normalization depends on the case. If you can show that a change generally
improves performance (not just for your code) and has at most minor regressions,
then any approach is worth considering.

Michael

Philip Reames via llvm-dev

2019-Aug-09 23:00 UTC

head link

[llvm-dev] How to best deal with undesirable Induction Variable Simplification?

On 8/8/19 10:36 AM, Danila Malyutin via llvm-dev wrote:>
> Hello,
> Recently I’ve come across two instances where Induction Variable
> Simplification lead to noticable performance regressions.
>
> In one case, the removal of extra IV lead to the inability to
> reschedule instructions in a tight loop to reduce stalls. In that
> case, there were enough registers to spare, so using extra register
> for extra induction variable was preferable since it reduced
> dependencies in the loop.
>This one I'd phrase as a deficiency in the backend.  Arguably LSR, but
in general our rewrite to reduce schedule pressure transforms have room
for improvement.  I ran across a case of this with an add reduction
recently as well.

Removing a redundant IV is clearly the "right answer" in terms of
producing simpler, easier to optimize IR. 
> In the second case, there was a big nested loop made even bigger after
> unswitching. However, the inner loop body was rather simple, of the form:
>
> loop {
>
>   p+=n;
>
> …
>
>   p+=n;
>
> …
>
> }
> use p.
>
>  
>
> Due to unswitching there were several such loops each with the
> different number of p+=n ops, so when the IndVars pass rewrote all
> exit values, it added a lot of slightly different offsets to the main
> loop header that couldn’t fit in the available registers which lead to
> unnecessary spills/reloads.
>I have to ask a further question here.  Why are the spill/fills
problematic?  If they happened *outside* said loops - as you'd expect
from the example - at worst there is a code size impact.  Is there
something more going on?  (i.e. are the loops super short running or
something?)>
>
> I am wondering what is the usual strategy for dealing with such
> “pessimizations”? Is it possible to somehow modify the IndVarSimplify
> pass to take those issues into account (for example, tell it that
> adding offset computation + gep is potentially more expensive than
> simply reusing last var from the loop) or should it be recovered in
> some later pass? If so, is there an easy way to revert IV elimination?
> Have anyone dealt with similar issues before?
>My answer: IndVars did the right thing in both of these cases.  The IR
is definitely much cleaner, easier to optimize by other transforms,
etc..  Unfortunately, it's not uncommon for a good transform to produce
output which reveals other deficiencies in the optimizer/backend.  We
can and should fix those where we find them. 

(There's honest disagreement about the philosophy here JFYI.)
>  
>
> --
>
> Danila
>
>  
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190809/56c5c89a/attachment.html>

llvm dev - Aug 2019 - How to best deal with undesirable Induction Variable Simplification?

[llvm-dev] How to best deal with undesirable Induction Variable Simplification?

[llvm-dev] How to best deal with undesirable Induction Variable Simplification?

[llvm-dev] How to best deal with undesirable Induction Variable Simplification?

[llvm-dev] How to best deal with undesirable Induction Variable Simplification?