thr3ads.net - llvm dev - [llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das via llvm-dev

2015-Aug-17 00:42 UTC

[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution

This is related to an issue in loop strength reduction [1] that I've
been trying to fix on and off for a while.  [1] has a more detailed
description of the issue and an example, but briefly put, I want LSR
to consider formulae that have "Zext T" as base and/or scale
registers, and to appropriately rate such formulae.

My first attempt[2] at fixing this was buggy and had to be reverted.
While it is certainly possible to "fix" the bug in [2], the fixed
version is ugly and fragile, and I'd prefer having something cleaner.

The key issue here is that currently there is no way to represent
"Zext T" as a *distinct computation* within SCEV -- a SCEV expression
of the form "Zext T" (any SCEV expression, actually) is always
maximally simplified on construction.  This means I cannot "factor
out" a Zext from a base register (say) and force SCEV to maintain it
as a "SCEVZeroExtend T"; SCEV is free to re-simplify
"SCEVZeroExtend
T" to something equivalent on construction.  This is not just an issue
in theory -- factoring out a 'SCEVZeroExtend' from an expression
'E'
requires us to prove that 'E' == 'SCEVZeroExtend T'.  SCEV can
use
that exact same fact to "simplify" 'SCEVZeroExtend T' back to
'E' on
construction.  There is an analogous issue w.r.t. sign extension
expressions.

There are two possibilities I see here -- the problem can be solved within
LoopStrengthReduce or within ScalarEvolution.

# Within LoopStrengthReduce

Solving it within LoopStrengthReduce is tricky: due to the
representation issue in SCEV mentioned above, the fact that a scale or
base register is actually a narrower than needed and has to be zero /
sign extended before use needs to be represented separately from the
SCEV*.  Every place in LoopStrengthReduce that analyzes and
manipulates registers needs to be updated to understand this bit of
side information ([2] was buggy because not every place in LSR had
been updated this way in the change).

The cleanest approach I could come up with is to introduce a `class
Register` that wraps a SCEV and some associated metadata; and change
LSR to reason about registers using this `Register` class (and not raw
SCEV pointers).  All of the "this i64 register is actually a zero
extended i32 register" can then be made to live within this `Register`
class, making the structure of the code more obvious.

# Within ScalarEvolution

Another way to fix the problem is to introduce a new kind of SCEV node
-- a `SCEVOpaqueExpr`.  `SCEVOpaqueExpr` wraps another SCEV
expression, and prevents optimizations from optimizing across the
`SCEVOpaqueExpr` boundary.  This will allow us to keep representing
registers as SCEV pointers as they are today while solving the zext
representation problem.  After factoring a SCEV expression 'E' into
'SCEVZeroExtend T', 'SCEVZeroExtend T' can be "frozen"
by representing
it as 'SCEVZeroExtend (SCEVOpaqueExpr T)'.  Since 'SCEVZeroExtend
(SCEVOpaqueExpr T)' computes the same value as 'SCEVZeroExtend T'
='E', there is no need to update the parts of LSR that analyze / modify
registers to preserve correctness.

Note: this type of "opaque" node also can be used to force the SCEV
expander to generate a particular sequence of expressions; something
that is achieved by the "expand(getSCEVUnknown(expand(Op)))" idiom in
LSR currently.



So far I'm leaning towards the second approach, but given that it would
be a substantial change (though potentially useful elsewhere), I'd like to
hear what others think before diving in.

-- Sanjoy


[1]: http://lists.llvm.org/pipermail/llvm-dev/2015-April/084434.html
[2]: http://reviews.llvm.org/rL243939

Hal Finkel via llvm-dev

2015-Aug-17 01:11 UTC

head link

[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution

----- Original Message -----> From: "Sanjoy Das via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>, "Andrew
Trick" <atrick at apple.com>, "Quentin Colombet"
> <qcolombet at apple.com>, "Dan Gohman" <dan433584 at
gmail.com>
> Sent: Sunday, August 16, 2015 7:42:16 PM
> Subject: [llvm-dev] RFC for a design change in LoopStrengthReduce /
ScalarEvolution
> 
> This is related to an issue in loop strength reduction [1] that I've
> been trying to fix on and off for a while.  [1] has a more detailed
> description of the issue and an example, but briefly put, I want LSR
> to consider formulae that have "Zext T" as base and/or scale
> registers, and to appropriately rate such formulae.
> 
> My first attempt[2] at fixing this was buggy and had to be reverted.
> While it is certainly possible to "fix" the bug in [2], the fixed
> version is ugly and fragile, and I'd prefer having something cleaner.
> 
> The key issue here is that currently there is no way to represent
> "Zext T" as a *distinct computation* within SCEV -- a SCEV
expression
> of the form "Zext T" (any SCEV expression, actually) is always
> maximally simplified on construction.  This means I cannot "factor
> out" a Zext from a base register (say) and force SCEV to maintain it
> as a "SCEVZeroExtend T"; SCEV is free to re-simplify
"SCEVZeroExtend
> T" to something equivalent on construction.  This is not just an
> issue
> in theory -- factoring out a 'SCEVZeroExtend' from an expression
'E'
> requires us to prove that 'E' == 'SCEVZeroExtend T'.  SCEV
can use
> that exact same fact to "simplify" 'SCEVZeroExtend T'
back to 'E' on
> construction.  There is an analogous issue w.r.t. sign extension
> expressions.
I don't understand why you want to factor out the information, exactly. It
seems like what you need is a function like:

  unsigned getMinLeadingZeros(const SCEV *);

then, if you want to get the non-extended expression, you can just apply an
appropriate truncation. I assume, however, that I'm missing something.

Adding simplification barriers into SCEV expressions seems like a dangerous
idea, especially since LSR is not the only user of SCEV.

 -Hal
> 
> There are two possibilities I see here -- the problem can be solved
> within
> LoopStrengthReduce or within ScalarEvolution.
> 
> # Within LoopStrengthReduce
> 
> Solving it within LoopStrengthReduce is tricky: due to the
> representation issue in SCEV mentioned above, the fact that a scale
> or
> base register is actually a narrower than needed and has to be zero /
> sign extended before use needs to be represented separately from the
> SCEV*.  Every place in LoopStrengthReduce that analyzes and
> manipulates registers needs to be updated to understand this bit of
> side information ([2] was buggy because not every place in LSR had
> been updated this way in the change).
> 
> The cleanest approach I could come up with is to introduce a `class
> Register` that wraps a SCEV and some associated metadata; and change
> LSR to reason about registers using this `Register` class (and not
> raw
> SCEV pointers).  All of the "this i64 register is actually a zero
> extended i32 register" can then be made to live within this
> `Register`
> class, making the structure of the code more obvious.
> 
> # Within ScalarEvolution
> 
> Another way to fix the problem is to introduce a new kind of SCEV
> node
> -- a `SCEVOpaqueExpr`.  `SCEVOpaqueExpr` wraps another SCEV
> expression, and prevents optimizations from optimizing across the
> `SCEVOpaqueExpr` boundary.  This will allow us to keep representing
> registers as SCEV pointers as they are today while solving the zext
> representation problem.  After factoring a SCEV expression 'E' into
> 'SCEVZeroExtend T', 'SCEVZeroExtend T' can be
"frozen" by
> representing
> it as 'SCEVZeroExtend (SCEVOpaqueExpr T)'.  Since
'SCEVZeroExtend
> (SCEVOpaqueExpr T)' computes the same value as 'SCEVZeroExtend
T' => 'E', there is no need to update the parts of LSR that
analyze /
> modify
> registers to preserve correctness.
> 
> Note: this type of "opaque" node also can be used to force the
SCEV
> expander to generate a particular sequence of expressions; something
> that is achieved by the "expand(getSCEVUnknown(expand(Op)))"
idiom in
> LSR currently.
> 
> 
> 
> So far I'm leaning towards the second approach, but given that it
> would
> be a substantial change (though potentially useful elsewhere), I'd
> like to
> hear what others think before diving in.
> 
> -- Sanjoy
> 
> 
> [1]: http://lists.llvm.org/pipermail/llvm-dev/2015-April/084434.html
> [2]: http://reviews.llvm.org/rL243939
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org         http://llvm.cs.uiuc.edu
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Sanjoy Das via llvm-dev

2015-Aug-17 02:43 UTC

head link

[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution

> I don't understand why you want to factor out the information,
> exactly. It seems like what you need is a function like:
>
>   unsigned getMinLeadingZeros(const SCEV *);
>
> then, if you want to get the non-extended expression, you can just
> apply an appropriate truncation. I assume, however, that I'm missing
> something.
The problem is not about how to codegen a specific SCEV expression
efficiently, but about the overall solution LSR converges to (which
includes considering how it will codegen *other* SCEVs).  In the
latter sense there is a difference between `X` and `(trunc (zext X))`
even though both of them produce the same value (and possibly are
equally efficient).

The example in [1] is roughly like this at the high level:

You have an i32 induction variable `V` == `{VStart,+,1}`.
`{VStart,+,1}` does not unsigned-overflow given the loop bounds.  Two
interesting uses in the loop are something like (in this mail by zext
I'll always mean zext from i32 to i64) `GEP @Global, zext(V)` and
`V ult limit`.

Because SCEV can prove that `{VStart,+,1}` does not unsigned-overflow,
`GEP @Global, zext(V)` can be computed as `GEP (@Global + zext
VStart), {i64 0,+,1}`.  Similarly, `V` == `trunc (zext V)` =`trunc {zext
VStart,+,1}` == `trunc {zext VStart,+,1}` =`trunc({i64 0,+,1}) + trunc (zext
VStart)` =`trunc({i64 0,+,1}) + VStart`.  This is the solution LSR ends up
choosing.  However, the cheaper solution here is

`GEP @Global, zext(V)` == `GEP @Global, zext({VStart,+,1})`
and
`V ult limit` == `{VStart,+,1} ult limit`

(i.e. pretty much the direct naive lowering).

This naive lowering is better because we can use the increment in
`{VStart,+,1}` to zero extend the value implicitly on x86 (you can
find the generated machine code at [1]).

The reason why LSR does not pick this cheaper solution is because it
"forgets" that `{zext VStart,+,1}` == `zext({VStart,+,1})`.  I'm
trying to fix this by adding a rule that replaces `{zext X,+,zext Y}`
with `zext({X,+,Y})` when possible on architectures that have "free"
zero extensions, so that if there is a solution involving the narrower
induction variable, it will be found.

Now I could use the trick you mentioned in the place in LSR where it
computes the cost of a solution (i.e. when looking for "common"
registers see if one register is the zero extension of another).  I
have not tried it, but I suspect it will be fairly tricky -- a given
solution will now have more than one "cost" depending on how you
interpret specific registers (as truncations of some extended value or
as a native wide value) and you'd have to search over the possible
interpretations of a specific solution.
> Adding simplification barriers into SCEV expressions seems like a
> dangerous idea, especially since LSR is not the only user of SCEV.
So I don't intend for SCEV to produce SCEVOpaqueExpr nodes itself.
There would instead be a 'const SCEV *getOpaqueExpr(const SCEV *)'
function that clients of SCEV can use to create an opaque expression.
Then most of the simplification routines SCEV will be changed to have
the moral equivalent of 'if (isa<SCEVOpaqueExpr>(E)) continue;'.

-- Sanjoy

[1]: http://lists.llvm.org/pipermail/llvm-dev/2015-April/084434.html

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Aug 2015 - RFC for a design change in LoopStrengthReduce / ScalarEvolution

[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution

[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution

[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution

Possibly Parallel Threads