Sanjoy Das via llvm-dev
2015-Aug-17 00:42 UTC
[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution
This is related to an issue in loop strength reduction [1] that I've been trying to fix on and off for a while. [1] has a more detailed description of the issue and an example, but briefly put, I want LSR to consider formulae that have "Zext T" as base and/or scale registers, and to appropriately rate such formulae. My first attempt[2] at fixing this was buggy and had to be reverted. While it is certainly possible to "fix" the bug in [2], the fixed version is ugly and fragile, and I'd prefer having something cleaner. The key issue here is that currently there is no way to represent "Zext T" as a *distinct computation* within SCEV -- a SCEV expression of the form "Zext T" (any SCEV expression, actually) is always maximally simplified on construction. This means I cannot "factor out" a Zext from a base register (say) and force SCEV to maintain it as a "SCEVZeroExtend T"; SCEV is free to re-simplify "SCEVZeroExtend T" to something equivalent on construction. This is not just an issue in theory -- factoring out a 'SCEVZeroExtend' from an expression 'E' requires us to prove that 'E' == 'SCEVZeroExtend T'. SCEV can use that exact same fact to "simplify" 'SCEVZeroExtend T' back to 'E' on construction. There is an analogous issue w.r.t. sign extension expressions. There are two possibilities I see here -- the problem can be solved within LoopStrengthReduce or within ScalarEvolution. # Within LoopStrengthReduce Solving it within LoopStrengthReduce is tricky: due to the representation issue in SCEV mentioned above, the fact that a scale or base register is actually a narrower than needed and has to be zero / sign extended before use needs to be represented separately from the SCEV*. Every place in LoopStrengthReduce that analyzes and manipulates registers needs to be updated to understand this bit of side information ([2] was buggy because not every place in LSR had been updated this way in the change). The cleanest approach I could come up with is to introduce a `class Register` that wraps a SCEV and some associated metadata; and change LSR to reason about registers using this `Register` class (and not raw SCEV pointers). All of the "this i64 register is actually a zero extended i32 register" can then be made to live within this `Register` class, making the structure of the code more obvious. # Within ScalarEvolution Another way to fix the problem is to introduce a new kind of SCEV node -- a `SCEVOpaqueExpr`. `SCEVOpaqueExpr` wraps another SCEV expression, and prevents optimizations from optimizing across the `SCEVOpaqueExpr` boundary. This will allow us to keep representing registers as SCEV pointers as they are today while solving the zext representation problem. After factoring a SCEV expression 'E' into 'SCEVZeroExtend T', 'SCEVZeroExtend T' can be "frozen" by representing it as 'SCEVZeroExtend (SCEVOpaqueExpr T)'. Since 'SCEVZeroExtend (SCEVOpaqueExpr T)' computes the same value as 'SCEVZeroExtend T' ='E', there is no need to update the parts of LSR that analyze / modify registers to preserve correctness. Note: this type of "opaque" node also can be used to force the SCEV expander to generate a particular sequence of expressions; something that is achieved by the "expand(getSCEVUnknown(expand(Op)))" idiom in LSR currently. So far I'm leaning towards the second approach, but given that it would be a substantial change (though potentially useful elsewhere), I'd like to hear what others think before diving in. -- Sanjoy [1]: http://lists.llvm.org/pipermail/llvm-dev/2015-April/084434.html [2]: http://reviews.llvm.org/rL243939
Hal Finkel via llvm-dev
2015-Aug-17 01:11 UTC
[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution
----- Original Message -----> From: "Sanjoy Das via llvm-dev" <llvm-dev at lists.llvm.org> > To: "llvm-dev" <llvm-dev at lists.llvm.org>, "Andrew Trick" <atrick at apple.com>, "Quentin Colombet" > <qcolombet at apple.com>, "Dan Gohman" <dan433584 at gmail.com> > Sent: Sunday, August 16, 2015 7:42:16 PM > Subject: [llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution > > This is related to an issue in loop strength reduction [1] that I've > been trying to fix on and off for a while. [1] has a more detailed > description of the issue and an example, but briefly put, I want LSR > to consider formulae that have "Zext T" as base and/or scale > registers, and to appropriately rate such formulae. > > My first attempt[2] at fixing this was buggy and had to be reverted. > While it is certainly possible to "fix" the bug in [2], the fixed > version is ugly and fragile, and I'd prefer having something cleaner. > > The key issue here is that currently there is no way to represent > "Zext T" as a *distinct computation* within SCEV -- a SCEV expression > of the form "Zext T" (any SCEV expression, actually) is always > maximally simplified on construction. This means I cannot "factor > out" a Zext from a base register (say) and force SCEV to maintain it > as a "SCEVZeroExtend T"; SCEV is free to re-simplify "SCEVZeroExtend > T" to something equivalent on construction. This is not just an > issue > in theory -- factoring out a 'SCEVZeroExtend' from an expression 'E' > requires us to prove that 'E' == 'SCEVZeroExtend T'. SCEV can use > that exact same fact to "simplify" 'SCEVZeroExtend T' back to 'E' on > construction. There is an analogous issue w.r.t. sign extension > expressions.I don't understand why you want to factor out the information, exactly. It seems like what you need is a function like: unsigned getMinLeadingZeros(const SCEV *); then, if you want to get the non-extended expression, you can just apply an appropriate truncation. I assume, however, that I'm missing something. Adding simplification barriers into SCEV expressions seems like a dangerous idea, especially since LSR is not the only user of SCEV. -Hal> > There are two possibilities I see here -- the problem can be solved > within > LoopStrengthReduce or within ScalarEvolution. > > # Within LoopStrengthReduce > > Solving it within LoopStrengthReduce is tricky: due to the > representation issue in SCEV mentioned above, the fact that a scale > or > base register is actually a narrower than needed and has to be zero / > sign extended before use needs to be represented separately from the > SCEV*. Every place in LoopStrengthReduce that analyzes and > manipulates registers needs to be updated to understand this bit of > side information ([2] was buggy because not every place in LSR had > been updated this way in the change). > > The cleanest approach I could come up with is to introduce a `class > Register` that wraps a SCEV and some associated metadata; and change > LSR to reason about registers using this `Register` class (and not > raw > SCEV pointers). All of the "this i64 register is actually a zero > extended i32 register" can then be made to live within this > `Register` > class, making the structure of the code more obvious. > > # Within ScalarEvolution > > Another way to fix the problem is to introduce a new kind of SCEV > node > -- a `SCEVOpaqueExpr`. `SCEVOpaqueExpr` wraps another SCEV > expression, and prevents optimizations from optimizing across the > `SCEVOpaqueExpr` boundary. This will allow us to keep representing > registers as SCEV pointers as they are today while solving the zext > representation problem. After factoring a SCEV expression 'E' into > 'SCEVZeroExtend T', 'SCEVZeroExtend T' can be "frozen" by > representing > it as 'SCEVZeroExtend (SCEVOpaqueExpr T)'. Since 'SCEVZeroExtend > (SCEVOpaqueExpr T)' computes the same value as 'SCEVZeroExtend T' => 'E', there is no need to update the parts of LSR that analyze / > modify > registers to preserve correctness. > > Note: this type of "opaque" node also can be used to force the SCEV > expander to generate a particular sequence of expressions; something > that is achieved by the "expand(getSCEVUnknown(expand(Op)))" idiom in > LSR currently. > > > > So far I'm leaning towards the second approach, but given that it > would > be a substantial change (though potentially useful elsewhere), I'd > like to > hear what others think before diving in. > > -- Sanjoy > > > [1]: http://lists.llvm.org/pipermail/llvm-dev/2015-April/084434.html > [2]: http://reviews.llvm.org/rL243939 > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org http://llvm.cs.uiuc.edu > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Sanjoy Das via llvm-dev
2015-Aug-17 02:43 UTC
[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution
> I don't understand why you want to factor out the information, > exactly. It seems like what you need is a function like: > > unsigned getMinLeadingZeros(const SCEV *); > > then, if you want to get the non-extended expression, you can just > apply an appropriate truncation. I assume, however, that I'm missing > something.The problem is not about how to codegen a specific SCEV expression efficiently, but about the overall solution LSR converges to (which includes considering how it will codegen *other* SCEVs). In the latter sense there is a difference between `X` and `(trunc (zext X))` even though both of them produce the same value (and possibly are equally efficient). The example in [1] is roughly like this at the high level: You have an i32 induction variable `V` == `{VStart,+,1}`. `{VStart,+,1}` does not unsigned-overflow given the loop bounds. Two interesting uses in the loop are something like (in this mail by zext I'll always mean zext from i32 to i64) `GEP @Global, zext(V)` and `V ult limit`. Because SCEV can prove that `{VStart,+,1}` does not unsigned-overflow, `GEP @Global, zext(V)` can be computed as `GEP (@Global + zext VStart), {i64 0,+,1}`. Similarly, `V` == `trunc (zext V)` =`trunc {zext VStart,+,1}` == `trunc {zext VStart,+,1}` =`trunc({i64 0,+,1}) + trunc (zext VStart)` =`trunc({i64 0,+,1}) + VStart`. This is the solution LSR ends up choosing. However, the cheaper solution here is `GEP @Global, zext(V)` == `GEP @Global, zext({VStart,+,1})` and `V ult limit` == `{VStart,+,1} ult limit` (i.e. pretty much the direct naive lowering). This naive lowering is better because we can use the increment in `{VStart,+,1}` to zero extend the value implicitly on x86 (you can find the generated machine code at [1]). The reason why LSR does not pick this cheaper solution is because it "forgets" that `{zext VStart,+,1}` == `zext({VStart,+,1})`. I'm trying to fix this by adding a rule that replaces `{zext X,+,zext Y}` with `zext({X,+,Y})` when possible on architectures that have "free" zero extensions, so that if there is a solution involving the narrower induction variable, it will be found. Now I could use the trick you mentioned in the place in LSR where it computes the cost of a solution (i.e. when looking for "common" registers see if one register is the zero extension of another). I have not tried it, but I suspect it will be fairly tricky -- a given solution will now have more than one "cost" depending on how you interpret specific registers (as truncations of some extended value or as a native wide value) and you'd have to search over the possible interpretations of a specific solution.> Adding simplification barriers into SCEV expressions seems like a > dangerous idea, especially since LSR is not the only user of SCEV.So I don't intend for SCEV to produce SCEVOpaqueExpr nodes itself. There would instead be a 'const SCEV *getOpaqueExpr(const SCEV *)' function that clients of SCEV can use to create an opaque expression. Then most of the simplification routines SCEV will be changed to have the moral equivalent of 'if (isa<SCEVOpaqueExpr>(E)) continue;'. -- Sanjoy [1]: http://lists.llvm.org/pipermail/llvm-dev/2015-April/084434.html
Apparently Analagous Threads
- RFC for a design change in LoopStrengthReduce / ScalarEvolution
- RFC for a design change in LoopStrengthReduce / ScalarEvolution
- RFC for a design change in LoopStrengthReduce / ScalarEvolution
- [ScalarEvolution][SCEV] no-wrap flags dependent on order of getSCEV() calls
- [ScalarEvolution][SCEV] no-wrap flags dependent on order of getSCEV() calls