Sanjoy Das via llvm-dev
2015-Aug-18 05:35 UTC
[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution
> Of course, and the point is that, for example, on x86_64, the zext here is free. I'm still trying to understand the problem... > > In the example you provided in your previous e-mail, we choose the solution: > > `GEP @Global, zext(V)` -> `GEP (@Global + zext VStart), {i64 0,+,1}` > `V` -> `trunc({i64 0,+,1}) + VStart` > > instead of the actually-better solution: > > `GEP @Global, zext(V)` -> `GEP @Global, zext({VStart,+,1})` > `V` -> `{VStart,+,1}` > > where LSR never considers the latter case because it transforms: > > `zext({VStart,+,1})` to `{zext VStart,+,1}` > > and, thus, never considers the formula with zext on the outside? Your proposed solution is that LSR should be able to create: > > zext(opaque({VStart,+,1})) > > forcing it to keep all extensions as the outermost operators. Is that right?Yes, precisely.> Will this interfere with SCEV's ability to prove wrapping properties of those expressions? If so, does that matter?SCEV won't be able (by design) to prove no-overflow for `opaque({VStart,+,1})` but it should still be able to prove no overflow for `{VStart,+,1}` as usual.> Sure, but fixing LSR by creating a local-handicapping mechanism for > SCEV seems somehow unfortunate. Are we going to oscillate from picking > from one part of the solution space to picking from a different part, > without particular regard for which might be better?While I found the SCEVOpaqueExpr idea easier to reason with, by no means am I suggesting that it is clearly the better solution. :) Would you rather this be solved fully within LSR instead? -- Sanjoy
Hal Finkel via llvm-dev
2015-Aug-18 19:38 UTC
[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution
----- Original Message -----> From: "Sanjoy Das" <sanjoy at playingwithpointers.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Andrew Trick" <atrick at apple.com>, "Quentin Colombet" > <qcolombet at apple.com>, "Dan Gohman" <dan433584 at gmail.com> > Sent: Tuesday, August 18, 2015 12:35:50 AM > Subject: Re: [llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution > > > Of course, and the point is that, for example, on x86_64, the zext > > here is free. I'm still trying to understand the problem... > > > > In the example you provided in your previous e-mail, we choose the > > solution: > > > > `GEP @Global, zext(V)` -> `GEP (@Global + zext VStart), {i64 > > 0,+,1}` > > `V` -> `trunc({i64 0,+,1}) + VStart` > > > > instead of the actually-better solution: > > > > `GEP @Global, zext(V)` -> `GEP @Global, zext({VStart,+,1})` > > `V` -> `{VStart,+,1}` > > > > where LSR never considers the latter case because it transforms: > > > > `zext({VStart,+,1})` to `{zext VStart,+,1}` > > > > and, thus, never considers the formula with zext on the outside? > > Your proposed solution is that LSR should be able to create: > > > > zext(opaque({VStart,+,1})) > > > > forcing it to keep all extensions as the outermost operators. Is > > that right? > > Yes, precisely. > > > Will this interfere with SCEV's ability to prove wrapping > > properties of those expressions? If so, does that matter? > > SCEV won't be able (by design) to prove no-overflow for > `opaque({VStart,+,1})` but it should still be able to prove no > overflow for `{VStart,+,1}` as usual. > > > Sure, but fixing LSR by creating a local-handicapping mechanism for > > SCEV seems somehow unfortunate. Are we going to oscillate from > > picking > > from one part of the solution space to picking from a different > > part, > > without particular regard for which might be better? > > While I found the SCEVOpaqueExpr idea easier to reason with, by no > means am I suggesting that it is clearly the better solution. :) > > Would you rather this be solved fully within LSR instead?I don't claim to understand LSR well enough to have an informed preference here, and of course, this might also depend on *how* you solve it in LSR. That having been said, in an abstract sense, solving the problem without blocking analysis within SCEV seems like something should at least be explored. Also, it seems like doing this in LSR can provide benefits the SCEV-based approach can't (but I could have misunderstood and the two proposed solutions search equivalent parts of the solution space). Thanks again, Hal> > -- Sanjoy >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Ramanarayanan, Ramshankar via llvm-dev
2015-Aug-21 16:22 UTC
[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution
I also need some fixes in LSR though it may or may not be the same as Sanjoy's. One of the TODO from the file is: More sophistication in the way Formulae are generated and filtered. I guess that `{zext VStart,+,1}` appears during induction variable simplify. Some cases with address calculations and SIB: Reg1 = sext (offset) to i64 Loop: Load* BaseAddr [BaseReg,Reg1,Scale]; Reg1 = Reg1 + 1 Backedge Or Reg1 = sext (offset)*Scale to i64 Loop: Load* BaseAddr [0,Reg1,1]; Reg1 = Reg1 + Scale Backedge Or Reg1 = sext (offset1) to i64 Reg1 = Scale*Reg1 Reg2 = offset2 Loop: Reg3 = sext(Reg2) Load* BaseAddr [Reg1,Reg3,Scale] Reg2 = <somethingelse> Reg1 = Reg1 + Stride*Scale Backedge If LSR finds and optimizes: Reg1 = Reg1 + Stride*Scale, sext on Start looks like a better thing to have, as IV got promoted to i64, though maybe this is not all that important. If NW propagation didn’t help, the sext would be inside the loop. The cost of this may vary with different targets: in x86_64, the register cost of zext is 0 and though sext may or may not need an extra register, it needs an extra instruction (movslq or cltq). Another issue is that LSR just looks at inner most loops, instead of all array accesses at the current loop-level. Also, it looks like we need to input thick Addrec expressions to LSR, for ex: for C1 * (+ (X, {Y, +, C2}), C1 would need to be substituted in the add to obtain an Nary Addrec, prior to LSR opt. Else it is going to consider {Y,+,C2} for LSR replacement. I am not able to make the connection between SIB and all this, as we have addrecs. An addrec is {X, +, Y} or + (X, {0,+,Y}), so basically the LSR will insert a new add instruction based on {0,+,Y}. I suppose if there is more than one such Addrec subexpression in a loop, i.e. involving different Stepvalues, LSR may not be doing the better thing vs having an offset register in a SIB instruction. Maybe I am missing something. Thoughts? -----Original Message----- From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Hal Finkel via llvm-dev Sent: Tuesday, August 18, 2015 3:38 PM To: Sanjoy Das Cc: llvm-dev Subject: Re: [llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution ----- Original Message -----> From: "Sanjoy Das" <sanjoy at playingwithpointers.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Andrew Trick" <atrick at apple.com>, "Quentin Colombet" > <qcolombet at apple.com>, "Dan Gohman" <dan433584 at gmail.com> > Sent: Tuesday, August 18, 2015 12:35:50 AM > Subject: Re: [llvm-dev] RFC for a design change in LoopStrengthReduce > / ScalarEvolution > > > Of course, and the point is that, for example, on x86_64, the zext > > here is free. I'm still trying to understand the problem... > > > > In the example you provided in your previous e-mail, we choose the > > solution: > > > > `GEP @Global, zext(V)` -> `GEP (@Global + zext VStart), {i64 > > 0,+,1}` > > `V` -> `trunc({i64 0,+,1}) + VStart` > > > > instead of the actually-better solution: > > > > `GEP @Global, zext(V)` -> `GEP @Global, zext({VStart,+,1})` > > `V` -> `{VStart,+,1}` > > > > where LSR never considers the latter case because it transforms: > > > > `zext({VStart,+,1})` to `{zext VStart,+,1}` > > > > and, thus, never considers the formula with zext on the outside? > > Your proposed solution is that LSR should be able to create: > > > > zext(opaque({VStart,+,1})) > > > > forcing it to keep all extensions as the outermost operators. Is > > that right? > > Yes, precisely. > > > Will this interfere with SCEV's ability to prove wrapping properties > > of those expressions? If so, does that matter? > > SCEV won't be able (by design) to prove no-overflow for > `opaque({VStart,+,1})` but it should still be able to prove no > overflow for `{VStart,+,1}` as usual. > > > Sure, but fixing LSR by creating a local-handicapping mechanism for > > SCEV seems somehow unfortunate. Are we going to oscillate from > > picking from one part of the solution space to picking from a > > different part, without particular regard for which might be better? > > While I found the SCEVOpaqueExpr idea easier to reason with, by no > means am I suggesting that it is clearly the better solution. :) > > Would you rather this be solved fully within LSR instead?I don't claim to understand LSR well enough to have an informed preference here, and of course, this might also depend on *how* you solve it in LSR. That having been said, in an abstract sense, solving the problem without blocking analysis within SCEV seems like something should at least be explored. Also, it seems like doing this in LSR can provide benefits the SCEV-based approach can't (but I could have misunderstood and the two proposed solutions search equivalent parts of the solution space). Thanks again, Hal> > -- Sanjoy >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev