Sanjoy Das via llvm-dev
2015-Aug-18 05:35 UTC
[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution
> Of course, and the point is that, for example, on x86_64, the zext here is free. I'm still trying to understand the problem... > > In the example you provided in your previous e-mail, we choose the solution: > > `GEP @Global, zext(V)` -> `GEP (@Global + zext VStart), {i64 0,+,1}` > `V` -> `trunc({i64 0,+,1}) + VStart` > > instead of the actually-better solution: > > `GEP @Global, zext(V)` -> `GEP @Global, zext({VStart,+,1})` > `V` -> `{VStart,+,1}` > > where LSR never considers the latter case because it transforms: > > `zext({VStart,+,1})` to `{zext VStart,+,1}` > > and, thus, never considers the formula with zext on the outside? Your proposed solution is that LSR should be able to create: > > zext(opaque({VStart,+,1})) > > forcing it to keep all extensions as the outermost operators. Is that right?Yes, precisely.> Will this interfere with SCEV's ability to prove wrapping properties of those expressions? If so, does that matter?SCEV won't be able (by design) to prove no-overflow for `opaque({VStart,+,1})` but it should still be able to prove no overflow for `{VStart,+,1}` as usual.> Sure, but fixing LSR by creating a local-handicapping mechanism for > SCEV seems somehow unfortunate. Are we going to oscillate from picking > from one part of the solution space to picking from a different part, > without particular regard for which might be better?While I found the SCEVOpaqueExpr idea easier to reason with, by no means am I suggesting that it is clearly the better solution. :) Would you rather this be solved fully within LSR instead? -- Sanjoy
Hal Finkel via llvm-dev
2015-Aug-18 19:38 UTC
[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution
----- Original Message -----> From: "Sanjoy Das" <sanjoy at playingwithpointers.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Andrew Trick" <atrick at apple.com>, "Quentin Colombet" > <qcolombet at apple.com>, "Dan Gohman" <dan433584 at gmail.com> > Sent: Tuesday, August 18, 2015 12:35:50 AM > Subject: Re: [llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution > > > Of course, and the point is that, for example, on x86_64, the zext > > here is free. I'm still trying to understand the problem... > > > > In the example you provided in your previous e-mail, we choose the > > solution: > > > > `GEP @Global, zext(V)` -> `GEP (@Global + zext VStart), {i64 > > 0,+,1}` > > `V` -> `trunc({i64 0,+,1}) + VStart` > > > > instead of the actually-better solution: > > > > `GEP @Global, zext(V)` -> `GEP @Global, zext({VStart,+,1})` > > `V` -> `{VStart,+,1}` > > > > where LSR never considers the latter case because it transforms: > > > > `zext({VStart,+,1})` to `{zext VStart,+,1}` > > > > and, thus, never considers the formula with zext on the outside? > > Your proposed solution is that LSR should be able to create: > > > > zext(opaque({VStart,+,1})) > > > > forcing it to keep all extensions as the outermost operators. Is > > that right? > > Yes, precisely. > > > Will this interfere with SCEV's ability to prove wrapping > > properties of those expressions? If so, does that matter? > > SCEV won't be able (by design) to prove no-overflow for > `opaque({VStart,+,1})` but it should still be able to prove no > overflow for `{VStart,+,1}` as usual. > > > Sure, but fixing LSR by creating a local-handicapping mechanism for > > SCEV seems somehow unfortunate. Are we going to oscillate from > > picking > > from one part of the solution space to picking from a different > > part, > > without particular regard for which might be better? > > While I found the SCEVOpaqueExpr idea easier to reason with, by no > means am I suggesting that it is clearly the better solution. :) > > Would you rather this be solved fully within LSR instead?I don't claim to understand LSR well enough to have an informed preference here, and of course, this might also depend on *how* you solve it in LSR. That having been said, in an abstract sense, solving the problem without blocking analysis within SCEV seems like something should at least be explored. Also, it seems like doing this in LSR can provide benefits the SCEV-based approach can't (but I could have misunderstood and the two proposed solutions search equivalent parts of the solution space). Thanks again, Hal> > -- Sanjoy >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Ramanarayanan, Ramshankar via llvm-dev
2015-Aug-21 16:22 UTC
[llvm-dev] RFC for a design change in LoopStrengthReduce / ScalarEvolution
I also need some fixes in LSR though it may or may not be the same as
Sanjoy's. One of the TODO from the file is: More sophistication in the way
Formulae are generated and filtered.
I guess that `{zext VStart,+,1}` appears during induction variable simplify.
Some cases with address calculations and SIB:
Reg1 = sext (offset) to i64
Loop:
Load* BaseAddr [BaseReg,Reg1,Scale];
Reg1 = Reg1 + 1
Backedge
Or
Reg1 = sext (offset)*Scale to i64
Loop:
Load* BaseAddr [0,Reg1,1];
Reg1 = Reg1 + Scale
Backedge
Or
Reg1 = sext (offset1) to i64
Reg1 = Scale*Reg1
Reg2 = offset2
Loop:
Reg3 = sext(Reg2)
Load* BaseAddr [Reg1,Reg3,Scale]
Reg2 = <somethingelse>
Reg1 = Reg1 + Stride*Scale
Backedge
If LSR finds and optimizes: Reg1 = Reg1 + Stride*Scale, sext on Start looks like
a better thing to have, as IV got promoted to i64, though maybe this is not all
that important. If NW propagation didn’t help, the sext would be inside the
loop. The cost of this may vary with different targets: in x86_64, the register
cost of zext is 0 and though sext may or may not need an extra register, it
needs an extra instruction (movslq or cltq).
Another issue is that LSR just looks at inner most loops, instead of all array
accesses at the current loop-level. Also, it looks like we need to input thick
Addrec expressions to LSR, for ex: for C1 * (+ (X, {Y, +, C2}), C1 would need to
be substituted in the add to obtain an Nary Addrec, prior to LSR opt. Else it is
going to consider {Y,+,C2} for LSR replacement.
I am not able to make the connection between SIB and all this, as we have
addrecs. An addrec is {X, +, Y} or + (X, {0,+,Y}), so basically the LSR will
insert a new add instruction based on {0,+,Y}. I suppose if there is more than
one such Addrec subexpression in a loop, i.e. involving different Stepvalues,
LSR may not be doing the better thing vs having an offset register in a SIB
instruction. Maybe I am missing something. Thoughts?
-----Original Message-----
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Hal
Finkel via llvm-dev
Sent: Tuesday, August 18, 2015 3:38 PM
To: Sanjoy Das
Cc: llvm-dev
Subject: Re: [llvm-dev] RFC for a design change in LoopStrengthReduce /
ScalarEvolution
----- Original Message -----> From: "Sanjoy Das" <sanjoy at playingwithpointers.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Andrew
Trick" <atrick at apple.com>, "Quentin Colombet"
> <qcolombet at apple.com>, "Dan Gohman" <dan433584 at
gmail.com>
> Sent: Tuesday, August 18, 2015 12:35:50 AM
> Subject: Re: [llvm-dev] RFC for a design change in LoopStrengthReduce
> / ScalarEvolution
>
> > Of course, and the point is that, for example, on x86_64, the zext
> > here is free. I'm still trying to understand the problem...
> >
> > In the example you provided in your previous e-mail, we choose the
> > solution:
> >
> > `GEP @Global, zext(V)` -> `GEP (@Global + zext VStart), {i64
> > 0,+,1}`
> > `V` -> `trunc({i64 0,+,1}) + VStart`
> >
> > instead of the actually-better solution:
> >
> > `GEP @Global, zext(V)` -> `GEP @Global, zext({VStart,+,1})`
> > `V` -> `{VStart,+,1}`
> >
> > where LSR never considers the latter case because it transforms:
> >
> > `zext({VStart,+,1})` to `{zext VStart,+,1}`
> >
> > and, thus, never considers the formula with zext on the outside?
> > Your proposed solution is that LSR should be able to create:
> >
> > zext(opaque({VStart,+,1}))
> >
> > forcing it to keep all extensions as the outermost operators. Is
> > that right?
>
> Yes, precisely.
>
> > Will this interfere with SCEV's ability to prove wrapping
properties
> > of those expressions? If so, does that matter?
>
> SCEV won't be able (by design) to prove no-overflow for
> `opaque({VStart,+,1})` but it should still be able to prove no
> overflow for `{VStart,+,1}` as usual.
>
> > Sure, but fixing LSR by creating a local-handicapping mechanism for
> > SCEV seems somehow unfortunate. Are we going to oscillate from
> > picking from one part of the solution space to picking from a
> > different part, without particular regard for which might be better?
>
> While I found the SCEVOpaqueExpr idea easier to reason with, by no
> means am I suggesting that it is clearly the better solution. :)
>
> Would you rather this be solved fully within LSR instead?
I don't claim to understand LSR well enough to have an informed preference
here, and of course, this might also depend on *how* you solve it in LSR. That
having been said, in an abstract sense, solving the problem without blocking
analysis within SCEV seems like something should at least be explored. Also, it
seems like doing this in LSR can provide benefits the SCEV-based approach
can't (but I could have misunderstood and the two proposed solutions search
equivalent parts of the solution space).
Thanks again,
Hal
>
> -- Sanjoy
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev