[+CC Andy] Hi Elena, I don't have any fundamental issues with teaching SCEV about floating point types, but given this will be a major change, I think a high level roadmap should be discussed on llvm-dev before we start reviewing and committing changes. Here are some issues that I think are worth discussing: - Core motivation: why do we even care about optimizing floating point induction variables? What situations are they common in? Do programmers _expect_ compilers to optimize them well? (I haven't worked on our vectorizers so pardon the possibly stupid question) in the example you gave, why do you need SCEV to analyze the increment to vectorize the loop (i.e how does it help)? What are some other concrete cases you'll want to optimize? - I presume you'll want SCEV expressions for `sitofp` and `uitofp`. (The most important question:) With these in the game, what is the canonical representation of SCEV expressions that can be expressed as, say, both `sitofp(A + B)` and `sitofp(A) + sitofp(B)`? Will we have a way to mark expressions (like we have `nsw` and `nuw` for `sext` and `zext`) which we can distribute `sitofp` and `uitofp` over? Same questions for `fptosi` and `fptoui`. - How will you partition the logic between floating and integer expressions in SCEV-land? Will you have, say, `SCEVAddExpr` do different things based on type, or will you split it into `SCEVIAddExpr` and `SCEVFAddExpr`? [0] * There are likely to be similarities too -- e.g. the "inductive" or "control flow" aspect of `SCEVAddRecExpr` is likely to be common between floating point add recurrences[1], and integer add recurrences; and part of figuring out the partitioning is also figuring out how to re-use these bits of logic. [0]: I'll prefer the latter since e.g. integer addition is associative, but floating point addition isn't; and it is better to force programmers to handle the two operations differently. [1]: For instance, things like this: https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/ScalarEvolution.cpp#L7564 are likely to stay common between floating point and integer add recs. -- Sanjoy
> On May 16, 2016, at 2:42 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > [+CC Andy] > > Hi Elena, > > I don't have any fundamental issues with teaching SCEV about floating > point types, but given this will be a major change, I think a high > level roadmap should be discussed on llvm-dev before we start > reviewing and committing changes. > > Here are some issues that I think are worth discussing: > > - Core motivation: why do we even care about optimizing floating > point induction variables? What situations are they common in? Do > programmers _expect_ compilers to optimize them well? (I haven't > worked on our vectorizers so pardon the possibly stupid question) > in the example you gave, why do you need SCEV to analyze the > increment to vectorize the loop (i.e how does it help)? What are > some other concrete cases you'll want to optimize? > > - I presume you'll want SCEV expressions for `sitofp` and `uitofp`. > (The most important question:) With these in the game, what is the > canonical representation of SCEV expressions that can be expressed > as, say, both `sitofp(A + B)` and `sitofp(A) + sitofp(B)`? > > Will we have a way to mark expressions (like we have `nsw` and > `nuw` for `sext` and `zext`) which we can distribute `sitofp` and > `uitofp` over? > > Same questions for `fptosi` and `fptoui`. > > - How will you partition the logic between floating and integer > expressions in SCEV-land? Will you have, say, `SCEVAddExpr` do > different things based on type, or will you split it into > `SCEVIAddExpr` and `SCEVFAddExpr`? [0] > > * There are likely to be similarities too -- e.g. the "inductive" > or "control flow" aspect of `SCEVAddRecExpr` is likely to be > common between floating point add recurrences[1], and integer add > recurrences; and part of figuring out the partitioning is also > figuring out how to re-use these bits of logic. > > > [0]: I'll prefer the latter since e.g. integer addition is > associative, but floating point addition isn't; and it is better to > force programmers to handle the two operations differently. > > [1]: For instance, things like this: > https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/ScalarEvolution.cpp#L7564 > are likely to stay common between floating point and integer add recs. > > — SanjoyI don’t think it makes sense to consider a floating point induction variable an “add recurrence” unless it the inductive value can be promoted to an integer. (The computational properties of chained recurrences are defined on integers). IndVarSimplify handleFloatingPointIV is supposed to promote your FP induction variables to integers so that SCEV can analyze them. After all the loop opts runs, the LSR is supposed to cleanup any casts during optimizeShadowIV. -Andy
> On May 16, 2016, at 3:03 PM, Andrew Trick <atrick at apple.com> wrote: > > >> On May 16, 2016, at 2:42 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: >> >> [+CC Andy] >> >> Hi Elena, >> >> I don't have any fundamental issues with teaching SCEV about floating >> point types, but given this will be a major change, I think a high >> level roadmap should be discussed on llvm-dev before we start >> reviewing and committing changes. >> >> Here are some issues that I think are worth discussing: >> >> - Core motivation: why do we even care about optimizing floating >> point induction variables? What situations are they common in? Do >> programmers _expect_ compilers to optimize them well? (I haven't >> worked on our vectorizers so pardon the possibly stupid question) >> in the example you gave, why do you need SCEV to analyze the >> increment to vectorize the loop (i.e how does it help)? What are >> some other concrete cases you'll want to optimize? >> >> - I presume you'll want SCEV expressions for `sitofp` and `uitofp`. >> (The most important question:) With these in the game, what is the >> canonical representation of SCEV expressions that can be expressed >> as, say, both `sitofp(A + B)` and `sitofp(A) + sitofp(B)`? >> >> Will we have a way to mark expressions (like we have `nsw` and >> `nuw` for `sext` and `zext`) which we can distribute `sitofp` and >> `uitofp` over? >> >> Same questions for `fptosi` and `fptoui`. >> >> - How will you partition the logic between floating and integer >> expressions in SCEV-land? Will you have, say, `SCEVAddExpr` do >> different things based on type, or will you split it into >> `SCEVIAddExpr` and `SCEVFAddExpr`? [0] >> >> * There are likely to be similarities too -- e.g. the "inductive" >> or "control flow" aspect of `SCEVAddRecExpr` is likely to be >> common between floating point add recurrences[1], and integer add >> recurrences; and part of figuring out the partitioning is also >> figuring out how to re-use these bits of logic. >> >> >> [0]: I'll prefer the latter since e.g. integer addition is >> associative, but floating point addition isn't; and it is better to >> force programmers to handle the two operations differently. >> >> [1]: For instance, things like this: >> https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/ScalarEvolution.cpp#L7564 >> are likely to stay common between floating point and integer add recs. >> >> — Sanjoy > > I don’t think it makes sense to consider a floating point induction variable an “add recurrence” unless it the inductive value can be promoted to an integer. (The computational properties of chained recurrences are defined on integers). > > IndVarSimplify handleFloatingPointIV is supposed to promote your FP induction variables to integers so that SCEV can analyze them. After all the loop opts runs, the LSR is supposed to cleanup any casts during optimizeShadowIV.That said, there are probably many reasons (e.g. non-integer initial value) that you can’t promote a FP IV to an Integer, but you may still want to vectorize it in fast-math mode. I’ll leave it to MichaelZ and Adam to comment on how best to prepare such loops for vectorization. -Andy
Demikhovsky, Elena via llvm-dev
2016-May-17 10:14 UTC
[llvm-dev] Working on FP SCEV Analysis
Hi Sanjoy, Please see my answers bellow: - Core motivation: why do we even care about optimizing floating point induction variables? What situations are they common in? Do programmers _expect_ compilers to optimize them well? (I haven't worked on our vectorizers so pardon the possibly stupid question) in the example you gave, why do you need SCEV to analyze the increment to vectorize the loop (i.e how does it help)? What are some other concrete cases you'll want to optimize? [Demikhovsky, Elena] I gave an example of loop that can be vectorized in the fast-math mode. ICC compiler vectorizes loops with *primary* and *secondary* IVs: This is the example for *primary* induction: (1) for (float i = 0.5; i < 0.75; i+=0.05) {} → i is a "primary" IV And for *secondary*: (2) for (int i = 0, float x = start; i < N; i++, x += delta) {} → x is a "secondary" IV Now I'm working only on (2) - I presume you'll want SCEV expressions for `sitofp` and `uitofp`. [Demikhovsky, Elena] I'm adding these expressions, of course. They are similar to "truncate" and "zext", in terms of implementation. (The most important question:) With these in the game, what is the canonical representation of SCEV expressions that can be expressed as, say, both `sitofp(A + B)` and `sitofp(A) + sitofp(B)`? [Demikhovsky, Elena] Meanwhile I have (start + delta * sitofp(i)). I don't know how far we can go with FP simplification and under what flags. The first implementation does not assume that sitofp(A + B) is equal to sitofp(A) + sitofp(B) Will we have a way to mark expressions (like we have `nsw` and `nuw` for `sext` and `zext`) which we can distribute `sitofp` and `uitofp` over? [Demikhovsky, Elena] I assume that sitofp and uitofp should be 2 different operations. Same questions for `fptosi` and `fptoui`. [Demikhovsky, Elena] the same answer as above, because I don’t want to combine these operations - How will you partition the logic between floating and integer expressions in SCEV-land? Will you have, say, `SCEVAddExpr` do different things based on type, or will you split it into `SCEVIAddExpr` and `SCEVFAddExpr`? [0] [Demikhovsky, Elena] Yes, I’m introducing SCEVFAddExpr and SCEVFMulExpr - (start + delta * sitofp(i)) * There are likely to be similarities too -- e.g. the "inductive" or "control flow" aspect of `SCEVAddRecExpr` is likely to be common between floating point add recurrences[1], and integer add recurrences; and part of figuring out the partitioning is also figuring out how to re-use these bits of logic. [Demikhovsky, Elena] I’m adding SCEVFAddRecExpr to describe the recurrence of FP IV [0]: I'll prefer the latter since e.g. integer addition is associative, but floating point addition isn't; and it is better to force programmers to handle the two operations differently. [1]: For instance, things like this: https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/ScalarEvolution.cpp#L7564 are likely to stay common between floating point and integer add recs. -- Sanjoy --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160517/130bf88f/attachment.html>
>What situations are they common in?ICC Vectorizer made a paradigm shift a while ago. If there aren’t a clear reason why something can’t be vectorized, we should try our best to vectorize. The rest is a performance modeling (and priority to implement) question, not a capability question. We believe this is a good paradigm to follow in a vectorizer development. It was a big departure from “vectorize when all things look nice to vectorizer”. We shouldn’t give up vectorizing simply because programmer wrote a FP induction code.(*) Then, the next question is what’s the best solution for that problem, and extending SCEV appears to be one of the obvious directions to explore. Thanks, Hideki Saito Intel Compilers and Languages ---------------------- (*) Quick (and dirty) overview of vectorization legality Vectorization is a cross-iteration optimization. We need to have a solution for cross-iteration dependences. Forward dependencies are considered “safe for vectorization” since vector execution order naturally satisfy them. Backward dependencies are unsafe, unless vectorizer knows how to “break” them. Induction is cyclic dependence by nature and as such considered unsafe for vectorization, unless vectorizer knows how to break them. [Given a CFG that executes from top to bottom, forward dependence is the downward data dependence edge.] _____________________________________________ From: Demikhovsky, Elena Sent: Tuesday, May 17, 2016 3:15 AM To: Sanjoy Das <sanjoy at playingwithpointers.com>; Chandler Carruth <chandlerc at google.com> Cc: llvm-dev <llvm-dev at lists.llvm.org>; Hal Finkel (hfinkel at anl.gov) <hfinkel at anl.gov>; Adam Nemet (anemet at apple.com) <anemet at apple.com>; Andrew Trick <atrick at apple.com>; mzolotukhin at apple.com; Zaks, Ayal <ayal.zaks at intel.com>; Saito, Hideki <hideki.saito at intel.com> Subject: RE: [llvm-dev] Working on FP SCEV Analysis Hi Sanjoy, Please see my answers bellow: - Core motivation: why do we even care about optimizing floating point induction variables? What situations are they common in? Do programmers _expect_ compilers to optimize them well? (I haven't worked on our vectorizers so pardon the possibly stupid question) in the example you gave, why do you need SCEV to analyze the increment to vectorize the loop (i.e how does it help)? What are some other concrete cases you'll want to optimize? [Demikhovsky, Elena] I gave an example of loop that can be vectorized in the fast-math mode. ICC compiler vectorizes loops with *primary* and *secondary* IVs: This is the example for *primary* induction: (1) for (float i = 0.5; i < 0.75; i+=0.05) {} → i is a "primary" IV And for *secondary*: (2) for (int i = 0, float x = start; i < N; i++, x += delta) {} → x is a "secondary" IV Now I'm working only on (2) - I presume you'll want SCEV expressions for `sitofp` and `uitofp`. [Demikhovsky, Elena] I'm adding these expressions, of course. They are similar to "truncate" and "zext", in terms of implementation. (The most important question:) With these in the game, what is the canonical representation of SCEV expressions that can be expressed as, say, both `sitofp(A + B)` and `sitofp(A) + sitofp(B)`? [Demikhovsky, Elena] Meanwhile I have (start + delta * sitofp(i)). I don't know how far we can go with FP simplification and under what flags. The first implementation does not assume that sitofp(A + B) is equal to sitofp(A) + sitofp(B) Will we have a way to mark expressions (like we have `nsw` and `nuw` for `sext` and `zext`) which we can distribute `sitofp` and `uitofp` over? [Demikhovsky, Elena] I assume that sitofp and uitofp should be 2 different operations. Same questions for `fptosi` and `fptoui`. [Demikhovsky, Elena] the same answer as above, because I don’t want to combine these operations - How will you partition the logic between floating and integer expressions in SCEV-land? Will you have, say, `SCEVAddExpr` do different things based on type, or will you split it into `SCEVIAddExpr` and `SCEVFAddExpr`? [0] [Demikhovsky, Elena] Yes, I’m introducing SCEVFAddExpr and SCEVFMulExpr - (start + delta * sitofp(i)) * There are likely to be similarities too -- e.g. the "inductive" or "control flow" aspect of `SCEVAddRecExpr` is likely to be common between floating point add recurrences[1], and integer add recurrences; and part of figuring out the partitioning is also figuring out how to re-use these bits of logic. [Demikhovsky, Elena] I’m adding SCEVFAddRecExpr to describe the recurrence of FP IV [0]: I'll prefer the latter since e.g. integer addition is associative, but floating point addition isn't; and it is better to force programmers to handle the two operations differently. [1]: For instance, things like this: https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/ScalarEvolution.cpp#L7564 are likely to stay common between floating point and integer add recs. -- Sanjoy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160518/f406e553/attachment.html>
> On May 16, 2016, at 2:42 PM, Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > - Core motivation: why do we even care about optimizing floating > point induction variables? What situations are they common in? Do > programmers _expect_ compilers to optimize them well? (I haven't > worked on our vectorizers so pardon the possibly stupid question) > in the example you gave, why do you need SCEV to analyze the > increment to vectorize the loop (i.e how does it help)? What are > some other concrete cases you'll want to optimize?Graphics shading languages like GLSL or HLSL have floating point types as the “default” types. Integers weren’t even added until later revisions of GLSL. In that world, it’s not especially strange to imagine a loop counter written in floating point. —Owen -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160517/9948f60e/attachment.html>
Chandler Carruth via llvm-dev
2016-May-18 04:27 UTC
[llvm-dev] Working on FP SCEV Analysis
On Tue, May 17, 2016 at 8:49 PM Owen Anderson <resistor at mac.com> wrote:> > On May 16, 2016, at 2:42 PM, Sanjoy Das via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > - Core motivation: why do we even care about optimizing floating > point induction variables? What situations are they common in? Do > programmers _expect_ compilers to optimize them well? (I haven't > worked on our vectorizers so pardon the possibly stupid question) > in the example you gave, why do you need SCEV to analyze the > increment to vectorize the loop (i.e how does it help)? What are > some other concrete cases you'll want to optimize? > > > Graphics shading languages like GLSL or HLSL have floating point types as > the “default” types. Integers weren’t even added until later revisions of > GLSL. In that world, it’s not especially strange to imagine a loop counter > written in floating point. >But most of those are convertible to integer IVs which as Andy pointed out up the thread is something IndVarSimplify tries to do. A potential way to show a motivating use case is what Andy already outlined: cases where non-integer values are fundamentally used as part of the IV. Even then, I'd personally want to see further evidence of why the correct solution is to model the floating point IV in SCEV rather than find a more powerful way of converting the IV to an integer that models the non-integer values taken on by the IV. As an example, if the use case is the following code with appropriate flags to relax IEEE semantics so this looks like normal algabra etc: for (float f = 0.01f; f < 1.0f; f += 0.01f) ... I'd rather see us cleverly turn it into: float f = 0.01f; for (int i = 1; i < 100; i += 1, f += 0.01f) ... (or whatever the transform would be, I've not spent lots of time thinking about the exact way to map this onto a synthetic "adding that value N times crosses threshold, so let's replace IV with a counter to N") -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160518/7814064d/attachment.html>