>What situations are they common in?ICC Vectorizer made a paradigm shift a while ago. If there aren’t a clear reason why something can’t be vectorized, we should try our best to vectorize. The rest is a performance modeling (and priority to implement) question, not a capability question. We believe this is a good paradigm to follow in a vectorizer development. It was a big departure from “vectorize when all things look nice to vectorizer”. We shouldn’t give up vectorizing simply because programmer wrote a FP induction code.(*) Then, the next question is what’s the best solution for that problem, and extending SCEV appears to be one of the obvious directions to explore. Thanks, Hideki Saito Intel Compilers and Languages ---------------------- (*) Quick (and dirty) overview of vectorization legality Vectorization is a cross-iteration optimization. We need to have a solution for cross-iteration dependences. Forward dependencies are considered “safe for vectorization” since vector execution order naturally satisfy them. Backward dependencies are unsafe, unless vectorizer knows how to “break” them. Induction is cyclic dependence by nature and as such considered unsafe for vectorization, unless vectorizer knows how to break them. [Given a CFG that executes from top to bottom, forward dependence is the downward data dependence edge.] _____________________________________________ From: Demikhovsky, Elena Sent: Tuesday, May 17, 2016 3:15 AM To: Sanjoy Das <sanjoy at playingwithpointers.com>; Chandler Carruth <chandlerc at google.com> Cc: llvm-dev <llvm-dev at lists.llvm.org>; Hal Finkel (hfinkel at anl.gov) <hfinkel at anl.gov>; Adam Nemet (anemet at apple.com) <anemet at apple.com>; Andrew Trick <atrick at apple.com>; mzolotukhin at apple.com; Zaks, Ayal <ayal.zaks at intel.com>; Saito, Hideki <hideki.saito at intel.com> Subject: RE: [llvm-dev] Working on FP SCEV Analysis Hi Sanjoy, Please see my answers bellow: - Core motivation: why do we even care about optimizing floating point induction variables? What situations are they common in? Do programmers _expect_ compilers to optimize them well? (I haven't worked on our vectorizers so pardon the possibly stupid question) in the example you gave, why do you need SCEV to analyze the increment to vectorize the loop (i.e how does it help)? What are some other concrete cases you'll want to optimize? [Demikhovsky, Elena] I gave an example of loop that can be vectorized in the fast-math mode. ICC compiler vectorizes loops with *primary* and *secondary* IVs: This is the example for *primary* induction: (1) for (float i = 0.5; i < 0.75; i+=0.05) {} → i is a "primary" IV And for *secondary*: (2) for (int i = 0, float x = start; i < N; i++, x += delta) {} → x is a "secondary" IV Now I'm working only on (2) - I presume you'll want SCEV expressions for `sitofp` and `uitofp`. [Demikhovsky, Elena] I'm adding these expressions, of course. They are similar to "truncate" and "zext", in terms of implementation. (The most important question:) With these in the game, what is the canonical representation of SCEV expressions that can be expressed as, say, both `sitofp(A + B)` and `sitofp(A) + sitofp(B)`? [Demikhovsky, Elena] Meanwhile I have (start + delta * sitofp(i)). I don't know how far we can go with FP simplification and under what flags. The first implementation does not assume that sitofp(A + B) is equal to sitofp(A) + sitofp(B) Will we have a way to mark expressions (like we have `nsw` and `nuw` for `sext` and `zext`) which we can distribute `sitofp` and `uitofp` over? [Demikhovsky, Elena] I assume that sitofp and uitofp should be 2 different operations. Same questions for `fptosi` and `fptoui`. [Demikhovsky, Elena] the same answer as above, because I don’t want to combine these operations - How will you partition the logic between floating and integer expressions in SCEV-land? Will you have, say, `SCEVAddExpr` do different things based on type, or will you split it into `SCEVIAddExpr` and `SCEVFAddExpr`? [0] [Demikhovsky, Elena] Yes, I’m introducing SCEVFAddExpr and SCEVFMulExpr - (start + delta * sitofp(i)) * There are likely to be similarities too -- e.g. the "inductive" or "control flow" aspect of `SCEVAddRecExpr` is likely to be common between floating point add recurrences[1], and integer add recurrences; and part of figuring out the partitioning is also figuring out how to re-use these bits of logic. [Demikhovsky, Elena] I’m adding SCEVFAddRecExpr to describe the recurrence of FP IV [0]: I'll prefer the latter since e.g. integer addition is associative, but floating point addition isn't; and it is better to force programmers to handle the two operations differently. [1]: For instance, things like this: https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/ScalarEvolution.cpp#L7564 are likely to stay common between floating point and integer add recs. -- Sanjoy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160518/f406e553/attachment.html>
On Tue, May 17, 2016 at 5:17 PM, Saito, Hideki via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > >What situations are they common in? > > ICC Vectorizer made a paradigm shift a while ago. > If there aren’t a clear reason why something can’t be vectorized, we > should try our best to vectorize. > The rest is a performance modeling (and priority to implement) question, > not a capability question. > We believe this is a good paradigm to follow in a vectorizer development. >In some sense, yes, but not at all possible costs. There needs to be some actual motivating case to make it worth even writing the code for.> It was a big departure from > “vectorize when all things look nice to vectorizer”. >These are not diametrically opposed. I mean, it may be not worth the cost of mainintaing the *compiler code* to do o it. This isn't the same as "when things look nice to the vectorizer", it's more "we're willing to vectorize whatever we can, as long as someone is going to actually use it". Nobody has here provided a useful set of cases/applications/etc that suggests it should be done. I'm not saying there are none, i'm saying, literally, nobody has motivated this use case yet :)> > We shouldn’t give up vectorizing simply because programmer wrote a FP > induction code.(*) >We shouldn't add code to the compiler just because we can. I would similarly be against, for example, vectorizing loops with binary coded decimal induction variables, and adding an entire BCD SCEV infrastructure, without some motivating case *somewhere*. So i suggest y'all start from: "Here are the cases we care about making faster, and why we care about making them faster". In compilers, building infrastructure first, then finding customers works a lot worse than figuring out what customers want, and then building infrastructure for them :) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160517/d6117f98/attachment.html>
Gerolf Hoflehner via llvm-dev
2016-May-18 02:07 UTC
[llvm-dev] Working on FP SCEV Analysis
> On May 17, 2016, at 6:14 PM, Daniel Berlin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > On Tue, May 17, 2016 at 5:17 PM, Saito, Hideki via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > >What situations are they common in? > > ICC Vectorizer made a paradigm shift a while ago. > If there aren’t a clear reason why something can’t be vectorized, we should try our best to vectorize. > The rest is a performance modeling (and priority to implement) question, not a capability question. > We believe this is a good paradigm to follow in a vectorizer development. > > In some sense, yes, but not at all possible costs. > There needs to be some actual motivating case to make it worth even writing the code for.This paradigm can have far reaching consequences. The vectorizer is the performance cow to milk at the IR level. So under that paradigm - followed religiously - one would plug in any loop transformation, polyhedral or non-polyhedral etc cost models etc to morph code vectorizable. And when that is not sufficient one would probably start adding large numbers of run-time checks, multi-versioned code etc. This might be a good paradigm to follow from the peak performance angle, but not so from the compile-time or code size angle. It seems best to pursue a paradigm like this with a peak performance library rather than mainstream llvm.> > It was a big departure from > “vectorize when all things look nice to vectorizer”. > > These are not diametrically opposed. > > I mean, it may be not worth the cost of mainintaing the *compiler code* to do o it. > This isn't the same as "when things look nice to the vectorizer", it's more "we're willing to vectorize whatever we can, as long as someone is going to actually use it". > > Nobody has here provided a useful set of cases/applications/etc that suggests it should be done. I'm not saying there are none, i'm saying, literally, nobody has motivated this use case yet :)> > > > We shouldn’t give up vectorizing simply because programmer wrote a FP induction code.(*) > > We shouldn't add code to the compiler just because we can. > > I would similarly be against, for example, vectorizing loops with binary coded decimal induction variables, and adding an entire BCD SCEV infrastructure, without some motivating case *somewhere*. > > So i suggest y'all start from: "Here are the cases we care about making faster, and why we care about making them faster”.+1 I think a lot of people would be very interested in non-toy examples that show big performance differences between icc and clang. That would also allow to dig deeper into questions like is it “vectorizer capability, dependence analysis and/or supporting transformations and/or ??? ” to explain the gap.> > In compilers, building infrastructure first, then finding customers works a lot worse than figuring out what customers want, and then building infrastructure for them :) > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160517/605f865f/attachment-0001.html>