thr3ads.net - llvm dev - [llvm-dev] Working on FP SCEV Analysis [May 2016]

If this information is useful, please help other people find it:
Share via:

Saito, Hideki via llvm-dev

2016-May-18 00:17 UTC

[llvm-dev] Working on FP SCEV Analysis

>What situations are they common in?
ICC Vectorizer made a paradigm shift a while ago.
If there aren’t a clear reason why something can’t be vectorized, we should try
our best to vectorize.
The rest is a performance modeling (and priority to implement) question, not a
capability question.
We believe this is a good paradigm to follow in a vectorizer development. It was
a big departure from
“vectorize when all things look nice to vectorizer”.

We shouldn’t give up vectorizing simply because programmer wrote a FP induction
code.(*)
Then, the next question is what’s the best solution for that problem, and
extending SCEV
appears to be one of the obvious directions to explore.

Thanks,
Hideki Saito
Intel Compilers and Languages
----------------------
(*) Quick (and dirty) overview of vectorization legality
Vectorization is a cross-iteration optimization. We need to have a solution for
cross-iteration dependences.
Forward dependencies are considered “safe for vectorization” since vector
execution order naturally satisfy them.
Backward dependencies are unsafe, unless vectorizer knows how to “break” them.
Induction is cyclic dependence
by nature and as such considered unsafe for vectorization, unless vectorizer
knows how to break them.
[Given a CFG that executes from top to bottom, forward dependence is the
downward data dependence edge.]

_____________________________________________
From: Demikhovsky, Elena
Sent: Tuesday, May 17, 2016 3:15 AM
To: Sanjoy Das <sanjoy at playingwithpointers.com>; Chandler Carruth
<chandlerc at google.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; Hal Finkel (hfinkel at anl.gov)
<hfinkel at anl.gov>; Adam Nemet (anemet at apple.com) <anemet at
apple.com>; Andrew Trick <atrick at apple.com>; mzolotukhin at
apple.com; Zaks, Ayal <ayal.zaks at intel.com>; Saito, Hideki
<hideki.saito at intel.com>
Subject: RE: [llvm-dev] Working on FP SCEV Analysis


Hi Sanjoy,

Please see my answers bellow:

  - Core motivation: why do we even care about optimizing floating
    point induction variables?  What situations are they common in?  Do
    programmers _expect_ compilers to optimize them well?  (I haven't
    worked on our vectorizers so pardon the possibly stupid question)
    in the example you gave, why do you need SCEV to analyze the
    increment to vectorize the loop (i.e how does it help)?  What are
    some other concrete cases you'll want to optimize?

[Demikhovsky, Elena] I gave an example of loop that can be vectorized in the
fast-math mode. ICC compiler vectorizes loops with *primary* and *secondary*
IVs:
This is the example for *primary* induction:

(1) for (float i = 0.5; i < 0.75; i+=0.05) {}   → i is a "primary"
IV

And for *secondary*:

(2) for (int i = 0, float x = start; i < N; i++, x += delta) {}     → x is a
"secondary" IV

Now I'm working only on (2)

  - I presume you'll want SCEV expressions for `sitofp` and `uitofp`.

[Demikhovsky, Elena] I'm adding these expressions, of course. They are
similar to "truncate" and "zext", in terms of
implementation.

    (The most important question:) With these in the game, what is the
    canonical representation of SCEV expressions that can be expressed
    as, say, both `sitofp(A + B)` and `sitofp(A) + sitofp(B)`?
[Demikhovsky, Elena] Meanwhile I have  (start + delta * sitofp(i)).
I don't know how far we can go with FP simplification and under what flags.
The first implementation does not assume that sitofp(A + B) is equal to
sitofp(A) + sitofp(B)


    Will we have a way to mark expressions (like we have `nsw` and
    `nuw` for `sext` and `zext`) which we can distribute `sitofp` and
    `uitofp` over?
[Demikhovsky, Elena] I assume that sitofp and uitofp should be 2 different
operations.

    Same questions for `fptosi` and `fptoui`.
[Demikhovsky, Elena] the same answer as above, because I don’t want to combine
these operations

  - How will you partition the logic between floating and integer
    expressions in SCEV-land?  Will you have, say, `SCEVAddExpr` do
    different things based on type, or will you split it into
    `SCEVIAddExpr` and `SCEVFAddExpr`? [0]

[Demikhovsky, Elena] Yes, I’m introducing SCEVFAddExpr and SCEVFMulExpr - (start
+ delta * sitofp(i))

    * There are likely to be similarities too -- e.g. the "inductive"
      or "control flow" aspect of `SCEVAddRecExpr` is likely to be
      common between floating point add recurrences[1], and integer add
      recurrences; and part of figuring out the partitioning is also
      figuring out how to re-use these bits of logic.
[Demikhovsky, Elena] I’m adding SCEVFAddRecExpr to describe the recurrence of FP
IV


[0]: I'll prefer the latter since e.g. integer addition is associative, but
floating point addition isn't; and it is better to force programmers to
handle the two operations differently.

[1]: For instance, things like this:
https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/ScalarEvolution.cpp#L7564
are likely to stay common between floating point and integer add recs.

-- Sanjoy

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160518/f406e553/attachment.html>

Daniel Berlin via llvm-dev

2016-May-18 01:14 UTC

head link

[llvm-dev] Working on FP SCEV Analysis

On Tue, May 17, 2016 at 5:17 PM, Saito, Hideki via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> >What situations are they common in?
>
> ICC Vectorizer made a paradigm shift a while ago.
> If there aren’t a clear reason why something can’t be vectorized, we
> should try our best to vectorize.
> The rest is a performance modeling (and priority to implement) question,
> not a capability question.
> We believe this is a good paradigm to follow in a vectorizer development.
>
In some sense, yes, but not at all possible costs.
There needs to be some actual motivating case to make it worth even writing
the code for.

> It was a big departure from
> “vectorize when all things look nice to vectorizer”.
>
These are not diametrically opposed.

I mean, it may be not worth the cost of mainintaing the *compiler code* to
do o it.
This isn't the same as "when things look nice to the vectorizer",
it's more
"we're willing to vectorize whatever we can, as long as someone is
going to
actually use it".

Nobody has here provided a useful set of cases/applications/etc that
suggests it should be done. I'm not saying there are none, i'm saying,
literally, nobody has motivated this use case yet :)

>
> We shouldn’t give up vectorizing simply because programmer wrote a FP
> induction code.(*)
>
We shouldn't add code to the compiler just because we can.

I would similarly be against, for example, vectorizing loops with binary
coded decimal induction variables, and adding an entire BCD SCEV
infrastructure, without some motivating case *somewhere*.

So i suggest y'all start from: "Here are the cases we care about making
faster, and why we care about making them faster".

In compilers, building infrastructure first, then finding customers works a
lot worse than figuring out what customers want, and then building
infrastructure for them :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160517/d6117f98/attachment.html>

Gerolf Hoflehner via llvm-dev

2016-May-18 02:07 UTC

head link

[llvm-dev] Working on FP SCEV Analysis

> On May 17, 2016, at 6:14 PM, Daniel Berlin via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> 
> On Tue, May 17, 2016 at 5:17 PM, Saito, Hideki via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>  
> >What situations are they common in?
>  
> ICC Vectorizer made a paradigm shift a while ago.
> If there aren’t a clear reason why something can’t be vectorized, we should
try our best to vectorize.
> The rest is a performance modeling (and priority to implement) question,
not a capability question.
> We believe this is a good paradigm to follow in a vectorizer development.
> 
> In some sense, yes, but not at all possible costs.
> There needs to be some actual motivating case to make it worth even writing
the code for.
This paradigm can have far reaching consequences. The vectorizer is the
performance cow to milk at the IR level. So under that paradigm - followed
religiously - one would plug in any loop transformation, polyhedral or
non-polyhedral etc cost models etc to morph code vectorizable. And when that is
not sufficient one would probably start adding large numbers of run-time checks,
multi-versioned code etc. This might be a good paradigm to follow from the peak
performance angle, but not so from the compile-time or code size angle. It seems
best to pursue a paradigm like this with a peak performance library rather than
mainstream llvm.>  
> It was a big departure from
> “vectorize when all things look nice to vectorizer”.
> 
> These are not diametrically opposed.
> 
> I mean, it may be not worth the cost of mainintaing the *compiler code* to
do o it.
> This isn't the same as "when things look nice to the
vectorizer", it's more "we're willing to vectorize whatever we
can, as long as someone is going to actually use it".
> 
> Nobody has here provided a useful set of cases/applications/etc that
suggests it should be done. I'm not saying there are none, i'm saying,
literally, nobody has motivated this use case yet :)
> 
>  
>  
> We shouldn’t give up vectorizing simply because programmer wrote a FP
induction code.(*)
> 
> We shouldn't add code to the compiler just because we can.
> 
> I would similarly be against, for example, vectorizing loops with binary
coded decimal induction variables, and adding an entire BCD SCEV infrastructure,
without some motivating case *somewhere*.
> 
> So i suggest y'all start from: "Here are the cases we care about
making faster, and why we care about making them faster”.+1  I think a lot of people would be very interested in non-toy examples that
show big performance differences between icc and clang. That would also allow to
dig deeper into questions like is it “vectorizer capability, dependence analysis
and/or supporting transformations and/or ??? ”  to explain the gap.
> 
> In compilers, building infrastructure first, then finding customers works a
lot worse than figuring out what customers want, and then building
infrastructure for them :)
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160517/605f865f/attachment-0001.html>

llvm dev - May 2016 - Working on FP SCEV Analysis

[llvm-dev] Working on FP SCEV Analysis

[llvm-dev] Working on FP SCEV Analysis

[llvm-dev] Working on FP SCEV Analysis