thr3ads.net - llvm dev - [llvm-dev] Loop vectorization and unsafe floating point math [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Björn Pettersson A via llvm-dev

2020-Jun-24 15:21 UTC

[llvm-dev] Loop vectorization and unsafe floating point math

Hi llvm-dev!

We are doing some fuzzy testing using C program generators,
and one question that came up when generating a program with
both floating point arithmetic and loop pragmas was;
Is the loop vectorizer really allowed to vectorize a loop when
it can't prove that it is safe to reorder fp math, even if
there is a loop pragma that hints about a preferred width.


When reading here

 
http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations

it says " Loop hints can be specified before any loop and
will be ignored if the optimization is not safe to apply.".


But given this example (see also https://godbolt.org/z/fzRHsp )

//------------------------------------------------------------------
//
//  clang -O3 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize

#include <stdio.h>
#include <stdint.h>

double v_1 = -902.30847021;
double v_2 = -902.30847021;

int main()
{

  #pragma clang loop vectorize_width(2) unroll(disable)
  for (int i = 0; i < 16; ++i) {
    v_1 = v_1 * 430.33975544;
  }

  #pragma clang loop unroll(disable)
  for (int i = 0; i < 16; ++i) {
    v_2 = v_2 * 430.33975544;
  }

  printf("v_1: %f\n", v_1);
  printf("v_2: %f\n", v_2);
}

//
//------------------------------------------------------------------


we get these remarks:

  <source>:11:3: remark: the cost-model indicates that interleaving is not
beneficial [-Rpass-analysis=loop-vectorize]
  <source>:11:3: remark: vectorized loop (vectorization width: 2,
interleaved count: 1) [-Rpass=loop-vectorize]
  <source>:17:15: remark: loop not vectorized: cannot prove it is safe to
reorder floating-point operations; allow reordering by specifying '#pragma
clang loop vectorize(enable)'

and the result:

  v_1: -1248356232174473978185211891975727638059679744.000000
  v_2: -1248356232174473819728886863447052450971779072.000000


So the second loop isn't vectorized due to unsafe reordering of fp math.
But the first loop is vectorized, even if the optimization isn't safe to
apply.
And this is also reflected in that we get different result for v_1 and v_2.


Is this correct behavior? Should the pragma result in vectorization here?

Note that we get vectorization even with "vectorize_width(3)". So
despite
the fact that LV ignores the bad vectorization factor, it consider vectorization
to be "forced".

(I also wonder if "forced" is bad terminology here, if the pragma
should be considered as a hint.)

Regards,
Björn Pettersson

Hal Finkel via llvm-dev

2020-Jun-24 22:26 UTC

head link

[llvm-dev] Loop vectorization and unsafe floating point math

On 6/24/20 10:21 AM, Björn Pettersson A via llvm-dev
wrote:> Hi llvm-dev!
>
> We are doing some fuzzy testing using C program generators,
> and one question that came up when generating a program with
> both floating point arithmetic and loop pragmas was;
> Is the loop vectorizer really allowed to vectorize a loop when
> it can't prove that it is safe to reorder fp math, even if
> there is a loop pragma that hints about a preferred width.
>
>
> When reading here
>
>   
http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations
>
> it says " Loop hints can be specified before any loop and
> will be ignored if the optimization is not safe to apply.".

This is a good question. The statement above was written with memory 
dependence checks in mind. In this case, the lack of safety comes from 
the floating-point reassociation. Part of the problem here is the 
translation of the behavior of the compiler to the language in the 
documentation. When we say that the pragma "will be ignored", we
don't
literally mean that the compiler necessarily ignores it *statically*, we 
mean that the effect of the vectorization might be ignored *dynamically* 
in cases where vectorization might be unsafe. We do this, as you likely 
know, by multiversioning the loop, and using a memory-dependence check 
to select, during program execution, which to run.

Regarding the effect of reassociation, I don't know of any efficient way 
that we might check ahead of time whether the reassociation would 
produce a different runtime result from the scalar loop. We're relying 
on the user's directive to tell the compiler that the reassociation is 
safe. An alternative design would require in the pragma some explicit 
acknowledgement of the reduction (e.g., what happens, at least in the 
specification, for OpenMP SIMD). We would want a different notation from 
the existing vectorize(assume_safety) used to disable the dependence 
checks. I'm highly sympathetic to your use case, in part because I do 
the same thing, and in part because I also work on autotuning systems 
that need the same property. However, in this case, our systems need to 
keep track of the presence of reductions. I think it's reasonable to say 
that the pragma is working as designed and we should update the 
documentation. If there's consensus here to require some kind of 
reduction acknowledgement, I'm fine with that too (although we need to 
realize that's going to cause significant regressions for existing users).

  -Hal

>
>
> But given this example (see also https://godbolt.org/z/fzRHsp )
>
> //------------------------------------------------------------------
> //
> //  clang -O3 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize
>
> #include <stdio.h>
> #include <stdint.h>
>
> double v_1 = -902.30847021;
> double v_2 = -902.30847021;
>
> int main()
> {
>
>    #pragma clang loop vectorize_width(2) unroll(disable)
>    for (int i = 0; i < 16; ++i) {
>      v_1 = v_1 * 430.33975544;
>    }
>
>    #pragma clang loop unroll(disable)
>    for (int i = 0; i < 16; ++i) {
>      v_2 = v_2 * 430.33975544;
>    }
>
>    printf("v_1: %f\n", v_1);
>    printf("v_2: %f\n", v_2);
> }
>
> //
> //------------------------------------------------------------------
>
>
> we get these remarks:
>
>    <source>:11:3: remark: the cost-model indicates that interleaving
is not beneficial [-Rpass-analysis=loop-vectorize]
>    <source>:11:3: remark: vectorized loop (vectorization width: 2,
interleaved count: 1) [-Rpass=loop-vectorize]
>    <source>:17:15: remark: loop not vectorized: cannot prove it is
safe to reorder floating-point operations; allow reordering by specifying
'#pragma clang loop vectorize(enable)'
>
> and the result:
>
>    v_1: -1248356232174473978185211891975727638059679744.000000
>    v_2: -1248356232174473819728886863447052450971779072.000000
>
>
> So the second loop isn't vectorized due to unsafe reordering of fp
math.
> But the first loop is vectorized, even if the optimization isn't safe
to apply.
> And this is also reflected in that we get different result for v_1 and v_2.
>
>
> Is this correct behavior? Should the pragma result in vectorization here?
>
> Note that we get vectorization even with "vectorize_width(3)". So
despite
> the fact that LV ignores the bad vectorization factor, it consider
vectorization
> to be "forced".
>
> (I also wonder if "forced" is bad terminology here, if the pragma
should be considered as a hint.)
>
> Regards,
> Björn Pettersson
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Björn Pettersson A via llvm-dev

2020-Jun-25 12:28 UTC

head link

[llvm-dev] Loop vectorization and unsafe floating point math

> -----Original Message-----
> From: Hal Finkel <hfinkel at anl.gov>
> Sent: den 25 juni 2020 00:27
> To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com>; llvm-dev
<llvm-
> dev at lists.llvm.org>
> Subject: Re: [llvm-dev] Loop vectorization and unsafe floating point math
> 
> 
> On 6/24/20 10:21 AM, Björn Pettersson A via llvm-dev wrote:
> > Hi llvm-dev!
> >
> > We are doing some fuzzy testing using C program generators,
> > and one question that came up when generating a program with
> > both floating point arithmetic and loop pragmas was;
> > Is the loop vectorizer really allowed to vectorize a loop when
> > it can't prove that it is safe to reorder fp math, even if
> > there is a loop pragma that hints about a preferred width.
> >
> >
> > When reading here
> >
> >    https://protect2.fireeye.com/v1/url?k=6176c00e-3fd6024b-61768095-
> 8692dc8284cb-52ab55cbccb6bb5c&q=1&e=f2b4f1fd-db65-4d37-b316-
>
ae4db861e5e1&u=http%3A%2F%2Fclang.llvm.org%2Fdocs%2FLanguageExtensions.ht
> ml%23extensions-for-loop-hint-optimizations
> >
> > it says " Loop hints can be specified before any loop and
> > will be ignored if the optimization is not safe to apply.".
> 
> 
> This is a good question. The statement above was written with memory
> dependence checks in mind. In this case, the lack of safety comes from
> the floating-point reassociation. Part of the problem here is the
> translation of the behavior of the compiler to the language in the
> documentation. When we say that the pragma "will be ignored", we
don't
> literally mean that the compiler necessarily ignores it *statically*, we
> mean that the effect of the vectorization might be ignored *dynamically*
> in cases where vectorization might be unsafe. We do this, as you likely
> know, by multiversioning the loop, and using a memory-dependence check
> to select, during program execution, which to run.
Sure, but it won't use a vectorization factor of 543 if that can't be
applied either (it will see vectorization_width(543) as a hint and use
a different one if it can't be applied). So in some sense the pragma
is a hint (and the documentation describes them as "loop hints").
> 
> Regarding the effect of reassociation, I don't know of any efficient
way
> that we might check ahead of time whether the reassociation would
> produce a different runtime result from the scalar loop. We're relying
> on the user's directive to tell the compiler that the reassociation is
> safe. An alternative design would require in the pragma some explicit
> acknowledgement of the reduction (e.g., what happens, at least in the
> specification, for OpenMP SIMD). We would want a different notation from
> the existing vectorize(assume_safety) used to disable the dependence
> checks. I'm highly sympathetic to your use case, in part because I do
> the same thing, and in part because I also work on autotuning systems
> that need the same property. However, in this case, our systems need to
> keep track of the presence of reductions. I think it's reasonable to
say
> that the pragma is working as designed and we should update the
> documentation. If there's consensus here to require some kind of
> reduction acknowledgement, I'm fine with that too (although we need to
> realize that's going to cause significant regressions for existing
> users).
Maybe it is unlikely that someone wants to vectorize a loop with
floating point math unless using -ffast-math. But the loop vectorizer
is not auto-vectorizing the code unless using -ffast-math in this
case. So the legality checks are there (maybe it it pessimistic,
but nevertheless it is checked).

The problem I see is that the loop hint pragmas got a side-effect
that it turns on -ffast-math for the loop. Either we need to
document that, or one would expect that the whole program would
be compiler with -ffast-math.

I did not explicitly mention -O0 in my earlier examples, but doesn't
it feel weird that when compiling a program with vectorization hints,
with -fno-fast-math, I might get different results when executing the
program depending on if I used -O0 or -O3 when compiling.
That is actually what our test-framework were doing (comparing result
when using "-O0 -fno-fast-math " and "-O3 -fno-fast-math").
and it
ended up with failures due to loop pragmas being present in the code.


I also noticed that there are some TTI-hooks that seem to be a bit
related to this. But since both LoopVectorizeHints::allowReordering()
and LoopVectorizeHints::isPotentiallyUnsafe() are out-ruled by the
FK_Enabled hint it doesn't matter what the TTI hooks are saying.

> 
>   -Hal
> 
> 
> >
> >
> > But given this example (see also
> https://protect2.fireeye.com/v1/url?k=b7c27cca-e962be8f-b7c23c51-
> 8692dc8284cb-5b196aecb3293f6e&q=1&e=f2b4f1fd-db65-4d37-b316-
> ae4db861e5e1&u=https%3A%2F%2Fgodbolt.org%2Fz%2FfzRHsp )
> >
> > //------------------------------------------------------------------
> > //
> > //  clang -O3 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize
> >
> > #include <stdio.h>
> > #include <stdint.h>
> >
> > double v_1 = -902.30847021;
> > double v_2 = -902.30847021;
> >
> > int main()
> > {
> >
> >    #pragma clang loop vectorize_width(2) unroll(disable)
> >    for (int i = 0; i < 16; ++i) {
> >      v_1 = v_1 * 430.33975544;
> >    }
> >
> >    #pragma clang loop unroll(disable)
> >    for (int i = 0; i < 16; ++i) {
> >      v_2 = v_2 * 430.33975544;
> >    }
> >
> >    printf("v_1: %f\n", v_1);
> >    printf("v_2: %f\n", v_2);
> > }
> >
> > //
> > //------------------------------------------------------------------
> >
> >
> > we get these remarks:
> >
> >    <source>:11:3: remark: the cost-model indicates that
interleaving is
> not beneficial [-Rpass-analysis=loop-vectorize]
> >    <source>:11:3: remark: vectorized loop (vectorization width:
2,
> interleaved count: 1) [-Rpass=loop-vectorize]
> >    <source>:17:15: remark: loop not vectorized: cannot prove it
is safe
> to reorder floating-point operations; allow reordering by specifying
> '#pragma clang loop vectorize(enable)'
> >
> > and the result:
> >
> >    v_1: -1248356232174473978185211891975727638059679744.000000
> >    v_2: -1248356232174473819728886863447052450971779072.000000
> >
> >
> > So the second loop isn't vectorized due to unsafe reordering of fp
> math.
> > But the first loop is vectorized, even if the optimization isn't
safe
> to apply.
> > And this is also reflected in that we get different result for v_1 and
> v_2.
> >
> >
> > Is this correct behavior? Should the pragma result in vectorization
> here?
> >
> > Note that we get vectorization even with
"vectorize_width(3)". So
> despite
> > the fact that LV ignores the bad vectorization factor, it consider
> vectorization
> > to be "forced".
> >
> > (I also wonder if "forced" is bad terminology here, if the
pragma
> should be considered as a hint.)
> >
> > Regards,
> > Björn Pettersson
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://protect2.fireeye.com/v1/url?k=2ec8d1ef-706813aa-2ec89174-
> 8692dc8284cb-73c51f5230e924ed&q=1&e=f2b4f1fd-db65-4d37-b316-
> ae4db861e5e1&u=https%3A%2F%2Flists.llvm.org%2Fcgi-
> bin%2Fmailman%2Flistinfo%2Fllvm-dev
> 
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory

llvm dev - Jun 2020 - Loop vectorization and unsafe floating point math

[llvm-dev] Loop vectorization and unsafe floating point math

[llvm-dev] Loop vectorization and unsafe floating point math

[llvm-dev] Loop vectorization and unsafe floating point math