thr3ads.net - llvm dev - [llvm-dev] Vectorization with fast-math on irregular ISA sub-sets [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Martin J. O'Riordan via llvm-dev

2016-Feb-11 11:23 UTC

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

Our processor also has some issues regarding the handling of denormals - scalar
and vector - and we ran into a related problem only a few days ago.

The v3.8 compiler has done a lot of good work on optimisations for
floating-point math, but ironically one of them broke our implementation of
'nextafterf'.  The desired code fragment (FP32) is:

  float xAbs = fabsf(x);

since we know our instruction for this does not handle denormals and the
algorithm is sensitive to correct denormals, the code was written to avoid this
issue as follows:

  float xAbs = __builtin_astype(__builtin_astype(x, unsigned) & 0x7FFFFFFF,
float);

But the v3.8 FP optimiser now recognises this pattern and replaces it with an
ISD::FABS node and broke our workaround :-)  It's a great optimisation and I
have no problem with its correctness, but I was thinking that perhaps I might
see where I should extend the target information interface to allow a target to
say that it does not support denormals so that this and possibly other
optimisations could be suppressed in a target dependent way.

Overall the new FP32 optimisation patterns appear to have yielded a small but
not insignificant performance advantage over v3.7.1, though it is still early
days for my complete measurements.

	MartinO

-----Original Message-----
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Renato
Golin via llvm-dev
Sent: 11 February 2016 10:53
To: Hal Finkel <hfinkel at anl.gov>
Cc: LLVM Dev <llvm-dev at lists.llvm.org>; nd <nd at arm.com>
Subject: Re: [llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

Hal,

I had a read on the ARM ARM about VFP and SIMD FP semantics and my analysis is
that NEON's only problem is the Flush-to-zero behaviour, which is
non-compliant.

NEON deals with NaNs and Infs in the way specified by the standard and should
not cause any concern to us. But we don't seem to have a flag specifically
to denormals, so I think using the UnsafeMath is the safest option for now.

On 11 February 2016 at 01:15, Hal Finkel <hfinkel at anl.gov>
wrote:>   nsz
>   No Signed Zeros - Allow optimizations to treat the sign of a zero
argument or result as insignificant.
In both VFP and NEON, zero signs are significant. In NEON, the
flush-to-zero's zero will have the same sign as the input denormal.

>   nnan
>   No NaNs - Allow optimizations to assume the arguments and result are not
NaN. Such optimizations are required to retain defined behavior over NaNs, but
the value of the result is undefined.
Both VFP and NEON treat NaNs as the standard requires, ie. [ NaN op ? ] = NaN.

>   ninf
>   No Infs - Allow optimizations to assume the arguments and result are not
+/-Inf. Such optimizations are required to retain defined behavior over +/-Inf,
but the value of the result is undefined.
Same here. Operations with Inf generate Inf or NaNs on both units.

The flush-to-zero behaviour has an effect on both NaNs and Infs, since it
happens before. So a denormal operation with an Inf in VFP will not generate a
NaN, while in NEON it'll be flushed to zero first, thus generating NaNs.

James, is that a correct assessment?

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Renato Golin via llvm-dev

2016-Feb-11 11:49 UTC

head link

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

On 11 February 2016 at 11:23, Martin J. O'Riordan
<martin.oriordan at movidius.com> wrote:> But the v3.8 FP optimiser now recognises this pattern and replaces it with
an ISD::FABS node and broke our workaround :-) It's a great optimisation
and I have no problem with its correctness, but I was thinking that perhaps I
might see where I should extend the target information interface to allow a
target to say that it does not support denormals so that this and possibly other
optimisations could be suppressed in a target dependent way.
Hi Martin,

So, I have a patch that right now is a big hammer:
* Targets can have SIMD IEEE compliant or not (instead of fine
grained choosing which part).
* Any FP arithmetic / cast operation with UnsafeAlgebra will trigger
a "potentially unsafe" flag in the vectorizer.
* In the end, if the SIMD unit is not IEEE compliant and there is any
potentially unsafe operations, avoid that loop.

I just need to create some more tests to submit.

The problems I can see in your case are:
* Both scalar and vector units have problems with denormals, so my
isSIMDIEEE() is not enough.
- To fix this, you can add isVFPIEEE(), but we may find a better solution?
* Your optimisation is basic-block based, not loop based, so we'd
have to add the same check to SLP.
- SLP deals with both SIMD and VFP units, so we would need the
additional flag anyway.
- This will be my next step.
* Other passes already have access to the TTI, so they can use those
flags to avoid strength reduction, combine, etc. in those cases.

I don't think we need to create a fine grained solution right now,
since we don't have examples with different behaviour.

Would that work for you?

cheers,
--renato

Martin J. O'Riordan via llvm-dev

2016-Feb-11 12:24 UTC

head link

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

This is fine Renato.

I worked around the local issue by using an instruction intrinsics so that the
pattern would be invisible to this optimisation, and my thoughts for raising
this to the TargetTransformInfo level are still not well formed.  I was actually
quite impressed with the new optimisation, it cleverly handled the situation
perfectly.

A coarse grained solution is fine, and it is always possible to handle this in
custom lowering for ISD::FABS which could check a target specific flag to see if
it should do the "safe thing" or the "fast thing".

Thanks for the feedback,

	MartinO

-----Original Message-----
From: Renato Golin [mailto:renato.golin at linaro.org] 
Sent: 11 February 2016 11:50
To: Martin.ORiordan at movidius.com
Cc: Hal Finkel <hfinkel at anl.gov>; LLVM Developers <llvm-dev at
lists.llvm.org>
Subject: Re: [llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

On 11 February 2016 at 11:23, Martin J. O'Riordan <martin.oriordan at
movidius.com> wrote:> But the v3.8 FP optimiser now recognises this pattern and replaces it with
an ISD::FABS node and broke our workaround :-)  It's a great optimisation
and I have no problem with its correctness, but I was thinking that perhaps I
might see where I should extend the target information interface to allow a
target to say that it does not support denormals so that this and possibly other
optimisations could be suppressed in a target dependent way.
Hi Martin,

So, I have a patch that right now is a big hammer:
 * Targets can have SIMD IEEE compliant or not (instead of fine grained choosing
which part).
 * Any FP arithmetic / cast operation with UnsafeAlgebra will trigger a
"potentially unsafe" flag in the vectorizer.
 * In the end, if the SIMD unit is not IEEE compliant and there is any
potentially unsafe operations, avoid that loop.

I just need to create some more tests to submit.

The problems I can see in your case are:
 * Both scalar and vector units have problems with denormals, so my
isSIMDIEEE() is not enough.
   - To fix this, you can add isVFPIEEE(), but we may find a better solution?
 * Your optimisation is basic-block based, not loop based, so we'd have to
add the same check to SLP.
   - SLP deals with both SIMD and VFP units, so we would need the additional
flag anyway.
   - This will be my next step.
 * Other passes already have access to the TTI, so they can use those flags to
avoid strength reduction, combine, etc. in those cases.

I don't think we need to create a fine grained solution right now, since we
don't have examples with different behaviour.

Would that work for you?

cheers,
--renato

llvm dev - Feb 2016 - Vectorization with fast-math on irregular ISA sub-sets

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets