thr3ads.net - llvm dev - [llvm-dev] Vectorization with fast-math on irregular ISA sub-sets [Feb 2016]

If this information is useful, please help other people find it:
Share via:

James Molloy via llvm-dev

2016-Feb-15 08:34 UTC

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

Hi,
> James, is that a correct assessment?
Yes, it is also my belief that the only way ARMv7 NEON differs from IEEE754 is
lack of denormal support.

James
> On 11 Feb 2016, at 10:53, Renato Golin <renato.golin at linaro.org>
wrote:
> 
> Hal,
> 
> I had a read on the ARM ARM about VFP and SIMD FP semantics and my
> analysis is that NEON's only problem is the Flush-to-zero behaviour,
> which is non-compliant.
> 
> NEON deals with NaNs and Infs in the way specified by the standard and
> should not cause any concern to us. But we don't seem to have a flag
> specifically to denormals, so I think using the UnsafeMath is the
> safest option for now.
> 
> On 11 February 2016 at 01:15, Hal Finkel <hfinkel at anl.gov> wrote:
>>  nsz
>>  No Signed Zeros - Allow optimizations to treat the sign of a zero
argument or result as insignificant.
> 
> In both VFP and NEON, zero signs are significant. In NEON, the
> flush-to-zero's zero will have the same sign as the input denormal.
> 
> 
>>  nnan
>>  No NaNs - Allow optimizations to assume the arguments and result are
not NaN. Such optimizations are required to retain defined behavior over NaNs,
but the value of the result is undefined.
> 
> Both VFP and NEON treat NaNs as the standard requires, ie. [ NaN op ? ] =
NaN.
> 
> 
>>  ninf
>>  No Infs - Allow optimizations to assume the arguments and result are
not +/-Inf. Such optimizations are required to retain defined behavior over
+/-Inf, but the value of the result is undefined.
> 
> Same here. Operations with Inf generate Inf or NaNs on both units.
> 
> The flush-to-zero behaviour has an effect on both NaNs and Infs, since
> it happens before. So a denormal operation with an Inf in VFP will not
> generate a NaN, while in NEON it'll be flushed to zero first, thus
> generating NaNs.
> 
> James, is that a correct assessment?
> 
> cheers,
> --renato
>

Stephen Canon via llvm-dev

2016-Feb-15 14:22 UTC

head link

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

> On Feb 15, 2016, at 3:34 AM, James Molloy via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
>> James, is that a correct assessment?
> 
> Yes, it is also my belief that the only way ARMv7 NEON differs from IEEE754
is lack of denormal support.
- ARMv7 NEON ignores the rounding mode set in bits 23:22 of FPSCR and always
uses round to nearest.
- ARMv7 NEON ignores the trap enable bits (15:8) in FPSCR and always uses
default exception handling.

As with denormal support, the issue at hand is not so much that these differ
from IEEE 754 as it is that they differ from the behavior of the scalar (VFP)
arithmetic.

- Steve

Renato Golin via llvm-dev

2016-Feb-21 12:14 UTC

head link

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

On 15 February 2016 at 14:22, Stephen Canon <scanon at apple.com>
wrote:> - ARMv7 NEON ignores the rounding mode set in bits 23:22 of FPSCR and
always uses round to nearest.
> - ARMv7 NEON ignores the trap enable bits (15:8) in FPSCR and always uses
default exception handling.
If I read the manuals correctly, these are not strictly defined on
IEEE 754 to be one way or another, so these don't violate the
standard. The subnormal treatment does.

> As with denormal support, the issue at hand is not so much that these
differ from IEEE 754 as it is that they differ from the behavior of the scalar
(VFP) arithmetic.
This one of the practical consequences, yes, but of no relevance to
this work. Right now, I'm only trying to avoid surprises. If a user
has different results using -ffast-math, it's expected. Without, not
so much.

cheers,
--renato

llvm dev - Feb 2016 - Vectorization with fast-math on irregular ISA sub-sets

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets