On Fri, Mar 25, 2016 at 01:23:03PM +0000, Renato Golin via llvm-dev
wrote:> On 25 March 2016 at 04:11, Hal Finkel <hfinkel at anl.gov> wrote:
> > As I understand it, the fundamental property being addresses here is:
Are
> > the semantics of scalar FP math the same as vector FP math? TTI seems
like
> > a good place to expose that information. If the semantics are indeed
> > different, then the vectorizer would require fast-math flags in order
to
> > vectorize FP operations (similarly, gcc's man page says it
requires
> > -funsafe-math-optimizations for vectorization unless -mfpu=neon or
similar
> > is specified). In this context, this different-semantics query would
return
> > true if:
>
> The semantics is indeed different, VFP is IEEE-754 compliant while
> NEON is not. We don't want to stop the compiler from using VFP for FP
> math, but we want to be cautious when using NEON in the same way..
>
>
> > !(isDarwin OR ARMISA >= v8 OR fpMath == NEON)
> >
> > and then we need to teach people to use -mfpu=neon ;)
>
> So, there's the catch. In GCC, -mfpu=neon means to use NEON, which is
> not enabled by default, so the compiler assumes that the user is aware
> that NEON FP is not IEEE compliant. I don't think that's a safe
> assumption, but I also don't want to have a slightly different
> behaviour than GCC gratuitously.
Note that my discussion below relates to the AArch32 behaviour (the ARM
port of GCC, not the AArch64 port of GCC).
I can see why the text in the man page might be misleading, but let me quote
the part I think Hal was referring to here (with added emphasis):
If the selected floating-point hardware includes the NEON extension
(e.g. -mfpu=neon), note that floating-point operations are **not**
generated by GCC's auto-vectorization pass **unless**
-funsafe-math-optimizations is also specified. This is because
NEON hardware does not fully implement the IEEE 754 standard for
floating-point arithmetic (in particular denormal values are treated
as zero), so the use of NEON instructions may lead to a loss of
precision.
That is to say, GCC will only auto-vectorize floationg-point arithmetic
if both -mfpu=neon AND -funsafe-math-optimizations are given. -mfpu=neon
by itself does not imply that it is OK for GCC to generate non-IEEE
compliant code. The default is safe until explicitly told otherwise.
> Clang defaults to -mfpu=neon when we choose -mcpu=cortex-a* or
> -march=armv7a, so our current behaviour is on par with GCC. But I
> think that's a dangerous assumption.
If your current behaviour is to generate unsafe math when -mfpu=neon
is passed, then I agree this is dangerous. Again, this is *NOT* GCC's
behaviour.
> Furthermore, the only alternatives we have at the moment is to either
> use NEON for everything or nothing. It would be good to have an option
> to use NEON for integer arithmetic and VFP for FP if the user requires
> IEEE compliance.
In GCC, this is -mfpu=neon.
> > P.S. Looking at gcc's man page, gcc seems to use -mfpu for ARM and
-mfpmath
> > for x86. Do we use -mfpmath for both?
>
> We already support -mfpmath=vfp/neon in Clang, but it's bogus. My
> proposal is to make it count.
>
> The best way I can think of is to let -mfpmath=vfp *disable* only FP
> NEON and -mfpmath=neon *enable* only FP NEON, both orthogonal from
> integer math.
>
> Examples:
>
> Works today:
> -mfpu=soft -> Int (ALU), FP (LIB), no VFP/NEON instructions
> -mfpu=softfp -> Int (ALU), FP (LIB), VFP/NEON instructions allowed
> -mfpu=vfp -> Int (ALU), FP (VFP)
> -mfpu=neon -> Int (NEON), FP (NEON)
>
> Change proposed:
> -mfpmath=neon -mfpu=vfp -> Int (ALU), FP (NEON)
> -mfpmath=vfp -mfpu=neon -> Int (NEON), FP (VFP)
>
> This would be similar enough to GCC, and would allow the small number
> of users that care about IEEE-754 compliance to disable FP NEON on
> demand.
In GCC today:
-mfpu=vfp is the minimum floating-point instruction set supported, the
choice of which ABI you use (-mfloat-abi) is independent from the choice
of floating-point hardware that exists. -mfpu=soft and -mfpu=softfp are
rejected by GCC.
Starting with that:
-mfloat-abi=soft -> Generate library calls for all floating-point
operations, do not permit Neon operations.
-mfloat-abi=softfp -> Pass floating point arguments using the softfloat
abi (i.e. in core registers). Emit floating point instructions as
appropriate.
-mfloat-abi=hard -> Pass floating point arguments in VFP registers.
Emit floating point instructions as appropriate.
Independent of this, we have -mfpu:
-mfpu=neon -> Permit generation of Neon instructions (both integer and
floating point) where allowed by the language specification. Note that
this does not by itself allow the generation of non-IEEE compliant code.
And on top of that, -funsafe-math-optimizations to enable generating Neon
instructions for floating point operations.
For your set of use cases:
Int (ALU), FP (LIB), no VFP/NEON instructions
-mfloat-abi=soft
Int (ALU), FP (LIB), VFP/NEON instructions allowed
Impossible
Int (ALU), FP (VFP)
-mfloat-abi=hard or -mfloat-abi=softfp
+ -mfpu=vfp (or other non-neon FPU)
Int (NEON), FP (VFP)
-float-abi=hard or -mfloat-abi=softfp
+ -mfpu=neon (or greater)
Int (NEON), FP (NEON)
-float-abi=hard or -mfloat-abi=softfp
+ -mfpu=neon (or greater)
+ -funsafe-math-optimizations (or equivalent)
Int (ALU), FP (NEON)
Impossible (as far as I know).
Hope this helps,
James