On 25 March 2016 at 04:11, Hal Finkel <hfinkel at anl.gov> wrote:> As I understand it, the fundamental property being addresses here is: Are the semantics of scalar FP math the same as vector FP math? TTI seems like a good place to expose that information. If the semantics are indeed different, then the vectorizer would require fast-math flags in order to vectorize FP operations (similarly, gcc's man page says it requires -funsafe-math-optimizations for vectorization unless -mfpu=neon or similar is specified). In this context, this different-semantics query would return true if:The semantics is indeed different, VFP is IEEE-754 compliant while NEON is not. We don't want to stop the compiler from using VFP for FP math, but we want to be cautious when using NEON in the same way..> !(isDarwin OR ARMISA >= v8 OR fpMath == NEON) > > and then we need to teach people to use -mfpu=neon ;)So, there's the catch. In GCC, -mfpu=neon means to use NEON, which is not enabled by default, so the compiler assumes that the user is aware that NEON FP is not IEEE compliant. I don't think that's a safe assumption, but I also don't want to have a slightly different behaviour than GCC gratuitously. Clang defaults to -mfpu=neon when we choose -mcpu=cortex-a* or -march=armv7a, so our current behaviour is on par with GCC. But I think that's a dangerous assumption. Furthermore, the only alternatives we have at the moment is to either use NEON for everything or nothing. It would be good to have an option to use NEON for integer arithmetic and VFP for FP if the user requires IEEE compliance..> P.S. Looking at gcc's man page, gcc seems to use -mfpu for ARM and -mfpmath for x86. Do we use -mfpmath for both?We already support -mfpmath=vfp/neon in Clang, but it's bogus. My proposal is to make it count. The best way I can think of is to let -mfpmath=vfp *disable* only FP NEON and -mfpmath=neon *enable* only FP NEON, both orthogonal from integer math. Examples: Works today: -mfpu=soft -> Int (ALU), FP (LIB), no VFP/NEON instructions -mfpu=softfp -> Int (ALU), FP (LIB), VFP/NEON instructions allowed -mfpu=vfp -> Int (ALU), FP (VFP) -mfpu=neon -> Int (NEON), FP (NEON) Change proposed: -mfpmath=neon -mfpu=vfp -> Int (ALU), FP (NEON) -mfpmath=vfp -mfpu=neon -> Int (NEON), FP (VFP) This would be similar enough to GCC, and would allow the small number of users that care about IEEE-754 compliance to disable FP NEON on demand. cheers, --renato
----- Original Message -----> From: "Renato Golin" <renato.golin at linaro.org> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "LLVM Dev" <llvm-dev at lists.llvm.org>, "James Molloy" <James.Molloy at arm.com> > Sent: Friday, March 25, 2016 8:23:03 AM > Subject: Re: NEON FP flags > > On 25 March 2016 at 04:11, Hal Finkel <hfinkel at anl.gov> wrote: > > As I understand it, the fundamental property being addresses here > > is: Are the semantics of scalar FP math the same as vector FP > > math? TTI seems like a good place to expose that information. If > > the semantics are indeed different, then the vectorizer would > > require fast-math flags in order to vectorize FP operations > > (similarly, gcc's man page says it requires > > -funsafe-math-optimizations for vectorization unless -mfpu=neon or > > similar is specified). In this context, this different-semantics > > query would return true if: > > The semantics is indeed different, VFP is IEEE-754 compliant while > NEON is not. We don't want to stop the compiler from using VFP for FP > math, but we want to be cautious when using NEON in the same way.. > > > > !(isDarwin OR ARMISA >= v8 OR fpMath == NEON) > > > > and then we need to teach people to use -mfpu=neon ;) > > So, there's the catch. In GCC, -mfpu=neon means to use NEON, which is > not enabled by default, so the compiler assumes that the user is > aware > that NEON FP is not IEEE compliant. I don't think that's a safe > assumption, but I also don't want to have a slightly different > behaviour than GCC gratuitously. > > Clang defaults to -mfpu=neon when we choose -mcpu=cortex-a* or > -march=armv7a, so our current behaviour is on par with GCC. But I > think that's a dangerous assumption. > > Furthermore, the only alternatives we have at the moment is to either > use NEON for everything or nothing. It would be good to have an > option > to use NEON for integer arithmetic and VFP for FP if the user > requires > IEEE compliance.. > > > > > P.S. Looking at gcc's man page, gcc seems to use -mfpu for ARM and > > -mfpmath for x86. Do we use -mfpmath for both? > > We already support -mfpmath=vfp/neon in Clang, but it's bogus. My > proposal is to make it count. > > The best way I can think of is to let -mfpmath=vfp *disable* only FP > NEON and -mfpmath=neon *enable* only FP NEON, both orthogonal from > integer math. > > Examples: > > Works today: > -mfpu=soft -> Int (ALU), FP (LIB), no VFP/NEON instructions > -mfpu=softfp -> Int (ALU), FP (LIB), VFP/NEON instructions allowed > -mfpu=vfp -> Int (ALU), FP (VFP) > -mfpu=neon -> Int (NEON), FP (NEON) > > Change proposed: > -mfpmath=neon -mfpu=vfp -> Int (ALU), FP (NEON) > -mfpmath=vfp -mfpu=neon -> Int (NEON), FP (VFP) > > This would be similar enough to GCC, and would allow the small number > of users that care about IEEE-754 compliance to disable FP NEON on > demand.I think this seems reasonable, although it is somewhat unfortunate, in terms of naming, that "-mfpu" affects non-FP operations too. However, I think we're stuck because of what GCC decided to do. Thanks again, Hal> > cheers, > --renato >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
On Fri, Mar 25, 2016 at 01:23:03PM +0000, Renato Golin via llvm-dev wrote:> On 25 March 2016 at 04:11, Hal Finkel <hfinkel at anl.gov> wrote: > > As I understand it, the fundamental property being addresses here is: Are > > the semantics of scalar FP math the same as vector FP math? TTI seems like > > a good place to expose that information. If the semantics are indeed > > different, then the vectorizer would require fast-math flags in order to > > vectorize FP operations (similarly, gcc's man page says it requires > > -funsafe-math-optimizations for vectorization unless -mfpu=neon or similar > > is specified). In this context, this different-semantics query would return > > true if: > > The semantics is indeed different, VFP is IEEE-754 compliant while > NEON is not. We don't want to stop the compiler from using VFP for FP > math, but we want to be cautious when using NEON in the same way.. > > > > !(isDarwin OR ARMISA >= v8 OR fpMath == NEON) > > > > and then we need to teach people to use -mfpu=neon ;) > > So, there's the catch. In GCC, -mfpu=neon means to use NEON, which is > not enabled by default, so the compiler assumes that the user is aware > that NEON FP is not IEEE compliant. I don't think that's a safe > assumption, but I also don't want to have a slightly different > behaviour than GCC gratuitously.Note that my discussion below relates to the AArch32 behaviour (the ARM port of GCC, not the AArch64 port of GCC). I can see why the text in the man page might be misleading, but let me quote the part I think Hal was referring to here (with added emphasis): If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=neon), note that floating-point operations are **not** generated by GCC's auto-vectorization pass **unless** -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision. That is to say, GCC will only auto-vectorize floationg-point arithmetic if both -mfpu=neon AND -funsafe-math-optimizations are given. -mfpu=neon by itself does not imply that it is OK for GCC to generate non-IEEE compliant code. The default is safe until explicitly told otherwise.> Clang defaults to -mfpu=neon when we choose -mcpu=cortex-a* or > -march=armv7a, so our current behaviour is on par with GCC. But I > think that's a dangerous assumption.If your current behaviour is to generate unsafe math when -mfpu=neon is passed, then I agree this is dangerous. Again, this is *NOT* GCC's behaviour.> Furthermore, the only alternatives we have at the moment is to either > use NEON for everything or nothing. It would be good to have an option > to use NEON for integer arithmetic and VFP for FP if the user requires > IEEE compliance.In GCC, this is -mfpu=neon.> > P.S. Looking at gcc's man page, gcc seems to use -mfpu for ARM and -mfpmath > > for x86. Do we use -mfpmath for both? > > We already support -mfpmath=vfp/neon in Clang, but it's bogus. My > proposal is to make it count. > > The best way I can think of is to let -mfpmath=vfp *disable* only FP > NEON and -mfpmath=neon *enable* only FP NEON, both orthogonal from > integer math. > > Examples: > > Works today: > -mfpu=soft -> Int (ALU), FP (LIB), no VFP/NEON instructions > -mfpu=softfp -> Int (ALU), FP (LIB), VFP/NEON instructions allowed > -mfpu=vfp -> Int (ALU), FP (VFP) > -mfpu=neon -> Int (NEON), FP (NEON) > > Change proposed: > -mfpmath=neon -mfpu=vfp -> Int (ALU), FP (NEON) > -mfpmath=vfp -mfpu=neon -> Int (NEON), FP (VFP) > > This would be similar enough to GCC, and would allow the small number > of users that care about IEEE-754 compliance to disable FP NEON on > demand.In GCC today: -mfpu=vfp is the minimum floating-point instruction set supported, the choice of which ABI you use (-mfloat-abi) is independent from the choice of floating-point hardware that exists. -mfpu=soft and -mfpu=softfp are rejected by GCC. Starting with that: -mfloat-abi=soft -> Generate library calls for all floating-point operations, do not permit Neon operations. -mfloat-abi=softfp -> Pass floating point arguments using the softfloat abi (i.e. in core registers). Emit floating point instructions as appropriate. -mfloat-abi=hard -> Pass floating point arguments in VFP registers. Emit floating point instructions as appropriate. Independent of this, we have -mfpu: -mfpu=neon -> Permit generation of Neon instructions (both integer and floating point) where allowed by the language specification. Note that this does not by itself allow the generation of non-IEEE compliant code. And on top of that, -funsafe-math-optimizations to enable generating Neon instructions for floating point operations. For your set of use cases: Int (ALU), FP (LIB), no VFP/NEON instructions -mfloat-abi=soft Int (ALU), FP (LIB), VFP/NEON instructions allowed Impossible Int (ALU), FP (VFP) -mfloat-abi=hard or -mfloat-abi=softfp + -mfpu=vfp (or other non-neon FPU) Int (NEON), FP (VFP) -float-abi=hard or -mfloat-abi=softfp + -mfpu=neon (or greater) Int (NEON), FP (NEON) -float-abi=hard or -mfloat-abi=softfp + -mfpu=neon (or greater) + -funsafe-math-optimizations (or equivalent) Int (ALU), FP (NEON) Impossible (as far as I know). Hope this helps, James
On 29 March 2016 at 11:09, James Greenhalgh <james.greenhalgh at arm.com> wrote:> That is to say, GCC will only auto-vectorize floationg-point arithmetic > if both -mfpu=neon AND -funsafe-math-optimizations are given. -mfpu=neon > by itself does not imply that it is OK for GCC to generate non-IEEE > compliant code. The default is safe until explicitly told otherwise.Right, that was what I originally though from Hal's bug report, but recent emails on the thread confused me. I think this is the right behaviour, and I'm glad GCC does it, so we can follow the correct approach from start.>> Furthermore, the only alternatives we have at the moment is to either >> use NEON for everything or nothing. It would be good to have an option >> to use NEON for integer arithmetic and VFP for FP if the user requires >> IEEE compliance. > > In GCC, this is -mfpu=neon.This makes my life *so* much easier! :)> In GCC today: > > -mfpu=vfp is the minimum floating-point instruction set supported, the > choice of which ABI you use (-mfloat-abi) is independent from the choice > of floating-point hardware that exists. -mfpu=soft and -mfpu=softfp are > rejected by GCC.Yes, I mixed mfpu with mfloat-abi, my bad.> For your set of use cases: > > Int (ALU), FP (LIB), no VFP/NEON instructions > -mfloat-abi=soft > > Int (ALU), FP (LIB), VFP/NEON instructions allowed > ImpossibleI mentioned this as -mfloat-abi=floatfp. Now I see my representation of int/fp mixed concepts. Ignore this.> Int (ALU), FP (VFP) > -mfloat-abi=hard or -mfloat-abi=softfp > + -mfpu=vfp (or other non-neon FPU) > > Int (NEON), FP (VFP) > -float-abi=hard or -mfloat-abi=softfp > + -mfpu=neon (or greater)Excellent! This means I can only make -fsubnormal flags count, and all will be the same. This was my first approach, but Hal convinced me that we may want a specific flag that is included by fast/unsafe maths flags. See below.> Int (NEON), FP (NEON) > -float-abi=hard or -mfloat-abi=softfp > + -mfpu=neon (or greater) > + -funsafe-math-optimizations (or equivalent)Do you have one specifically for subnormals? -funsafe-math is a bit of a big hammer and will enable other (potentially unwanted) behaviour from the vectorizer. However, -ffast-math / unsafe-math should include subnormal support.> Int (ALU), FP (NEON) > Impossible (as far as I know).Irrelevant, as far as I care. :) cheers, --renato