thr3ads.net - llvm dev - [LLVMdev] NEON vector instructions and the fast math IR flags [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Tobias Grosser

2013-Jun-07 01:35 UTC

[LLVMdev] NEON vector instructions and the fast math IR flags

Hi,

I was recently looking into the translation of LLVM-IR vector instructions
to ARM NEON assembly. Specifically, when this is legal to do and when we
need to be careful.

I attached a very simple test case:

define <4 x float> @fooP(<4 x float> %A, <4 x float> %B)
{
%C = fmul <4 x float> %A, %B
ret <4 x float> %C
}

If fooP is compiled with  "llc -march=arm -mattr=+vfp3,+neon" LLVM
happily
uses ARM NEON instructions to implement the vector multiply. This is
obviously the fastest code that we can generate, but on the other hand we
loose precision compared to non-NEON code (NEON flushes denormals to zero).

As LLVM has now support for IR level fast-math flags, I am wondering if it
now would make sense to only create NEON instructions if the relevant fast
math flags are set on the IR level?

The reason behind my question is that at the moment the only way to get
IEEE 754 floating point operations on ARM is to fully disable NEON.
However, NEON can be safely used for integer computations as well as for
LLVM-IR instructions with the appropriate fast math flags. The attached
test case contains an example of a floating point operation that requires
IEEE 754 compliance, a floating point operation that does not require IEEE
754 as well as an integer computation. It is a perfect mixed use case,
where we really do not want to globally disable NEON.

I understand that some users do not require 754 compliant floating point
behavior (clang on darwin?), which means they would probably not need this
change. However, it should also not hurt them performance-wise as such
users would probably set the relevant global fast-math flags to reduce the
precision requirements, such that NEON instructions would be chosen anyway.

I am very interested in opinions on the general topic as well as how to
actually implement this in the ARM target.

All the best,
Tobias

[1] http://llvm.org/docs/LangRef.html#fast-math-flags
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130606/f5262e8e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: neon-floating-point-precision.ll
Type: application/octet-stream
Size: 1188 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130606/f5262e8e/attachment.obj>

Owen Anderson

2013-Jun-07 06:05 UTC

head link

[LLVMdev] NEON vector instructions and the fast math IR flags

On Jun 6, 2013, at 8:35 PM, Tobias Grosser <grosser at google.com> wrote:
> I understand that some users do not require 754 compliant floating point
behavior (clang on darwin?), which means they would probably not need this
change. However, it should also not hurt them performance-wise as such users
would probably set the relevant global fast-math flags to reduce the precision
requirements, such that NEON instructions would be chosen anyway.
Darwin uses NEON for floating point, but does *not* (and should not). globally
enable fast math flags.  Use of NEON for FP needs to remain achievable without
globally setting the fast math flags.  Fast math may imply reasonably imply
NEON, but the opposite direction is not accurate.

That said, I don't think anyone would object to making VFP codegen available
under non-Darwin triples.  It's just a matter of making it happen.

-Owen

Renato Golin

2013-Jun-07 06:58 UTC

head link

[LLVMdev] NEON vector instructions and the fast math IR flags

On 7 June 2013 07:05, Owen Anderson <resistor at mac.com> wrote:
> Darwin uses NEON for floating point, but does *not* (and should not).
> globally enable fast math flags.  Use of NEON for FP needs to remain
> achievable without globally setting the fast math flags.  Fast math may
> imply reasonably imply NEON, but the opposite direction is not accurate.
>
> That said, I don't think anyone would object to making VFP codegen
> available under non-Darwin triples.  It's just a matter of making it
happen.
>
Hi Owen,

ARMSubtarget::resetSubtargetFeatures(StringRef CPU, StringRef FS) has a
check to see if the target is Darwin or if UnsafeMath is enabled to set
the UseNEONForSinglePrecisionFP, but only for A5 and A8, where this was a
problem. Maybe I was too conservative on my fix.

Tobi,

The march=arm option would default to ARMv4, while mattr=+neon would force
NEON, but I'm not sure it would default to A8, which would be a weird
combination of ARM7TDMI+NEON.

There are two things to know at this point:

1. When the execution gets to resetSubtargetFeatures, what CPU has it
detected for your arguments. You may also have to look at ARM.td to see if
the CPU that it got detected has in its description the feature
"FeatureNEONForFP".

2. If the CPU is correct (Cortex-A*), and it's neither A5 nor A8, do we
still want to generate single-precision float on NEON when non-Darwin and
safe math? I don't think so. Possibly, that condition should be extended to
ignore the CPU you're using and *only* emit NEON SP-FP when either Darwin
or UnsafeMath are on.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130607/5299507e/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Jun 2013 - [LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

Possibly Parallel Threads