Tobias Grosser
2013-Jun-07 01:35 UTC
[LLVMdev] NEON vector instructions and the fast math IR flags
Hi, I was recently looking into the translation of LLVM-IR vector instructions to ARM NEON assembly. Specifically, when this is legal to do and when we need to be careful. I attached a very simple test case: define <4 x float> @fooP(<4 x float> %A, <4 x float> %B) { %C = fmul <4 x float> %A, %B ret <4 x float> %C } If fooP is compiled with "llc -march=arm -mattr=+vfp3,+neon" LLVM happily uses ARM NEON instructions to implement the vector multiply. This is obviously the fastest code that we can generate, but on the other hand we loose precision compared to non-NEON code (NEON flushes denormals to zero). As LLVM has now support for IR level fast-math flags, I am wondering if it now would make sense to only create NEON instructions if the relevant fast math flags are set on the IR level? The reason behind my question is that at the moment the only way to get IEEE 754 floating point operations on ARM is to fully disable NEON. However, NEON can be safely used for integer computations as well as for LLVM-IR instructions with the appropriate fast math flags. The attached test case contains an example of a floating point operation that requires IEEE 754 compliance, a floating point operation that does not require IEEE 754 as well as an integer computation. It is a perfect mixed use case, where we really do not want to globally disable NEON. I understand that some users do not require 754 compliant floating point behavior (clang on darwin?), which means they would probably not need this change. However, it should also not hurt them performance-wise as such users would probably set the relevant global fast-math flags to reduce the precision requirements, such that NEON instructions would be chosen anyway. I am very interested in opinions on the general topic as well as how to actually implement this in the ARM target. All the best, Tobias [1] http://llvm.org/docs/LangRef.html#fast-math-flags -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130606/f5262e8e/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: neon-floating-point-precision.ll Type: application/octet-stream Size: 1188 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130606/f5262e8e/attachment.obj>
Owen Anderson
2013-Jun-07 06:05 UTC
[LLVMdev] NEON vector instructions and the fast math IR flags
On Jun 6, 2013, at 8:35 PM, Tobias Grosser <grosser at google.com> wrote:> I understand that some users do not require 754 compliant floating point behavior (clang on darwin?), which means they would probably not need this change. However, it should also not hurt them performance-wise as such users would probably set the relevant global fast-math flags to reduce the precision requirements, such that NEON instructions would be chosen anyway.Darwin uses NEON for floating point, but does *not* (and should not). globally enable fast math flags. Use of NEON for FP needs to remain achievable without globally setting the fast math flags. Fast math may imply reasonably imply NEON, but the opposite direction is not accurate. That said, I don't think anyone would object to making VFP codegen available under non-Darwin triples. It's just a matter of making it happen. -Owen
Renato Golin
2013-Jun-07 06:58 UTC
[LLVMdev] NEON vector instructions and the fast math IR flags
On 7 June 2013 07:05, Owen Anderson <resistor at mac.com> wrote:> Darwin uses NEON for floating point, but does *not* (and should not). > globally enable fast math flags. Use of NEON for FP needs to remain > achievable without globally setting the fast math flags. Fast math may > imply reasonably imply NEON, but the opposite direction is not accurate. > > That said, I don't think anyone would object to making VFP codegen > available under non-Darwin triples. It's just a matter of making it happen. >Hi Owen, ARMSubtarget::resetSubtargetFeatures(StringRef CPU, StringRef FS) has a check to see if the target is Darwin or if UnsafeMath is enabled to set the UseNEONForSinglePrecisionFP, but only for A5 and A8, where this was a problem. Maybe I was too conservative on my fix. Tobi, The march=arm option would default to ARMv4, while mattr=+neon would force NEON, but I'm not sure it would default to A8, which would be a weird combination of ARM7TDMI+NEON. There are two things to know at this point: 1. When the execution gets to resetSubtargetFeatures, what CPU has it detected for your arguments. You may also have to look at ARM.td to see if the CPU that it got detected has in its description the feature "FeatureNEONForFP". 2. If the CPU is correct (Cortex-A*), and it's neither A5 nor A8, do we still want to generate single-precision float on NEON when non-Darwin and safe math? I don't think so. Possibly, that condition should be extended to ignore the CPU you're using and *only* emit NEON SP-FP when either Darwin or UnsafeMath are on. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130607/5299507e/attachment.html>
Possibly Parallel Threads
- [LLVMdev] NEON vector instructions and the fast math IR flags
- [LLVMdev] NEON vector instructions and the fast math IR flags
- [LLVMdev] NEON vector instructions and the fast math IR flags
- [LLVMdev] NEON vector instructions and the fast math IR flags
- [LLVMdev] NEON vector instructions and the fast math IR flags