Renato Golin
2013-Jun-07 06:58 UTC
[LLVMdev] NEON vector instructions and the fast math IR flags
On 7 June 2013 07:05, Owen Anderson <resistor at mac.com> wrote:> Darwin uses NEON for floating point, but does *not* (and should not). > globally enable fast math flags. Use of NEON for FP needs to remain > achievable without globally setting the fast math flags. Fast math may > imply reasonably imply NEON, but the opposite direction is not accurate. > > That said, I don't think anyone would object to making VFP codegen > available under non-Darwin triples. It's just a matter of making it happen. >Hi Owen, ARMSubtarget::resetSubtargetFeatures(StringRef CPU, StringRef FS) has a check to see if the target is Darwin or if UnsafeMath is enabled to set the UseNEONForSinglePrecisionFP, but only for A5 and A8, where this was a problem. Maybe I was too conservative on my fix. Tobi, The march=arm option would default to ARMv4, while mattr=+neon would force NEON, but I'm not sure it would default to A8, which would be a weird combination of ARM7TDMI+NEON. There are two things to know at this point: 1. When the execution gets to resetSubtargetFeatures, what CPU has it detected for your arguments. You may also have to look at ARM.td to see if the CPU that it got detected has in its description the feature "FeatureNEONForFP". 2. If the CPU is correct (Cortex-A*), and it's neither A5 nor A8, do we still want to generate single-precision float on NEON when non-Darwin and safe math? I don't think so. Possibly, that condition should be extended to ignore the CPU you're using and *only* emit NEON SP-FP when either Darwin or UnsafeMath are on. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130607/5299507e/attachment.html>
Tobias Grosser
2013-Jun-07 07:48 UTC
[LLVMdev] NEON vector instructions and the fast math IR flags
On 06/06/2013 11:58 PM, Renato Golin wrote:> On 7 June 2013 07:05, Owen Anderson <resistor at mac.com> wrote:Hi Owen, hi Renato, thanks for your replies.>> Darwin uses NEON for floating point, but does *not* (and should not). >> globally enable fast math flags. Use of NEON for FP needs to remain >> achievable without globally setting the fast math flags. Fast math may >> imply reasonably imply NEON, but the opposite direction is not accurate.Good point. Fast math is probably a too tough requirement. I need to look into what are the ways NEON does not comply with IEEE 754. For now the only difference I see is that it may round denormals to zero.>> That said, I don't think anyone would object to making VFP codegen >> available under non-Darwin triples. It's just a matter of making it happen.I see.> Tobi, > > The march=arm option would default to ARMv4, while mattr=+neon would force > NEON, but I'm not sure it would default to A8, which would be a weird > combination of ARM7TDMI+NEON. > > There are two things to know at this point: > > 1. When the execution gets to resetSubtargetFeatures, what CPU has it > detected for your arguments. You may also have to look at ARM.td to see if > the CPU that it got detected has in its description the feature > "FeatureNEONForFP". > > 2. If the CPU is correct (Cortex-A*), and it's neither A5 nor A8, do we > still want to generate single-precision float on NEON when non-Darwin and > safe math? I don't think so. Possibly, that condition should be extended to > ignore the CPU you're using and *only* emit NEON SP-FP when either Darwin > or UnsafeMath are on.Renato: When to set which subtarget feature is a policy decision, where I honestly don't have any opinion on for clang. The best is probably to mirror the gcc behavior on linux targets. My current goal is to understand the implications of certain features and to make sure a tool using the LLVM back-ends can actually implement any policy it likes. I just looked again at the +neonfp flag. Compiling with and without +neonfp flag seems to only affect scalar types in the attached test case. If e.g. the LLVM vectorizer introduces vector instructions on LLVM-IR level floating point vectors still yield NEON assembly even if compiled with "-mattr=+neon,-neonfp". Is this expected? Cheers, Tobias -------------- next part -------------- ; RUN: llc -march=arm -mattr=+vfp3,+neon < %s | FileCheck %s ; fooP() performs a vector floating point multiplication with full precision ; requirement. Even if we allow neon with -mattr=+neon, NEON should not be used ; to implement this function as it does not comply to the full precision ; requirements (NEON rounds e.g. denormals to zero which reduces precision) define <4 x float> @fooP(<4 x float> %A, <4 x float> %B) { %C = fmul <4 x float> %A, %B ; CHECK: fooP ; CHECK: vmul.f32 s ; CHECK: vmul.f32 s ; CHECK: vmul.f32 s ; CHECK: vmul.f32 s ret <4 x float> %C } ; fooR() performs a vector floating point multiplication with relaxed precision ; requirements. In this case the precision loss introduced by neon is acceptable ; and we should generate NEON instructions define <4 x float> @fooR(<4 x float> %A, <4 x float> %B) { %C = fmul fast <4 x float> %A, %B ; CHECK: fooR ; CHECK: vmul.f32 q ret <4 x float> %C } ; bar() performs a vector integer multiplication. On an ARM NEON device, this ; code should always be execute as vector code. define <4 x i32> @bar(<4 x i32> %A, <4 x i32> %B) { %C = mul <4 x i32> %A, %B ; CHECK: bar ; CHECK: vmul.i32 q ret <4 x i32> %C } define float @fooS(float %A, float %B) { %C = fmul fast float %A, %B ; CHECK: fooR ; CHECK: vmul.f32 q ret float %C }
David Tweed
2013-Jun-07 08:01 UTC
[LLVMdev] NEON vector instructions and the fast math IR flags
>> Darwin uses NEON for floating point, but does *not* (and should not). >> globally enable fast math flags. Use of NEON for FP needs to remain >> achievable without globally setting the fast math flags. Fast math may >> imply reasonably imply NEON, but the opposite direction is not accurate.| Good point. Fast math is probably a too tough requirement. I need to | look into what are the ways NEON does not comply with IEEE 754. For now | the only difference I see is that it may round denormals to zero. Yes, I've gone on record before as saying that fast-math enables far too many different things for it to be "the canonical switch" for just about any transformation. Rather, it should be what I think it is in gcc which is an effectively a short-cut for invoking of several individual math-option flags. [snip] |I just looked again at the +neonfp flag. Compiling with and without |+neonfp flag seems to only affect scalar types in the attached test |case. If e.g. the LLVM vectorizer introduces vector instructions on |LLVM-IR level floating point vectors still yield NEON assembly even if |compiled with "-mattr=+neon,-neonfp". Is this expected? I'm virtually certain that's a problem since there are codebases out there which use that to effectively specify "integer neon but use VFP for floats". If the vectorizer is producing neon floating point from scalar code in the presence of that flag then it's a (minor) issue waiting to happen. Cheers, Dave
Renato Golin
2013-Jun-07 08:14 UTC
[LLVMdev] NEON vector instructions and the fast math IR flags
On 7 June 2013 08:48, Tobias Grosser <tobias at grosser.es> wrote:> When to set which subtarget feature is a policy decision, where I honestly > don't have any opinion on for clang. The best is probably to mirror the gcc > behavior on linux targets. >Not really, since GCC has no special behaviour for Darwin, AFAIK. My change will only generate SP-FP on NEON for A5 and A8 and only if it's Darwin or UnsafeMath is on, which seems not to be the case for you, so I don't think the problem is in that area. It's possible that some passes are not consulting that flag when generating NEON SP-FP. If that's true, this is definitely a bug. When I changed that, for VMUL.f32, it worked (ie. generated VFP instruction), but it might not be taking the same path your code is. I just looked again at the +neonfp flag. Compiling with and without +neonfp> flag seems to only affect scalar types in the attached test case. If e.g. > the LLVM vectorizer introduces vector instructions on LLVM-IR level > floating point vectors still yield NEON assembly even if compiled with > "-mattr=+neon,-neonfp". Is this expected? >No, vectorizers should honour FP contracts. This is probably a bug, too. Please, fill both bugs on bugzilla, attaching the relevant IR to each one and a way to reproduce, and I'll have a look at them. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130607/b71e00c9/attachment.html>
Reasonably Related Threads
- [LLVMdev] NEON vector instructions and the fast math IR flags
- [LLVMdev] NEON vector instructions and the fast math IR flags
- [LLVMdev] NEON vector instructions and the fast math IR flags
- [LLVMdev] NEON vector instructions and the fast math IR flags
- [LLVMdev] NEON vector instructions and the fast math IR flags