Tobias Grosser
2013-Jun-07 20:35 UTC
[LLVMdev] NEON vector instructions and the fast math IR flags
On 06/07/2013 06:49 AM, Arnold Schwaighofer wrote:> > On Jun 7, 2013, at 3:14 AM, Renato Golin <renato.golin at linaro.org> wrote: > >> On 7 June 2013 08:48, Tobias Grosser <tobias at grosser.es> wrote: >> When to set which subtarget feature is a policy decision, where I honestly don't have any opinion on for clang. The best is probably to mirror the gcc behavior on linux targets. >> >> Not really, since GCC has no special behaviour for Darwin, AFAIK. >> >> My change will only generate SP-FP on NEON for A5 and A8 and only if it's Darwin or UnsafeMath is on, which seems not to be the case for you, so I don't think the problem is in that area. It's possible that some passes are not consulting that flag when generating NEON SP-FP. If that's true, this is definitely a bug. >> >> When I changed that, for VMUL.f32, it worked (ie. generated VFP instruction), but it might not be taking the same path your code is. >> >> >> I just looked again at the +neonfp flag. Compiling with and without +neonfp flag seems to only affect scalar types in the attached test case. If e.g. the LLVM vectorizer introduces vector instructions on LLVM-IR level floating point vectors still yield NEON assembly even if compiled with "-mattr=+neon,-neonfp". Is this expected? >> >> No, vectorizers should honour FP contracts. This is probably a bug, too. >> >> Please, fill both bugs on bugzilla, attaching the relevant IR to each one and a way to reproduce, and I'll have a look at them. >> > > > It is not the vectorizer that is the issue, it is the ARM backend that currently translates vectorized floating point IR to NEON instructions (it should scalarize it if desired to do so - i.e. if people care about denormals). To fix this issue one would have to fix the backend: i.e not declare v4f32 et al as legal (under a flag). As to making this predicated on fast math flags on operations (something like no-denormals - i don’t think we have that in the IR yet - we only have no nan, no infinite, no signed zeros, etc) I believe this would be a lot harder because I suspect you would have to custom lower all the operations.Thanks for that explanation. I think it illustrates the situation well. For programs that have mixed precision requirements for floating point operations we probably need to do this according to the fast math flags. Until we get there, a good first step would probably be to provide a global option similar to -enable-no-infs-fp-math that specifies if denormals should be allowed or not. This would allow the user to specify the precision requirements, without the need to alter with the feature flags of a specific piece of hardware. Tobi
David Tweed
2013-Jun-10 08:56 UTC
[LLVMdev] NEON vector instructions and the fast math IR flags
| For programs that have mixed precision requirements for floating point | operations we probably need to do this according to the fast math flags. | Until we get there, a good first step would probably be to provide a | global option similar to -enable-no-infs-fp-math that specifies if | denormals should be allowed or not. This would allow the user to specify | the precision requirements, without the need to alter with the feature | flags of a specific piece of hardware. Hi, sorry for coming in late on this. Firstly, I think what you mean is "if denormals should be required to be preserved or not". (Apart from anything else it's possible to move data between standard CPU, SIMD CPU and GPU so that even if one part of the system flushes them to zero when they occur can show up in other parts.) Clearly this implies that you can't use NEON instructions since they are specified not to preserve denormals. Secondly, I think it would be helpful to at least try to map out which "optimizations" are going to be viewed by a per-instruction IR flag just in order to get a clearer idea if the global stuff is the right model. (Amongst other things, I'm interested in DSLs where the likelihood of knowing something about the "ideal requirements" for operations that will be transformed into LLVM IR is higher than for manually written C/Fortran.) Cheers, Dave -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Tobias Grosser
2013-Jun-10 15:06 UTC
[LLVMdev] NEON vector instructions and the fast math IR flags
On 06/10/2013 01:56 AM, David Tweed wrote:> | For programs that have mixed precision requirements for floating point > | operations we probably need to do this according to the fast math flags. > | Until we get there, a good first step would probably be to provide a > | global option similar to -enable-no-infs-fp-math that specifies if > | denormals should be allowed or not. This would allow the user to specify > | the precision requirements, without the need to alter with the feature > | flags of a specific piece of hardware. > > Hi, sorry for coming in late on this. Firstly, I think what you mean is "if denormals should be required to be preserved or not". (Apart from anything else it's possible to move data between standard CPU, SIMD CPU and GPU so that even if one part of the system flushes them to zero when they occur can show up in other parts.) Clearly this implies that you can't use NEON instructions since they are specified not to preserve denormals.True.> Secondly, I think it would be helpful to at least try to map out which "optimizations" are going to be viewed by a per-instruction IR flag just in order to get a clearer idea if the global stuff is the right model. (Amongst other things, I'm interested in DSLs where the likelihood of knowing something about the "ideal requirements" for operations that will be transformed into LLVM IR is higher than for manually written C/Fortran.)Sorry, I did not get this sentence. Would you mind rephrasing it? At the moment I am mainly concerned of the code generation aspect. Optimizations on LLVM-IR can already reason per-instruction about several floating point precision flags. Doing this during code generation is apparently difficult as we would have to decide per instruction if we can legally lower it to NEON or not. Tobi
Possibly Parallel Threads
- [LLVMdev] NEON vector instructions and the fast math IR flags
- [LLVMdev] NEON vector instructions and the fast math IR flags
- [LLVMdev] NEON vector instructions and the fast math IR flags
- [LLVMdev] NEON vector instructions and the fast math IR flags
- [LLVMdev] NEON vector instructions and the fast math IR flags