Martin J. O'Riordan via llvm-dev
2016-Feb-11 11:23 UTC
[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets
Our processor also has some issues regarding the handling of denormals - scalar and vector - and we ran into a related problem only a few days ago. The v3.8 compiler has done a lot of good work on optimisations for floating-point math, but ironically one of them broke our implementation of 'nextafterf'. The desired code fragment (FP32) is: float xAbs = fabsf(x); since we know our instruction for this does not handle denormals and the algorithm is sensitive to correct denormals, the code was written to avoid this issue as follows: float xAbs = __builtin_astype(__builtin_astype(x, unsigned) & 0x7FFFFFFF, float); But the v3.8 FP optimiser now recognises this pattern and replaces it with an ISD::FABS node and broke our workaround :-) It's a great optimisation and I have no problem with its correctness, but I was thinking that perhaps I might see where I should extend the target information interface to allow a target to say that it does not support denormals so that this and possibly other optimisations could be suppressed in a target dependent way. Overall the new FP32 optimisation patterns appear to have yielded a small but not insignificant performance advantage over v3.7.1, though it is still early days for my complete measurements. MartinO -----Original Message----- From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Renato Golin via llvm-dev Sent: 11 February 2016 10:53 To: Hal Finkel <hfinkel at anl.gov> Cc: LLVM Dev <llvm-dev at lists.llvm.org>; nd <nd at arm.com> Subject: Re: [llvm-dev] Vectorization with fast-math on irregular ISA sub-sets Hal, I had a read on the ARM ARM about VFP and SIMD FP semantics and my analysis is that NEON's only problem is the Flush-to-zero behaviour, which is non-compliant. NEON deals with NaNs and Infs in the way specified by the standard and should not cause any concern to us. But we don't seem to have a flag specifically to denormals, so I think using the UnsafeMath is the safest option for now. On 11 February 2016 at 01:15, Hal Finkel <hfinkel at anl.gov> wrote:> nsz > No Signed Zeros - Allow optimizations to treat the sign of a zero argument or result as insignificant.In both VFP and NEON, zero signs are significant. In NEON, the flush-to-zero's zero will have the same sign as the input denormal.> nnan > No NaNs - Allow optimizations to assume the arguments and result are not NaN. Such optimizations are required to retain defined behavior over NaNs, but the value of the result is undefined.Both VFP and NEON treat NaNs as the standard requires, ie. [ NaN op ? ] = NaN.> ninf > No Infs - Allow optimizations to assume the arguments and result are not +/-Inf. Such optimizations are required to retain defined behavior over +/-Inf, but the value of the result is undefined.Same here. Operations with Inf generate Inf or NaNs on both units. The flush-to-zero behaviour has an effect on both NaNs and Infs, since it happens before. So a denormal operation with an Inf in VFP will not generate a NaN, while in NEON it'll be flushed to zero first, thus generating NaNs. James, is that a correct assessment? cheers, --renato _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Renato Golin via llvm-dev
2016-Feb-11 11:49 UTC
[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets
On 11 February 2016 at 11:23, Martin J. O'Riordan <martin.oriordan at movidius.com> wrote:> But the v3.8 FP optimiser now recognises this pattern and replaces it with an ISD::FABS node and broke our workaround :-) It's a great optimisation and I have no problem with its correctness, but I was thinking that perhaps I might see where I should extend the target information interface to allow a target to say that it does not support denormals so that this and possibly other optimisations could be suppressed in a target dependent way.Hi Martin, So, I have a patch that right now is a big hammer: * Targets can have SIMD IEEE compliant or not (instead of fine grained choosing which part). * Any FP arithmetic / cast operation with UnsafeAlgebra will trigger a "potentially unsafe" flag in the vectorizer. * In the end, if the SIMD unit is not IEEE compliant and there is any potentially unsafe operations, avoid that loop. I just need to create some more tests to submit. The problems I can see in your case are: * Both scalar and vector units have problems with denormals, so my isSIMDIEEE() is not enough. - To fix this, you can add isVFPIEEE(), but we may find a better solution? * Your optimisation is basic-block based, not loop based, so we'd have to add the same check to SLP. - SLP deals with both SIMD and VFP units, so we would need the additional flag anyway. - This will be my next step. * Other passes already have access to the TTI, so they can use those flags to avoid strength reduction, combine, etc. in those cases. I don't think we need to create a fine grained solution right now, since we don't have examples with different behaviour. Would that work for you? cheers, --renato
Martin J. O'Riordan via llvm-dev
2016-Feb-11 12:24 UTC
[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets
This is fine Renato. I worked around the local issue by using an instruction intrinsics so that the pattern would be invisible to this optimisation, and my thoughts for raising this to the TargetTransformInfo level are still not well formed. I was actually quite impressed with the new optimisation, it cleverly handled the situation perfectly. A coarse grained solution is fine, and it is always possible to handle this in custom lowering for ISD::FABS which could check a target specific flag to see if it should do the "safe thing" or the "fast thing". Thanks for the feedback, MartinO -----Original Message----- From: Renato Golin [mailto:renato.golin at linaro.org] Sent: 11 February 2016 11:50 To: Martin.ORiordan at movidius.com Cc: Hal Finkel <hfinkel at anl.gov>; LLVM Developers <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Vectorization with fast-math on irregular ISA sub-sets On 11 February 2016 at 11:23, Martin J. O'Riordan <martin.oriordan at movidius.com> wrote:> But the v3.8 FP optimiser now recognises this pattern and replaces it with an ISD::FABS node and broke our workaround :-) It's a great optimisation and I have no problem with its correctness, but I was thinking that perhaps I might see where I should extend the target information interface to allow a target to say that it does not support denormals so that this and possibly other optimisations could be suppressed in a target dependent way.Hi Martin, So, I have a patch that right now is a big hammer: * Targets can have SIMD IEEE compliant or not (instead of fine grained choosing which part). * Any FP arithmetic / cast operation with UnsafeAlgebra will trigger a "potentially unsafe" flag in the vectorizer. * In the end, if the SIMD unit is not IEEE compliant and there is any potentially unsafe operations, avoid that loop. I just need to create some more tests to submit. The problems I can see in your case are: * Both scalar and vector units have problems with denormals, so my isSIMDIEEE() is not enough. - To fix this, you can add isVFPIEEE(), but we may find a better solution? * Your optimisation is basic-block based, not loop based, so we'd have to add the same check to SLP. - SLP deals with both SIMD and VFP units, so we would need the additional flag anyway. - This will be my next step. * Other passes already have access to the TTI, so they can use those flags to avoid strength reduction, combine, etc. in those cases. I don't think we need to create a fine grained solution right now, since we don't have examples with different behaviour. Would that work for you? cheers, --renato