Chris Tetreault via llvm-dev
2021-Sep-10 19:39 UTC
[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?
The problem is that math code is often templated, so `template <typename T> MyMatrixT<T> safeMul(const MyMatrixT<T> & lhs …` is going to be in a header. Regardless, my position isn’t “there is no NaN”. My position is “you cannot count on operations on NaN working”. Just like sometimes you can dereference a pointer after it is free’d, but you should not count on this working. If the compiler I’m using emits a call to a library function instead of providing a macro, and this results in isnan actually computing if x is NaN, then so be it. But if the compiler provides a macro that evaluates to false under fast-math, then the two loops in safeMul can be optimized. Either way, as a developer, I know that I turned on fast-math, and I write code accordingly. I think working around these sorts of issues is something that C and C++ developers are used to. These sorts of “inconsistent” between compilers behaviors is something we accept because we know it comes with improved performance. In this case, the fix is easy, so I don’t think this corner case is worth supporting. Especially when the fix is also just one line: ``` #define myIsNan(x) (reinterpret_cast<uint32_t>(x) == THE_BIT_PATTERN_OF_MY_SENTINEL_NAN) ``` I would probably call the macro something else like `shouldProcessElement`. Thanks, Chris Tetreault From: Serge Pavlov <sepavloff at gmail.com> Sent: Friday, September 10, 2021 11:26 AM To: Chris Tetreault <ctetreau at quicinc.com> Cc: Richard Smith <richard at metafoo.co.uk>; llvm-dev at lists.llvm.org; cfe-dev at lists.llvm.org Subject: Re: [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode? WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros. It should not be done in headers of course. Redefinition of this macro in the source file which is compiled with -ffinite-math-only is free from the described drawbacks. Besides, the macro `isnan` is defined by libc, not compiler and IIRC it is defined as macro to allow such manipulations. Influence of libc on behavior of `isnan` in -ffinite-math-only is also an argument against "there are no NaNs". It causes inconsistency in the behavior. Libc can provide its own implementation, which does not rely on compiler `__builtin_isnan` and user code that uses `isnan` would work. But at some point configuration script changes or libc changed the macro and your code works wrong, as it happened after commit 767eadd78 in llvm libcxx project. Keeping `isnan` would make changes in libc less harmful. Thanks, --Serge -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210910/b322d7c1/attachment.html>
Serge Pavlov via llvm-dev
2021-Sep-11 04:20 UTC
[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?
On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at quicinc.com> wrote:> The problem is that math code is often templated, so `template <typename > T> MyMatrixT<T> safeMul(const MyMatrixT<T> & lhs …` is going to be in a > header. >No problem, the user can write: ``` #ifdef __FAST_MATH__ #undef isnan #define isnan(x) false #endif ``` and put it somewhere in the headers. On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at quicinc.com> wrote:> Regardless, my position isn’t “there is no NaN”. My position is “you > cannot count on operations on NaN working”.Exactly. Attempts to express the condition of -ffast-math as restrictions on types are not fruitful. I think it is the reason why GCC documentation does not use simple and clear "there is no NaN" but prefers more complicated wording about arithmetic. On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at quicinc.com> wrote:> I think working around these sorts of issues is something that C and C++ > developers are used to. These sorts of “inconsistent” between compilers > behaviors is something we accept because we know it comes with improved > performance. In this case, the fix is easy, so I don’t think this corner > case is worth supporting. Especially when the fix is also just one line: > ``` > #define myIsNan(x) (reinterpret_cast<uint32_t>(x) => THE_BIT_PATTERN_OF_MY_SENTINEL_NAN) > ```It won't work in this way. If `x == 5.0`, then `reinterpret_cast<uint32_t>(x) == 5`. What you need there is a bitcast. Standard C does not have such. To emulate it a reinterpret_cast of memory can be used: `*reinterpret_cast<int *>(&x)`. Another way is to use a union. Both these solutions require operations with memory, which is not good for performance, especially on GPU and ML cores. Of course, a smart compiler can eliminate memory operation, but it does not have to do it always, as it is only optimization. Moving a value between float and integer pipelines also may incur a performance penalty. At the same time this check often may be done with a single instruction. Thanks, --Serge -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210911/5a82a8f7/attachment.html>