Serge Pavlov via llvm-dev
2021-Sep-13 06:02 UTC
[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?
I was also wrong about reinterpret_cast, sorry. `reinterpret_cast<uint32_t>(float)` is an invalid construct. The working construct is `reinterpret_cast<uint32_t&>(x)`. It however possesses the same drawback, it requires `x` be in memory. Thanks, --Serge On Sat, Sep 11, 2021 at 11:20 AM Serge Pavlov <sepavloff at gmail.com> wrote:> On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at quicinc.com> > wrote: > >> The problem is that math code is often templated, so `template <typename >> T> MyMatrixT<T> safeMul(const MyMatrixT<T> & lhs …` is going to be in a >> header. >> > > No problem, the user can write: > ``` > #ifdef __FAST_MATH__ > #undef isnan > #define isnan(x) false > #endif > ``` > and put it somewhere in the headers. > > On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at quicinc.com> > wrote: > >> Regardless, my position isn’t “there is no NaN”. My position is “you >> cannot count on operations on NaN working”. > > > Exactly. Attempts to express the condition of -ffast-math as restrictions > on types are not fruitful. I think it is the reason why GCC documentation > does not use simple and clear "there is no NaN" but prefers more > complicated wording about arithmetic. > > On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at quicinc.com> > wrote: > >> I think working around these sorts of issues is something that C and C++ >> developers are used to. These sorts of “inconsistent” between compilers >> behaviors is something we accept because we know it comes with improved >> performance. In this case, the fix is easy, so I don’t think this corner >> case is worth supporting. Especially when the fix is also just one line: >> ``` >> #define myIsNan(x) (reinterpret_cast<uint32_t>(x) =>> THE_BIT_PATTERN_OF_MY_SENTINEL_NAN) >> ``` > > > It won't work in this way. If `x == 5.0`, then > `reinterpret_cast<uint32_t>(x) == 5`. What you need there is a bitcast. > Standard C does not have such. To emulate it a reinterpret_cast of memory > can be used: `*reinterpret_cast<int *>(&x)`. Another way is to use a > union. Both these solutions require operations with memory, which is not > good for performance, especially on GPU and ML cores. Of course, a smart > compiler can eliminate memory operation, but it does not have to do it > always, as it is only optimization. Moving a value between float and > integer pipelines also may incur a performance penalty. At the same time > this check often may be done with a single instruction. > > Thanks, > --Serge >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210913/5ae38745/attachment.html>
James Y Knight via llvm-dev
2021-Sep-13 11:59 UTC
[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?
On Mon, Sep 13, 2021, 2:02 AM Serge Pavlov via cfe-dev < cfe-dev at lists.llvm.org> wrote:> The working construct is `reinterpret_cast<uint32_t&>(x)`. It however > possesses the same drawback, it requires `x` be in memory. >We're getting rather far afield of the thread topic here, but .. that is UB, don't do that. Instead, always memcpy, e.g. uint32_t y; memcpy(&y, &flo, sizeof(uint32_t)); Or use a wrapper like std::bit_cast or absl::bit_cast ( https://github.com/abseil/abseil-cpp/blob/cfbf5bf948a2656bda7ddab59d3bcb29595c144c/absl/base/casts.h#L106 ). This has effectively no runtime overhead, the compiler is extremely good at deleting calls to memcpy when it has a constant smallish size. And remember that *every* local variable started out in memory. Only through optimizations does the memory location and the loads/stores for every access get eliminated.>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210913/d34aafc8/attachment.html>