thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode? [Sep 2021]

If this information is useful, please help other people find it:
Share via:

Serge Pavlov via llvm-dev

2021-Sep-11 04:20 UTC

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at quicinc.com>
wrote:
> The problem is that math code is often templated, so `template <typename
> T>  MyMatrixT<T> safeMul(const MyMatrixT<T> & lhs …` is
going to be in a
> header.
>
No problem, the user can write:
```
#ifdef __FAST_MATH__
#undef isnan
#define isnan(x) false
#endif
```
and put it somewhere in the headers.

On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at quicinc.com>
wrote:
> Regardless, my position isn’t “there is no NaN”. My position is “you
> cannot count on operations on NaN working”.

Exactly. Attempts to express the condition of -ffast-math as restrictions
on types are not fruitful. I think it is the reason why GCC documentation
does not use simple and clear "there is no NaN" but prefers more
complicated wording about arithmetic.

On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at quicinc.com>
wrote:
> I think working around these sorts of issues is something that C and C++
> developers are used to. These sorts of “inconsistent” between compilers
> behaviors is something we accept because we know it comes with improved
> performance. In this case, the fix is easy, so I don’t think this corner
> case is worth supporting. Especially when the fix is also just one line:
> ```
> #define myIsNan(x) (reinterpret_cast<uint32_t>(x) =>
THE_BIT_PATTERN_OF_MY_SENTINEL_NAN)
> ```

It won't work in this way. If `x == 5.0`, then
`reinterpret_cast<uint32_t>(x) == 5`. What you need there is a bitcast.
Standard C does not have such. To emulate it a reinterpret_cast of memory
can be used: `*reinterpret_cast<int *>(&x)`. Another way is to use a
union.
Both these solutions require operations with memory, which is not good for
performance, especially on GPU and ML cores. Of course, a smart compiler
can eliminate memory operation, but it does not have to do it always, as it
is only optimization. Moving a value between float and integer
pipelines also may incur a performance penalty. At the same time this check
often may be done with a single instruction.

Thanks,
--Serge
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210911/5a82a8f7/attachment.html>

Serge Pavlov via llvm-dev

2021-Sep-13 06:02 UTC

head link

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

I was also wrong about reinterpret_cast, sorry.
`reinterpret_cast<uint32_t>(float)` is an invalid construct. The working
construct is `reinterpret_cast<uint32_t&>(x)`. It however possesses
the
same drawback, it requires `x` be in memory.

Thanks,
--Serge


On Sat, Sep 11, 2021 at 11:20 AM Serge Pavlov <sepavloff at gmail.com>
wrote:
> On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at
quicinc.com>
> wrote:
>
>> The problem is that math code is often templated, so `template
<typename
>> T>  MyMatrixT<T> safeMul(const MyMatrixT<T> & lhs …`
is going to be in a
>> header.
>>
>
> No problem, the user can write:
> ```
> #ifdef __FAST_MATH__
> #undef isnan
> #define isnan(x) false
> #endif
> ```
> and put it somewhere in the headers.
>
> On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at
quicinc.com>
> wrote:
>
>> Regardless, my position isn’t “there is no NaN”. My position is “you
>> cannot count on operations on NaN working”.
>
>
> Exactly. Attempts to express the condition of -ffast-math as restrictions
> on types are not fruitful. I think it is the reason why GCC documentation
> does not use simple and clear "there is no NaN" but prefers more
> complicated wording about arithmetic.
>
> On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at
quicinc.com>
> wrote:
>
>> I think working around these sorts of issues is something that C and
C++
>> developers are used to. These sorts of “inconsistent” between compilers
>> behaviors is something we accept because we know it comes with improved
>> performance. In this case, the fix is easy, so I don’t think this
corner
>> case is worth supporting. Especially when the fix is also just one
line:
>> ```
>> #define myIsNan(x) (reinterpret_cast<uint32_t>(x) =>>
THE_BIT_PATTERN_OF_MY_SENTINEL_NAN)
>> ```
>
>
> It won't work in this way. If `x == 5.0`, then
> `reinterpret_cast<uint32_t>(x) == 5`. What you need there is a
bitcast.
> Standard C does not have such. To emulate it a reinterpret_cast of memory
> can be used: `*reinterpret_cast<int *>(&x)`. Another way is to
use a
> union. Both these solutions require operations with memory, which is not
> good for performance, especially on GPU and ML cores. Of course, a smart
> compiler can eliminate memory operation, but it does not have to do it
> always, as it is only optimization. Moving a value between float and
> integer pipelines also may incur a performance penalty. At the same time
> this check often may be done with a single instruction.
>
> Thanks,
> --Serge
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210913/5ae38745/attachment.html>

Chris Tetreault via llvm-dev

2021-Sep-13 16:45 UTC

head link

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

Honestly, we can do this until the end of time. I think we both agree, that for
either scheme, there exists workarounds. The question is which workarounds are
more palatable, which is a matter of opinion. I think we’ve come to an impasse,
so let me just state that my opinion on the question “Should isnan be optimized
out in fast-math mode?” is “Yes”, which is what you asked to get in your
original message. I think that the implementation of fast-math will be cleaner
if we don’t special case a bunch of random constructs in order to do what the
user meant instead of what they said. I think fast-math is a notorious footgun,
and any attempts to mitigate this will only reduce the effectiveness of the
tool, while not really improving the user experience.

As a user, if I read that:

```
if (isnan(x)) {
```

… is guaranteed to work, and I read that fast-math enables the compiler to
reason about constructs like `x + 0` being equal to `x`, then I’m going to be
very confused when:

```
if (isnan(x + 0)) {
```

… does not also work. I’m going to open a bug and complain, and the slide down
the slippery slope will continue. You and I understand the difference, and the
technical reason why `isnan(x)` is supported but `isnan(x + 0)` isn’t, but Joe
Coder just trying to figure out why he’s got NaN in his matrices despite his
careful NaN handling code. Joe is not a compiler expert, and on the face of it,
it seems like a silly limitation. This will never end until fast-math is gutted.

Thanks,
   Chris Tetreault

From: Serge Pavlov <sepavloff at gmail.com>
Sent: Friday, September 10, 2021 9:21 PM
To: Chris Tetreault <ctetreau at quicinc.com>
Cc: Richard Smith <richard at metafoo.co.uk>; llvm-dev at lists.llvm.org;
cfe-dev at lists.llvm.org
Subject: Re: [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math
mode?


WARNING: This email originated from outside of Qualcomm. Please be wary of any
links or attachments, and do not enable macros.
On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at
quicinc.com<mailto:ctetreau at quicinc.com>> wrote:
The problem is that math code is often templated, so `template <typename
T>  MyMatrixT<T> safeMul(const MyMatrixT<T> & lhs …` is going
to be in a header.

No problem, the user can write:
```
#ifdef __FAST_MATH__
#undef isnan
#define isnan(x) false
#endif
```
and put it somewhere in the headers.

On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at
quicinc.com<mailto:ctetreau at quicinc.com>> wrote:
Regardless, my position isn’t “there is no NaN”. My position is “you cannot
count on operations on NaN working”.

Exactly. Attempts to express the condition of -ffast-math as restrictions on
types are not fruitful. I think it is the reason why GCC documentation does not
use simple and clear "there is no NaN" but prefers more complicated
wording about arithmetic.

On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctetreau at
quicinc.com<mailto:ctetreau at quicinc.com>> wrote:
I think working around these sorts of issues is something that C and C++
developers are used to. These sorts of “inconsistent” between compilers
behaviors is something we accept because we know it comes with improved
performance. In this case, the fix is easy, so I don’t think this corner case is
worth supporting. Especially when the fix is also just one line:
```
#define myIsNan(x) (reinterpret_cast<uint32_t>(x) ==
THE_BIT_PATTERN_OF_MY_SENTINEL_NAN)
```

It won't work in this way. If `x == 5.0`, then
`reinterpret_cast<uint32_t>(x) == 5`. What you need there is a bitcast.
Standard C does not have such. To emulate it a reinterpret_cast of memory can be
used: `*reinterpret_cast<int *>(&x)`. Another way is to use a union.
Both these solutions require operations with memory, which is not good for
performance, especially on GPU and ML cores. Of course, a smart compiler can
eliminate memory operation, but it does not have to do it always, as it is only
optimization. Moving a value between float and integer pipelines also may incur
a performance penalty. At the same time this check often may be done with a
single instruction.

Thanks,
--Serge
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210913/4e769d2f/attachment.html>

llvm dev - Sep 2021 - [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?