thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode? [Sep 2021]

If this information is useful, please help other people find it:
Share via:

Serge Pavlov via llvm-dev

2021-Sep-13 13:49 UTC

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

Let's weigh the alternatives.

We are discussing two approaches for handling `isnan` and similar functions
in -ffinite-math-only mode:
1. "Old" behavior: "with -ffinite-math-only you are telling that
there are
no NaNs", so `isnan` may be optimized to `false`.
2. "New" behavior: with -ffinite-math-only you are telling that the
operands of arithmetic operations are not NaNs but otherwise NaN may be
used. As `isnan` is not an arithmetic operation, it should be preserved.

Advantages of the "old" behavior are:
- " it’s intuitively clear".
- It is close to the GCC current behavior.

Advantages of the "new" behavior are:
- `isnan` is still available to the user, which allows, for instance,
validation of working data or selection between fast and slow path.
- NaN is available and may be used, for instance, as sentinel.
- Consistency between compiler and library implementations, both would
behave similarly.
- In most real cases the "old" behavior can be easily obtained by
redefinition of `isnan`.
- It is free from issues like "what returns
numeric_limits<float>::has_quite_NaN()?".

It is unlikely that "old" behavior gives noticeable performance gain.
Anyway, `isnan` may be redefined to `false` if it actually does.

Intuitive clarity of the "old" way is questionable for users, because
it is
not clear why functions like `isnan` silently disappeared or what body
should have specializations of `numeric_limit` methods.

There are cases when checking for NaN is needed even in -ffinite-math-only
mode. To make it, users have to use workarounds like doing integer
arithmetic on float values, which reduce clarity of code, make it
unportable and slower.

Are there any other advantages/disadvantages of these approaches?

Thanks,
--Serge

On Mon, Sep 13, 2021 at 7:00 PM James Y Knight <jyknight at google.com>
wrote:
> On Mon, Sep 13, 2021, 2:02 AM Serge Pavlov via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> The working construct is `reinterpret_cast<uint32_t&>(x)`. It
however
>> possesses the same drawback, it requires `x` be in memory.
>>
>
> We're getting rather far afield of the thread topic here, but .. that
is
> UB, don't do that.
>
> Instead, always memcpy, e.g.
> uint32_t y;
> memcpy(&y, &flo, sizeof(uint32_t));
>
> Or use a wrapper like std::bit_cast or absl::bit_cast (
>
https://github.com/abseil/abseil-cpp/blob/cfbf5bf948a2656bda7ddab59d3bcb29595c144c/absl/base/casts.h#L106
> ).
>
> This has effectively no runtime overhead, the compiler is extremely good
> at deleting calls to memcpy when it has a constant smallish size. And
> remember that *every* local variable started out in memory. Only through
> optimizations does the memory location and the loads/stores for every
> access get eliminated.
>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210913/5fe23580/attachment.html>

Krzysztof Parzyszek via llvm-dev

2021-Sep-13 14:03 UTC

head link

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

If the compiler provides “isnan”, the user can’t redefine it. 
Redefining/undefining any function or a macro provided by a compiler is UB.

The “old” behavior can be tuned with #pragmas to restore the functionality of
NaNs where needed.
The “old” behavior doesn’t have a problem with “has_nan”---it returns “true”. 
What other issues are there?

--
Krzysztof Parzyszek  kparzysz at quicinc.com<mailto:kparzysz at
quicinc.com>   AI tools development

From: cfe-dev <cfe-dev-bounces at lists.llvm.org> On Behalf Of Serge
Pavlov via cfe-dev
Sent: Monday, September 13, 2021 8:50 AM
To: James Y Knight <jyknight at google.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; Clang Dev <cfe-dev at
lists.llvm.org>
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math
mode?


WARNING: This email originated from outside of Qualcomm. Please be wary of any
links or attachments, and do not enable macros.
Let's weigh the alternatives.

We are discussing two approaches for handling `isnan` and similar functions in
-ffinite-math-only mode:
1. "Old" behavior: "with -ffinite-math-only you are telling that
there are no NaNs", so `isnan` may be optimized to `false`.
2. "New" behavior: with -ffinite-math-only you are telling that the
operands of arithmetic operations are not NaNs but otherwise NaN may be used. As
`isnan` is not an arithmetic operation, it should be preserved.

Advantages of the "old" behavior are:
- " it’s intuitively clear".
- It is close to the GCC current behavior.

Advantages of the "new" behavior are:
- `isnan` is still available to the user, which allows, for instance, validation
of working data or selection between fast and slow path.
- NaN is available and may be used, for instance, as sentinel.
- Consistency between compiler and library implementations, both would behave
similarly.
- In most real cases the "old" behavior can be easily obtained by
redefinition of `isnan`.
- It is free from issues like "what returns
numeric_limits<float>::has_quite_NaN()?".

It is unlikely that "old" behavior gives noticeable performance gain.
Anyway, `isnan` may be redefined to `false` if it actually does.

Intuitive clarity of the "old" way is questionable for users, because
it is not clear why functions like `isnan` silently disappeared or what body
should have specializations of `numeric_limit` methods.

There are cases when checking for NaN is needed even in -ffinite-math-only mode.
To make it, users have to use workarounds like doing integer arithmetic on float
values, which reduce clarity of code, make it unportable and slower.

Are there any other advantages/disadvantages of these approaches?

Thanks,
--Serge


On Mon, Sep 13, 2021 at 7:00 PM James Y Knight <jyknight at
google.com<mailto:jyknight at google.com>> wrote:
On Mon, Sep 13, 2021, 2:02 AM Serge Pavlov via cfe-dev <cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:
The working construct is `reinterpret_cast<uint32_t&>(x)`. It however
possesses the same drawback, it requires `x` be in memory.

We're getting rather far afield of the thread topic here, but .. that is UB,
don't do that.

Instead, always memcpy, e.g.
uint32_t y;
memcpy(&y, &flo, sizeof(uint32_t));

Or use a wrapper like std::bit_cast or absl::bit_cast
(https://github.com/abseil/abseil-cpp/blob/cfbf5bf948a2656bda7ddab59d3bcb29595c144c/absl/base/casts.h#L106).

This has effectively no runtime overhead, the compiler is extremely good at
deleting calls to memcpy when it has a constant smallish size. And remember that
every local variable started out in memory. Only through optimizations does the
memory location and the loads/stores for every access get eliminated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210913/7af39540/attachment.html>

llvm dev - Sep 2021 - [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?