thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode? [Sep 2021]

If this information is useful, please help other people find it:
Share via:

Mehdi AMINI via llvm-dev

2021-Sep-17 16:16 UTC

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepavloff at gmail.com>
wrote:
> On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>>
>>
>> On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctetreau at
quicinc.com>
>>> wrote:
>>>
>>>> The difference there is that doing pointer arithmetic on null
pointers
>>>> doesn't *usually* work, unless you turn on -ffast-pointers.
>>>>
>>>> It seems to me that  most confusion related to -ffast-math is
likely
>>>> caused by people who are transitioning to using it. I have some
codebase,
>>>> and I turn on fast math, and then a few months down the road I
notice a
>>>> strangeness that I did not catch during the initial transition
period. If
>>>> you're writing new code with fast-math, you don't do
things like try to use
>>>> NaN as a sentinel value in a TU with fast math turned on. This
is the sort
>>>> of thing you catch when you try to transition an existing
codebase. Forgive
>>>> me for the uncharitable interpretation, but it's much
easier to ask the
>>>> compiler to change to accommodate your use case than it is to
refactor your
>>>> code.
>>>>
>>>
>>> It is a common way to explain problems with -ffinite-math-only by
user
>>> ignorance. However user misunderstandings and complaints may
indicate a
>>> flaw in compiler implementation, which I believe we have in this
case.
>>>
>>> Using NaN as sentinels is a natural way when you cannot spend extra
>>> memory for keeping flags for each item, spend extra cycles to read
that
>>> flag and do not want to pollute cache. It does not depend on
reading
>>> documentation or writing the code from scratch. It is simply the
best
>>> solution for storing data. If performance of the data processing is
>>> critical, -ffast-math is a good solution. This is a fairly
legitimate use
>>> case. The fact that the compiler does not allow it is a compiler
drawback.
>>>
>>>
>>>> To me, I think Mehdi had the best solution: The algorithm that
is the
>>>> bottleneck, and experiences the huge speedup using fast-math
should be
>>>> separated into its own source file. This source file, and only
this source
>>>> file should be compiled with fast-math. The outer driver loop
should not be
>>>> compiled with fast math. This solution is clean, (probably)
easy, and
>>>> doesn't require a change in the compiler.
>>>
>>>
>>> It is a workaround, it works in some cases but does not in others.
ML
>>> kernel often is a single translation unit, there may be no such
thing as
>>> linker for that processor. At the same time it is computation
intensive and
>>> using fast-math in it may be very profitable.
>>>
>>
>> Switching mode in a single TU seems valuable, but could this be handled
>> with pragmas or function attributes instead?
>>
>
> GCC allows it by using `#pragma GCC optimize()`, but clang does not
> support it. No suitable function attribute exists for that.
>
Right, I know that clang does not support it, but it could :)
So since we're looking at what provides the best user-experience: isn't
that it? Shouldn't we look into providing this level of granularity?
(whether function-level or finer grain)


>
>>
>>
>>>
>>>
>>>> Changing the compiler is hard, affects everybody who uses the
compiler,
>>>> and creates inconsistency in behavior between clang and gcc
(and msvc with
>>>> /fp:fast), and clang and old versions of clang.
>>>
>>>
>>> ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is
>>> implemented in libc, it is also a real check. Actually removing
`isnan`
>>> creates inconsistency.
>>>
>>>
>>>> The behavior of fast-math with respect to NaN is consistent
across the
>>>> mainstream c/c++ compilers: no promises are made, users should
not assume
>>>> that they can use it for anything. Changing it now would create
a major
>>>> portability issue for user codebases, which in and of itself is
a very
>>>> strong reason to not make this change.
>>>>
>>>
>>> Removing `isnan` is only an optimization, it does not intend to
change
>>> semantics. So it cannot create portability issues. Quite the
contrary, it
>>> helps portability by making behavior consistent between compilers
and libc
>>> implementations. The only possible issue is performance loss, this
is
>>> discussed above, it is an unlikely case. Anyway, if such loss
exists and it
>>> is absolutely intolerable for a user, a hack with redefinition of
`isnan`
>>> restores the previous code generation.
>>>
>>>
>>>> If the behavior is confusing to users, that's because
it's poorly
>>>> explained.
>>>
>>>
>>> What is confusing? That the explicitly written call to a function
is not
>>> removed? According to user feedback it is the silent removal of
`isnan`
>>> that confuses users.
>>>
>>> Honestly, I think the docs are pretty clear, but "It's
clear, you just
>>>> need to learn to read" is never an acceptable answer so it
could certainly
>>>> be improved. This is the only thing that needs to be fixed in
my opinion.
>>>>
>>>
>>> The documentation says about -ffinite-math-only:
>>>
>>>  "Allow optimizations for floating-point arithmetic that
assume that
>>> arguments and results are not NaNs or +-Infs."
>>>
>>> Is it clear whether `isnan` is arithmetic or not?
>>>
>>
>> If the result of a floating point arithmetic is fed directly to
>> `isnan()`, are we allowed to eliminate the computation and fold the
check
>> to none? (seems like it according to the sentence you're quoting).
>> Are we back to `isnan(x+0.0)` can be folded but not `isnan(x)`?
>>
>
> Initially there was such intention, but during the discussion it became
> clear that it is not profitable. For `isinf` there is a realistic use case
> when removing the call `isinf(a+b)` is an issue. Arthur O'Dwyer
> demonstrated an example for `isnan`. It is harder to provide guarantees
> about the result, than about the arguments.
>
>
>> --
>> Mehdi
>>
>>
>>
>>
>>>
>>>
>>>> Thanks,
>>>>    Chris Tetreault
>>>>
>>>> -----Original Message-----
>>>> From: Michael Kruse <cfe-dev at meinersbur.de>
>>>> Sent: Thursday, September 16, 2021 12:29 PM
>>>> To: Chris Tetreault <ctetreau at quicinc.com>
>>>> Cc: Serge Pavlov <sepavloff at gmail.com>; llvm-dev at
lists.llvm.org;
>>>> cfe-dev at lists.llvm.org
>>>> Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out
in
>>>> fast-math mode?
>>>>
>>>> WARNING: This email originated from outside of Qualcomm. Please
be wary
>>>> of any links or attachments, and do not enable macros.
>>>>
>>>> Am Mo., 13. Sept. 2021 um 11:46 Uhr schrieb Chris Tetreault via
cfe-dev
>>>> <cfe-dev at lists.llvm.org>:
>>>> > As a user, if I read that:
>>>> >
>>>> >
>>>> >
>>>> > ```
>>>> >
>>>> > if (isnan(x)) {
>>>> >
>>>> > ```
>>>> >
>>>> >
>>>> >
>>>> > … is guaranteed to work, and I read that fast-math enables
the
>>>> compiler to reason about constructs like `x + 0` being equal to
`x`, then
>>>> I’m going to be very confused when:
>>>> >
>>>> >
>>>> >
>>>> > ```
>>>> >
>>>> > if (isnan(x + 0)) {
>>>> >
>>>> > ```
>>>> >
>>>> >
>>>> >
>>>> > … does not also work. I’m going to open a bug and
complain, and the
>>>> slide down the slippery slope will continue. You and I
understand the
>>>> difference, and the technical reason why `isnan(x)` is
supported but
>>>> `isnan(x + 0)` isn’t, but Joe Coder just trying to figure out
why he’s got
>>>> NaN in his matrices despite his careful NaN handling code. Joe
is not a
>>>> compiler expert, and on the face of it, it seems like a silly
limitation.
>>>> This will never end until fast-math is gutted.
>>>>
>>>> C/C++ already has cases like this. Pointer arithmetic on null
pointers
>>>> is undefined behaviour, even if adding[1,2]/subtracting[3]
zero. I don't
>>>> think it is too far fetched to expect from users to know that
an operation
>>>> is undefined behaviour even if one of the operands is zero.
>>>>
>>>> Michael
>>>>
>>>> [1]
>>>>
https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-addition.c
>>>> [2]
>>>>
https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/ubsan/TestCases/Pointer/nullptr-and-nonzero-offset-constants.cpp
>>>> [3]
>>>>
https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-subtraction.c
>>>>
>>>>
>>>> Michael
>>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210917/afd6adcd/attachment.html>

Chris Tetreault via llvm-dev

2021-Sep-17 17:34 UTC

head link

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

For the record, I think having a pragma that allows you to control fast-math
behavior is fine. This sort of thing is far from unprecedented, and is far more
palatable to me than just having random special cases. It’s also far more useful
as I can do something like:

   #pragma clang fast-math push
   #pragma clang fast-math-on
   // bunch of hairy floating point math here
   #pragma clang fast-math-pop

… and easily isolate fast math to the smallest area it can be. This is morally
equivalent to putting the scary math in it’s own TU, but less annoying.

The pragma should probably have a push/pop, but otherwise I don’t really care
what color the bike shed is.

Thanks,
   Chris Tetreault

From: Mehdi AMINI <joker.eph at gmail.com>
Sent: Friday, September 17, 2021 9:17 AM
To: Serge Pavlov <sepavloff at gmail.com>
Cc: Chris Tetreault <ctetreau at quicinc.com>; llvm-dev at lists.llvm.org;
cfe-dev at lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math
mode?

WARNING: This email originated from outside of Qualcomm. Please be wary of any
links or attachments, and do not enable macros.

On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepavloff at
gmail.com<mailto:sepavloff at gmail.com>> wrote:
On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joker.eph at
gmail.com<mailto:joker.eph at gmail.com>> wrote:

On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:
On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctetreau at
quicinc.com<mailto:ctetreau at quicinc.com>> wrote:
The difference there is that doing pointer arithmetic on null pointers
doesn't *usually* work, unless you turn on -ffast-pointers.

It seems to me that  most confusion related to -ffast-math is likely caused by
people who are transitioning to using it. I have some codebase, and I turn on
fast math, and then a few months down the road I notice a strangeness that I did
not catch during the initial transition period. If you're writing new code
with fast-math, you don't do things like try to use NaN as a sentinel value
in a TU with fast math turned on. This is the sort of thing you catch when you
try to transition an existing codebase. Forgive me for the uncharitable
interpretation, but it's much easier to ask the compiler to change to
accommodate your use case than it is to refactor your code.

It is a common way to explain problems with -ffinite-math-only by user
ignorance. However user misunderstandings and complaints may indicate a flaw in
compiler implementation, which I believe we have in this case.

Using NaN as sentinels is a natural way when you cannot spend extra memory for
keeping flags for each item, spend extra cycles to read that flag and do not
want to pollute cache. It does not depend on reading documentation or writing
the code from scratch. It is simply the best solution for storing data. If
performance of the data processing is critical, -ffast-math is a good solution.
This is a fairly legitimate use case. The fact that the compiler does not allow
it is a compiler drawback.

To me, I think Mehdi had the best solution: The algorithm that is the
bottleneck, and experiences the huge speedup using fast-math should be separated
into its own source file. This source file, and only this source file should be
compiled with fast-math. The outer driver loop should not be compiled with fast
math. This solution is clean, (probably) easy, and doesn't require a change
in the compiler.

It is a workaround, it works in some cases but does not in others. ML kernel
often is a single translation unit, there may be no such thing as linker for
that processor. At the same time it is computation intensive and using fast-math
in it may be very profitable.

Switching mode in a single TU seems valuable, but could this be handled with
pragmas or function attributes instead?

GCC allows it by using `#pragma GCC optimize()`, but clang does not support it.
No suitable function attribute exists for that.

Right, I know that clang does not support it, but it could :)
So since we're looking at what provides the best user-experience: isn't
that it? Shouldn't we look into providing this level of granularity?
(whether function-level or finer grain)

Changing the compiler is hard, affects everybody who uses the compiler, and
creates inconsistency in behavior between clang and gcc (and msvc with
/fp:fast), and clang and old versions of clang.

ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is implemented
in libc, it is also a real check. Actually removing `isnan` creates
inconsistency.

The behavior of fast-math with respect to NaN is consistent across the
mainstream c/c++ compilers: no promises are made, users should not assume that
they can use it for anything. Changing it now would create a major portability
issue for user codebases, which in and of itself is a very strong reason to not
make this change.

Removing `isnan` is only an optimization, it does not intend to change
semantics. So it cannot create portability issues. Quite the contrary, it helps
portability by making behavior consistent between compilers and libc
implementations. The only possible issue is performance loss, this is discussed
above, it is an unlikely case. Anyway, if such loss exists and it is absolutely
intolerable for a user, a hack with redefinition of `isnan` restores the
previous code generation.

If the behavior is confusing to users, that's because it's poorly
explained.

What is confusing? That the explicitly written call to a function is not
removed? According to user feedback it is the silent removal of `isnan` that
confuses users.

Honestly, I think the docs are pretty clear, but "It's clear, you just
need to learn to read" is never an acceptable answer so it could certainly
be improved. This is the only thing that needs to be fixed in my opinion.

The documentation says about -ffinite-math-only:

 "Allow optimizations for floating-point arithmetic that assume that
arguments and results are not NaNs or +-Infs."

Is it clear whether `isnan` is arithmetic or not?

If the result of a floating point arithmetic is fed directly to `isnan()`, are
we allowed to eliminate the computation and fold the check to none? (seems like
it according to the sentence you're quoting).
Are we back to `isnan(x+0.0)` can be folded but not `isnan(x)`?

Initially there was such intention, but during the discussion it became clear
that it is not profitable. For `isinf` there is a realistic use case when
removing the call `isinf(a+b)` is an issue. Arthur O'Dwyer demonstrated an
example for `isnan`. It is harder to provide guarantees about the result, than
about the arguments.

--
Mehdi

Thanks,
   Chris Tetreault

-----Original Message-----
From: Michael Kruse <cfe-dev at meinersbur.de<mailto:cfe-dev at
meinersbur.de>>
Sent: Thursday, September 16, 2021 12:29 PM
To: Chris Tetreault <ctetreau at quicinc.com<mailto:ctetreau at
quicinc.com>>
Cc: Serge Pavlov <sepavloff at gmail.com<mailto:sepavloff at
gmail.com>>; llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>; cfe-dev at lists.llvm.org<mailto:cfe-dev at
lists.llvm.org>
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math
mode?

WARNING: This email originated from outside of Qualcomm. Please be wary of any
links or attachments, and do not enable macros.

Am Mo., 13. Sept. 2021 um 11:46 Uhr schrieb Chris Tetreault via cfe-dev
<cfe-dev at lists.llvm.org<mailto:cfe-dev at
lists.llvm.org>>:> As a user, if I read that:
>
>
>
> ```
>
> if (isnan(x)) {
>
> ```
>
>
>
> … is guaranteed to work, and I read that fast-math enables the compiler to
reason about constructs like `x + 0` being equal to `x`, then I’m going to be
very confused when:
>
>
>
> ```
>
> if (isnan(x + 0)) {
>
> ```
>
>
>
> … does not also work. I’m going to open a bug and complain, and the slide
down the slippery slope will continue. You and I understand the difference, and
the technical reason why `isnan(x)` is supported but `isnan(x + 0)` isn’t, but
Joe Coder just trying to figure out why he’s got NaN in his matrices despite his
careful NaN handling code. Joe is not a compiler expert, and on the face of it,
it seems like a silly limitation. This will never end until fast-math is gutted.
C/C++ already has cases like this. Pointer arithmetic on null pointers is
undefined behaviour, even if adding[1,2]/subtracting[3] zero. I don't think
it is too far fetched to expect from users to know that an operation is
undefined behaviour even if one of the operands is zero.

Michael

[1]
https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-addition.c
[2]
https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/ubsan/TestCases/Pointer/nullptr-and-nonzero-offset-constants.cpp
[3]
https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-subtraction.c

Michael
_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210917/a6bf801e/attachment-0001.html>

Serge Pavlov via llvm-dev

2021-Sep-20 08:22 UTC

head link

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

On Fri, Sep 17, 2021 at 11:17 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
> On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepavloff at
gmail.com> wrote:
>
>> On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>
>>> On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctetreau at
quicinc.com>
>>>> wrote:
>>>>
>>>>> The difference there is that doing pointer arithmetic on
null pointers
>>>>> doesn't *usually* work, unless you turn on
-ffast-pointers.
>>>>>
>>>>> It seems to me that  most confusion related to -ffast-math
is likely
>>>>> caused by people who are transitioning to using it. I have
some codebase,
>>>>> and I turn on fast math, and then a few months down the
road I notice a
>>>>> strangeness that I did not catch during the initial
transition period. If
>>>>> you're writing new code with fast-math, you don't
do things like try to use
>>>>> NaN as a sentinel value in a TU with fast math turned on.
This is the sort
>>>>> of thing you catch when you try to transition an existing
codebase. Forgive
>>>>> me for the uncharitable interpretation, but it's much
easier to ask the
>>>>> compiler to change to accommodate your use case than it is
to refactor your
>>>>> code.
>>>>>
>>>>
>>>> It is a common way to explain problems with -ffinite-math-only
by user
>>>> ignorance. However user misunderstandings and complaints may
indicate a
>>>> flaw in compiler implementation, which I believe we have in
this case.
>>>>
>>>> Using NaN as sentinels is a natural way when you cannot spend
extra
>>>> memory for keeping flags for each item, spend extra cycles to
read that
>>>> flag and do not want to pollute cache. It does not depend on
reading
>>>> documentation or writing the code from scratch. It is simply
the best
>>>> solution for storing data. If performance of the data
processing is
>>>> critical, -ffast-math is a good solution. This is a fairly
legitimate use
>>>> case. The fact that the compiler does not allow it is a
compiler drawback.
>>>>
>>>>
>>>>> To me, I think Mehdi had the best solution: The algorithm
that is the
>>>>> bottleneck, and experiences the huge speedup using
fast-math should be
>>>>> separated into its own source file. This source file, and
only this source
>>>>> file should be compiled with fast-math. The outer driver
loop should not be
>>>>> compiled with fast math. This solution is clean, (probably)
easy, and
>>>>> doesn't require a change in the compiler.
>>>>
>>>>
>>>> It is a workaround, it works in some cases but does not in
others. ML
>>>> kernel often is a single translation unit, there may be no such
thing as
>>>> linker for that processor. At the same time it is computation
intensive and
>>>> using fast-math in it may be very profitable.
>>>>
>>>
>>> Switching mode in a single TU seems valuable, but could this be
handled
>>> with pragmas or function attributes instead?
>>>
>>
>> GCC allows it by using `#pragma GCC optimize()`, but clang does not
>> support it. No suitable function attribute exists for that.
>>
>
> Right, I know that clang does not support it, but it could :)
> So since we're looking at what provides the best user-experience:
isn't
> that it? Shouldn't we look into providing this level of granularity?
> (whether function-level or finer grain)
>
It could mitigate the problem if it were implemented. A user who needs to
handle NaNs in -ffinite-math-only compilation and writes the code from
scratch could use this facility to get things working. I also think such
pragma, implemented with enough degree of flexibility, could be useful
irrespective of this topic.

However, in general it does not solve the problem. The most important issue
which remains unaddressed is inconsistency of the implementation.

The handling of `isnan` in -ffinite-math-only by clang is not consistent
because:
- It differs from what other compilers do. Namely MSVC and Intel compiler
do not throw away `isnan` in this mode: https://godbolt.org/z/qTaz47qhP.
- It depends on optimization options. With -O2 the check is removed but
with -O0 remains: https://godbolt.org/z/cjYePv7s7. Other options also can
affect the behavior, for example with `-ffp-model=strict` the check is
generated irrespective of the optimization mode (see the same link).
- It is inconsistent with libc implementations. If `isnan` is provided by
libc, it is a real check, but the compiler may drop it.
It would not be an issue if `isnan` removal were just an optimization. It
however changes semantics in the presence of NaNs, so such removal can
break user code.

In the typical use case a user puts a call to `isnan` to ensure no
operations on NaNs occur. The call can also be present in some header that
implements some functionality for the general case. It may work because
`isnan` is provided by libc. Later on when configuration changes or libc is
updated the code may be broken, because implementation of `isnan` changes,
as it happened after https://reviews.llvm.org/D69806.

If clang kept calls to `isnan`, it would be consistent with ICC and MSVC
and with all libc implementations. The behavior would be different from
gcc, but clang would be on the winning side, because the number of programs
that work with clang would be larger.

Also if we agree that NaNs can appear in the code compiled with
-ffinite-math-only, there must be a way to check if a number is a NaN.

Thanks,
--Serge
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210920/c22b38c6/attachment.html>

llvm dev - Sep 2021 - [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?