thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode? [Sep 2021]

If this information is useful, please help other people find it:
Share via:

Serge Pavlov via llvm-dev

2021-Sep-17 03:23 UTC

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctetreau at quicinc.com>
wrote:
> The difference there is that doing pointer arithmetic on null pointers
> doesn't *usually* work, unless you turn on -ffast-pointers.
>
> It seems to me that  most confusion related to -ffast-math is likely
> caused by people who are transitioning to using it. I have some codebase,
> and I turn on fast math, and then a few months down the road I notice a
> strangeness that I did not catch during the initial transition period. If
> you're writing new code with fast-math, you don't do things like
try to use
> NaN as a sentinel value in a TU with fast math turned on. This is the sort
> of thing you catch when you try to transition an existing codebase. Forgive
> me for the uncharitable interpretation, but it's much easier to ask the
> compiler to change to accommodate your use case than it is to refactor your
> code.
>
It is a common way to explain problems with -ffinite-math-only by user
ignorance. However user misunderstandings and complaints may indicate a
flaw in compiler implementation, which I believe we have in this case.

Using NaN as sentinels is a natural way when you cannot spend extra memory
for keeping flags for each item, spend extra cycles to read that flag and
do not want to pollute cache. It does not depend on reading documentation
or writing the code from scratch. It is simply the best solution for
storing data. If performance of the data processing is critical,
-ffast-math is a good solution. This is a fairly legitimate use case. The
fact that the compiler does not allow it is a compiler drawback.

> To me, I think Mehdi had the best solution: The algorithm that is the
> bottleneck, and experiences the huge speedup using fast-math should be
> separated into its own source file. This source file, and only this source
> file should be compiled with fast-math. The outer driver loop should not be
> compiled with fast math. This solution is clean, (probably) easy, and
> doesn't require a change in the compiler.

It is a workaround, it works in some cases but does not in others. ML
kernel often is a single translation unit, there may be no such thing as
linker for that processor. At the same time it is computation intensive and
using fast-math in it may be very profitable.

> Changing the compiler is hard, affects everybody who uses the compiler,
> and creates inconsistency in behavior between clang and gcc (and msvc with
> /fp:fast), and clang and old versions of clang.

ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is
implemented in libc, it is also a real check. Actually removing `isnan`
creates inconsistency.

> The behavior of fast-math with respect to NaN is consistent across the
> mainstream c/c++ compilers: no promises are made, users should not assume
> that they can use it for anything. Changing it now would create a major
> portability issue for user codebases, which in and of itself is a very
> strong reason to not make this change.
>
Removing `isnan` is only an optimization, it does not intend to change
semantics. So it cannot create portability issues. Quite the contrary, it
helps portability by making behavior consistent between compilers and libc
implementations. The only possible issue is performance loss, this is
discussed above, it is an unlikely case. Anyway, if such loss exists and it
is absolutely intolerable for a user, a hack with redefinition of `isnan`
restores the previous code generation.

> If the behavior is confusing to users, that's because it's poorly
> explained.

What is confusing? That the explicitly written call to a function is not
removed? According to user feedback it is the silent removal of `isnan`
that confuses users.

Honestly, I think the docs are pretty clear, but "It's clear, you just
need> to learn to read" is never an acceptable answer so it could certainly
be
> improved. This is the only thing that needs to be fixed in my opinion.
>
The documentation says about -ffinite-math-only:

 "Allow optimizations for floating-point arithmetic that assume that
arguments and results are not NaNs or +-Infs."

Is it clear whether `isnan` is arithmetic or not?

> Thanks,
>    Chris Tetreault
>
> -----Original Message-----
> From: Michael Kruse <cfe-dev at meinersbur.de>
> Sent: Thursday, September 16, 2021 12:29 PM
> To: Chris Tetreault <ctetreau at quicinc.com>
> Cc: Serge Pavlov <sepavloff at gmail.com>; llvm-dev at
lists.llvm.org;
> cfe-dev at lists.llvm.org
> Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in
> fast-math mode?
>
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
>
> Am Mo., 13. Sept. 2021 um 11:46 Uhr schrieb Chris Tetreault via cfe-dev
<
> cfe-dev at lists.llvm.org>:
> > As a user, if I read that:
> >
> >
> >
> > ```
> >
> > if (isnan(x)) {
> >
> > ```
> >
> >
> >
> > … is guaranteed to work, and I read that fast-math enables the
compiler
> to reason about constructs like `x + 0` being equal to `x`, then I’m going
> to be very confused when:
> >
> >
> >
> > ```
> >
> > if (isnan(x + 0)) {
> >
> > ```
> >
> >
> >
> > … does not also work. I’m going to open a bug and complain, and the
> slide down the slippery slope will continue. You and I understand the
> difference, and the technical reason why `isnan(x)` is supported but
> `isnan(x + 0)` isn’t, but Joe Coder just trying to figure out why he’s got
> NaN in his matrices despite his careful NaN handling code. Joe is not a
> compiler expert, and on the face of it, it seems like a silly limitation.
> This will never end until fast-math is gutted.
>
> C/C++ already has cases like this. Pointer arithmetic on null pointers is
> undefined behaviour, even if adding[1,2]/subtracting[3] zero. I don't
think
> it is too far fetched to expect from users to know that an operation is
> undefined behaviour even if one of the operands is zero.
>
> Michael
>
> [1]
>
https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-addition.c
> [2]
>
https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/ubsan/TestCases/Pointer/nullptr-and-nonzero-offset-constants.cpp
> [3]
>
https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-subtraction.c
>
>
> Michael
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210917/3b581019/attachment.html>

Mehdi AMINI via llvm-dev

2021-Sep-17 03:53 UTC

head link

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctetreau at
quicinc.com>
> wrote:
>
>> The difference there is that doing pointer arithmetic on null pointers
>> doesn't *usually* work, unless you turn on -ffast-pointers.
>>
>> It seems to me that  most confusion related to -ffast-math is likely
>> caused by people who are transitioning to using it. I have some
codebase,
>> and I turn on fast math, and then a few months down the road I notice a
>> strangeness that I did not catch during the initial transition period.
If
>> you're writing new code with fast-math, you don't do things
like try to use
>> NaN as a sentinel value in a TU with fast math turned on. This is the
sort
>> of thing you catch when you try to transition an existing codebase.
Forgive
>> me for the uncharitable interpretation, but it's much easier to ask
the
>> compiler to change to accommodate your use case than it is to refactor
your
>> code.
>>
>
> It is a common way to explain problems with -ffinite-math-only by user
> ignorance. However user misunderstandings and complaints may indicate a
> flaw in compiler implementation, which I believe we have in this case.
>
> Using NaN as sentinels is a natural way when you cannot spend extra memory
> for keeping flags for each item, spend extra cycles to read that flag and
> do not want to pollute cache. It does not depend on reading documentation
> or writing the code from scratch. It is simply the best solution for
> storing data. If performance of the data processing is critical,
> -ffast-math is a good solution. This is a fairly legitimate use case. The
> fact that the compiler does not allow it is a compiler drawback.
>
>
>> To me, I think Mehdi had the best solution: The algorithm that is the
>> bottleneck, and experiences the huge speedup using fast-math should be
>> separated into its own source file. This source file, and only this
source
>> file should be compiled with fast-math. The outer driver loop should
not be
>> compiled with fast math. This solution is clean, (probably) easy, and
>> doesn't require a change in the compiler.
>
>
> It is a workaround, it works in some cases but does not in others. ML
> kernel often is a single translation unit, there may be no such thing as
> linker for that processor. At the same time it is computation intensive and
> using fast-math in it may be very profitable.
>
Switching mode in a single TU seems valuable, but could this be handled
with pragmas or function attributes instead?

>
>
>> Changing the compiler is hard, affects everybody who uses the compiler,
>> and creates inconsistency in behavior between clang and gcc (and msvc
with
>> /fp:fast), and clang and old versions of clang.
>
>
> ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is
> implemented in libc, it is also a real check. Actually removing `isnan`
> creates inconsistency.
>
>
>> The behavior of fast-math with respect to NaN is consistent across the
>> mainstream c/c++ compilers: no promises are made, users should not
assume
>> that they can use it for anything. Changing it now would create a major
>> portability issue for user codebases, which in and of itself is a very
>> strong reason to not make this change.
>>
>
> Removing `isnan` is only an optimization, it does not intend to change
> semantics. So it cannot create portability issues. Quite the contrary, it
> helps portability by making behavior consistent between compilers and libc
> implementations. The only possible issue is performance loss, this is
> discussed above, it is an unlikely case. Anyway, if such loss exists and it
> is absolutely intolerable for a user, a hack with redefinition of `isnan`
> restores the previous code generation.
>
>
>> If the behavior is confusing to users, that's because it's
poorly
>> explained.
>
>
> What is confusing? That the explicitly written call to a function is not
> removed? According to user feedback it is the silent removal of `isnan`
> that confuses users.
>
> Honestly, I think the docs are pretty clear, but "It's clear, you
just
>> need to learn to read" is never an acceptable answer so it could
certainly
>> be improved. This is the only thing that needs to be fixed in my
opinion.
>>
>
> The documentation says about -ffinite-math-only:
>
>  "Allow optimizations for floating-point arithmetic that assume that
> arguments and results are not NaNs or +-Infs."
>
> Is it clear whether `isnan` is arithmetic or not?
>
If the result of a floating point arithmetic is fed directly to `isnan()`,
are we allowed to eliminate the computation and fold the check to none?
(seems like it according to the sentence you're quoting).
Are we back to `isnan(x+0.0)` can be folded but not `isnan(x)`?

-- 
Mehdi



>
>
>> Thanks,
>>    Chris Tetreault
>>
>> -----Original Message-----
>> From: Michael Kruse <cfe-dev at meinersbur.de>
>> Sent: Thursday, September 16, 2021 12:29 PM
>> To: Chris Tetreault <ctetreau at quicinc.com>
>> Cc: Serge Pavlov <sepavloff at gmail.com>; llvm-dev at
lists.llvm.org;
>> cfe-dev at lists.llvm.org
>> Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in
>> fast-math mode?
>>
>> WARNING: This email originated from outside of Qualcomm. Please be wary
>> of any links or attachments, and do not enable macros.
>>
>> Am Mo., 13. Sept. 2021 um 11:46 Uhr schrieb Chris Tetreault via cfe-dev
<
>> cfe-dev at lists.llvm.org>:
>> > As a user, if I read that:
>> >
>> >
>> >
>> > ```
>> >
>> > if (isnan(x)) {
>> >
>> > ```
>> >
>> >
>> >
>> > … is guaranteed to work, and I read that fast-math enables the
compiler
>> to reason about constructs like `x + 0` being equal to `x`, then I’m
going
>> to be very confused when:
>> >
>> >
>> >
>> > ```
>> >
>> > if (isnan(x + 0)) {
>> >
>> > ```
>> >
>> >
>> >
>> > … does not also work. I’m going to open a bug and complain, and
the
>> slide down the slippery slope will continue. You and I understand the
>> difference, and the technical reason why `isnan(x)` is supported but
>> `isnan(x + 0)` isn’t, but Joe Coder just trying to figure out why he’s
got
>> NaN in his matrices despite his careful NaN handling code. Joe is not a
>> compiler expert, and on the face of it, it seems like a silly
limitation.
>> This will never end until fast-math is gutted.
>>
>> C/C++ already has cases like this. Pointer arithmetic on null pointers
is
>> undefined behaviour, even if adding[1,2]/subtracting[3] zero. I
don't think
>> it is too far fetched to expect from users to know that an operation is
>> undefined behaviour even if one of the operands is zero.
>>
>> Michael
>>
>> [1]
>>
https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-addition.c
>> [2]
>>
https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/ubsan/TestCases/Pointer/nullptr-and-nonzero-offset-constants.cpp
>> [3]
>>
https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-subtraction.c
>>
>>
>> Michael
>>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210916/d77c0bc4/attachment.html>

Chris Tetreault via llvm-dev

2021-Sep-17 17:05 UTC

head link

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

> It is a common way to explain problems with -ffinite-math-only by user
ignorance. However user misunderstandings and complaints may indicate a flaw in
compiler implementation, which I believe we have in this case.
The *intent* of -ffast-math is to do absolutely everything possible to get the
fastest floating point math possible. In other words, to pretend that floating
point values are actually real numbers. In real math, there is no negative 0, or
NaN, and infinity is not something you do algebra with. If you try to divide by
0, you just get points off your midterm; there’s no “behavior” that needs to be
defined.
> Using NaN as sentinels is a natural way when you cannot spend extra memory
for keeping flags for each item, spend extra cycles to read that flag and do not
want to pollute cache. It does not depend on reading documentation or writing
the code from scratch. It is simply the best solution for storing data. If
performance of the data processing is critical, -ffast-math is a good solution.
This is a fairly legitimate use case. The fact that the compiler does not allow
it is a compiler drawback.
The use case is legitimate, I agree, and the compiler does allow it. In
fancyAlgo.cpp, have your input validation code that checks for NaN. in
fancyAlgoFast.cpp, have your ML kernel, compiled with fast-math. If you really
*must* have them all be in the same TU, then compile with -fhonor-nans.
> It is a workaround, it works in some cases but does not in others. ML
kernel often is a single translation unit, there may be no such thing as linker
for that processor. At the same time it is computation intensive and using
fast-math in it may be very profitable.
It's a workaround for a well defined behavior that you opted into. If having
multiple TU’s is not an option, then you can do some of the other solutions
presented or compile with -fhonor-nans.
> ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is
implemented in libc, it is also a real check. Actually removing `isnan` creates
inconsistency.
I don’t know about ICC, but this is a quote from the docs for MSVC: “Special
values (NaN, +infinity, -infinity, -0.0) may not be propagated or behave
strictly according to the IEEE-754 standard”. My interpretation of this is the
same as that for clang: no guarantees are provided. Maybe it works today, but
that doesn’t mean it will work tomorrow.
> Removing `isnan` is only an optimization, it does not intend to change
semantics. So it cannot create portability issues. Quite the contrary, it helps
portability by making behavior consistent between compilers and libc
implementations. The only possible issue is performance loss, this is discussed
above, it is an unlikely case. Anyway, if such loss exists and it is absolutely
intolerable for a user, a hack with redefinition of `isnan` restores the
previous code generation.
It creates a portability issue if we change the compiler to guarantee that
isnan(x) works, and then users rely on it working. The resulting code is non
portable to other compilers, and to old versions of clang.
> The documentation says about -ffinite-math-only:
 > "Allow optimizations for floating-point arithmetic that assume that
arguments and results are not NaNs or +-Infs."
> Is it clear whether `isnan` is arithmetic or not?
The docs for -ffinite-math-only on https://clang.llvm.org/docs/UsersManual.html
actually say “Allow floating-point optimizations that assume arguments and
results are not NaNs or +-Inf”. Notice that it says nothing about arithmetic.
I’m not sure where you got your quote, but it’s not from the docs for the
currently available release of upstream clang.

Thanks,
   Chris Tetreault

From: Serge Pavlov <sepavloff at gmail.com>
Sent: Thursday, September 16, 2021 8:23 PM
To: Chris Tetreault <ctetreau at quicinc.com>
Cc: Michael Kruse <cfe-dev at meinersbur.de>; llvm-dev at lists.llvm.org;
cfe-dev at lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math
mode?

WARNING: This email originated from outside of Qualcomm. Please be wary of any
links or attachments, and do not enable macros.
On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctetreau at
quicinc.com<mailto:ctetreau at quicinc.com>> wrote:
The difference there is that doing pointer arithmetic on null pointers
doesn't *usually* work, unless you turn on -ffast-pointers.

It seems to me that  most confusion related to -ffast-math is likely caused by
people who are transitioning to using it. I have some codebase, and I turn on
fast math, and then a few months down the road I notice a strangeness that I did
not catch during the initial transition period. If you're writing new code
with fast-math, you don't do things like try to use NaN as a sentinel value
in a TU with fast math turned on. This is the sort of thing you catch when you
try to transition an existing codebase. Forgive me for the uncharitable
interpretation, but it's much easier to ask the compiler to change to
accommodate your use case than it is to refactor your code.

It is a common way to explain problems with -ffinite-math-only by user
ignorance. However user misunderstandings and complaints may indicate a flaw in
compiler implementation, which I believe we have in this case.

Using NaN as sentinels is a natural way when you cannot spend extra memory for
keeping flags for each item, spend extra cycles to read that flag and do not
want to pollute cache. It does not depend on reading documentation or writing
the code from scratch. It is simply the best solution for storing data. If
performance of the data processing is critical, -ffast-math is a good solution.
This is a fairly legitimate use case. The fact that the compiler does not allow
it is a compiler drawback.

To me, I think Mehdi had the best solution: The algorithm that is the
bottleneck, and experiences the huge speedup using fast-math should be separated
into its own source file. This source file, and only this source file should be
compiled with fast-math. The outer driver loop should not be compiled with fast
math. This solution is clean, (probably) easy, and doesn't require a change
in the compiler.

It is a workaround, it works in some cases but does not in others. ML kernel
often is a single translation unit, there may be no such thing as linker for
that processor. At the same time it is computation intensive and using fast-math
in it may be very profitable.

Changing the compiler is hard, affects everybody who uses the compiler, and
creates inconsistency in behavior between clang and gcc (and msvc with
/fp:fast), and clang and old versions of clang.

ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is implemented
in libc, it is also a real check. Actually removing `isnan` creates
inconsistency.

The behavior of fast-math with respect to NaN is consistent across the
mainstream c/c++ compilers: no promises are made, users should not assume that
they can use it for anything. Changing it now would create a major portability
issue for user codebases, which in and of itself is a very strong reason to not
make this change.

Removing `isnan` is only an optimization, it does not intend to change
semantics. So it cannot create portability issues. Quite the contrary, it helps
portability by making behavior consistent between compilers and libc
implementations. The only possible issue is performance loss, this is discussed
above, it is an unlikely case. Anyway, if such loss exists and it is absolutely
intolerable for a user, a hack with redefinition of `isnan` restores the
previous code generation.

If the behavior is confusing to users, that's because it's poorly
explained.

What is confusing? That the explicitly written call to a function is not
removed? According to user feedback it is the silent removal of `isnan` that
confuses users.

Honestly, I think the docs are pretty clear, but "It's clear, you just
need to learn to read" is never an acceptable answer so it could certainly
be improved. This is the only thing that needs to be fixed in my opinion.

The documentation says about -ffinite-math-only:

 "Allow optimizations for floating-point arithmetic that assume that
arguments and results are not NaNs or +-Infs."

Is it clear whether `isnan` is arithmetic or not?

Thanks,
   Chris Tetreault

-----Original Message-----
From: Michael Kruse <cfe-dev at meinersbur.de<mailto:cfe-dev at
meinersbur.de>>
Sent: Thursday, September 16, 2021 12:29 PM
To: Chris Tetreault <ctetreau at quicinc.com<mailto:ctetreau at
quicinc.com>>
Cc: Serge Pavlov <sepavloff at gmail.com<mailto:sepavloff at
gmail.com>>; llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>; cfe-dev at lists.llvm.org<mailto:cfe-dev at
lists.llvm.org>
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math
mode?

WARNING: This email originated from outside of Qualcomm. Please be wary of any
links or attachments, and do not enable macros.

Am Mo., 13. Sept. 2021 um 11:46 Uhr schrieb Chris Tetreault via cfe-dev
<cfe-dev at lists.llvm.org<mailto:cfe-dev at
lists.llvm.org>>:> As a user, if I read that:
>
>
>
> ```
>
> if (isnan(x)) {
>
> ```
>
>
>
> … is guaranteed to work, and I read that fast-math enables the compiler to
reason about constructs like `x + 0` being equal to `x`, then I’m going to be
very confused when:
>
>
>
> ```
>
> if (isnan(x + 0)) {
>
> ```
>
>
>
> … does not also work. I’m going to open a bug and complain, and the slide
down the slippery slope will continue. You and I understand the difference, and
the technical reason why `isnan(x)` is supported but `isnan(x + 0)` isn’t, but
Joe Coder just trying to figure out why he’s got NaN in his matrices despite his
careful NaN handling code. Joe is not a compiler expert, and on the face of it,
it seems like a silly limitation. This will never end until fast-math is gutted.
C/C++ already has cases like this. Pointer arithmetic on null pointers is
undefined behaviour, even if adding[1,2]/subtracting[3] zero. I don't think
it is too far fetched to expect from users to know that an operation is
undefined behaviour even if one of the operands is zero.

Michael

[1]
https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-addition.c
[2]
https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/ubsan/TestCases/Pointer/nullptr-and-nonzero-offset-constants.cpp
[3]
https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-subtraction.c

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210917/c57ea015/attachment.html>

llvm dev - Sep 2021 - [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?