thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode? [Sep 2021]

If this information is useful, please help other people find it:
Share via:

Chris Tetreault via llvm-dev

2021-Sep-20 16:39 UTC

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

You’re confusing implementation details (you have a Godbolt link that shows that
MSVC just happens to not remove the isnan call) with documented behavior (I
provided a link to the MSVC docs that shows that no promises are made with
respect to NaN). The fact is that no compiler (Maybe ICC does, I don’t know, I
haven’t checked. I bet their docs say something similar to MSVC, clang, and GCC
though.) guarantees that isnan(x) will not be optimized out with fast-math
enabled. There is no inconsistency: all the compilers document that they are
free to optimize as if there were no NaNs, and they then do whatever is best for
their implementation. If you think this is inconsistent, then let me tell you
about that time I dereferenced a null pointer and it didn’t segfault.

Now, many people have suggested in this thread that a pragma be added. I
personally fully support this proposal. I think it’s a very clean solution, and
any non-trivial portable codebase probably already has a library of preprocessor
macros that abstract this sort of thing. Do you have a concrete reason why a
pragma is unsuitable?

From: Serge Pavlov <sepavloff at gmail.com>
Sent: Monday, September 20, 2021 1:23 AM
To: Mehdi AMINI <joker.eph at gmail.com>
Cc: Chris Tetreault <ctetreau at quicinc.com>; llvm-dev at lists.llvm.org;
cfe-dev at lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math
mode?


WARNING: This email originated from outside of Qualcomm. Please be wary of any
links or attachments, and do not enable macros.
On Fri, Sep 17, 2021 at 11:17 PM Mehdi AMINI <joker.eph at
gmail.com<mailto:joker.eph at gmail.com>> wrote:
On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepavloff at
gmail.com<mailto:sepavloff at gmail.com>> wrote:
On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joker.eph at
gmail.com<mailto:joker.eph at gmail.com>> wrote:
On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:
On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctetreau at
quicinc.com<mailto:ctetreau at quicinc.com>> wrote:
The difference there is that doing pointer arithmetic on null pointers
doesn't *usually* work, unless you turn on -ffast-pointers.

It seems to me that  most confusion related to -ffast-math is likely caused by
people who are transitioning to using it. I have some codebase, and I turn on
fast math, and then a few months down the road I notice a strangeness that I did
not catch during the initial transition period. If you're writing new code
with fast-math, you don't do things like try to use NaN as a sentinel value
in a TU with fast math turned on. This is the sort of thing you catch when you
try to transition an existing codebase. Forgive me for the uncharitable
interpretation, but it's much easier to ask the compiler to change to
accommodate your use case than it is to refactor your code.

It is a common way to explain problems with -ffinite-math-only by user
ignorance. However user misunderstandings and complaints may indicate a flaw in
compiler implementation, which I believe we have in this case.

Using NaN as sentinels is a natural way when you cannot spend extra memory for
keeping flags for each item, spend extra cycles to read that flag and do not
want to pollute cache. It does not depend on reading documentation or writing
the code from scratch. It is simply the best solution for storing data. If
performance of the data processing is critical, -ffast-math is a good solution.
This is a fairly legitimate use case. The fact that the compiler does not allow
it is a compiler drawback.


To me, I think Mehdi had the best solution: The algorithm that is the
bottleneck, and experiences the huge speedup using fast-math should be separated
into its own source file. This source file, and only this source file should be
compiled with fast-math. The outer driver loop should not be compiled with fast
math. This solution is clean, (probably) easy, and doesn't require a change
in the compiler.

It is a workaround, it works in some cases but does not in others. ML kernel
often is a single translation unit, there may be no such thing as linker for
that processor. At the same time it is computation intensive and using fast-math
in it may be very profitable.

Switching mode in a single TU seems valuable, but could this be handled with
pragmas or function attributes instead?

GCC allows it by using `#pragma GCC optimize()`, but clang does not support it.
No suitable function attribute exists for that.

Right, I know that clang does not support it, but it could :)
So since we're looking at what provides the best user-experience: isn't
that it? Shouldn't we look into providing this level of granularity?
(whether function-level or finer grain)

It could mitigate the problem if it were implemented. A user who needs to handle
NaNs in -ffinite-math-only compilation and writes the code from scratch could
use this facility to get things working. I also think such pragma, implemented
with enough degree of flexibility, could be useful irrespective of this topic.

However, in general it does not solve the problem. The most important issue
which remains unaddressed is inconsistency of the implementation.

The handling of `isnan` in -ffinite-math-only by clang is not consistent
because:
- It differs from what other compilers do. Namely MSVC and Intel compiler do not
throw away `isnan` in this mode: https://godbolt.org/z/qTaz47qhP.
- It depends on optimization options. With -O2 the check is removed but with -O0
remains: https://godbolt.org/z/cjYePv7s7. Other options also can affect the
behavior, for example with `-ffp-model=strict` the check is generated
irrespective of the optimization mode (see the same link).
- It is inconsistent with libc implementations. If `isnan` is provided by libc,
it is a real check, but the compiler may drop it.
It would not be an issue if `isnan` removal were just an optimization. It
however changes semantics in the presence of NaNs, so such removal can break
user code.

In the typical use case a user puts a call to `isnan` to ensure no operations on
NaNs occur. The call can also be present in some header that implements some
functionality for the general case. It may work because `isnan` is provided by
libc. Later on when configuration changes or libc is updated the code may be
broken, because implementation of `isnan` changes, as it happened after
https://reviews.llvm.org/D69806.

If clang kept calls to `isnan`, it would be consistent with ICC and MSVC and
with all libc implementations. The behavior would be different from gcc, but
clang would be on the winning side, because the number of programs that work
with clang would be larger.

Also if we agree that NaNs can appear in the code compiled with
-ffinite-math-only, there must be a way to check if a number is a NaN.

Thanks,
--Serge
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210920/0b7207d1/attachment-0001.html>

Serge Pavlov via llvm-dev

2021-Sep-20 17:09 UTC

head link

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

MSVC documentation says: “Special values (NaN, +infinity, -infinity, -0.0)
may not be propagated or behave strictly according to the IEEE-754
standard”. Such exclusion is necessary to apply transformations that are
suitable for real numbers only, like `x * 0 -> 0`. NaNs in arithmetic
operations propagate from input to output, in most operations if an operand
is NaN, the result is also NaN. `isnan` has nothing with NaN propagation,
it just makes the check. The documentation does not provide justification
for removal of `isnan`.

all the compilers document that they are free to optimize as if there
were> no NaNs, and they then do whatever is best for their implementation.

Exactly. Leaving `isnan` in the code makes compiler behavior more
consistent and convenient for users. Clang also can go this way.

Do you have a concrete reason why a pragma is unsuitable?


I described the concerns in the reply to Mehdi Amini's message.

Thanks,
--Serge


On Mon, Sep 20, 2021 at 11:39 PM Chris Tetreault <ctetreau at quicinc.com>
wrote:
> You’re confusing implementation details (you have a Godbolt link that
> shows that MSVC just happens to not remove the isnan call) with documented
> behavior (I provided a link to the MSVC docs that shows that no promises
> are made with respect to NaN). The fact is that no compiler (Maybe ICC
> does, I don’t know, I haven’t checked. I bet their docs say something
> similar to MSVC, clang, and GCC though.) guarantees that isnan(x) will not
> be optimized out with fast-math enabled. There is no inconsistency: all the
> compilers document that they are free to optimize as if there were no NaNs,
> and they then do whatever is best for their implementation. If you think
> this is inconsistent, then let me tell you about that time I dereferenced a
> null pointer and it didn’t segfault.
>
>
>
> Now, many people have suggested in this thread that a pragma be added. I
> personally fully support this proposal. I think it’s a very clean solution,
> and any non-trivial portable codebase probably already has a library of
> preprocessor macros that abstract this sort of thing. Do you have a
> concrete reason why a pragma is unsuitable?
>
>
>
> *From:* Serge Pavlov <sepavloff at gmail.com>
> *Sent:* Monday, September 20, 2021 1:23 AM
> *To:* Mehdi AMINI <joker.eph at gmail.com>
> *Cc:* Chris Tetreault <ctetreau at quicinc.com>; llvm-dev at
lists.llvm.org;
> cfe-dev at lists.llvm.org
> *Subject:* Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in
> fast-math mode?
>
>
>
> *WARNING:* This email originated from outside of Qualcomm. Please be wary
> of any links or attachments, and do not enable macros.
>
> On Fri, Sep 17, 2021 at 11:17 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
> On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepavloff at
gmail.com> wrote:
>
> On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
> On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctetreau at
quicinc.com>
> wrote:
>
> The difference there is that doing pointer arithmetic on null pointers
> doesn't *usually* work, unless you turn on -ffast-pointers.
>
> It seems to me that  most confusion related to -ffast-math is likely
> caused by people who are transitioning to using it. I have some codebase,
> and I turn on fast math, and then a few months down the road I notice a
> strangeness that I did not catch during the initial transition period. If
> you're writing new code with fast-math, you don't do things like
try to use
> NaN as a sentinel value in a TU with fast math turned on. This is the sort
> of thing you catch when you try to transition an existing codebase. Forgive
> me for the uncharitable interpretation, but it's much easier to ask the
> compiler to change to accommodate your use case than it is to refactor your
> code.
>
>
>
> It is a common way to explain problems with -ffinite-math-only by user
> ignorance. However user misunderstandings and complaints may indicate a
> flaw in compiler implementation, which I believe we have in this case.
>
>
>
> Using NaN as sentinels is a natural way when you cannot spend extra memory
> for keeping flags for each item, spend extra cycles to read that flag and
> do not want to pollute cache. It does not depend on reading documentation
> or writing the code from scratch. It is simply the best solution for
> storing data. If performance of the data processing is critical,
> -ffast-math is a good solution. This is a fairly legitimate use case. The
> fact that the compiler does not allow it is a compiler drawback.
>
>
>
>
> To me, I think Mehdi had the best solution: The algorithm that is the
> bottleneck, and experiences the huge speedup using fast-math should be
> separated into its own source file. This source file, and only this source
> file should be compiled with fast-math. The outer driver loop should not be
> compiled with fast math. This solution is clean, (probably) easy, and
> doesn't require a change in the compiler.
>
>
>
> It is a workaround, it works in some cases but does not in others. ML
> kernel often is a single translation unit, there may be no such thing as
> linker for that processor. At the same time it is computation intensive and
> using fast-math in it may be very profitable.
>
>
>
> Switching mode in a single TU seems valuable, but could this be handled
> with pragmas or function attributes instead?
>
>
>
> GCC allows it by using `#pragma GCC optimize()`, but clang does not
> support it. No suitable function attribute exists for that.
>
>
>
> Right, I know that clang does not support it, but it could :)
>
> So since we're looking at what provides the best user-experience:
isn't
> that it? Shouldn't we look into providing this level of granularity?
> (whether function-level or finer grain)
>
>
>
> It could mitigate the problem if it were implemented. A user who needs to
> handle NaNs in -ffinite-math-only compilation and writes the code from
> scratch could use this facility to get things working. I also think such
> pragma, implemented with enough degree of flexibility, could be useful
> irrespective of this topic.
>
>
>
> However, in general it does not solve the problem. The most important
> issue which remains unaddressed is inconsistency of the implementation.
>
>
>
> The handling of `isnan` in -ffinite-math-only by clang is not consistent
> because:
>
> - It differs from what other compilers do. Namely MSVC and Intel compiler
> do not throw away `isnan` in this mode: https://godbolt.org/z/qTaz47qhP.
>
> - It depends on optimization options. With -O2 the check is removed but
> with -O0 remains: https://godbolt.org/z/cjYePv7s7. Other options also can
> affect the behavior, for example with `-ffp-model=strict` the check is
> generated irrespective of the optimization mode (see the same link).
>
> - It is inconsistent with libc implementations. If `isnan` is provided by
> libc, it is a real check, but the compiler may drop it.
>
> It would not be an issue if `isnan` removal were just an optimization. It
> however changes semantics in the presence of NaNs, so such removal can
> break user code.
>
>
>
> In the typical use case a user puts a call to `isnan` to ensure no
> operations on NaNs occur. The call can also be present in some header that
> implements some functionality for the general case. It may work because
> `isnan` is provided by libc. Later on when configuration changes or libc is
> updated the code may be broken, because implementation of `isnan` changes,
> as it happened after https://reviews.llvm.org/D69806.
>
>
>
> If clang kept calls to `isnan`, it would be consistent with ICC and MSVC
> and with all libc implementations. The behavior would be different from
> gcc, but clang would be on the winning side, because the number of programs
> that work with clang would be larger.
>
>
>
> Also if we agree that NaNs can appear in the code compiled with
> -ffinite-math-only, there must be a way to check if a number is a NaN.
>
>
>
> Thanks,
>
> --Serge
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210921/8a17c919/attachment.html>

Arthur O'Dwyer via llvm-dev

2021-Sep-20 17:13 UTC

head link

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

On Mon, Sep 20, 2021 at 12:40 PM Chris Tetreault via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> You’re confusing implementation details (you have a Godbolt link that
> shows that MSVC just happens to not remove the isnan call) with documented
> behavior (I provided a link to the MSVC docs that shows that no promises
> are made with respect to NaN). The fact is that no compiler (Maybe ICC
> does, I don’t know, I haven’t checked. I bet their docs say something
> similar to MSVC, clang, and GCC though.) guarantees that isnan(x) will not
> be optimized out with fast-math enabled. There is no inconsistency: all the
> compilers document that they are free to optimize as if there were no NaNs,
> and they then do whatever is best for their implementation. If you think
> this is inconsistent, then let me tell you about that time I dereferenced a
> null pointer and it didn’t segfault.
>
+1.

> Now, many people have suggested in this thread that a pragma be added. I
> personally fully support this proposal. I think it’s a very clean solution,
> and any non-trivial portable codebase probably already has a library of
> preprocessor macros that abstract this sort of thing. Do you have a
> concrete reason why a pragma is unsuitable?
>
I think that there are two questions in this thread.
- How should fast-math mode actually behave? [Maybe we're settled on the
"NANs are SNANs and signaling operations produce unspecified values"
model.
Gee I hope so.]
- Should switching into/out-of fast-math mode be controlled only by
a TU-level command line option, or should there also be a pragma for it?
(Btw, multiply these questions by the number of different modes we support;
I've consciously been trying to phrase everything in terms of NANs, but
Serge likes to talk about -ffinite-math-only, where not just NANs but also
INF and -INF are verboten. And then there's the -fno-signed-zeros option
<https://gcc.gnu.org/wiki/FloatingPointMath>, which *does not forbid*
-0.0,
but does permit it to be treated as a-zero-value-of-unspecified-sign. I
think -ffast-math probably also forbids subnormals... but maybe it just
treats them as either-their-actual-value-or-zero-of-the-appropriate-sign.)

Anyway, should there be a pragma in addition to the TU-level command line
option?:

There must be a command-line option, anyway — I mean, it already exists
(-ffast-math, etc). Pragmas are basically *about* taking some command-line
decision and allowing the decision to be made more granularly. Look at
`#pragma GCC diagnostic ignored "-Wfoo"`, for example; it's
expressed in
terms of the command-line option. So if Clang were to support something like
    #pragma GCC optimize("ffast-math")  // cf. #pragma GCC
optimize("O2")
that would still be expressed in terms of the command-line option, and
hopefully both the option and the pragma would end up setting the same
internal bits.

However, pragmas are hard to get right. Consider:

    double unoptimized(double x) { return (x + 1) > x; }
    #pragma GCC optimize("ffast-math")
    bool optimized(double x) { return unoptimized(x+1); }
    #pragma GCC optimize("fno-fast-math")
    int main() {
        return optimized(HUGE_VAL);
    }

The compiler would have to think about what it means to inline
`unoptimized` into `optimized`.  The arithmetic in `optimized` produces
INF, but then it's passed to `unoptimized`, which is not marked as
fast-math, so I guess the compiler can't optimize `(x+1) > x` into `true`
in that context?  It's *at least* confusing and subtle for the compiler
vendor to get right; and possibly philosophically confusing as well.
Alternatively, you could forbid inlining between functions with different
optimization levels... but that's *clearly* a terrible idea, right?

And of course some programmer is going to try something dumb like

    #pragma GCC optimize("ffast-math")
    #define REAL_ISNAN(x) std::isnan(x)
    #pragma GCC optimize("fno-fast-math")

which "of course" won't work, but who's going to explain it to
them?

Not to mention, if the pragma is active at the top of the TU where some
template or implicitly defaulted special member is defined, but then it's
not active at the point where the template is instantiated or the special
member is implicitly defined... what the heck happens in *that* case? and
who's going to write the StackOverflow answer about it?

Basically, the translation unit is the *natural* unit of... hmm...
translation. There's very little return-on-investment involved in trying to
circumvent that.

–Arthur
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210920/33393785/attachment.html>

llvm dev - Sep 2021 - [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?