Mehdi AMINI via llvm-dev
2021-Sep-17 16:16 UTC
[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?
On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepavloff at gmail.com> wrote:> On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joker.eph at gmail.com> wrote: > >> >> >> On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev < >> cfe-dev at lists.llvm.org> wrote: >> >>> On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctetreau at quicinc.com> >>> wrote: >>> >>>> The difference there is that doing pointer arithmetic on null pointers >>>> doesn't *usually* work, unless you turn on -ffast-pointers. >>>> >>>> It seems to me that most confusion related to -ffast-math is likely >>>> caused by people who are transitioning to using it. I have some codebase, >>>> and I turn on fast math, and then a few months down the road I notice a >>>> strangeness that I did not catch during the initial transition period. If >>>> you're writing new code with fast-math, you don't do things like try to use >>>> NaN as a sentinel value in a TU with fast math turned on. This is the sort >>>> of thing you catch when you try to transition an existing codebase. Forgive >>>> me for the uncharitable interpretation, but it's much easier to ask the >>>> compiler to change to accommodate your use case than it is to refactor your >>>> code. >>>> >>> >>> It is a common way to explain problems with -ffinite-math-only by user >>> ignorance. However user misunderstandings and complaints may indicate a >>> flaw in compiler implementation, which I believe we have in this case. >>> >>> Using NaN as sentinels is a natural way when you cannot spend extra >>> memory for keeping flags for each item, spend extra cycles to read that >>> flag and do not want to pollute cache. It does not depend on reading >>> documentation or writing the code from scratch. It is simply the best >>> solution for storing data. If performance of the data processing is >>> critical, -ffast-math is a good solution. This is a fairly legitimate use >>> case. The fact that the compiler does not allow it is a compiler drawback. >>> >>> >>>> To me, I think Mehdi had the best solution: The algorithm that is the >>>> bottleneck, and experiences the huge speedup using fast-math should be >>>> separated into its own source file. This source file, and only this source >>>> file should be compiled with fast-math. The outer driver loop should not be >>>> compiled with fast math. This solution is clean, (probably) easy, and >>>> doesn't require a change in the compiler. >>> >>> >>> It is a workaround, it works in some cases but does not in others. ML >>> kernel often is a single translation unit, there may be no such thing as >>> linker for that processor. At the same time it is computation intensive and >>> using fast-math in it may be very profitable. >>> >> >> Switching mode in a single TU seems valuable, but could this be handled >> with pragmas or function attributes instead? >> > > GCC allows it by using `#pragma GCC optimize()`, but clang does not > support it. No suitable function attribute exists for that. >Right, I know that clang does not support it, but it could :) So since we're looking at what provides the best user-experience: isn't that it? Shouldn't we look into providing this level of granularity? (whether function-level or finer grain)> >> >> >>> >>> >>>> Changing the compiler is hard, affects everybody who uses the compiler, >>>> and creates inconsistency in behavior between clang and gcc (and msvc with >>>> /fp:fast), and clang and old versions of clang. >>> >>> >>> ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is >>> implemented in libc, it is also a real check. Actually removing `isnan` >>> creates inconsistency. >>> >>> >>>> The behavior of fast-math with respect to NaN is consistent across the >>>> mainstream c/c++ compilers: no promises are made, users should not assume >>>> that they can use it for anything. Changing it now would create a major >>>> portability issue for user codebases, which in and of itself is a very >>>> strong reason to not make this change. >>>> >>> >>> Removing `isnan` is only an optimization, it does not intend to change >>> semantics. So it cannot create portability issues. Quite the contrary, it >>> helps portability by making behavior consistent between compilers and libc >>> implementations. The only possible issue is performance loss, this is >>> discussed above, it is an unlikely case. Anyway, if such loss exists and it >>> is absolutely intolerable for a user, a hack with redefinition of `isnan` >>> restores the previous code generation. >>> >>> >>>> If the behavior is confusing to users, that's because it's poorly >>>> explained. >>> >>> >>> What is confusing? That the explicitly written call to a function is not >>> removed? According to user feedback it is the silent removal of `isnan` >>> that confuses users. >>> >>> Honestly, I think the docs are pretty clear, but "It's clear, you just >>>> need to learn to read" is never an acceptable answer so it could certainly >>>> be improved. This is the only thing that needs to be fixed in my opinion. >>>> >>> >>> The documentation says about -ffinite-math-only: >>> >>> "Allow optimizations for floating-point arithmetic that assume that >>> arguments and results are not NaNs or +-Infs." >>> >>> Is it clear whether `isnan` is arithmetic or not? >>> >> >> If the result of a floating point arithmetic is fed directly to >> `isnan()`, are we allowed to eliminate the computation and fold the check >> to none? (seems like it according to the sentence you're quoting). >> Are we back to `isnan(x+0.0)` can be folded but not `isnan(x)`? >> > > Initially there was such intention, but during the discussion it became > clear that it is not profitable. For `isinf` there is a realistic use case > when removing the call `isinf(a+b)` is an issue. Arthur O'Dwyer > demonstrated an example for `isnan`. It is harder to provide guarantees > about the result, than about the arguments. >> >> -- >> Mehdi >> >> >> >> >>> >>> >>>> Thanks, >>>> Chris Tetreault >>>> >>>> -----Original Message----- >>>> From: Michael Kruse <cfe-dev at meinersbur.de> >>>> Sent: Thursday, September 16, 2021 12:29 PM >>>> To: Chris Tetreault <ctetreau at quicinc.com> >>>> Cc: Serge Pavlov <sepavloff at gmail.com>; llvm-dev at lists.llvm.org; >>>> cfe-dev at lists.llvm.org >>>> Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in >>>> fast-math mode? >>>> >>>> WARNING: This email originated from outside of Qualcomm. Please be wary >>>> of any links or attachments, and do not enable macros. >>>> >>>> Am Mo., 13. Sept. 2021 um 11:46 Uhr schrieb Chris Tetreault via cfe-dev >>>> <cfe-dev at lists.llvm.org>: >>>> > As a user, if I read that: >>>> > >>>> > >>>> > >>>> > ``` >>>> > >>>> > if (isnan(x)) { >>>> > >>>> > ``` >>>> > >>>> > >>>> > >>>> > … is guaranteed to work, and I read that fast-math enables the >>>> compiler to reason about constructs like `x + 0` being equal to `x`, then >>>> I’m going to be very confused when: >>>> > >>>> > >>>> > >>>> > ``` >>>> > >>>> > if (isnan(x + 0)) { >>>> > >>>> > ``` >>>> > >>>> > >>>> > >>>> > … does not also work. I’m going to open a bug and complain, and the >>>> slide down the slippery slope will continue. You and I understand the >>>> difference, and the technical reason why `isnan(x)` is supported but >>>> `isnan(x + 0)` isn’t, but Joe Coder just trying to figure out why he’s got >>>> NaN in his matrices despite his careful NaN handling code. Joe is not a >>>> compiler expert, and on the face of it, it seems like a silly limitation. >>>> This will never end until fast-math is gutted. >>>> >>>> C/C++ already has cases like this. Pointer arithmetic on null pointers >>>> is undefined behaviour, even if adding[1,2]/subtracting[3] zero. I don't >>>> think it is too far fetched to expect from users to know that an operation >>>> is undefined behaviour even if one of the operands is zero. >>>> >>>> Michael >>>> >>>> [1] >>>> https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-addition.c >>>> [2] >>>> https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/ubsan/TestCases/Pointer/nullptr-and-nonzero-offset-constants.cpp >>>> [3] >>>> https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-subtraction.c >>>> >>>> >>>> Michael >>>> >>> _______________________________________________ >>> cfe-dev mailing list >>> cfe-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210917/afd6adcd/attachment.html>
Chris Tetreault via llvm-dev
2021-Sep-17 17:34 UTC
[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?
For the record, I think having a pragma that allows you to control fast-math behavior is fine. This sort of thing is far from unprecedented, and is far more palatable to me than just having random special cases. It’s also far more useful as I can do something like: #pragma clang fast-math push #pragma clang fast-math-on // bunch of hairy floating point math here #pragma clang fast-math-pop … and easily isolate fast math to the smallest area it can be. This is morally equivalent to putting the scary math in it’s own TU, but less annoying. The pragma should probably have a push/pop, but otherwise I don’t really care what color the bike shed is. Thanks, Chris Tetreault From: Mehdi AMINI <joker.eph at gmail.com> Sent: Friday, September 17, 2021 9:17 AM To: Serge Pavlov <sepavloff at gmail.com> Cc: Chris Tetreault <ctetreau at quicinc.com>; llvm-dev at lists.llvm.org; cfe-dev at lists.llvm.org Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode? WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros. On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepavloff at gmail.com<mailto:sepavloff at gmail.com>> wrote: On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joker.eph at gmail.com<mailto:joker.eph at gmail.com>> wrote: On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote: On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctetreau at quicinc.com<mailto:ctetreau at quicinc.com>> wrote: The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers. It seems to me that most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code. It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case. Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback. To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler. It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable. Switching mode in a single TU seems valuable, but could this be handled with pragmas or function attributes instead? GCC allows it by using `#pragma GCC optimize()`, but clang does not support it. No suitable function attribute exists for that. Right, I know that clang does not support it, but it could :) So since we're looking at what provides the best user-experience: isn't that it? Shouldn't we look into providing this level of granularity? (whether function-level or finer grain) Changing the compiler is hard, affects everybody who uses the compiler, and creates inconsistency in behavior between clang and gcc (and msvc with /fp:fast), and clang and old versions of clang. ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is implemented in libc, it is also a real check. Actually removing `isnan` creates inconsistency. The behavior of fast-math with respect to NaN is consistent across the mainstream c/c++ compilers: no promises are made, users should not assume that they can use it for anything. Changing it now would create a major portability issue for user codebases, which in and of itself is a very strong reason to not make this change. Removing `isnan` is only an optimization, it does not intend to change semantics. So it cannot create portability issues. Quite the contrary, it helps portability by making behavior consistent between compilers and libc implementations. The only possible issue is performance loss, this is discussed above, it is an unlikely case. Anyway, if such loss exists and it is absolutely intolerable for a user, a hack with redefinition of `isnan` restores the previous code generation. If the behavior is confusing to users, that's because it's poorly explained. What is confusing? That the explicitly written call to a function is not removed? According to user feedback it is the silent removal of `isnan` that confuses users. Honestly, I think the docs are pretty clear, but "It's clear, you just need to learn to read" is never an acceptable answer so it could certainly be improved. This is the only thing that needs to be fixed in my opinion. The documentation says about -ffinite-math-only: "Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs." Is it clear whether `isnan` is arithmetic or not? If the result of a floating point arithmetic is fed directly to `isnan()`, are we allowed to eliminate the computation and fold the check to none? (seems like it according to the sentence you're quoting). Are we back to `isnan(x+0.0)` can be folded but not `isnan(x)`? Initially there was such intention, but during the discussion it became clear that it is not profitable. For `isinf` there is a realistic use case when removing the call `isinf(a+b)` is an issue. Arthur O'Dwyer demonstrated an example for `isnan`. It is harder to provide guarantees about the result, than about the arguments. -- Mehdi Thanks, Chris Tetreault -----Original Message----- From: Michael Kruse <cfe-dev at meinersbur.de<mailto:cfe-dev at meinersbur.de>> Sent: Thursday, September 16, 2021 12:29 PM To: Chris Tetreault <ctetreau at quicinc.com<mailto:ctetreau at quicinc.com>> Cc: Serge Pavlov <sepavloff at gmail.com<mailto:sepavloff at gmail.com>>; llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org> Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode? WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros. Am Mo., 13. Sept. 2021 um 11:46 Uhr schrieb Chris Tetreault via cfe-dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>>:> As a user, if I read that: > > > > ``` > > if (isnan(x)) { > > ``` > > > > … is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when: > > > > ``` > > if (isnan(x + 0)) { > > ``` > > > > … does not also work. I’m going to open a bug and complain, and the slide down the slippery slope will continue. You and I understand the difference, and the technical reason why `isnan(x)` is supported but `isnan(x + 0)` isn’t, but Joe Coder just trying to figure out why he’s got NaN in his matrices despite his careful NaN handling code. Joe is not a compiler expert, and on the face of it, it seems like a silly limitation. This will never end until fast-math is gutted.C/C++ already has cases like this. Pointer arithmetic on null pointers is undefined behaviour, even if adding[1,2]/subtracting[3] zero. I don't think it is too far fetched to expect from users to know that an operation is undefined behaviour even if one of the operands is zero. Michael [1] https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-addition.c [2] https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/ubsan/TestCases/Pointer/nullptr-and-nonzero-offset-constants.cpp [3] https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-subtraction.c Michael _______________________________________________ cfe-dev mailing list cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210917/a6bf801e/attachment-0001.html>
Serge Pavlov via llvm-dev
2021-Sep-20 08:22 UTC
[llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?
On Fri, Sep 17, 2021 at 11:17 PM Mehdi AMINI <joker.eph at gmail.com> wrote:> On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepavloff at gmail.com> wrote: > >> On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joker.eph at gmail.com> wrote: >> >>> On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev < >>> cfe-dev at lists.llvm.org> wrote: >>> >>>> On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctetreau at quicinc.com> >>>> wrote: >>>> >>>>> The difference there is that doing pointer arithmetic on null pointers >>>>> doesn't *usually* work, unless you turn on -ffast-pointers. >>>>> >>>>> It seems to me that most confusion related to -ffast-math is likely >>>>> caused by people who are transitioning to using it. I have some codebase, >>>>> and I turn on fast math, and then a few months down the road I notice a >>>>> strangeness that I did not catch during the initial transition period. If >>>>> you're writing new code with fast-math, you don't do things like try to use >>>>> NaN as a sentinel value in a TU with fast math turned on. This is the sort >>>>> of thing you catch when you try to transition an existing codebase. Forgive >>>>> me for the uncharitable interpretation, but it's much easier to ask the >>>>> compiler to change to accommodate your use case than it is to refactor your >>>>> code. >>>>> >>>> >>>> It is a common way to explain problems with -ffinite-math-only by user >>>> ignorance. However user misunderstandings and complaints may indicate a >>>> flaw in compiler implementation, which I believe we have in this case. >>>> >>>> Using NaN as sentinels is a natural way when you cannot spend extra >>>> memory for keeping flags for each item, spend extra cycles to read that >>>> flag and do not want to pollute cache. It does not depend on reading >>>> documentation or writing the code from scratch. It is simply the best >>>> solution for storing data. If performance of the data processing is >>>> critical, -ffast-math is a good solution. This is a fairly legitimate use >>>> case. The fact that the compiler does not allow it is a compiler drawback. >>>> >>>> >>>>> To me, I think Mehdi had the best solution: The algorithm that is the >>>>> bottleneck, and experiences the huge speedup using fast-math should be >>>>> separated into its own source file. This source file, and only this source >>>>> file should be compiled with fast-math. The outer driver loop should not be >>>>> compiled with fast math. This solution is clean, (probably) easy, and >>>>> doesn't require a change in the compiler. >>>> >>>> >>>> It is a workaround, it works in some cases but does not in others. ML >>>> kernel often is a single translation unit, there may be no such thing as >>>> linker for that processor. At the same time it is computation intensive and >>>> using fast-math in it may be very profitable. >>>> >>> >>> Switching mode in a single TU seems valuable, but could this be handled >>> with pragmas or function attributes instead? >>> >> >> GCC allows it by using `#pragma GCC optimize()`, but clang does not >> support it. No suitable function attribute exists for that. >> > > Right, I know that clang does not support it, but it could :) > So since we're looking at what provides the best user-experience: isn't > that it? Shouldn't we look into providing this level of granularity? > (whether function-level or finer grain) >It could mitigate the problem if it were implemented. A user who needs to handle NaNs in -ffinite-math-only compilation and writes the code from scratch could use this facility to get things working. I also think such pragma, implemented with enough degree of flexibility, could be useful irrespective of this topic. However, in general it does not solve the problem. The most important issue which remains unaddressed is inconsistency of the implementation. The handling of `isnan` in -ffinite-math-only by clang is not consistent because: - It differs from what other compilers do. Namely MSVC and Intel compiler do not throw away `isnan` in this mode: https://godbolt.org/z/qTaz47qhP. - It depends on optimization options. With -O2 the check is removed but with -O0 remains: https://godbolt.org/z/cjYePv7s7. Other options also can affect the behavior, for example with `-ffp-model=strict` the check is generated irrespective of the optimization mode (see the same link). - It is inconsistent with libc implementations. If `isnan` is provided by libc, it is a real check, but the compiler may drop it. It would not be an issue if `isnan` removal were just an optimization. It however changes semantics in the presence of NaNs, so such removal can break user code. In the typical use case a user puts a call to `isnan` to ensure no operations on NaNs occur. The call can also be present in some header that implements some functionality for the general case. It may work because `isnan` is provided by libc. Later on when configuration changes or libc is updated the code may be broken, because implementation of `isnan` changes, as it happened after https://reviews.llvm.org/D69806. If clang kept calls to `isnan`, it would be consistent with ICC and MSVC and with all libc implementations. The behavior would be different from gcc, but clang would be on the winning side, because the number of programs that work with clang would be larger. Also if we agree that NaNs can appear in the code compiled with -ffinite-math-only, there must be a way to check if a number is a NaN. Thanks, --Serge -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210920/c22b38c6/attachment.html>