thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] Do we need intrinsics for floating-point classification functions? [Sep 2021]

If this information is useful, please help other people find it:
Share via:

Roman Lebedev via llvm-dev

2021-Sep-21 20:19 UTC

[llvm-dev] [cfe-dev] Do we need intrinsics for floating-point classification functions?

Given just how contentious all this has been so far,
i would strongly suggest for that to go through review.

Roman

On Tue, Sep 21, 2021 at 8:39 PM Serge Pavlov via cfe-dev
<cfe-dev at lists.llvm.org> wrote:>
> Hi all,
>
> If nobody argues, in a couple of days I will put back the `llvm.isnan`
implementation excluding the changes for fast-mode.
>
> Thanks,
> --Serge
>
>
> On Fri, Sep 3, 2021 at 1:46 PM Serge Pavlov <sepavloff at gmail.com>
wrote:
>>
>> Thank you for summarizing.
>>
>> I would prefer Option 2, a separate tool for separate tasks. It seems
to me that a specialized function is easier to implement than a universal one.
As implementation of llvm.isnan demonstrated, there may be various issues in
implementing such functions for different targets. It is easier to provide
optimized versions of small functions which are used more frequently than
`fpclassify`. Besides, `fpclassify` may be used itself and a user may choose
arbitrary constants, which complicates code generation. It seems that it is more
acceptable to tolerate some inefficiency in `fpclassify` than in basic
classification intrinsics. And yes, we should consider `signbit` with them.
>>
>> Another consideration in favor of dedicated intrinsic rather than
ordinary function. Compiler may do some optimizations when it knows the
semantics of the function. In https://reviews.llvm.org/D104854 this intrinsic
was optimized out if its argument was  provided by an operation with
'nnan' flag. We can also think about an optimization that determines
basic blocks guarded by `llvm.isnan` and assign flag 'nnan' in them. It
could make code faster in many practical cases, and make use of notorious
-ffast-math less attractive.
>>
>> Thanks,
>> --Serge
>>
>>
>> On Fri, Sep 3, 2021 at 12:22 PM James Y Knight <jyknight at
google.com> wrote:
>>>
>>> As you say, we don't strictly need a new intrinsic -- we can
emit the code to do the correct integer-based bit-checking in the frontend -- as
was done before. Yet, that is not ideal. The rationale to have an intrinsic
instead of the frontend generating integer-bit-manipulation seem good, IMO.
>>>
>>> So the question that arises: which new intrinsic(s) do we want to
add? I list 3 options below. I think Option 1 should be discarded from
consideration, and I'm not sure which of 2 or 3 is best. I'm leaning
towards 3 -- it seems like it may be simpler -- although I'm not certain.
>>>
>>> Option 1: we could theoretically address the problem with an
llvm.experimental.constrained.superquiet_fcmp, which would be the same as fcmp,
except it would not raise an fp exception even for an sNAN. This would be a
straightforward substitution for fcmp in Clang's existing codegen, and it
should work.
>>>
>>> However, I don't think this would be a good idea, because
it's a poor match for hardware. The "superquiet_fcmp" operation
doesn't map to CPU functionality in any ISA i'm aware of. Generally, the
implementation would have to be to filter out sNANs prior to using the hardware
fp-compare instruction. And how you you detect an sNAN without raising an fp
exceptoin? Fall down to integer ops. But by doing that, you've done work
almost equivalent to what you needed for the actual classification function that
you were trying to implement in the first place. Not real useful.
>>>
>>> Option 2: We could add a whole family of classification intrinsics.
I think that would be:
>>>
>>> llvm.isnan
>>> llvm.issignaling
>>> llvm.isinf
>>> llvm.isfinite
>>> llvm.isnormal
>>> llvm.issubnormal
>>> llvm.iszero
>>> (Note, some of these are missing corresponding __builtin_is* in
Clang at the moment -- we have no __builtin_issignaling, __builtin_issubnormal,
or __builtin_iszero. Probably ought to.)
>>>
>>> We don't necessarily need an intrinsic for fpclassify if we
have the above intrinsics, since it can be built with llvm.isnan, llvm.isinf,
llvm.iszero, and llvm.isnormal.
>>>
>>> Option 3: Add only an fpclassify intrinsic.
>>>
>>> That is, something like:
>>>    i32 llvm.fpclassify.i32.f32(i32 %if_snan, i32 %if_qnan, i32
%if_infinite, i32 %if_normal, i32 %if_subnormal, i32 %if_zero, float %value)
>>> which classifies the given value, returning the value of the
argument corresponding to its categorization. We can say the %if_* args are
required to be constant integers, if we like, for simplicity of implementation.
>>>
>>> Thus, Clang would codegen __builtin_isnan/etc like:
>>>   %isnan = call i1 llvm.fpclassify.i1.f32(i1 1, i1 1, i1 0, i1 0,
i1 0, i1 0, float %value)
>>> And for fpclassify, we might generate something like:
>>>   %ret = call i32 llvm.fpclassify.i32.f32(i32 0, i32 0, i32 1, i32
4, i32 3, i32 2, float %value)
>>>
>>> On most architectures, we'd expand this intrinsic into
appropriate integer operations (skipping the parts of classification which are
irrelevant for the given constant arguments), since there's no corresponding
hardware instructions available for float classification.  Or, for non-strictfp
functions, we could continue to expand into an fcmp-based set of
tests...although looking at the asm we currently generate, the integer versions
may well be faster, other than isnan.
>>>
>>> On SystemZ/s390x, this intrinsic would translate almost directly
into the "test data class" instruction -- so long as the %if_*
arguments are all 0/1. That's kinda nice. ("Test data class" takes
a fp value and a bitmask, and returns true if a bit is set in the position
corresponding to the classification of the fp value.)
>>>
>>>
>>> Separately, we have the signbit operation. I think that's the
only other operation that needs to be addressed related to this RFC. Currently,
Clang always generates integer bit-ops for __builtin_signbit in the frontend.
This is arguably OK as is. Yet, completing the set of IR fp classification
intrinsics seems like it'd be a good idea. So, we could also (along with any
of the above options) add:
>>>   i1 llvm.signbit.f32(float %value)
>>>
>>>
>>> On Thu, Sep 2, 2021 at 8:33 AM Serge Pavlov via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Some time ago a new intrinsic `llvm.isnan` was introduced,
which was intended to represent IEEE-754 operation `isNaN` as well as a family
of C library functions `isnan*`. Then a concern was raised (see 
https://reviews.llvm.org/D104854) that this functionality should be removed.
Discussion in the subsequent RFC
(https://lists.llvm.org/pipermail/llvm-dev/2021-August/152257.html) came to
consensus that such intrinsic is necessary. Nevertheless the patches related to
the new intrinsic were reverted. I have to restart the discussion in hope to
convince the community that this intrinsic and other classification functions
are necessary.
>>>>
>>>> There are two main reasons why this intrinsic is necessary:
>>>> 1. It allows correct implementation of `isnan` if strict
floating point semantics is in effect,
>>>> 2. It allows preserving the check in -ffast-math compilation.
>>>>
>>>> To facilitate the discussion let's concentrate on the first
problem.
>>>>
>>>> Previously the frontend intrinsic `__builtin_isnan` was
converted into `cmp uno` during IR generation in clang codegen. This solution is
not suitable if FP exceptions are not ignored, because compare instructions
raise exceptions if its argument is signaling NaN. Both IEEE-754 (5.7.2) an C
standard  (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf, F.3p6)
demand that this function does not raise floating point exceptions.  There was
no target-independent IR construct that could represent `isnan`.
>>>>
>>>> This drawback was significant enough and some attempts to
alleviate it were undertaken. In https://reviews.llvm.org/D95948 `isnan` was
implemented using integer operations in strictfp functions. It however is not
suitable for targets where a more efficient way exists, like dedicated
instruction. Another solution was implemented in
https://reviews.llvm.org/D96568, where a hook
`clang::TargetCodeGenInfo::testFPKind` was introduced, which injects target
specific code into IR. Such a solution makes IR more target-dependent and
prevents some IR-level optimizations.
>>>>
>>>> To have a solution suitable for all cases, a new intrinsic
function `llvm.isnan` was introduced (https://reviews.llvm.org/D104854). It
protects the check from undesirable optimizations and preserves it till
selector, where it can be lowered in optimal for a particular target way.
>>>>
>>>> Other classification functions also need their own intrinsics.
In strictfp mode even a check for zero (`iszero`) cannot be made by comparing a
value against zero, - if the value is signaling NaN, FP exceptions would be
raised. James Y Knight in the previous discussion
(https://lists.llvm.org/pipermail/llvm-dev/2021-August/152282.html) listed such
"non-computational" functions, which should not signal if provided
with an sNAN argument.
>>>>
>>>> It looks like new intrinsic is the only consistent and in
target-agnostic way to implement these checks in all environments including the
case when FP exceptions are not ignored.
>>>>
>>>> Any feedback is welcome.
>>>
>>>
>>>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Serge Pavlov via llvm-dev

2021-Sep-22 04:17 UTC

head link

[llvm-dev] [cfe-dev] Do we need intrinsics for floating-point classification functions?

The review was done in https://reviews.llvm.org/D104854. The design was
discussed there, in
https://lists.llvm.org/pipermail/llvm-dev/2021-August/152257.html and in
this thread. No alternatives were proposed and no disagreement was
expressed.

If somebody has concerns or objections, they can express them here or in
https://reviews.llvm.org/D104854 if they are about the implementation.

Thanks,
--Serge


On Wed, Sep 22, 2021 at 3:19 AM Roman Lebedev <lebedev.ri at gmail.com>
wrote:
> Given just how contentious all this has been so far,
> i would strongly suggest for that to go through review.
>
> Roman
>
> On Tue, Sep 21, 2021 at 8:39 PM Serge Pavlov via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
> >
> > Hi all,
> >
> > If nobody argues, in a couple of days I will put back the `llvm.isnan`
> implementation excluding the changes for fast-mode.
> >
> > Thanks,
> > --Serge
> >
> >
> > On Fri, Sep 3, 2021 at 1:46 PM Serge Pavlov <sepavloff at
gmail.com> wrote:
> >>
> >> Thank you for summarizing.
> >>
> >> I would prefer Option 2, a separate tool for separate tasks. It
seems
> to me that a specialized function is easier to implement than a universal
> one. As implementation of llvm.isnan demonstrated, there may be various
> issues in implementing such functions for different targets. It is easier
> to provide optimized versions of small functions which are used more
> frequently than `fpclassify`. Besides, `fpclassify` may be used itself and
> a user may choose arbitrary constants, which complicates code generation.
> It seems that it is more acceptable to tolerate some inefficiency in
> `fpclassify` than in basic classification intrinsics. And yes, we should
> consider `signbit` with them.
> >>
> >> Another consideration in favor of dedicated intrinsic rather than
> ordinary function. Compiler may do some optimizations when it knows the
> semantics of the function. In https://reviews.llvm.org/D104854 this
> intrinsic was optimized out if its argument was  provided by an operation
> with 'nnan' flag. We can also think about an optimization that
determines
> basic blocks guarded by `llvm.isnan` and assign flag 'nnan' in
them. It
> could make code faster in many practical cases, and make use of notorious
> -ffast-math less attractive.
> >>
> >> Thanks,
> >> --Serge
> >>
> >>
> >> On Fri, Sep 3, 2021 at 12:22 PM James Y Knight <jyknight at
google.com>
> wrote:
> >>>
> >>> As you say, we don't strictly need a new intrinsic -- we
can emit the
> code to do the correct integer-based bit-checking in the frontend -- as was
> done before. Yet, that is not ideal. The rationale to have an intrinsic
> instead of the frontend generating integer-bit-manipulation seem good, IMO.
> >>>
> >>> So the question that arises: which new intrinsic(s) do we want
to add?
> I list 3 options below. I think Option 1 should be discarded from
> consideration, and I'm not sure which of 2 or 3 is best. I'm
leaning
> towards 3 -- it seems like it may be simpler -- although I'm not
certain.
> >>>
> >>> Option 1: we could theoretically address the problem with an
> llvm.experimental.constrained.superquiet_fcmp, which would be the same as
> fcmp, except it would not raise an fp exception even for an sNAN. This
> would be a straightforward substitution for fcmp in Clang's existing
> codegen, and it should work.
> >>>
> >>> However, I don't think this would be a good idea, because
it's a poor
> match for hardware. The "superquiet_fcmp" operation doesn't
map to CPU
> functionality in any ISA i'm aware of. Generally, the implementation
would
> have to be to filter out sNANs prior to using the hardware fp-compare
> instruction. And how you you detect an sNAN without raising an fp
> exceptoin? Fall down to integer ops. But by doing that, you've done
work
> almost equivalent to what you needed for the actual classification function
> that you were trying to implement in the first place. Not real useful.
> >>>
> >>> Option 2: We could add a whole family of classification
intrinsics. I
> think that would be:
> >>>
> >>> llvm.isnan
> >>> llvm.issignaling
> >>> llvm.isinf
> >>> llvm.isfinite
> >>> llvm.isnormal
> >>> llvm.issubnormal
> >>> llvm.iszero
> >>> (Note, some of these are missing corresponding __builtin_is*
in Clang
> at the moment -- we have no __builtin_issignaling, __builtin_issubnormal,
> or __builtin_iszero. Probably ought to.)
> >>>
> >>> We don't necessarily need an intrinsic for fpclassify if
we have the
> above intrinsics, since it can be built with llvm.isnan, llvm.isinf,
> llvm.iszero, and llvm.isnormal.
> >>>
> >>> Option 3: Add only an fpclassify intrinsic.
> >>>
> >>> That is, something like:
> >>>    i32 llvm.fpclassify.i32.f32(i32 %if_snan, i32 %if_qnan, i32
> %if_infinite, i32 %if_normal, i32 %if_subnormal, i32 %if_zero, float
%value)
> >>> which classifies the given value, returning the value of the
argument
> corresponding to its categorization. We can say the %if_* args are required
> to be constant integers, if we like, for simplicity of implementation.
> >>>
> >>> Thus, Clang would codegen __builtin_isnan/etc like:
> >>>   %isnan = call i1 llvm.fpclassify.i1.f32(i1 1, i1 1, i1 0, i1
0, i1
> 0, i1 0, float %value)
> >>> And for fpclassify, we might generate something like:
> >>>   %ret = call i32 llvm.fpclassify.i32.f32(i32 0, i32 0, i32 1,
i32 4,
> i32 3, i32 2, float %value)
> >>>
> >>> On most architectures, we'd expand this intrinsic into
appropriate
> integer operations (skipping the parts of classification which are
> irrelevant for the given constant arguments), since there's no
> corresponding hardware instructions available for float classification.
> Or, for non-strictfp functions, we could continue to expand into an
> fcmp-based set of tests...although looking at the asm we currently
> generate, the integer versions may well be faster, other than isnan.
> >>>
> >>> On SystemZ/s390x, this intrinsic would translate almost
directly into
> the "test data class" instruction -- so long as the %if_*
arguments are all
> 0/1. That's kinda nice. ("Test data class" takes a fp value
and a bitmask,
> and returns true if a bit is set in the position corresponding to the
> classification of the fp value.)
> >>>
> >>>
> >>> Separately, we have the signbit operation. I think that's
the only
> other operation that needs to be addressed related to this RFC. Currently,
> Clang always generates integer bit-ops for __builtin_signbit in the
> frontend. This is arguably OK as is. Yet, completing the set of IR fp
> classification intrinsics seems like it'd be a good idea. So, we could
also
> (along with any of the above options) add:
> >>>   i1 llvm.signbit.f32(float %value)
> >>>
> >>>
> >>> On Thu, Sep 2, 2021 at 8:33 AM Serge Pavlov via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> Some time ago a new intrinsic `llvm.isnan` was introduced,
which was
> intended to represent IEEE-754 operation `isNaN` as well as a family of C
> library functions `isnan*`. Then a concern was raised (see
> https://reviews.llvm.org/D104854) that this functionality should be
> removed. Discussion in the subsequent RFC (
> https://lists.llvm.org/pipermail/llvm-dev/2021-August/152257.html) came
> to consensus that such intrinsic is necessary. Nevertheless the patches
> related to the new intrinsic were reverted. I have to restart the
> discussion in hope to convince the community that this intrinsic and other
> classification functions are necessary.
> >>>>
> >>>> There are two main reasons why this intrinsic is
necessary:
> >>>> 1. It allows correct implementation of `isnan` if strict
floating
> point semantics is in effect,
> >>>> 2. It allows preserving the check in -ffast-math
compilation.
> >>>>
> >>>> To facilitate the discussion let's concentrate on the
first problem.
> >>>>
> >>>> Previously the frontend intrinsic `__builtin_isnan` was
converted
> into `cmp uno` during IR generation in clang codegen. This solution is not
> suitable if FP exceptions are not ignored, because compare instructions
> raise exceptions if its argument is signaling NaN. Both IEEE-754 (5.7.2) an
> C standard  (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf,
> F.3p6) demand that this function does not raise floating point exceptions.
> There was no target-independent IR construct that could represent `isnan`.
> >>>>
> >>>> This drawback was significant enough and some attempts to
alleviate
> it were undertaken. In https://reviews.llvm.org/D95948 `isnan` was
> implemented using integer operations in strictfp functions. It however is
> not suitable for targets where a more efficient way exists, like dedicated
> instruction. Another solution was implemented in
> https://reviews.llvm.org/D96568, where a hook
> `clang::TargetCodeGenInfo::testFPKind` was introduced, which injects target
> specific code into IR. Such a solution makes IR more target-dependent and
> prevents some IR-level optimizations.
> >>>>
> >>>> To have a solution suitable for all cases, a new intrinsic
function
> `llvm.isnan` was introduced (https://reviews.llvm.org/D104854). It
> protects the check from undesirable optimizations and preserves it till
> selector, where it can be lowered in optimal for a particular target way.
> >>>>
> >>>> Other classification functions also need their own
intrinsics. In
> strictfp mode even a check for zero (`iszero`) cannot be made by comparing
> a value against zero, - if the value is signaling NaN, FP exceptions would
> be raised. James Y Knight in the previous discussion (
> https://lists.llvm.org/pipermail/llvm-dev/2021-August/152282.html) listed
> such "non-computational" functions, which should not signal if
provided
> with an sNAN argument.
> >>>>
> >>>> It looks like new intrinsic is the only consistent and in
> target-agnostic way to implement these checks in all environments including
> the case when FP exceptions are not ignored.
> >>>>
> >>>> Any feedback is welcome.
> >>>
> >>>
> >>>
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210922/0b10d147/attachment-0001.html>

llvm dev - Sep 2021 - [cfe-dev] Do we need intrinsics for floating-point classification functions?

[llvm-dev] [cfe-dev] Do we need intrinsics for floating-point classification functions?

[llvm-dev] [cfe-dev] Do we need intrinsics for floating-point classification functions?