thr3ads.net - llvm dev - [llvm-dev] FENV_ACCESS and floating point LibFunc calls [May 2017]

If this information is useful, please help other people find it:
Share via:

Sanjay Patel via llvm-dev

2017-May-11 23:36 UTC

[llvm-dev] FENV_ACCESS and floating point LibFunc calls

Thanks, Andy. I'm not sure how to solve that or my case given the DAG's
basic-block limit. Probably CodeGenPrepare or SelectionDAGBuilder...or we
wait until after isel and try to split it up in a machine instruction pass.

I filed my example here:
https://bugs.llvm.org/show_bug.cgi?id=33013

Feel free to comment there and/or open a new bug for the FP_TO_UINT case.

On Thu, May 11, 2017 at 4:20 PM, Kaylor, Andrew <andrew.kaylor at
intel.com>
wrote:
> Hi Sanjay,
>
>
>
> The issue that Michael is talking about is that in
SelectionDAGLegalize::ExpandNode()
> we’re doing this:
>
>
>
>   case ISD::FP_TO_UINT: {
>
>     SDValue True, False;
>
>     EVT VT =  Node->getOperand(0).getValueType();
>
>     EVT NVT = Node->getValueType(0);
>
>     APFloat apf(DAG.EVTToAPFloatSemantics(VT),
>
>                 APInt::getNullValue(VT.getSizeInBits()));
>
>     APInt x = APInt::getSignMask(NVT.getSizeInBits());
>
>     (void)apf.convertFromAPInt(x, false, APFloat::rmNearestTiesToEven);
>
>     Tmp1 = DAG.getConstantFP(apf, dl, VT);
>
>     Tmp2 = DAG.getSetCC(dl, getSetCCResultType(VT),
>
>                         Node->getOperand(0),
>
>                         Tmp1, ISD::SETLT);
>
>     True = DAG.getNode(ISD::FP_TO_SINT, dl, NVT, Node->getOperand(0));
>
>     // TODO: Should any fast-math-flags be set for the FSUB?
>
>     False = DAG.getNode(ISD::FP_TO_SINT, dl, NVT,
>
>                         DAG.getNode(ISD::FSUB, dl, VT,
>
>                                     Node->getOperand(0), Tmp1));
>
>     False = DAG.getNode(ISD::XOR, dl, NVT, False,
>
>                         DAG.getConstant(x, dl, NVT));
>
>     Tmp1 = DAG.getSelect(dl, NVT, Tmp2, True, False);
>
>     Results.push_back(Tmp1);
>
>     break;
>
>   }
>
>
>
> This results in code that speculatively executes the FSUB and then uses a
> cmov instruction (based on whether or not the original value was less than
> LLONG_MAX) to select between the direct FP_TO_SINT result and the result of
> the FSUB.  If we assume that for most data sets a significant majority of
> values being converted this way will be less than LLONG_MAX then the
> processor should be able to predict a branch here pretty reliably and so
> branching code would be faster than code that does a speculative FSUB
> followed by a cmov.
>
>
>
> The question is, how do we need to change this to get the back end to
> generate a branch instead of a cmov.  I’m guessing that there are
> circumstances in which the target-specific backend would make that decision
> on its own and sink the FSUB, but for the X86 backend that doesn’t seem to
> be happening in this case.  My inclination was to just rewrite the
> ExpandNode() code above to explicitly create a branch, but I haven’t tried
> it yet.
>
>
>
> As a side effect, which is what originally drew Michael’s attention to
> this conversion, the INEXACT floating point flag is set even for exact
> conversions as a result of the speculative FSUB.  That’s how this got
> related to FENV_ACCESS support.
>
>
>
> I don’t know if there has been a bug opened for this issue.  There was an
> earlier discussion here: https://groups.google.com/
> forum/#!topic/llvm-dev/Tl9SD-edUJI
>
>
>
> -Andy
>
>
>
>
>
> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Thursday, May 11, 2017 2:06 PM
> *To:* Kaylor, Andrew <andrew.kaylor at intel.com>
> *Cc:* Michael Clark <michaeljclark at mac.com>; llvm-dev <
> llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] FENV_ACCESS and floating point LibFunc calls
>
>
>
> Sounds like the select lowering issue is definitely separate from the FENV
> work.
>
> Is there a bug report with a C or IR example? You want to generate compare
> and branch instead of a cmov for something like this?
>
>
> int foo(float x) {
>   if (x < 42.0f)
>     return x;
>   return 12;
> }
>
> define i32 @foo(float %x) {
>   %cmp = fcmp olt float %x, 4.200000e+01
>   %conv = fptosi float %x to i32
>   %ret = select i1 %cmp, i32 %conv, i32 12
>   ret i32 %ret
> }
>
> $ clang -O2 cmovfp.c -S -o -
>     movss    LCPI0_0(%rip), %xmm1    ## xmm1 = mem[0],zero,zero,zero
>     ucomiss    %xmm0, %xmm1
>     cvttss2si    %xmm0, %ecx
>     movl    $12, %eax
>     cmoval    %ecx, %eax
>     retq
>
>
>
> On Thu, May 11, 2017 at 1:28 PM, Kaylor, Andrew via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi Michael,
>
>
>
> To be honest I haven’t started working on FP to integer conversions for
> FENV_ACCESS yet.
>
>
>
> To some extent I would consider the issue that you found independent of
> what I’m doing to constrain FP behavior.  That is, I think we ought to make
> the change you’re asking for even apart from the FENV_ACCESS work.  When I
> get to the conversions for FENV_ACCESS support it may require some
> additional constraints, but I think if the branching conversion is usually
> faster (and it looks like it will be) then that should be the default
> behavior.
>
>
>
> I’ll try to look into that.  I’d offer to give you advice on putting
> together a patch, but I’m still learning my way around the ISel code
> myself.  I think I know enough to figure out what to do but not enough to
> tell someone else how to do it without a bunch of wrong turns.
>
>
>
> -Andy
>
>
>
>
>
> *From:* Michael Clark [mailto:michaeljclark at mac.com]
> *Sent:* Wednesday, May 10, 2017 7:59 PM
> *To:* Kaylor, Andrew <andrew.kaylor at intel.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* FENV_ACCESS and floating point LibFunc calls
>
>
>
> Hi Andy,
>
>
>
> I’m interested to try out your patches…
>
>
>
> I understand the scope of FENV_ACCESS is relatively wide, however I’m
> still curious if you managed to figure out how to prevent the
> SelectionDAGLegalize::ExpandNode() FP_TO_UINT lowering of the FPToUI
> intrinsic from producing the predicate logic that incorrectly sets the
> floating point accrued exceptions due to unconditional execution of the
> ISD::FSUB node attached to the SELECT node. It’s a little above my head to
> try to solve this issue with my current understanding of LLVM but I could
> give it a try. I’d need some guidance as to how the lowering of SELECT can
> be controlled. i.e. where LLVM decides whether and how to lower a select
> node as a branch vs predicate logic.
>
>
>
> I’d almost forgotten that we microbenchmarked this and found the branching
> version is faster with regular input (< LLONG_MAX).
>
>
>
> - https://godbolt.org/g/ytgk7l
>
> All, Where does LLVM decide to lower select as predicate logic vs branches
> and how does the cost model work? I’m curious about a tuning flag to
> generate branches instead of computing both values and using conditional
> moves…
>
>
>
> Best,
>
> Michael.
>
>
>
> On 11 May 2017, at 11:41 AM, via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>
>
>
> Hi all,
>
> Background
> I've been working on adding the necessary support to LLVM for clang to
be
> able to support the STDC FENV_ACCESS pragma, which basically allows users
> to modify rounding mode at runtime and depend on the value of
> floating-point status flags or to unmask floating point exceptions without
> unexpected side effects.  I've committed an initial patch (r293226)
that
> adds constrained intrinsics for the basic FP operations, and I have a patch
> up for review now (https://reviews.llvm.org/D32319) that adds constrained
> versions of a number of libm-like FP intrinsics.
>
> Current problem
> Now I'm trying to make sure I have a good solution for the way in which
> the optimizer handles recognized calls to libm functions (sqrt, pow, cos,
> sin, etc.).  Basically, I need to prevent all passes from making any
> modifications to these calls that would make assumptions about rounding
> mode or improperly affect the FP status flags (either suppressing flags
> that should be present or setting flags that should not be set).  For
> instance, there are circumstances in which the optimizer will constant fold
> a call to one of these functions if the value of the arguments are known at
> compile time, but this constant folding generally assumes the default
> rounding mode and if the library call would have set a status flag, I need
> the flag to be set.
>
> Question
> My question is, can/should I just rely on the front end setting the
> "nobuiltin" attribute for the call site in any location where the
FP
> behavior needs to be restricted?
>
> Ideally, I would like to be able to conditionally enable optimizations
> like constant folding if I am able to prove that the rounding mode, though
> dynamic, is known for the callsite at compile time (the constrained
> intrinsics have a mechanism to enable this), but at the moment I am more
> concerned about correctness and would be willing to sacrifice optimizations
> to get correct behavior.  Long term, I was thinking that maybe I could do
> something like attach metadata to indicate rounding mode and exception
> behavior when they were known.
>
> Is there a better way to do this?
>
> Thanks,
> Andy
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170511/c42a61a1/attachment.html>

Michael Clark via llvm-dev

2017-May-12 01:30 UTC

head link

[llvm-dev] FENV_ACCESS and floating point LibFunc calls

Hi Sanjay,

I can’t comment on the bug as I don’t have an LLVM bugzilla account. I’ve
emailed bugs-admin to gain access to bugzilla so I can comment on the bug.

I note that on your bug that you have stated that the branch is faster than the
conditional move. Faster code is a side effect of the fix in this particular
case. The real issue is as Andrew mentioned, it that the predicate logic; the
ISD:FSUB in particular is causing FE_INEXACT accrued exception state for exact
conversions.

Here is some code that shows the error and has a workaround by forcing the
compiler to create a branch by using inline asm for the conversion.

	https://godbolt.org/g/huIgK1

The bug affects both float to u64 and double to u64 conversions.

Expected output:

$ clang  fcvt-bug.cc 
$ ./a.out 
1 exact
1 inexact
1 exact
1 inexact
1 exact
1 inexact
1 exact
1 inexact

Erroneous output:

$ clang -DDISABLE_WORKAROUND fcvt-bug.cc 
$ ./a.out 
1 exact
1 inexact
1 inexact
1 inexact
1 exact
1 inexact
1 inexact
1 inexact

Michael.
> On 12 May 2017, at 11:36 AM, Sanjay Patel <spatel at rotateright.com>
wrote:
> 
> Thanks, Andy. I'm not sure how to solve that or my case given the
DAG's basic-block limit. Probably CodeGenPrepare or SelectionDAGBuilder...or
we wait until after isel and try to split it up in a machine instruction pass.
> 
> I filed my example here:
> https://bugs.llvm.org/show_bug.cgi?id=33013
<https://bugs.llvm.org/show_bug.cgi?id=33013>
> 
> Feel free to comment there and/or open a new bug for the FP_TO_UINT case.
> 
> On Thu, May 11, 2017 at 4:20 PM, Kaylor, Andrew <andrew.kaylor at
intel.com <mailto:andrew.kaylor at intel.com>> wrote:
> Hi Sanjay,
> 
>  
> 
> The issue that Michael is talking about is that in
SelectionDAGLegalize::ExpandNode() we’re doing this:
> 
>  
> 
>   case ISD::FP_TO_UINT: {
> 
>     SDValue True, False;
> 
>     EVT VT =  Node->getOperand(0).getValueType();
> 
>     EVT NVT = Node->getValueType(0);
> 
>     APFloat apf(DAG.EVTToAPFloatSemantics(VT),
> 
>                 APInt::getNullValue(VT.getSizeInBits()));
> 
>     APInt x = APInt::getSignMask(NVT.getSizeInBits());
> 
>     (void)apf.convertFromAPInt(x, false, APFloat::rmNearestTiesToEven);
> 
>     Tmp1 = DAG.getConstantFP(apf, dl, VT);
> 
>     Tmp2 = DAG.getSetCC(dl, getSetCCResultType(VT),
> 
>                         Node->getOperand(0),
> 
>                         Tmp1, ISD::SETLT);
> 
>     True = DAG.getNode(ISD::FP_TO_SINT, dl, NVT, Node->getOperand(0));
> 
>     // TODO: Should any fast-math-flags be set for the FSUB?
> 
>     False = DAG.getNode(ISD::FP_TO_SINT, dl, NVT,
> 
>                         DAG.getNode(ISD::FSUB, dl, VT,
> 
>                                     Node->getOperand(0), Tmp1));
> 
>     False = DAG.getNode(ISD::XOR, dl, NVT, False,
> 
>                         DAG.getConstant(x, dl, NVT));
> 
>     Tmp1 = DAG.getSelect(dl, NVT, Tmp2, True, False);
> 
>     Results.push_back(Tmp1);
> 
>     break;
> 
>   }
> 
>  
> 
> This results in code that speculatively executes the FSUB and then uses a
cmov instruction (based on whether or not the original value was less than
LLONG_MAX) to select between the direct FP_TO_SINT result and the result of the
FSUB.  If we assume that for most data sets a significant majority of values
being converted this way will be less than LLONG_MAX then the processor should
be able to predict a branch here pretty reliably and so branching code would be
faster than code that does a speculative FSUB followed by a cmov. <>
>  
> 
> The question is, how do we need to change this to get the back end to
generate a branch instead of a cmov.  I’m guessing that there are circumstances
in which the target-specific backend would make that decision on its own and
sink the FSUB, but for the X86 backend that doesn’t seem to be happening in this
case.  My inclination was to just rewrite the ExpandNode() code above to
explicitly create a branch, but I haven’t tried it yet.
> 
>  
> 
> As a side effect, which is what originally drew Michael’s attention to this
conversion, the INEXACT floating point flag is set even for exact conversions as
a result of the speculative FSUB.  That’s how this got related to FENV_ACCESS
support.
> 
>  
> 
> I don’t know if there has been a bug opened for this issue.  There was an
earlier discussion here:
https://groups.google.com/forum/#!topic/llvm-dev/Tl9SD-edUJI
<https://groups.google.com/forum/#!topic/llvm-dev/Tl9SD-edUJI>
>  
> 
> -Andy
> 
>  
> 
>  
> 
>  <>From: Sanjay Patel [mailto:spatel at rotateright.com
<mailto:spatel at rotateright.com>]
> Sent: Thursday, May 11, 2017 2:06 PM
> To: Kaylor, Andrew <andrew.kaylor at intel.com <mailto:andrew.kaylor
at intel.com>>
> Cc: Michael Clark <michaeljclark at mac.com <mailto:michaeljclark at
mac.com>>; llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>
> Subject: Re: [llvm-dev] FENV_ACCESS and floating point LibFunc calls
> 
>  
> 
> Sounds like the select lowering issue is definitely separate from the FENV
work.
> 
> Is there a bug report with a C or IR example? You want to generate compare
and branch instead of a cmov for something like this?
> 
> 
> int foo(float x) {
>   if (x < 42.0f)
>     return x;
>   return 12;
> }
> 
> define i32 @foo(float %x) {
>   %cmp = fcmp olt float %x, 4.200000e+01
>   %conv = fptosi float %x to i32
>   %ret = select i1 %cmp, i32 %conv, i32 12
>   ret i32 %ret
> }
> 
> $ clang -O2 cmovfp.c -S -o -
>     movss    LCPI0_0(%rip), %xmm1    ## xmm1 = mem[0],zero,zero,zero
>     ucomiss    %xmm0, %xmm1
>     cvttss2si    %xmm0, %ecx
>     movl    $12, %eax
>     cmoval    %ecx, %eax
>     retq
> 
> 
>  
> 
> On Thu, May 11, 2017 at 1:28 PM, Kaylor, Andrew via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> 
> Hi Michael,
> 
>  
> 
> To be honest I haven’t started working on FP to integer conversions for
FENV_ACCESS yet.
> 
>  
> 
> To some extent I would consider the issue that you found independent of
what I’m doing to constrain FP behavior.  That is, I think we ought to make the
change you’re asking for even apart from the FENV_ACCESS work.  When I get to
the conversions for FENV_ACCESS support it may require some additional
constraints, but I think if the branching conversion is usually faster (and it
looks like it will be) then that should be the default behavior.
> 
>  
> 
> I’ll try to look into that.  I’d offer to give you advice on putting
together a patch, but I’m still learning my way around the ISel code myself.  I
think I know enough to figure out what to do but not enough to tell someone else
how to do it without a bunch of wrong turns.
> 
>  
> 
> -Andy
> 
>  
> 
>   <>
>  <>From: Michael Clark [mailto:michaeljclark at mac.com
<mailto:michaeljclark at mac.com>]
> Sent: Wednesday, May 10, 2017 7:59 PM
> To: Kaylor, Andrew <andrew.kaylor at intel.com <mailto:andrew.kaylor
at intel.com>>
> Cc: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>
> Subject: FENV_ACCESS and floating point LibFunc calls
> 
>  
> 
> Hi Andy,
> 
>  
> 
> I’m interested to try out your patches…
> 
>  
> 
> I understand the scope of FENV_ACCESS is relatively wide, however I’m still
curious if you managed to figure out how to prevent the
SelectionDAGLegalize::ExpandNode() FP_TO_UINT lowering of the FPToUI intrinsic
from producing the predicate logic that incorrectly sets the floating point
accrued exceptions due to unconditional execution of the ISD::FSUB node attached
to the SELECT node. It’s a little above my head to try to solve this issue with
my current understanding of LLVM but I could give it a try. I’d need some
guidance as to how the lowering of SELECT can be controlled. i.e. where LLVM
decides whether and how to lower a select node as a branch vs predicate logic.
> 
>  
> 
> I’d almost forgotten that we microbenchmarked this and found the branching
version is faster with regular input (< LLONG_MAX).
> 
>  
> 
> - https://godbolt.org/g/ytgk7l <https://godbolt.org/g/ytgk7l>
> All, Where does LLVM decide to lower select as predicate logic vs branches
and how does the cost model work? I’m curious about a tuning flag to generate
branches instead of computing both values and using conditional moves…
> 
>  
> 
> Best,
> 
> Michael.
> 
>  
> 
> On 11 May 2017, at 11:41 AM, via llvm-dev <llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>> wrote:
> 
>  
> 
> Hi all,
> 
> Background
> I've been working on adding the necessary support to LLVM for clang to
be able to support the STDC FENV_ACCESS pragma, which basically allows users to
modify rounding mode at runtime and depend on the value of floating-point status
flags or to unmask floating point exceptions without unexpected side effects. 
I've committed an initial patch (r293226) that adds constrained intrinsics
for the basic FP operations, and I have a patch up for review now
(https://reviews.llvm.org/D32319 <https://reviews.llvm.org/D32319>) that
adds constrained versions of a number of libm-like FP intrinsics.
> 
> Current problem
> Now I'm trying to make sure I have a good solution for the way in which
the optimizer handles recognized calls to libm functions (sqrt, pow, cos, sin,
etc.).  Basically, I need to prevent all passes from making any modifications to
these calls that would make assumptions about rounding mode or improperly affect
the FP status flags (either suppressing flags that should be present or setting
flags that should not be set).  For instance, there are circumstances in which
the optimizer will constant fold a call to one of these functions if the value
of the arguments are known at compile time, but this constant folding generally
assumes the default rounding mode and if the library call would have set a
status flag, I need the flag to be set.
> 
> Question
> My question is, can/should I just rely on the front end setting the
"nobuiltin" attribute for the call site in any location where the FP
behavior needs to be restricted?
> 
> Ideally, I would like to be able to conditionally enable optimizations like
constant folding if I am able to prove that the rounding mode, though dynamic,
is known for the callsite at compile time (the constrained intrinsics have a
mechanism to enable this), but at the moment I am more concerned about
correctness and would be willing to sacrifice optimizations to get correct
behavior.  Long term, I was thinking that maybe I could do something like attach
metadata to indicate rounding mode and exception behavior when they were known.
> 
> Is there a better way to do this?
> 
> Thanks,
> Andy
> 
>  
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>  
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170512/f7e422a7/attachment.html>

Tim Northover via llvm-dev

2017-May-12 01:48 UTC

head link

[llvm-dev] FENV_ACCESS and floating point LibFunc calls

On 11 May 2017 at 18:30, Michael Clark via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I note that on your bug that you have stated that the branch is faster than
> the conditional move. Faster code is a side effect of the fix in this
> particular case.
On the contrary: the faster code is pretty much the only reason this
can happen before the rest of the FENV support lands.

It's been said before, but I'll reiterate: LLVM IR does not model the
FENV on its instructions. CodeGen and other passes are free to
de-conditionalize exceptions, remove them, or add spurious ones just
for the giggles. What LLVM does now is not incorrect.

Tim.

Sanjay Patel via llvm-dev

2017-May-12 01:51 UTC

head link

[llvm-dev] FENV_ACCESS and floating point LibFunc calls

Hi Michael,

My main concern is performance in default mode, so I see the FENV fix as
the side effect. :)

I looked at this very briefly before leaving my dev machine for the day. I
think we're going to need a cmov conversion pass that happens very late
(maybe this already exists but just isn't firing for the cases I looked
at). Trying to convert these any sooner will likely be too harmful to the
DAG. PowerPC has this functionality, so we should already have a model to
share or steal from.

On Thu, May 11, 2017 at 7:30 PM Michael Clark <michaeljclark at mac.com>
wrote:
> Hi Sanjay,
>
> I can’t comment on the bug as I don’t have an LLVM bugzilla account. I’ve
> emailed bugs-admin to gain access to bugzilla so I can comment on the bug.
>
> I note that on your bug that you have stated that the branch is faster
> than the conditional move. Faster code is a side effect of the fix in this
> particular case. The real issue is as Andrew mentioned, it that the
> predicate logic; the ISD:FSUB in particular is causing FE_INEXACT accrued
> exception state for exact conversions.
>
> Here is some code that shows the error and has a workaround by forcing the
> compiler to create a branch by using inline asm for the conversion.
>
> https://godbolt.org/g/huIgK1
>
> The bug affects both float to u64 and double to u64 conversions.
>
> Expected output:
>
> $ clang  fcvt-bug.cc
> $ ./a.out
> 1 exact
> 1 inexact
> 1 exact
> 1 inexact
> 1 exact
> 1 inexact
> 1 exact
> 1 inexact
>
>
> Erroneous output:
>
> $ clang -DDISABLE_WORKAROUND fcvt-bug.cc
> $ ./a.out
> 1 exact
> 1 inexact
> 1 inexact
> 1 inexact
> 1 exact
> 1 inexact
> 1 inexact
> 1 inexact
>
>
> Michael.
>
> On 12 May 2017, at 11:36 AM, Sanjay Patel <spatel at rotateright.com>
wrote:
>
> Thanks, Andy. I'm not sure how to solve that or my case given the
DAG's
> basic-block limit. Probably CodeGenPrepare or SelectionDAGBuilder...or we
> wait until after isel and try to split it up in a machine instruction pass.
>
> I filed my example here:
> https://bugs.llvm.org/show_bug.cgi?id=33013
>
> Feel free to comment there and/or open a new bug for the FP_TO_UINT case.
>
> On Thu, May 11, 2017 at 4:20 PM, Kaylor, Andrew <andrew.kaylor at
intel.com>
> wrote:
>
>> Hi Sanjay,
>>
>>
>>
>> The issue that Michael is talking about is that in
>> SelectionDAGLegalize::ExpandNode() we’re doing this:
>>
>>
>>
>>   case ISD::FP_TO_UINT: {
>>
>>     SDValue True, False;
>>
>>     EVT VT =  Node->getOperand(0).getValueType();
>>
>>     EVT NVT = Node->getValueType(0);
>>
>>     APFloat apf(DAG.EVTToAPFloatSemantics(VT),
>>
>>                 APInt::getNullValue(VT.getSizeInBits()));
>>
>>     APInt x = APInt::getSignMask(NVT.getSizeInBits());
>>
>>     (void)apf.convertFromAPInt(x, false, APFloat::rmNearestTiesToEven);
>>
>>     Tmp1 = DAG.getConstantFP(apf, dl, VT);
>>
>>     Tmp2 = DAG.getSetCC(dl, getSetCCResultType(VT),
>>
>>                         Node->getOperand(0),
>>
>>                         Tmp1, ISD::SETLT);
>>
>>     True = DAG.getNode(ISD::FP_TO_SINT, dl, NVT,
Node->getOperand(0));
>>
>>     // TODO: Should any fast-math-flags be set for the FSUB?
>>
>>     False = DAG.getNode(ISD::FP_TO_SINT, dl, NVT,
>>
>>                         DAG.getNode(ISD::FSUB, dl, VT,
>>
>>                                     Node->getOperand(0), Tmp1));
>>
>>     False = DAG.getNode(ISD::XOR, dl, NVT, False,
>>
>>                         DAG.getConstant(x, dl, NVT));
>>
>>     Tmp1 = DAG.getSelect(dl, NVT, Tmp2, True, False);
>>
>>     Results.push_back(Tmp1);
>>
>>     break;
>>
>>   }
>>
>>
>>
>> This results in code that speculatively executes the FSUB and then uses
a
>> cmov instruction (based on whether or not the original value was less
than
>> LLONG_MAX) to select between the direct FP_TO_SINT result and the
result of
>> the FSUB.  If we assume that for most data sets a significant majority
of
>> values being converted this way will be less than LLONG_MAX then the
>> processor should be able to predict a branch here pretty reliably and
so
>> branching code would be faster than code that does a speculative FSUB
>> followed by a cmov.
>>
>>
>>
>> The question is, how do we need to change this to get the back end to
>> generate a branch instead of a cmov.  I’m guessing that there are
>> circumstances in which the target-specific backend would make that
decision
>> on its own and sink the FSUB, but for the X86 backend that doesn’t seem
to
>> be happening in this case.  My inclination was to just rewrite the
>> ExpandNode() code above to explicitly create a branch, but I haven’t
tried
>> it yet.
>>
>>
>>
>> As a side effect, which is what originally drew Michael’s attention to
>> this conversion, the INEXACT floating point flag is set even for exact
>> conversions as a result of the speculative FSUB.  That’s how this got
>> related to FENV_ACCESS support.
>>
>>
>>
>> I don’t know if there has been a bug opened for this issue.  There was
an
>> earlier discussion here:
>> https://groups.google.com/forum/#!topic/llvm-dev/Tl9SD-edUJI
>>
>>
>>
>> -Andy
>>
>>
>>
>>
>>
>> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
>> *Sent:* Thursday, May 11, 2017 2:06 PM
>> *To:* Kaylor, Andrew <andrew.kaylor at intel.com>
>> *Cc:* Michael Clark <michaeljclark at mac.com>; llvm-dev <
>> llvm-dev at lists.llvm.org>
>> *Subject:* Re: [llvm-dev] FENV_ACCESS and floating point LibFunc calls
>>
>>
>>
>> Sounds like the select lowering issue is definitely separate from the
>> FENV work.
>>
>> Is there a bug report with a C or IR example? You want to generate
>> compare and branch instead of a cmov for something like this?
>>
>>
>> int foo(float x) {
>>   if (x < 42.0f)
>>     return x;
>>   return 12;
>> }
>>
>> define i32 @foo(float %x) {
>>   %cmp = fcmp olt float %x, 4.200000e+01
>>   %conv = fptosi float %x to i32
>>   %ret = select i1 %cmp, i32 %conv, i32 12
>>   ret i32 %ret
>> }
>>
>> $ clang -O2 cmovfp.c -S -o -
>>     movss    LCPI0_0(%rip), %xmm1    ## xmm1 = mem[0],zero,zero,zero
>>     ucomiss    %xmm0, %xmm1
>>     cvttss2si    %xmm0, %ecx
>>     movl    $12, %eax
>>     cmoval    %ecx, %eax
>>     retq
>>
>>
>>
>> On Thu, May 11, 2017 at 1:28 PM, Kaylor, Andrew via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Hi Michael,
>>
>>
>>
>> To be honest I haven’t started working on FP to integer conversions for
>> FENV_ACCESS yet.
>>
>>
>>
>> To some extent I would consider the issue that you found independent of
>> what I’m doing to constrain FP behavior.  That is, I think we ought to
make
>> the change you’re asking for even apart from the FENV_ACCESS work. 
When I
>> get to the conversions for FENV_ACCESS support it may require some
>> additional constraints, but I think if the branching conversion is
usually
>> faster (and it looks like it will be) then that should be the default
>> behavior.
>>
>>
>>
>> I’ll try to look into that.  I’d offer to give you advice on putting
>> together a patch, but I’m still learning my way around the ISel code
>> myself.  I think I know enough to figure out what to do but not enough
to
>> tell someone else how to do it without a bunch of wrong turns.
>>
>>
>>
>> -Andy
>>
>>
>>
>>
>>
>> *From:* Michael Clark [mailto:michaeljclark at mac.com]
>> *Sent:* Wednesday, May 10, 2017 7:59 PM
>> *To:* Kaylor, Andrew <andrew.kaylor at intel.com>
>> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
>> *Subject:* FENV_ACCESS and floating point LibFunc calls
>>
>>
>>
>> Hi Andy,
>>
>>
>>
>> I’m interested to try out your patches…
>>
>>
>>
>> I understand the scope of FENV_ACCESS is relatively wide, however I’m
>> still curious if you managed to figure out how to prevent the
>> SelectionDAGLegalize::ExpandNode() FP_TO_UINT lowering of the FPToUI
>> intrinsic from producing the predicate logic that incorrectly sets the
>> floating point accrued exceptions due to unconditional execution of the
>> ISD::FSUB node attached to the SELECT node. It’s a little above my head
to
>> try to solve this issue with my current understanding of LLVM but I
could
>> give it a try. I’d need some guidance as to how the lowering of SELECT
can
>> be controlled. i.e. where LLVM decides whether and how to lower a
select
>> node as a branch vs predicate logic.
>>
>>
>>
>> I’d almost forgotten that we microbenchmarked this and found the
>> branching version is faster with regular input (< LLONG_MAX).
>>
>>
>>
>> - https://godbolt.org/g/ytgk7l
>>
>> All, Where does LLVM decide to lower select as predicate logic vs
>> branches and how does the cost model work? I’m curious about a tuning
flag
>> to generate branches instead of computing both values and using
conditional
>> moves…
>>
>>
>>
>> Best,
>>
>> Michael.
>>
>>
>>
>> On 11 May 2017, at 11:41 AM, via llvm-dev <llvm-dev at
lists.llvm.org>
>> wrote:
>>
>>
>>
>> Hi all,
>>
>> Background
>> I've been working on adding the necessary support to LLVM for clang
to be
>> able to support the STDC FENV_ACCESS pragma, which basically allows
users
>> to modify rounding mode at runtime and depend on the value of
>> floating-point status flags or to unmask floating point exceptions
without
>> unexpected side effects.  I've committed an initial patch (r293226)
that
>> adds constrained intrinsics for the basic FP operations, and I have a
patch
>> up for review now (https://reviews.llvm.org/D32319) that adds
>> constrained versions of a number of libm-like FP intrinsics.
>>
>> Current problem
>> Now I'm trying to make sure I have a good solution for the way in
which
>> the optimizer handles recognized calls to libm functions (sqrt, pow,
cos,
>> sin, etc.).  Basically, I need to prevent all passes from making any
>> modifications to these calls that would make assumptions about rounding
>> mode or improperly affect the FP status flags (either suppressing flags
>> that should be present or setting flags that should not be set).  For
>> instance, there are circumstances in which the optimizer will constant
fold
>> a call to one of these functions if the value of the arguments are
known at
>> compile time, but this constant folding generally assumes the default
>> rounding mode and if the library call would have set a status flag, I
need
>> the flag to be set.
>>
>> Question
>> My question is, can/should I just rely on the front end setting the
>> "nobuiltin" attribute for the call site in any location where
the FP
>> behavior needs to be restricted?
>>
>> Ideally, I would like to be able to conditionally enable optimizations
>> like constant folding if I am able to prove that the rounding mode,
though
>> dynamic, is known for the callsite at compile time (the constrained
>> intrinsics have a mechanism to enable this), but at the moment I am
more
>> concerned about correctness and would be willing to sacrifice
optimizations
>> to get correct behavior.  Long term, I was thinking that maybe I could
do
>> something like attach metadata to indicate rounding mode and exception
>> behavior when they were known.
>>
>> Is there a better way to do this?
>>
>> Thanks,
>> Andy
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170512/ce5f6bc1/attachment.html>

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - May 2017 - FENV_ACCESS and floating point LibFunc calls

[llvm-dev] FENV_ACCESS and floating point LibFunc calls

[llvm-dev] FENV_ACCESS and floating point LibFunc calls

[llvm-dev] FENV_ACCESS and floating point LibFunc calls

[llvm-dev] FENV_ACCESS and floating point LibFunc calls

Apparently Analagous Threads