Michael Clark via llvm-dev
2017-May-11 02:58 UTC
[llvm-dev] FENV_ACCESS and floating point LibFunc calls
Hi Andy, I’m interested to try out your patches… I understand the scope of FENV_ACCESS is relatively wide, however I’m still curious if you managed to figure out how to prevent the SelectionDAGLegalize::ExpandNode() FP_TO_UINT lowering of the FPToUI intrinsic from producing the predicate logic that incorrectly sets the floating point accrued exceptions due to unconditional execution of the ISD::FSUB node attached to the SELECT node. It’s a little above my head to try to solve this issue with my current understanding of LLVM but I could give it a try. I’d need some guidance as to how the lowering of SELECT can be controlled. i.e. where LLVM decides whether and how to lower a select node as a branch vs predicate logic. I’d almost forgotten that we microbenchmarked this and found the branching version is faster with regular input (< LLONG_MAX). - https://godbolt.org/g/ytgk7l All, Where does LLVM decide to lower select as predicate logic vs branches and how does the cost model work? I’m curious about a tuning flag to generate branches instead of computing both values and using conditional moves… Best, Michael.> On 11 May 2017, at 11:41 AM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi all, > > Background > I've been working on adding the necessary support to LLVM for clang to be able to support the STDC FENV_ACCESS pragma, which basically allows users to modify rounding mode at runtime and depend on the value of floating-point status flags or to unmask floating point exceptions without unexpected side effects. I've committed an initial patch (r293226) that adds constrained intrinsics for the basic FP operations, and I have a patch up for review now (https://reviews.llvm.org/D32319 <https://reviews.llvm.org/D32319>) that adds constrained versions of a number of libm-like FP intrinsics. > > Current problem > Now I'm trying to make sure I have a good solution for the way in which the optimizer handles recognized calls to libm functions (sqrt, pow, cos, sin, etc.). Basically, I need to prevent all passes from making any modifications to these calls that would make assumptions about rounding mode or improperly affect the FP status flags (either suppressing flags that should be present or setting flags that should not be set). For instance, there are circumstances in which the optimizer will constant fold a call to one of these functions if the value of the arguments are known at compile time, but this constant folding generally assumes the default rounding mode and if the library call would have set a status flag, I need the flag to be set. > > Question > My question is, can/should I just rely on the front end setting the "nobuiltin" attribute for the call site in any location where the FP behavior needs to be restricted? > > Ideally, I would like to be able to conditionally enable optimizations like constant folding if I am able to prove that the rounding mode, though dynamic, is known for the callsite at compile time (the constrained intrinsics have a mechanism to enable this), but at the moment I am more concerned about correctness and would be willing to sacrifice optimizations to get correct behavior. Long term, I was thinking that maybe I could do something like attach metadata to indicate rounding mode and exception behavior when they were known. > > Is there a better way to do this? > > Thanks, > Andy-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170511/dfea1c42/attachment.html>
Kaylor, Andrew via llvm-dev
2017-May-11 19:28 UTC
[llvm-dev] FENV_ACCESS and floating point LibFunc calls
Hi Michael, To be honest I haven’t started working on FP to integer conversions for FENV_ACCESS yet. To some extent I would consider the issue that you found independent of what I’m doing to constrain FP behavior. That is, I think we ought to make the change you’re asking for even apart from the FENV_ACCESS work. When I get to the conversions for FENV_ACCESS support it may require some additional constraints, but I think if the branching conversion is usually faster (and it looks like it will be) then that should be the default behavior. I’ll try to look into that. I’d offer to give you advice on putting together a patch, but I’m still learning my way around the ISel code myself. I think I know enough to figure out what to do but not enough to tell someone else how to do it without a bunch of wrong turns. -Andy From: Michael Clark [mailto:michaeljclark at mac.com] Sent: Wednesday, May 10, 2017 7:59 PM To: Kaylor, Andrew <andrew.kaylor at intel.com> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: FENV_ACCESS and floating point LibFunc calls Hi Andy, I’m interested to try out your patches… I understand the scope of FENV_ACCESS is relatively wide, however I’m still curious if you managed to figure out how to prevent the SelectionDAGLegalize::ExpandNode() FP_TO_UINT lowering of the FPToUI intrinsic from producing the predicate logic that incorrectly sets the floating point accrued exceptions due to unconditional execution of the ISD::FSUB node attached to the SELECT node. It’s a little above my head to try to solve this issue with my current understanding of LLVM but I could give it a try. I’d need some guidance as to how the lowering of SELECT can be controlled. i.e. where LLVM decides whether and how to lower a select node as a branch vs predicate logic. I’d almost forgotten that we microbenchmarked this and found the branching version is faster with regular input (< LLONG_MAX). - https://godbolt.org/g/ytgk7l All, Where does LLVM decide to lower select as predicate logic vs branches and how does the cost model work? I’m curious about a tuning flag to generate branches instead of computing both values and using conditional moves… Best, Michael. On 11 May 2017, at 11:41 AM, via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi all, Background I've been working on adding the necessary support to LLVM for clang to be able to support the STDC FENV_ACCESS pragma, which basically allows users to modify rounding mode at runtime and depend on the value of floating-point status flags or to unmask floating point exceptions without unexpected side effects. I've committed an initial patch (r293226) that adds constrained intrinsics for the basic FP operations, and I have a patch up for review now (https://reviews.llvm.org/D32319) that adds constrained versions of a number of libm-like FP intrinsics. Current problem Now I'm trying to make sure I have a good solution for the way in which the optimizer handles recognized calls to libm functions (sqrt, pow, cos, sin, etc.). Basically, I need to prevent all passes from making any modifications to these calls that would make assumptions about rounding mode or improperly affect the FP status flags (either suppressing flags that should be present or setting flags that should not be set). For instance, there are circumstances in which the optimizer will constant fold a call to one of these functions if the value of the arguments are known at compile time, but this constant folding generally assumes the default rounding mode and if the library call would have set a status flag, I need the flag to be set. Question My question is, can/should I just rely on the front end setting the "nobuiltin" attribute for the call site in any location where the FP behavior needs to be restricted? Ideally, I would like to be able to conditionally enable optimizations like constant folding if I am able to prove that the rounding mode, though dynamic, is known for the callsite at compile time (the constrained intrinsics have a mechanism to enable this), but at the moment I am more concerned about correctness and would be willing to sacrifice optimizations to get correct behavior. Long term, I was thinking that maybe I could do something like attach metadata to indicate rounding mode and exception behavior when they were known. Is there a better way to do this? Thanks, Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170511/2df80e0b/attachment.html>
Sanjay Patel via llvm-dev
2017-May-11 21:06 UTC
[llvm-dev] FENV_ACCESS and floating point LibFunc calls
Sounds like the select lowering issue is definitely separate from the FENV work. Is there a bug report with a C or IR example? You want to generate compare and branch instead of a cmov for something like this? int foo(float x) { if (x < 42.0f) return x; return 12; } define i32 @foo(float %x) { %cmp = fcmp olt float %x, 4.200000e+01 %conv = fptosi float %x to i32 %ret = select i1 %cmp, i32 %conv, i32 12 ret i32 %ret } $ clang -O2 cmovfp.c -S -o - movss LCPI0_0(%rip), %xmm1 ## xmm1 = mem[0],zero,zero,zero ucomiss %xmm0, %xmm1 cvttss2si %xmm0, %ecx movl $12, %eax cmoval %ecx, %eax retq On Thu, May 11, 2017 at 1:28 PM, Kaylor, Andrew via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi Michael, > > > > To be honest I haven’t started working on FP to integer conversions for > FENV_ACCESS yet. > > > > To some extent I would consider the issue that you found independent of > what I’m doing to constrain FP behavior. That is, I think we ought to make > the change you’re asking for even apart from the FENV_ACCESS work. When I > get to the conversions for FENV_ACCESS support it may require some > additional constraints, but I think if the branching conversion is usually > faster (and it looks like it will be) then that should be the default > behavior. > > > > I’ll try to look into that. I’d offer to give you advice on putting > together a patch, but I’m still learning my way around the ISel code > myself. I think I know enough to figure out what to do but not enough to > tell someone else how to do it without a bunch of wrong turns. > > > > -Andy > > > > > > *From:* Michael Clark [mailto:michaeljclark at mac.com] > *Sent:* Wednesday, May 10, 2017 7:59 PM > *To:* Kaylor, Andrew <andrew.kaylor at intel.com> > *Cc:* llvm-dev <llvm-dev at lists.llvm.org> > *Subject:* FENV_ACCESS and floating point LibFunc calls > > > > Hi Andy, > > > > I’m interested to try out your patches… > > > > I understand the scope of FENV_ACCESS is relatively wide, however I’m > still curious if you managed to figure out how to prevent the > SelectionDAGLegalize::ExpandNode() FP_TO_UINT lowering of the FPToUI > intrinsic from producing the predicate logic that incorrectly sets the > floating point accrued exceptions due to unconditional execution of the > ISD::FSUB node attached to the SELECT node. It’s a little above my head to > try to solve this issue with my current understanding of LLVM but I could > give it a try. I’d need some guidance as to how the lowering of SELECT can > be controlled. i.e. where LLVM decides whether and how to lower a select > node as a branch vs predicate logic. > > > > I’d almost forgotten that we microbenchmarked this and found the branching > version is faster with regular input (< LLONG_MAX). > > > > - https://godbolt.org/g/ytgk7l > > All, Where does LLVM decide to lower select as predicate logic vs branches > and how does the cost model work? I’m curious about a tuning flag to > generate branches instead of computing both values and using conditional > moves… > > > > Best, > > Michael. > > > > On 11 May 2017, at 11:41 AM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > Hi all, > > Background > I've been working on adding the necessary support to LLVM for clang to be > able to support the STDC FENV_ACCESS pragma, which basically allows users > to modify rounding mode at runtime and depend on the value of > floating-point status flags or to unmask floating point exceptions without > unexpected side effects. I've committed an initial patch (r293226) that > adds constrained intrinsics for the basic FP operations, and I have a patch > up for review now (https://reviews.llvm.org/D32319) that adds constrained > versions of a number of libm-like FP intrinsics. > > Current problem > Now I'm trying to make sure I have a good solution for the way in which > the optimizer handles recognized calls to libm functions (sqrt, pow, cos, > sin, etc.). Basically, I need to prevent all passes from making any > modifications to these calls that would make assumptions about rounding > mode or improperly affect the FP status flags (either suppressing flags > that should be present or setting flags that should not be set). For > instance, there are circumstances in which the optimizer will constant fold > a call to one of these functions if the value of the arguments are known at > compile time, but this constant folding generally assumes the default > rounding mode and if the library call would have set a status flag, I need > the flag to be set. > > Question > My question is, can/should I just rely on the front end setting the > "nobuiltin" attribute for the call site in any location where the FP > behavior needs to be restricted? > > Ideally, I would like to be able to conditionally enable optimizations > like constant folding if I am able to prove that the rounding mode, though > dynamic, is known for the callsite at compile time (the constrained > intrinsics have a mechanism to enable this), but at the moment I am more > concerned about correctness and would be willing to sacrifice optimizations > to get correct behavior. Long term, I was thinking that maybe I could do > something like attach metadata to indicate rounding mode and exception > behavior when they were known. > > Is there a better way to do this? > > Thanks, > Andy > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170511/d85ac0c5/attachment.html>