thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Michael Clark via llvm-dev

2017-Apr-19 17:01 UTC

[llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

Changing the list from cfe-dev to llvm-dev
> On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com>
wrote:
> 
> I’m getting close. I think it may be an issue with an individual intrinsic.
I’m looking for the X86 lowering of Instruction::FPToUI.
> 
> I found a comment around the rationale for using a conditional move versus
a branch. I believe the predicate logic using a conditional move is causing
INEXACT to be set from the other side of the predicate as the lowered x86_64
code executes both conversions whereas GCC uses a branch. That seems to be the
difference.
> 
> I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out what
the cast gets renamed to in the target layer so I can find where the sequence is
emitted.
> 
> 
> $ more llvm/lib/Target/X86//README-X86-64.txt
> …
> Are we better off using branches instead of cmove to implement FP to
> unsigned i64?
> 
> _conv:
>         ucomiss LC0(%rip), %xmm0
>         cvttss2siq      %xmm0, %rdx
>         jb      L3
>         subss   LC0(%rip), %xmm0
>         movabsq $-9223372036854775808, %rax
>         cvttss2siq      %xmm0, %rdx
>         xorq    %rax, %rdx
> L3:
>         movq    %rdx, %rax
>         ret
> 
> instead of
> 
> _conv:
>         movss LCPI1_0(%rip), %xmm1
>         cvttss2siq %xmm0, %rcx
>         movaps %xmm0, %xmm2
>         subss %xmm1, %xmm2
>         cvttss2siq %xmm2, %rax
>         movabsq $-9223372036854775808, %rdx
>         xorq %rdx, %rax
>         ucomiss %xmm1, %xmm0
>         cmovb %rcx, %rax
>         ret
> 
> 
>> On 19 Apr 2017, at 2:10 PM, Michael Clark <michaeljclark at mac.com
<mailto:michaeljclark at mac.com>> wrote:
>> 
>> 
>>> On 19 Apr 2017, at 1:14 PM, Tim Northover <t.p.northover at
gmail.com <mailto:t.p.northover at gmail.com>> wrote:
>>> 
>>> On 18 April 2017 at 15:54, Michael Clark via cfe-dev
>>> <cfe-dev at lists.llvm.org <mailto:cfe-dev at
lists.llvm.org>> wrote:
>>>> The only way towards completing a milestone is via fixing a
number of small issues along
>>>> the way…
>>> 
>>> I believe there's more to it than that. None of LLVM's
optimizations
>>> are aware of this extra side-channel of information (with possible
>>> exceptions like avoiding speculating fdiv because of unavoidable
>>> exceptions).
>>> 
>>> From what I remember, the real proposal is to replace all
>>> floating-point IR with intrinsics when FENV_ACCESS is on, which the
>>> optimizers by default won't have a clue about and will treat
>>> conservatively (essentially like they're modifying external
memory).
>>> 
>>> So be careful with drawing conclusions from small snippets;
you're
>>> probably not seeing the full range of LLVM's behaviour.
>> 
>> 
>> Yes. I’m sure.
>> 
>> It reproduces with just the cast on its own:
https://godbolt.org/g/myUoL2 <https://godbolt.org/g/myUoL2>
>> 
>> It appears to be in the LLVM lowering of the fptoui intrinsic so it
must MC layer optimisations.
>> 
>> ; Function Attrs: noinline nounwind uwtable
>> define i64 @_Z7fcvt_luf(float %f) #0 {
>>   %1 = alloca float, align 4
>>   store float %f, float* %1, align 4
>>   %2 = load float, float* %1, align 4
>>   %3 = fptoui float %2 to i64
>>   ret i64 %3
>> }
>> 
>> GCC performs a comparison with ucomiss and branches whereas Clang
computes both forms and predicates the result using a conditional move. One of
the conversions obviously is setting the INEXACT MXCSR flag.
>> 
>> Clang lowering (inexact set when result is exact):
>> 
>> fcvt_lu(float):
>>         movss   xmm1, dword ptr [rip + .LCPI1_0] # xmm1 =
mem[0],zero,zero,zero
>>         movaps  xmm2, xmm0
>>         subss   xmm2, xmm1
>>         cvttss2si       rax, xmm2
>>         movabs  rcx, -9223372036854775808
>>         xor     rcx, rax
>>         cvttss2si       rax, xmm0
>>         ucomiss xmm0, xmm1
>>         cmovae  rax, rcx
>>         ret
>> 
>> GCC lowering (sets flags correctly):
>> 
>> fcvt_lu(float):
>>         ucomiss xmm0, DWORD PTR .LC0[rip]
>>         jnb     .L4
>>         cvttss2si       rax, xmm0
>>         ret
>> .L4:
>>         subss   xmm0, DWORD PTR .LC0[rip]
>>         movabs  rdx, -9223372036854775808
>>         cvttss2si       rax, xmm0
>>         xor     rax, rdx
>>         ret
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170420/259f8a03/attachment-0001.html>

Michael Clark via llvm-dev

2017-Apr-19 17:06 UTC

head link

[llvm-dev] FE_INEXACT being set for an exact conversion from float to i64

Confirmed it is in the target layer in LLVM.

Here is the test case: https://godbolt.org/g/kApSxe

$ g++ -O3 -lm fcvt.cc 
$ ./a.out 
1 exact
1 inexact
1 exact
1 inexact

$ clang++ -O3 -lm fcvt.cc 
$ ./a.out 
1 exact
1 inexact
1 inexact
1 inexact

$ cat fcvt.cc
#include <cstdio>
#include <cstdint>
#include <cmath>
#include <limits>
#include <fenv.h>

typedef signed int         s32;
typedef unsigned int       u32;
typedef signed long long   s64;
typedef unsigned long long u64;

__attribute__ ((noinline)) s32 fcvt_wu(float f) { return s32(u32(f)); }
__attribute__ ((noinline)) s64 fcvt_lu(float f) { return s64(u64(f)); }

void test_fcvt_wu(float a)
{
	feclearexcept(FE_ALL_EXCEPT);
	printf("%d ", fcvt_wu(a));
	printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" :
"exact");
}

void test_fcvt_lu(float a)
{       
	feclearexcept(FE_ALL_EXCEPT);
	printf("%lld ", fcvt_lu(a));
	printf("%s\n", fetestexcept(FE_INEXACT) ? "inexact" :
"exact");
}

int main()
{
	fesetround(FE_TONEAREST);

	test_fcvt_wu(1.0f);
	test_fcvt_wu(1.1f);
	test_fcvt_lu(1.0f);
	test_fcvt_lu(1.1f);
}

> On 20 Apr 2017, at 5:01 AM, Michael Clark <michaeljclark at mac.com>
wrote:
> 
> Changing the list from cfe-dev to llvm-dev
> 
>> On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com
<mailto:michaeljclark at mac.com>> wrote:
>> 
>> I’m getting close. I think it may be an issue with an individual
intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI.
>> 
>> I found a comment around the rationale for using a conditional move
versus a branch. I believe the predicate logic using a conditional move is
causing INEXACT to be set from the other side of the predicate as the lowered
x86_64 code executes both conversions whereas GCC uses a branch. That seems to
be the difference.
>> 
>> I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out
what the cast gets renamed to in the target layer so I can find where the
sequence is emitted.
>> 
>> 
>> $ more llvm/lib/Target/X86//README-X86-64.txt
>> …
>> Are we better off using branches instead of cmove to implement FP to
>> unsigned i64?
>> 
>> _conv:
>>         ucomiss LC0(%rip), %xmm0
>>         cvttss2siq      %xmm0, %rdx
>>         jb      L3
>>         subss   LC0(%rip), %xmm0
>>         movabsq $-9223372036854775808, %rax
>>         cvttss2siq      %xmm0, %rdx
>>         xorq    %rax, %rdx
>> L3:
>>         movq    %rdx, %rax
>>         ret
>> 
>> instead of
>> 
>> _conv:
>>         movss LCPI1_0(%rip), %xmm1
>>         cvttss2siq %xmm0, %rcx
>>         movaps %xmm0, %xmm2
>>         subss %xmm1, %xmm2
>>         cvttss2siq %xmm2, %rax
>>         movabsq $-9223372036854775808, %rdx
>>         xorq %rdx, %rax
>>         ucomiss %xmm1, %xmm0
>>         cmovb %rcx, %rax
>>         ret
>> 
>> 
>>> On 19 Apr 2017, at 2:10 PM, Michael Clark <michaeljclark at
mac.com <mailto:michaeljclark at mac.com>> wrote:
>>> 
>>> 
>>>> On 19 Apr 2017, at 1:14 PM, Tim Northover <t.p.northover at
gmail.com <mailto:t.p.northover at gmail.com>> wrote:
>>>> 
>>>> On 18 April 2017 at 15:54, Michael Clark via cfe-dev
>>>> <cfe-dev at lists.llvm.org <mailto:cfe-dev at
lists.llvm.org>> wrote:
>>>>> The only way towards completing a milestone is via fixing a
number of small issues along
>>>>> the way…
>>>> 
>>>> I believe there's more to it than that. None of LLVM's
optimizations
>>>> are aware of this extra side-channel of information (with
possible
>>>> exceptions like avoiding speculating fdiv because of
unavoidable
>>>> exceptions).
>>>> 
>>>> From what I remember, the real proposal is to replace all
>>>> floating-point IR with intrinsics when FENV_ACCESS is on, which
the
>>>> optimizers by default won't have a clue about and will
treat
>>>> conservatively (essentially like they're modifying external
memory).
>>>> 
>>>> So be careful with drawing conclusions from small snippets;
you're
>>>> probably not seeing the full range of LLVM's behaviour.
>>> 
>>> 
>>> Yes. I’m sure.
>>> 
>>> It reproduces with just the cast on its own:
https://godbolt.org/g/myUoL2 <https://godbolt.org/g/myUoL2>
>>> 
>>> It appears to be in the LLVM lowering of the fptoui intrinsic so it
must MC layer optimisations.
>>> 
>>> ; Function Attrs: noinline nounwind uwtable
>>> define i64 @_Z7fcvt_luf(float %f) #0 {
>>>   %1 = alloca float, align 4
>>>   store float %f, float* %1, align 4
>>>   %2 = load float, float* %1, align 4
>>>   %3 = fptoui float %2 to i64
>>>   ret i64 %3
>>> }
>>> 
>>> GCC performs a comparison with ucomiss and branches whereas Clang
computes both forms and predicates the result using a conditional move. One of
the conversions obviously is setting the INEXACT MXCSR flag.
>>> 
>>> Clang lowering (inexact set when result is exact):
>>> 
>>> fcvt_lu(float):
>>>         movss   xmm1, dword ptr [rip + .LCPI1_0] # xmm1 =
mem[0],zero,zero,zero
>>>         movaps  xmm2, xmm0
>>>         subss   xmm2, xmm1
>>>         cvttss2si       rax, xmm2
>>>         movabs  rcx, -9223372036854775808
>>>         xor     rcx, rax
>>>         cvttss2si       rax, xmm0
>>>         ucomiss xmm0, xmm1
>>>         cmovae  rax, rcx
>>>         ret
>>> 
>>> GCC lowering (sets flags correctly):
>>> 
>>> fcvt_lu(float):
>>>         ucomiss xmm0, DWORD PTR .LC0[rip]
>>>         jnb     .L4
>>>         cvttss2si       rax, xmm0
>>>         ret
>>> .L4:
>>>         subss   xmm0, DWORD PTR .LC0[rip]
>>>         movabs  rdx, -9223372036854775808
>>>         cvttss2si       rax, xmm0
>>>         xor     rax, rdx
>>>         ret
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170420/7dbbb67d/attachment.html>

Flamedoge via llvm-dev

2017-Apr-19 17:14 UTC

head link

[llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

> Are we better off using branches instead of cmove to implement FP tounsigned i64?

This seems like it was done for perf reason (mispredict).
Conditional-to-cmov transformation should keep from introducing additional
observable side-effects, and it's clear that whatever did this did not
account for floating point exception.

On Wed, Apr 19, 2017 at 10:01 AM, Michael Clark via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Changing the list from cfe-dev to llvm-dev
>
> On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com>
wrote:
>
> I’m getting close. I think it may be an issue with an individual
> intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI.
>
> I found a comment around the rationale for using a conditional move versus
> a branch. I believe the predicate logic using a conditional move is causing
> INEXACT to be set from the other side of the predicate as the lowered
> x86_64 code executes both conversions whereas GCC uses a branch. That seems
> to be the difference.
>
> I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out
> what the cast gets renamed to in the target layer so I can find where the
> sequence is emitted.
>
>
> $ more llvm/lib/Target/X86//README-X86-64.txt
> …
> Are we better off using branches instead of cmove to implement FP to
> unsigned i64?
>
> _conv:
>         ucomiss LC0(%rip), %xmm0
>         cvttss2siq      %xmm0, %rdx
>         jb      L3
>         subss   LC0(%rip), %xmm0
>         movabsq $-9223372036854775808, %rax
>         cvttss2siq      %xmm0, %rdx
>         xorq    %rax, %rdx
> L3:
>         movq    %rdx, %rax
>         ret
>
> instead of
>
> _conv:
>         movss LCPI1_0(%rip), %xmm1
>         cvttss2siq %xmm0, %rcx
>         movaps %xmm0, %xmm2
>         subss %xmm1, %xmm2
>         cvttss2siq %xmm2, %rax
>         movabsq $-9223372036854775808, %rdx
>         xorq %rdx, %rax
>         ucomiss %xmm1, %xmm0
>         cmovb %rcx, %rax
>         ret
>
>
> On 19 Apr 2017, at 2:10 PM, Michael Clark <michaeljclark at mac.com>
wrote:
>
>
> On 19 Apr 2017, at 1:14 PM, Tim Northover <t.p.northover at
gmail.com> wrote:
>
> On 18 April 2017 at 15:54, Michael Clark via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>
> The only way towards completing a milestone is via fixing a number of
> small issues along
> the way…
>
>
> I believe there's more to it than that. None of LLVM's
optimizations
> are aware of this extra side-channel of information (with possible
> exceptions like avoiding speculating fdiv because of unavoidable
> exceptions).
>
> From what I remember, the real proposal is to replace all
> floating-point IR with intrinsics when FENV_ACCESS is on, which the
> optimizers by default won't have a clue about and will treat
> conservatively (essentially like they're modifying external memory).
>
> So be careful with drawing conclusions from small snippets; you're
> probably not seeing the full range of LLVM's behaviour.
>
>
>
> Yes. I’m sure.
>
> It reproduces with just the cast on its own: https://godbolt.org/g/myUoL2
>
> It appears to be in the LLVM lowering of the fptoui intrinsic so it must
> MC layer optimisations.
>
> ; Function Attrs: noinline nounwind uwtable
> define i64 @_Z7fcvt_luf(float %f) #0 {
>   %1 = alloca float, align 4
>   store float %f, float* %1, align 4
>   %2 = load float, float* %1, align 4
>   %3 = fptoui float %2 to i64
>   ret i64 %3
> }
>
>
> GCC performs a comparison with ucomiss and branches whereas Clang computes
> both forms and predicates the result using a conditional move. One of the
> conversions obviously is setting the INEXACT MXCSR flag.
>
> Clang lowering (inexact set when result is exact):
>
> fcvt_lu(float):
>         movss   xmm1, dword ptr [rip + .LCPI1_0] # xmm1 >
mem[0],zero,zero,zero
>         movaps  xmm2, xmm0
>         subss   xmm2, xmm1
>         cvttss2si       rax, xmm2
>         movabs  rcx, -9223372036854775808
>         xor     rcx, rax
>         cvttss2si       rax, xmm0
>         ucomiss xmm0, xmm1
>         cmovae  rax, rcx
>         ret
>
>
> GCC lowering (sets flags correctly):
>
> fcvt_lu(float):
>         ucomiss xmm0, DWORD PTR .LC0[rip]
>         jnb     .L4
>         cvttss2si       rax, xmm0
>         ret
> .L4:
>         subss   xmm0, DWORD PTR .LC0[rip]
>         movabs  rdx, -9223372036854775808
>         cvttss2si       rax, xmm0
>         xor     rax, rdx
>         ret
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170419/72c5480c/attachment.html>

Kaylor, Andrew via llvm-dev

2017-Apr-20 20:50 UTC

head link

[llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

> This seems like it was done for perf reason (mispredict).
Conditional-to-cmov transformation should keep
> from introducing additional observable side-effects, and it's clear
that whatever did this did not account
> for floating point exception.
That’s a very reasonable statement, but I’m not sure it corresponds to the way
we have typically approached this sort of problem.

In particular, there has been a de facto standard practice (and it has even been
openly stated and agreed upon once or twice) of assuming default rounding mode
and ignoring FP exceptions in the default (and currently only) optimization path
for FP-related instructions.  That is, clang/LLVM didn’t just not support
FENV_ACCESS by indifference but rather we have made conscious decisions to allow
transformations that violate the needs of FENV_ACCESS when doing so can improve
the performance of generated code.  Basically, we more or less pretend that
floating point status bits don’t exist (at least before you get to the
target-specific backend).

You’ll find that the X86 backend doesn’t even model MXCSR at the moment.  I
tried to add it recently and it kind of blew up before I had even modeled it for
anything other than LDMXCSR and STMCXSR.  We may want to address that at some
point, but right now it just isn’t there.

When we discussed how FENV_ACCESS support should be implemented, Chandler
proposed that when restricted modes (whether FENV_ACCESS or any other front
end-specific analogous behavior) were not being used the optimizer should be
able to behave as described above and that nothing done to support restricted FP
behavior should complicate or restrict the default optimizer behavior.  This was
met with general agreement at the time.

I mention all this as prologue to saying that while we should do something to
get FPToUI lowered without incorrectly setting FP exception status bits, it
isn’t necessarily what we want as the default behavior.

I would also like to add that I personally am very pleased that you discovered
this issue and have gotten as far as you have in the analysis of the problem. 
I’m in the process of adding constrained versions of various FP intrinsics (I
have a patch ready to be sent out today) and what I’ve done up to now has been
to simply translate the constrained operations into their traditional
representations somewhere in the ISel process.  I was aware that something would
need to be done in the codegen space to continue protecting these operations,
but I was kind of hoping that the actual instruction selection would be
reasonably safe.  FWIW, FPToUI is not one of the parts my pending patch
addresses.

-Andy

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Flamedoge via llvm-dev
Sent: Wednesday, April 19, 2017 10:14 AM
To: Michael Clark <michaeljclark at mac.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion
from float to unsigned long long
> Are we better off using branches instead of cmove to implement FP tounsigned i64?

This seems like it was done for perf reason (mispredict). Conditional-to-cmov
transformation should keep from introducing additional observable side-effects,
and it's clear that whatever did this did not account for floating point
exception.

On Wed, Apr 19, 2017 at 10:01 AM, Michael Clark via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Changing the list from cfe-dev to llvm-dev

On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at
mac.com<mailto:michaeljclark at mac.com>> wrote:

I’m getting close. I think it may be an issue with an individual intrinsic. I’m
looking for the X86 lowering of Instruction::FPToUI.

I found a comment around the rationale for using a conditional move versus a
branch. I believe the predicate logic using a conditional move is causing
INEXACT to be set from the other side of the predicate as the lowered x86_64
code executes both conversions whereas GCC uses a branch. That seems to be the
difference.

I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out what the
cast gets renamed to in the target layer so I can find where the sequence is
emitted.

$ more llvm/lib/Target/X86//README-X86-64.txt
…
Are we better off using branches instead of cmove to implement FP to
unsigned i64?

_conv:
        ucomiss LC0(%rip), %xmm0
        cvttss2siq      %xmm0, %rdx
        jb      L3
        subss   LC0(%rip), %xmm0
        movabsq $-9223372036854775808, %rax
        cvttss2siq      %xmm0, %rdx
        xorq    %rax, %rdx
L3:
        movq    %rdx, %rax
        ret

instead of

_conv:
        movss LCPI1_0(%rip), %xmm1
        cvttss2siq %xmm0, %rcx
        movaps %xmm0, %xmm2
        subss %xmm1, %xmm2
        cvttss2siq %xmm2, %rax
        movabsq $-9223372036854775808, %rdx
        xorq %rdx, %rax
        ucomiss %xmm1, %xmm0
        cmovb %rcx, %rax
        ret

On 19 Apr 2017, at 2:10 PM, Michael Clark <michaeljclark at
mac.com<mailto:michaeljclark at mac.com>> wrote:

On 19 Apr 2017, at 1:14 PM, Tim Northover <t.p.northover at
gmail.com<mailto:t.p.northover at gmail.com>> wrote:

On 18 April 2017 at 15:54, Michael Clark via cfe-dev
<cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:

The only way towards completing a milestone is via fixing a number of small
issues along
the way…

I believe there's more to it than that. None of LLVM's optimizations
are aware of this extra side-channel of information (with possible
exceptions like avoiding speculating fdiv because of unavoidable
exceptions).

From what I remember, the real proposal is to replace all
floating-point IR with intrinsics when FENV_ACCESS is on, which the
optimizers by default won't have a clue about and will treat
conservatively (essentially like they're modifying external memory).

So be careful with drawing conclusions from small snippets; you're
probably not seeing the full range of LLVM's behaviour.

Yes. I’m sure.

It reproduces with just the cast on its own: https://godbolt.org/g/myUoL2

It appears to be in the LLVM lowering of the fptoui intrinsic so it must MC
layer optimisations.

; Function Attrs: noinline nounwind uwtable
define i64 @_Z7fcvt_luf(float %f) #0 {
  %1 = alloca float, align 4
  store float %f, float* %1, align 4
  %2 = load float, float* %1, align 4
  %3 = fptoui float %2 to i64
  ret i64 %3
}

GCC performs a comparison with ucomiss and branches whereas Clang computes both
forms and predicates the result using a conditional move. One of the conversions
obviously is setting the INEXACT MXCSR flag.

Clang lowering (inexact set when result is exact):

fcvt_lu(float):
        movss   xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero
        movaps  xmm2, xmm0
        subss   xmm2, xmm1
        cvttss2si       rax, xmm2
        movabs  rcx, -9223372036854775808
        xor     rcx, rax
        cvttss2si       rax, xmm0
        ucomiss xmm0, xmm1
        cmovae  rax, rcx
        ret

GCC lowering (sets flags correctly):

fcvt_lu(float):
        ucomiss xmm0, DWORD PTR .LC0[rip]
        jnb     .L4
        cvttss2si       rax, xmm0
        ret
.L4:
        subss   xmm0, DWORD PTR .LC0[rip]
        movabs  rdx, -9223372036854775808
        cvttss2si       rax, xmm0
        xor     rax, rdx
        ret

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170420/dced8ccf/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Apr 2017 - [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

[llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

[llvm-dev] FE_INEXACT being set for an exact conversion from float to i64

[llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

[llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

Reasonably Related Threads