Dmitry Borisenkov
2014-Oct-07 17:50 UTC
[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)
Hello everyone. I'm not an expert neither in llvm nor in x86 nor in IEEE standard for floating point numbers, thus any of my following assumptions maybe wrong. If so, I will be grateful if you clarify me what's goes wrong. But if my guesses are correct we possibly have a bug in fp arithmetics on x86. I have the following ir: @g = constant i64 1 define i32 @main() { %gval = load i64* @g %gvalfp = bitcast i64 %gval to double %fmul = fmul double %gvalfp, -5.000000e-01 %fcmp = fcmp ueq double %fmul, -0.000000e+00 %ret = select i1 %fcmp, i32 1, i32 0 ret i32 %ret } And I expected that minimal positive denormalized double times -0.5 is equal to -0.0, so correct exit code is 1. llvm-3.4.2 on x86 linux target produced the following assembly: .file "fpfail.ll" .section .rodata.cst8,"aM", at progbits,8 .align 8 .LCPI0_0: .quad -4620693217682128896 # double -0.5 .LCPI0_1: .quad -9223372036854775808 # double -0 .text .globl main .align 16, 0x90 .type main, at function main: # @main .cfi_startproc # BB#0: vmovsd g, %xmm0 vmulsd .LCPI0_0, %xmm0, %xmm0 vucomisd .LCPI0_1, %xmm0 sete %al movzbl %al, %eax ret .Ltmp0: .size main, .Ltmp0-main .cfi_endproc .type g, at object # @g .section .rodata,"a", at progbits .globl g .align 8 g: .quad 1 # 0x1 .size g, 8 .section ".note.GNU-stack","", at progbits ./llc -march=x86 fpfail.ll; g++ fpfail.s; ./a.out; echo $? returns 1 as expected. But llvm-3.5 (on the same target) lowers the previous ir using floating point instructions in the following way. .text .file "fpfail.ll" .section .rodata.cst4,"aM", at progbits,4 .align 4 .LCPI0_0: .long 3204448256 # float -0.5 .text .globl main .align 16, 0x90 .type main, at function main: # @main .cfi_startproc # BB#0: fldl g fmuls .LCPI0_0 fldz fchs fxch %st(1) fucompp fnstsw %ax # kill: AX<def> AX<kill> EAX<def> # kill: AH<def> AH<kill> EAX<kill> sahf sete %al movzbl %al, %eax retl .Ltmp0: .size main, .Ltmp0-main .cfi_endproc .type g, at object # @g .section .rodata,"a", at progbits .globl g .align 8 g: .quad 1 # 0x1 .size g, 8 .section ".note.GNU-stack","", at progbits First, it doesn't assemble with g++ (4.8): fpfail.s:26: Error: invalid instruction suffix for `ret' I downloaded Intel manual and haven't found any mention of retl instruction, so I manually exchanged it with ret and reassemble: g++ fpfail.s; ./a.out; echo $? The exit code is 0. This is correct for Intel 80-bit floats but wrong for doubles. What am I do wrong or this is actually a bug or even worse - correct behavior? -- Kind regards, Dmitry Borisenkov -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141007/8d518e8b/attachment.html>
Tim Northover
2014-Oct-07 18:26 UTC
[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)
Hi Dmitry, On 7 October 2014 10:50, Dmitry Borisenkov <d.borisenkov at samsung.com> wrote:> fpfail.s:26: Error: invalid instruction suffix for `ret' > > I downloaded Intel manual and haven’t found any mention of retl instruction,"retl" is the AT&T syntax for the normal "ret" instruction in the Intel manual, which makes it mostly undocumented.> The exit code is 0. This is correct for Intel 80-bit floats but wrong for > doubles. What am I do wrong or this is actually a bug or even worse – > correct behavior?I think the default CPU used by llc was changed between 3.4 and 3.5. Before, we defaulted to the host's CPU (from memory), but now we pick a lowest common denominator "generic", which doesn't support SSE. When the IR comes from Clang, I believe we define the "FLT_EVAL_METHOD" macro to be 2 in this case (see C99 5.2.4.2.2), which signals that operations are performed at "long double" precision and the outcome you see is permitted. So I *think* this is OK, unless I'm misunderstanding one of the specs involved. Cheers. Tim.
Joerg Sonnenberger
2014-Oct-07 18:45 UTC
[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)
On Tue, Oct 07, 2014 at 09:50:37PM +0400, Dmitry Borisenkov wrote:> I'm not an expert neither in llvm nor in x86 nor in IEEE standard for > floating point numbers, thus any of my following assumptions maybe wrong. If > so, I will be grateful if you clarify me what's goes wrong. But if my > guesses are correct we possibly have a bug in fp arithmetics on x86.Are you targetting the same backend? i386 (32bit mode) uses FPU registers for argument passing and return values, x86_64 / amd64 (64bit mode) uses SSE registers for float/double values and FPU registers for long double. The error on retl makes me think the second example is compiled for i386, while the first example looks more like x86_64. Joerg
Dmitry Borisenkov
2014-Oct-08 08:36 UTC
[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)
Hi, Joerg Both of the examples were compiled ./llc -march=x86 -O3 fpfail.ll (i386). I've double checked it. Kind regards, Dmitry Borisenkov -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Joerg Sonnenberger Sent: Tuesday, October 07, 2014 10:45 PM To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly) On Tue, Oct 07, 2014 at 09:50:37PM +0400, Dmitry Borisenkov wrote:> I'm not an expert neither in llvm nor in x86 nor in IEEE standard for > floating point numbers, thus any of my following assumptions maybe > wrong. If so, I will be grateful if you clarify me what's goes wrong. > But if my guesses are correct we possibly have a bug in fp arithmetics onx86. Are you targetting the same backend? i386 (32bit mode) uses FPU registers for argument passing and return values, x86_64 / amd64 (64bit mode) uses SSE registers for float/double values and FPU registers for long double. The error on retl makes me think the second example is compiled for i386, while the first example looks more like x86_64. Joerg _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Stephen Checkoway
2014-Oct-10 06:48 UTC
[LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)
On Oct 7, 2014, at 2:26 PM, Tim Northover <t.p.northover at gmail.com> wrote:> Hi Dmitry, > > On 7 October 2014 10:50, Dmitry Borisenkov <d.borisenkov at samsung.com> wrote: >> fpfail.s:26: Error: invalid instruction suffix for `ret' >> >> I downloaded Intel manual and haven’t found any mention of retl instruction, > > "retl" is the AT&T syntax for the normal "ret" instruction in the > Intel manual, which makes it mostly undocumented.Are you sure about that? I don't recall ever seeing retl before. A while back a reference for AT&T was mentioned and, as I recall, this was the best anyone had <http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf>. It contains no mention of retl. This seems to be the commit that added support for it <http://lists.cs.uiuc.edu/pipermail/llvm-branch-commits/2010-May/003229.html>. I'm not sure I understand the distinction between retl/retq. x86 has 4 return instruction (cribbing from the Intel manual): C3 RET Near return CB RET Far return C2 iw RET imm16 Near return + pop imm16 bytes CA iw RET imm16 Far return + pop imm16 bytes (And I think that's been true since the 8086.) Distinguishing between near and far (e.g., ret vs. lret in AT&T or retn vs. retf with some other assemblers) makes sense, but what would a l or q suffix denote? But more to the point, even if there's a good reason to accept retl/retq as input, is there any reason to emit it ever? -- Stephen Checkoway
Apparently Analagous Threads
- [LLVMdev] Stange behavior in fp arithmetics on x86 (bug possibly)
- [LLVMdev] Contants generation
- [LLVMdev] LLVM Build Bot failure on llmv-x86_64-ubuntu
- [LLVMdev] [ARM] [PIC] optimizing the loading of hidden global variable
- KNL Assembly Code for Matrix Multiplication