Michael Spencer
2011-Apr-14 19:16 UTC
[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.
I finally got all of the 3DNow! instruction intrinsics and builtins into LLVM and Clang, however, while testing them, I've noticed that they produce incorrect results. For example: typedef float V2f __attribute__((vector_size(8))); int main() { V2f dest, a = {1.0, 3.0}, b = {10.0, 3.5}; dest = __builtin_ia32_pfadd(a, b); printf("(%f, %f)\n", dest[0], dest[1]); } Should output (11, 6.5). However, it outputs different values depending on the optimization level. Generally one of them is correct, and the other is -nan. I looked at the program using a debugger, and the pfadd instruction is executed correctly and the MMX register contains the correct values. The code that prepares the stack for the printf call seems to be messing it up. Here's the assembly generated at O3 for the above: .file "intrin.c" .text .globl main .align 16, 0x90 .type main, at function main: # @main # BB#0: # %entry pushl %ebp movl %esp, %ebp subl $56, %esp movl $1077936128, -12(%ebp) # imm = 0x40400000 movl $1065353216, -16(%ebp) # imm = 0x3F800000 movl $1080033280, -4(%ebp) # imm = 0x40600000 movl $1092616192, -8(%ebp) # imm = 0x41200000 movq -16(%ebp), %mm0 pfadd -8(%ebp), %mm0 movq %mm0, -24(%ebp) flds -20(%ebp) fstpl 12(%esp) flds -24(%ebp) fstpl 4(%esp) movl $.L.str, (%esp) calll printf xorl %eax, %eax addl $56, %esp popl %ebp ret .Ltmp0: .size main, .Ltmp0-main .type .L.str, at object # @.str .section .rodata.str1.1,"aMS", at progbits,1 .L.str: .asciz "%f, %f\n" .size .L.str, 8 .section ".note.GNU-stack","", at progbits Attached are my patches to enable support for this. I'd like to be done with this, because 3DNow! isn't even supported anymore. I was just adding these to learn tblgen and fill in some of the MSVC intrinsic headers. - Michael Spencer -------------- next part -------------- A non-text attachment was scrubbed... Name: 3dnow-builtins.patch Type: application/octet-stream Size: 23790 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110414/bdbdf8e4/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: clang-3dnow-builtins.patch Type: application/octet-stream Size: 8274 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110414/bdbdf8e4/attachment-0001.obj>
Eli Friedman
2011-Apr-14 19:47 UTC
[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.
On Thu, Apr 14, 2011 at 12:16 PM, Michael Spencer <bigcheesegs at gmail.com> wrote:> I finally got all of the 3DNow! instruction intrinsics and builtins > into LLVM and Clang, however, while testing them, I've noticed that > they produce incorrect results. > > For example: > > typedef float V2f __attribute__((vector_size(8))); > > int main() { > V2f dest, a = {1.0, 3.0}, b = {10.0, 3.5}; > dest = __builtin_ia32_pfadd(a, b); > printf("(%f, %f)\n", dest[0], dest[1]); > } > > Should output (11, 6.5). However, it outputs different values > depending on the optimization level. Generally one of them is correct, > and the other is -nan. > > I looked at the program using a debugger, and the pfadd instruction is > executed correctly and the MMX register contains the correct values. > The code that prepares the stack for the printf call seems to be > messing it up.I would call that "user error"; basically, using MMX instructions messes up the FP stack, and we assume the user is smart enough to make sure the two don't mix. -Eli
Chris Lattner
2011-Apr-14 21:37 UTC
[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.
On Apr 14, 2011, at 12:47 PM, Eli Friedman wrote:>> I looked at the program using a debugger, and the pfadd instruction is >> executed correctly and the MMX register contains the correct values. >> The code that prepares the stack for the printf call seems to be >> messing it up. > > I would call that "user error"; basically, using MMX instructions > messes up the FP stack, and we assume the user is smart enough to make > sure the two don't mix.More specifically, if you use MMX/3dNow intrinsics, you have to call "emms" at ABI boundaries. -Chris