Hello! I've just tried generating Win64 code and the result is not that good. First of all, XMM registers are saved without reason to do so. Not only this slows the performance but leads to random crashes too. XMMs are stored to the stack with MOVAPS instruction which requires 16-byte alignment which is not always the case. lli.exe (built in debug mode) randomly crashes on some simple hello-world-alike tests due to misalignment. Though the most problematic stuff is the lack of 'shadow zone' support in Win64 ABI. Or maybe I haven't figured out how to turn this on. In Win64 any function can treat 32 bytes of stack (RSP+08h..RSP+28h just after the call instruction) as scratch data. VC++ compiler stores arguments passed in registers there. In debug builds this doesn't get optimized away. Consider this C++ code: #include <stdio.h> int main () { for ( int i=0; i<5; i++ ) printf ( "%d\n", 0 ); return 0; } Compile it to llvm bytecode with -O0 flag. Then run debug build of 64-bit lli.exe (with -mtriple=x86_64-pc-windows argument). For me it prints 0's forever. The reason for this is printf function using shadow zone to store its arguments. Second arguments goes to the stack at address RSP+10h and overwrites 'i' variable always resetting it to zero. Is anyone aware of the second bug? If I have some time I'll try to fix it by myself but it'd be much better if someone hints me where to start from. -- Best Regards Peter Shugalev
On Thu, Jul 30, 2009 at 5:32 PM, Peter Shugalev<peter at shugalev.com> wrote:> Hello! > > I've just tried generating Win64 code and the result is not that good. > > First of all, XMM registers are saved without reason to do so. Not only > this slows the performance but leads to random crashes too. XMMs are > stored to the stack with MOVAPS instruction which requires 16-byte > alignment which is not always the case. lli.exe (built in debug mode) > randomly crashes on some simple hello-world-alike tests due to misalignment.http://llvm.org/bugs/show_bug.cgi?id=3739 is about the extra XMM stores; I thought the alignment was working, though...> Though the most problematic stuff is the lack of 'shadow zone' support > in Win64 ABI. Or maybe I haven't figured out how to turn this on. In > Win64 any function can treat 32 bytes of stack (RSP+08h..RSP+28h just > after the call instruction) as scratch data. VC++ compiler stores > arguments passed in registers there. In debug builds this doesn't get > optimized away.Wow, that's really strange... I'm pretty sure that simply isn't implemented. -Eli
>> Though the most problematic stuff is the lack of 'shadow zone' support >> in Win64 ABI. Or maybe I haven't figured out how to turn this on. In >> Win64 any function can treat 32 bytes of stack (RSP+08h..RSP+28h just >> after the call instruction) as scratch data. VC++ compiler stores >> arguments passed in registers there. In debug builds this doesn't get >> optimized away. > > Wow, that's really strange... I'm pretty sure that simply isn't implemented.Another side effect is function with more that four arguments. It won't work if LLVM/VC++ code is mixed. That's again because of abscence of 32-byte gap between stack top and arguments. E.g.: int main () { printf ( "%d %d %d %d\n", 1, 2, 3, 4 ); return 0; } Output: 1 2 3 0 Any ideas on how hard would it be to fix? -- Best Regards Peter Shugalev
Hi Peter, The attached patch is a workaround for the XMM misalignment issue. Basically it uses the fallback method of saving and restoring registers on the stack, which does work correctly with alignment. If I recall correctly it also doesn't save any registers unnecessarily, but I could be wrong about that. Anyway, it's hack, but if all you want for now is to be able to work with Win64 and use SSE this might offer a solution. I wasn't aware of the second bug you're describing, but the one in your latest e-mail about not being able to have more than four arguments I'm experiencing as well. I'm afraid I haven't found any workaround for that yet. Cheers, Nicolas -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Peter Shugalev Sent: vrijdag 31 juli 2009 2:32 To: LLVMdev at cs.uiuc.edu Subject: [LLVMdev] Win64 bugs Hello! I've just tried generating Win64 code and the result is not that good. First of all, XMM registers are saved without reason to do so. Not only this slows the performance but leads to random crashes too. XMMs are stored to the stack with MOVAPS instruction which requires 16-byte alignment which is not always the case. lli.exe (built in debug mode) randomly crashes on some simple hello-world-alike tests due to misalignment. Though the most problematic stuff is the lack of 'shadow zone' support in Win64 ABI. Or maybe I haven't figured out how to turn this on. In Win64 any function can treat 32 bytes of stack (RSP+08h..RSP+28h just after the call instruction) as scratch data. VC++ compiler stores arguments passed in registers there. In debug builds this doesn't get optimized away. Consider this C++ code: #include <stdio.h> int main () { for ( int i=0; i<5; i++ ) printf ( "%d\n", 0 ); return 0; } Compile it to llvm bytecode with -O0 flag. Then run debug build of 64-bit lli.exe (with -mtriple=x86_64-pc-windows argument). For me it prints 0's forever. The reason for this is printf function using shadow zone to store its arguments. Second arguments goes to the stack at address RSP+10h and overwrites 'i' variable always resetting it to zero. Is anyone aware of the second bug? If I have some time I'll try to fix it by myself but it'd be much better if someone hints me where to start from. -- Best Regards Peter Shugalev _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- A non-text attachment was scrubbed... Name: Win64workaround.patch Type: application/octet-stream Size: 899 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090731/840878f1/attachment.obj>
On 30-Jul-09, at 8:54 PM, Eli Friedman wrote:> On Thu, Jul 30, 2009 at 5:32 PM, Peter Shugalev<peter at shugalev.com> > wrote: >> Though the most problematic stuff is the lack of 'shadow zone' >> support >> in Win64 ABI. Or maybe I haven't figured out how to turn this on. In >> Win64 any function can treat 32 bytes of stack (RSP+08h..RSP+28h just >> after the call instruction) as scratch data. VC++ compiler stores >> arguments passed in registers there. In debug builds this doesn't get >> optimized away. > > Wow, that's really strange... I'm pretty sure that simply isn't > implemented.Anton K has a patch for this that we have been using successfully internally but he is still working out issues with regards to certain parameter types, in particular MMX/SSE params, that I believe he was going to fix before committing it. I'm sure he'll weigh into this discussion. Stefanus -- Stefanus Du Toit <stefanus.dutoit at rapidmind.com> RapidMind Inc. phone: +1 519 885 5455 x116 -- fax: +1 519 885 1463
> The attached patch is a workaround for the XMM misalignment issue. Basically > it uses the fallback method of saving and restoring registers on the stack, > which does work correctly with alignment. If I recall correctly it also > doesn't save any registers unnecessarily, but I could be wrong about that. > > Anyway, it's hack, but if all you want for now is to be able to work with > Win64 and use SSE this might offer a solution.Thanks a lot! I'll try it when I'm back from the vacation.
Hello, Nicolas> The attached patch is a workaround for the XMM misalignment issue. Basically > it uses the fallback method of saving and restoring registers on the stack, > which does work correctly with alignment. If I recall correctly it also > doesn't save any registers unnecessarily, but I could be wrong about that.Please don't use this patch, it's completely wrong. The problem is that prologue / epilogue emission code is not prepared for such 'fallback' solution and will emit improper stack update code. You can easily catch this problem when you have other callee-saved registers spilled (not only high xmm ones). I have patch which should complete the win64 CC support in LLVM (modulo varargs functions), I hope to commit it within next few days. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University