Katsuhiro Ueno
2013-Aug-02 18:26 UTC
[LLVMdev] bug of tail call optimization on x86 target
Dear LLVM developers, I am a developer of SML#, an ML-style functional programming language developed at Tohoku University. Currently we are intending to use LLVM as the backend of our SML# compiler in our upcoming release, and have rewritten our frontend and runtime so that they can cooperate with LLVM. LLVM works extremely fine with our SML# compiler. We are grateful to LLVM community for providing such an excellent software. However, when generating x86 code with enabling tail call optimization, unfortunately I found a bug that destroys callee-save registers. The following is the bug I found and the attached file is the fix of this bug. The following is the LLVM IR code that causes the bug: ; bug.ll target triple = "i686-apple-darwin12.4.0" declare fastcc void @foo(i32, i32, i32, i32, i32, i32) declare i32* @bar(i32*) define fastcc void @hoge(i32 %b) nounwind { %a = alloca i32 store i32 0, i32* %a %d = tail call i32* @bar(i32* %a) nounwind store i32 %b, i32* %d tail call fastcc void @foo(i32 1, i32 2, i32 3, i32 4, i32 5, i32 6) nounwind ret void } LLVM produces the following x86 code against the above source: % llc -tailcallopt -filetype=asm -disable-cfi -o - bug.ll ... _hoge: ## @hoge ## BB#0: subl $16, %esp pushl %esi ;;;; this preserves %esi, but subl $40, %esp movl %ecx, %esi movl $0, 40(%esp) ;;;; this overwrites preserved %esi leal 40(%esp), %eax movl %eax, (%esp) calll _bar movl %esi, (%eax) movl 60(%esp), %eax movl $6, 60(%esp) movl $5, 56(%esp) movl $4, 52(%esp) movl $3, 48(%esp) movl %eax, 44(%esp) movl $1, %ecx movl $2, %edx addl $40, %esp popl %esi ;;;; this always fails to restore %esi jmp _foo ## TAILCALL In the above code, 2nd instruction "pushl %esi" saves a callee-save register %esi to the stack, but 5th instruction "movl $0, 40(%esp)" overwrites the saved %esi. Eventually at the epilogue "popl %esi" fails to restore the previous %esi. This bug is due to wrong computation of stack object indices by the x86 backend. The attached patch indicates the wrong points. Due to integral promotion, explicit conversion to signed integer is needed at those points. Eliminating tail calls (with arbitrary number of arguments) is mandatory for practical functional languages. I would appreciate it if this bug would be fixed in the next release. Best regards, Katsuhiro Ueno -------------- next part -------------- A non-text attachment was scrubbed... Name: x86-tailcallopt-fix.diff Type: application/octet-stream Size: 1790 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130803/b8ef92ae/attachment.obj>
Hi Katsuhiro, Thanks very much for taking the time to dig into this and produce a patch. Is there any chance you could make a couple of tweaks?> This bug is due to wrong computation of stack object indices by the > x86 backend. The attached patch indicates the wrong points. Due to > integral promotion, explicit conversion to signed integer is needed > at those points.I'm not convinced that's the best solution, at least conceptually. SlotSize really is an unsigned quantity, and though it's unlikely we'd like 0x80000000 to be interpreted as positive, rather than negative if it ever does occur. Also, CreateFixedObject seems to take an int64_t I'd suggest: + ISelLowering line 2459: cast FPDiff to int64_t. + ISelLowering line 3327: cast SlotSize to int64_t. + FrameLowering: declare TailCallReturnAddrDelta as int64_t and subtract SlotSize? It would also be really good if you could convert your IR into a test-case (like the others in "test/CodeGen/X86/"). We like to have a test-case for every commit if it's humanly possible. Cheers. Tim.
Katsuhiro Ueno
2013-Aug-03 03:08 UTC
[LLVMdev] bug of tail call optimization on x86 target
Hi Tim, Thank you for your quick response.> I'm not convinced that's the best solution, at least conceptually. > SlotSize really is an unsigned quantity, and though it's unlikely we'd > like 0x80000000 to be interpreted as positive, rather than negative if > it ever does occur.I totally agree with you. I rewrote my fix and made a test case according to your suggestion. All of them are included in the attached file. Thanks and regards, Katsuhiro Ueno -------------- next part -------------- A non-text attachment was scrubbed... Name: x86-tailcallopt-fix-2.diff Type: application/octet-stream Size: 2947 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130803/901297c7/attachment.obj>