Hi all, I've added tail call optimization to x86. This is different from what -tailcallopt does, which forces fastcc function to be tail callable. My changes detect opportunities to do tail call without having to change the ABI. I've looked at the codegen of -tailcallopt and it doesn't look all that good. Running it as a llcbeta option shows it significantly pessimize code in most cases. As far as I can tell only PPC and X86 targets are supporting this option. Does anyone actually using it? I'd prefer to just remove it to clean up the implementation if no one has any objections. Evan
On Feb 5, 2010, at 3:35 PM, Evan Cheng wrote:> Hi all, > > I've added tail call optimization to x86. This is different from what -tailcallopt does, which forces fastcc function to be tail callable. My changes detect opportunities to do tail call without having to change the ABI. > > I've looked at the codegen of -tailcallopt and it doesn't look all that good. Running it as a llcbeta option shows it significantly pessimize code in most cases. > > As far as I can tell only PPC and X86 targets are supporting this option. Does anyone actually using it? I'd prefer to just remove it to clean up the implementation if no one has any objections.Right now, -tailcallopt is the way to get guaranteed tail calls. For projects which need this, it's not an optimization; it's a correctness requirement. Dan
On Feb 5, 2010, at 7:19 PM, Jon Harrop wrote:> On Friday 05 February 2010 23:35:15 Evan Cheng wrote: >> Does anyone actually using it? > > Yes, many LLVM-based projects rely upon TCO to work correctly.Ok, that's all I need to know.> >> I'd prefer to just remove it to clean up the implementation if no one has >> any objections. > > Are you saying that you want to remove LLVM's working TCO and replace it with > something that is faster but broken?No, I'd rather have something that's working and helps performance. Evan> > I think you may have misunderstood what TCO is and why users want it. TCO > allows an unbounded number of calls in tail position to use only a bounded > amount of stack space. Replacing branches within functions with tail calls > generalizes loops and makes them arbitrarily extensible. Consequently, many > language implementations (particularly functional languages) rely upon TCO to > implement loops. So breaking TCO literally means breaking loops for all of > those projects. > > You can imagine the reaction you would get if you went to the Clang guys and > advocated a new "for" loop because it was faster but segfaulted every 10,000 > iterations! > > Moreover, working TCO is one of the features that sets LLVM apart from its > competitors. The JVM doesn't provide TCO. Mono's implementation of TCO is > broken. Microsoft's .NET is one of the only alternatives to provide working > TCO and even it has more restrictions than LLVM does. > > The performance of tail calls on LLVM is a minor concern for me and I would > appreciate it being optimized but certainly not at the cost of correctness. > > -- > Dr Jon Harrop, Flying Frog Consultancy Ltd. > http://www.ffconsultancy.com/?e > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Friday 05 February 2010 23:35:15 Evan Cheng wrote:> Does anyone actually using it?Yes, many LLVM-based projects rely upon TCO to work correctly.> I'd prefer to just remove it to clean up the implementation if no one has > any objections.Are you saying that you want to remove LLVM's working TCO and replace it with something that is faster but broken? I think you may have misunderstood what TCO is and why users want it. TCO allows an unbounded number of calls in tail position to use only a bounded amount of stack space. Replacing branches within functions with tail calls generalizes loops and makes them arbitrarily extensible. Consequently, many language implementations (particularly functional languages) rely upon TCO to implement loops. So breaking TCO literally means breaking loops for all of those projects. You can imagine the reaction you would get if you went to the Clang guys and advocated a new "for" loop because it was faster but segfaulted every 10,000 iterations! Moreover, working TCO is one of the features that sets LLVM apart from its competitors. The JVM doesn't provide TCO. Mono's implementation of TCO is broken. Microsoft's .NET is one of the only alternatives to provide working TCO and even it has more restrictions than LLVM does. The performance of tail calls on LLVM is a minor concern for me and I would appreciate it being optimized but certainly not at the cost of correctness. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
Evan Cheng wrote:> As far as I can tell only PPC and X86 targets are supporting this option. Does anyone actually using it? I'd prefer to just remove it to clean up the implementation if no one has any objections.Don't know whether that is the same, but my Pure compiler sets llvm::PerformTailCallOpt. Pure needs TCO because it doesn't have any built-in looping constructs. In fact, most functional language implementations will rely on TCO. If you can make this work reliably on all supported platforms without needing special flags, that would be welcome. Otherwise please keep the flag. (That said, last time I tried LLVM 2.7svn, TCO was broken in the JIT, at least on x86_64. It works fine up to LLVM 2.6 for me, though.) Albert -- Dr. Albert Gr"af Dept. of Music-Informatics, University of Mainz, Germany Email: Dr.Graef at t-online.de, ag at muwiinfa.geschichte.uni-mainz.de WWW: http://www.musikinformatik.uni-mainz.de/ag
Thanks for improving this! Since no good deed can go unpunished, could you also update the LangRef and codegen documents to describe the new state of the world? It might also be nice to mention there that tail calls can be needed for correctness in addition to performance and how users can guarantee that a call will be transformed, since both if us have forgotten recently. I can take a stab at the second one if you don't have time. Thanks, Jeffrey On Friday, February 5, 2010, Evan Cheng <evan.cheng at apple.com> wrote:> Hi all, > > I've added tail call optimization to x86. This is different from what -tailcallopt does, which forces fastcc function to be tail callable. My changes detect opportunities to do tail call without having to change the ABI. > > I've looked at the codegen of -tailcallopt and it doesn't look all that good. Running it as a llcbeta option shows it significantly pessimize code in most cases. > > As far as I can tell only PPC and X86 targets are supporting this option. Does anyone actually using it? I'd prefer to just remove it to clean up the implementation if no one has any objections. > > Evan > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
I am somewhat surprised people are actually using TCO. I had to fixed a number of subtle bugs to get it working and even now I am not too happy with it. My focus was on finding non-ABI changing automatic tail call cases (aka gcc's sibcall). It's now done so I'll leave -tailcallopt alone for now. I'll run -tailcallopt as x86 llcbeta to see if JIT is indeed broken. Evan On Feb 5, 2010, at 7:32 PM, Albert Graef wrote:> Evan Cheng wrote: >> As far as I can tell only PPC and X86 targets are supporting this option. Does anyone actually using it? I'd prefer to just remove it to clean up the implementation if no one has any objections. > > Don't know whether that is the same, but my Pure compiler sets > llvm::PerformTailCallOpt. Pure needs TCO because it doesn't have any > built-in looping constructs. In fact, most functional language > implementations will rely on TCO. If you can make this work reliably on > all supported platforms without needing special flags, that would be > welcome. Otherwise please keep the flag. > > (That said, last time I tried LLVM 2.7svn, TCO was broken in the JIT, at > least on x86_64. It works fine up to LLVM 2.6 for me, though.) > > Albert > > -- > Dr. Albert Gr"af > Dept. of Music-Informatics, University of Mainz, Germany > Email: Dr.Graef at t-online.de, ag at muwiinfa.geschichte.uni-mainz.de > WWW: http://www.musikinformatik.uni-mainz.de/ag
On Feb 5, 2010, at 9:03 PM, Jeffrey Yasskin wrote:> Thanks for improving this! Since no good deed can go unpunished, could > you also update the LangRef and codegen documents to describe the new > state of the world? It might also be nice to mention there that tailI plan to do this part before 2.7.> calls can be needed for correctness in addition to performance and how > users can guarantee that a call will be transformed, since both if us > have forgotten recently. I can take a stab at the second one if you > don't have time.I don't really know enough about how TCO is used by Pure, Haskell, etc. So if you could write this part, I'd appreciate it. Evan> > Thanks, > Jeffrey > > On Friday, February 5, 2010, Evan Cheng <evan.cheng at apple.com> wrote: >> Hi all, >> >> I've added tail call optimization to x86. This is different from what -tailcallopt does, which forces fastcc function to be tail callable. My changes detect opportunities to do tail call without having to change the ABI. >> >> I've looked at the codegen of -tailcallopt and it doesn't look all that good. Running it as a llcbeta option shows it significantly pessimize code in most cases. >> >> As far as I can tell only PPC and X86 targets are supporting this option. Does anyone actually using it? I'd prefer to just remove it to clean up the implementation if no one has any objections. >> >> Evan >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>
On Feb 5, 2010, at 15:35, Evan Cheng wrote:> > I've added tail call optimization to x86. This is different from what -tailcallopt does, which forces fastcc function to be tail callable. My changes detect opportunities to do tail call without having to change the ABI.+1 for opportunistically optimizing tail calls in the "ccc" convention during x86 assembly generation. This is a desirable feature.> Does anyone actually using it? I'd prefer to just remove it to clean up the implementation if no one has any objections.-1 for removing the guaranteed tail call feature of the "fastcc" convention. As others have noted, there are several projects (mine included) that require the "tail" call site qualifier in the IR language. — j h woodyatt <jhw at conjury.org> http://jhw.vox.com/