Mark Seaborn
2013-Aug-02 04:11 UTC
[LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual
On 1 August 2013 00:11, Travis Cross <tc at travislists.com> wrote:> On 2013-07-30 22:11, Eli Bendersky wrote: > > we've published an initial version of the PNaCl bitcode reference > > manual online - > > http://www.chromium.org/nativeclient/pnacl/bitcode-abi. The PNaCl > > bitcode is a restricted subset of LLVM IR. > > > > Any comments would be most welcome. > > Hi Eli, > > I appreciate you for opening the process for input and comments. One > question stood out to me while reading the document: > > The document [1] indicates that only the 'ccc' calling convention will > be supported. The LLVM documentation [2] prominently notes that, > "tail calls can only be optimized when [fastcc], the GHC or the HiPE > convention is used."That note in the documentation seems to be incorrect, because LLVM will do tail call optimisations on at least x86 when using the "ccc" calling convention. For example: $ cat tail_call1.c void foo(int arg); void bar(int arg) { foo(arg); } $ clang tail_call1.c -S -o - -O2 ... bar: # @bar ... jmp foo # TAILCALL ... However, LLVM doesn't emit a tail call at -O0. Maybe what the documentation means to say is that tail call optimisations are only guaranteed to be done when using fastcc etc.? Further, I notice that the document includes> "call" but not "tail call" in the list of supported instructions. >That's an omission in the document. We do actually allow the "tail call" instruction in PNaCl. However, we haven't specified any guarantees that a tail call optimisation will happen. I suppose we could specify some guarantees if people want this. Maybe we could say that a "tail call" instruction is guaranteed to be turned into a real tail call if the callee function has the same number of arguments as the caller function or fewer? I think that would work on all the architectures PNaCl targets. Then we would have to change -O0 translation so that the guarantee is provided consistently at all optimisation levels. Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130801/1719e45f/attachment.html>
Travis Cross
2013-Aug-02 07:26 UTC
[LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual
Hi Mark, On 2013-08-02 04:11, Mark Seaborn wrote:> That note in the documentation seems to be incorrect, because LLVM > will do tail call optimisations on at least x86 when using the "ccc" > calling convention. For example: > > $ cat tail_call1.c > void foo(int arg); > void bar(int arg) { > foo(arg); > } > > $ clang tail_call1.c -S -o - -O2 > ... > bar: # @bar > ... > jmp foo # TAILCALL > ...That's actually a sibling call, which is a much less general form. See: http://llvm.org/releases/3.3/docs/CodeGenerator.html#sibling-call-optimization The above C roughly translates into: declare i32 @callee(i32) define i32 @caller(i32 %a1) { %1 = tail call i32 @callee(i32 %a1) ret i32 %1 } Which, as you note, produces a jmp. However, simply extending the argument list of the callee breaks this on x86-32: declare i32 @callee(i32, i32) define i32 @caller(i32 %a1) { %1 = tail call i32 @callee(i32 %a1, i32 0) ret i32 %1 } That results in a call rather than a jmp. You won't get a jmp back unless you write: declare fastcc i32 @callee(i32, i32) define fastcc i32 @caller(i32 %a1) { %1 = tail call fastcc i32 @callee(i32 %a1, i32 0) ret i32 %1 } (Or use cc10 rather than fastcc.) You can even break the tail call elimination by simply passing a value not in the caller's incoming argument stack (or one in the wrong position), again on x86-32: declare i32 @callee(i32) define i32 @caller(i32 %a1) { %a = add i32 %a1, %a1 %1 = tail call i32 @callee(i32 %a) ret i32 %1 } x86-64 will treat more things as sibling calls because you need 7 arguments before anything reaches the stack. But the same thing will happen on X86-64 at that point. This results in call rather than jmp: declare i32 @callee(i32, i32, i32, i32, i32, i32, i32) define i32 @caller(i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7) { %1 = tail call i32 @callee(i32 %a7, i32 %a6, i32 %a5, i32 %a4, i32 %a3, i32 %a2, i32 %a1) ret i32 %1 }> However, LLVM doesn't emit a tail call at -O0. > > Maybe what the documentation means to say is that tail call > optimisations are only guaranteed to be done when using fastcc etc.?It's a bit ambiguous because the author presumably intended to distinguish tail calls from sibling calls as is the convention in the C world.> Further, I notice that the document includes > "call" but not "tail call" in the list of supported instructions. > > > That's an omission in the document. We do actually allow the "tail > call" instruction in PNaCl.Great.> However, we haven't specified any guarantees that a tail call > optimisation will happen. I suppose we could specify some > guarantees if people want this.I found a thread where Evan Cheng proposed removing -tailcallopt from LLVM. It resulted in a number of language implementers who require guaranteed tail calls coming out of the woodwork to express their displeasure at the idea. It also resulted in some good discussion about why this is necessary and desirable. http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-February/029191.html The ability to guarantee tail call elimination is required for implementing functional languages. The Scheme standard requires all conforming implementations to guarantee that tail calls are eliminated. Microsoft would probably have never tried building F# on the CLR if the CLR could not promise this. Functional languages see tail call elimination as a matter of language semantics rather than as an optimization.> Maybe we could say that a "tail call" instruction is guaranteed to > be turned into a real tail call if the callee function has the same > number of arguments as the caller function or fewer? I think that > would work on all the architectures PNaCl targets.Well, it would certainly be better than not doing it, and perhaps it could be a step toward broader support, but it doesn't really solve the issue as functional languages rely on the semantics of tail calls being eliminated in all possible cases. What you're describing here is more general than sibling calls, as sibling calls have additional constraints such as requiring all arguments passed to the callee on the stack come from the caller's incoming argument stack. We really should find a plan for achieving support for full tail call elimination. Otherwise we'll end up with a rather embarrassing regression as compared to the CLR (and why should Microsoft have all the fun?).> Then we would have to change -O0 translation so that the guarantee > is provided consistently at all optimisation levels.That'd be ideal, yes. When the client runs llc, is it going to run it at a fixed or client determined optimization level or one specified by metadata that follows the code? Best, -- TC
JF Bastien
2013-Aug-02 14:12 UTC
[LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual
> That'd be ideal, yes. When the client runs llc, is it going to run it > at a fixed or client determined optimization level or one specified by > metadata that follows the code?You mean for PNaCl? The developer can run LLVM's optimizations on his machine and generate a pexe. The pexe and manifest are then served to users, and this manifest specifies which optimization level to use to translate the pexe to a nexe on user's devices. You can therefore choose between SelectionDAG and FastISel (fast code or fast translation) on the user's device without giving up all performance because the non-target-specific optimizations have been executed already. This is something that we've benchmarked fairly closely and plan to improve. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130802/b90205fb/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual
- [LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual
- [LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual
- [LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual
- [LLVMdev] PNaCl Bitcode reference manual