thr3ads.net - llvm dev - [LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Mark Seaborn

2013-Aug-02 04:11 UTC

[LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual

On 1 August 2013 00:11, Travis Cross <tc at travislists.com> wrote:
> On 2013-07-30 22:11, Eli Bendersky wrote:
> > we've published an initial version of the PNaCl bitcode reference
> > manual online -
> > http://www.chromium.org/nativeclient/pnacl/bitcode-abi. The PNaCl
> > bitcode is a restricted subset of LLVM IR.
> >
> > Any comments would be most welcome.
>
> Hi Eli,
>
> I appreciate you for opening the process for input and comments.  One
> question stood out to me while reading the document:
>
> The document [1] indicates that only the 'ccc' calling convention
will
> be supported.  The LLVM documentation [2] prominently notes that,
> "tail calls can only be optimized when [fastcc], the GHC or the HiPE
> convention is used."

That note in the documentation seems to be incorrect, because LLVM will do
tail call optimisations on at least x86 when using the "ccc" calling
convention.  For example:

$ cat tail_call1.c
void foo(int arg);
void bar(int arg) {
  foo(arg);
}

$ clang tail_call1.c -S -o - -O2
...
bar:                                    # @bar
...
    jmp    foo                     # TAILCALL
...

However, LLVM doesn't emit a tail call at -O0.

Maybe what the documentation means to say is that tail call optimisations
are only guaranteed to be done when using fastcc etc.?

Further, I notice that the document includes> "call" but not "tail call" in the list of supported
instructions.
>
That's an omission in the document.  We do actually allow the "tail
call"
instruction in PNaCl.

However, we haven't specified any guarantees that a tail call optimisation
will happen.  I suppose we could specify some guarantees if people want
this.  Maybe we could say that a "tail call" instruction is guaranteed
to
be turned into a real tail call if the callee function has the same number
of arguments as the caller function or fewer?  I think that would work on
all the architectures PNaCl targets.

Then we would have to change -O0 translation so that the guarantee is
provided consistently at all optimisation levels.

Cheers,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130801/1719e45f/attachment.html>

Travis Cross

2013-Aug-02 07:26 UTC

head link

[LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual

Hi Mark,

On 2013-08-02 04:11, Mark Seaborn wrote:> That note in the documentation seems to be incorrect, because LLVM
> will do tail call optimisations on at least x86 when using the
"ccc"
> calling convention.  For example:
> 
> $ cat tail_call1.c
> void foo(int arg);
> void bar(int arg) {
>   foo(arg);
> }
> 
> $ clang tail_call1.c -S -o - -O2
> ...
> bar:                                    # @bar
> ...
>     jmp    foo                     # TAILCALL
> ...
That's actually a sibling call, which is a much less general form.

See:

  http://llvm.org/releases/3.3/docs/CodeGenerator.html#sibling-call-optimization

The above C roughly translates into:

  declare i32 @callee(i32)
  define i32 @caller(i32 %a1) {
    %1 = tail call i32 @callee(i32 %a1)
    ret i32 %1
  }

Which, as you note, produces a jmp.  However, simply extending the
argument list of the callee breaks this on x86-32:

  declare i32 @callee(i32, i32)
  define i32 @caller(i32 %a1) {
    %1 = tail call i32 @callee(i32 %a1, i32 0)
    ret i32 %1
  }

That results in a call rather than a jmp.  You won't get a jmp back
unless you write:

  declare fastcc i32 @callee(i32, i32)
  define fastcc i32 @caller(i32 %a1) {
    %1 = tail call fastcc i32 @callee(i32 %a1, i32 0)
    ret i32 %1
  }

(Or use cc10 rather than fastcc.)

You can even break the tail call elimination by simply passing a value
not in the caller's incoming argument stack (or one in the wrong
position), again on x86-32:

  declare i32 @callee(i32)
  define i32 @caller(i32 %a1) {
    %a = add i32 %a1, %a1
    %1 = tail call i32 @callee(i32 %a)
    ret i32 %1
  }

x86-64 will treat more things as sibling calls because you need 7
arguments before anything reaches the stack.  But the same thing will
happen on X86-64 at that point.  This results in call rather than jmp:

declare i32 @callee(i32, i32, i32, i32, i32, i32, i32)
define i32 @caller(i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32
%a7) {
  %1 = tail call i32 @callee(i32 %a7, i32 %a6, i32 %a5, i32 %a4, i32 %a3, i32
%a2, i32 %a1)
  ret i32 %1
}
> However, LLVM doesn't emit a tail call at -O0.
> 
> Maybe what the documentation means to say is that tail call
> optimisations are only guaranteed to be done when using fastcc etc.?
It's a bit ambiguous because the author presumably intended to
distinguish tail calls from sibling calls as is the convention in the
C world.
>     Further, I notice that the document includes
>     "call" but not "tail call" in the list of supported
instructions.
> 
> 
> That's an omission in the document.  We do actually allow the
"tail
> call" instruction in PNaCl.
Great.
> However, we haven't specified any guarantees that a tail call
> optimisation will happen.  I suppose we could specify some
> guarantees if people want this.
I found a thread where Evan Cheng proposed removing -tailcallopt from
LLVM.  It resulted in a number of language implementers who require
guaranteed tail calls coming out of the woodwork to express their
displeasure at the idea.  It also resulted in some good discussion
about why this is necessary and desirable.

  http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-February/029191.html

The ability to guarantee tail call elimination is required for
implementing functional languages.  The Scheme standard requires all
conforming implementations to guarantee that tail calls are
eliminated.

Microsoft would probably have never tried building F# on the CLR if
the CLR could not promise this.

Functional languages see tail call elimination as a matter of language
semantics rather than as an optimization.
> Maybe we could say that a "tail call" instruction is guaranteed
to
> be turned into a real tail call if the callee function has the same
> number of arguments as the caller function or fewer?  I think that
> would work on all the architectures PNaCl targets.
Well, it would certainly be better than not doing it, and perhaps it
could be a step toward broader support, but it doesn't really solve
the issue as functional languages rely on the semantics of tail calls
being eliminated in all possible cases.

What you're describing here is more general than sibling calls, as
sibling calls have additional constraints such as requiring all
arguments passed to the callee on the stack come from the caller's
incoming argument stack.

We really should find a plan for achieving support for full tail call
elimination.  Otherwise we'll end up with a rather embarrassing
regression as compared to the CLR (and why should Microsoft have all
the fun?).
> Then we would have to change -O0 translation so that the guarantee
> is provided consistently at all optimisation levels.
That'd be ideal, yes.  When the client runs llc, is it going to run it
at a fixed or client determined optimization level or one specified by
metadata that follows the code?

Best,

-- TC

JF Bastien

2013-Aug-02 14:12 UTC

head link

[LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual

> That'd be ideal, yes.  When the client runs llc, is it going to run it
> at a fixed or client determined optimization level or one specified by
> metadata that follows the code?
You mean for PNaCl? The developer can run LLVM's optimizations on his
machine and generate a pexe. The pexe and manifest are then served to
users, and this manifest specifies which optimization level to use to
translate the pexe to a nexe on user's devices.

You can therefore choose between SelectionDAG and FastISel (fast code or
fast translation) on the user's device without giving up all performance
because the non-target-specific optimizations have been executed already.

This is something that we've benchmarked fairly closely and plan to
improve.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130802/b90205fb/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Aug 2013 - [LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual

[LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual

[LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual

[LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual

Possibly Parallel Threads