thr3ads.net - llvm dev - [LLVMdev] Efficient implementation of closures? [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Talin

2008-Dec-28 05:53 UTC

[LLVMdev] Efficient implementation of closures?

A somewhat random question: I'm wondering if there's any kind of trick 
in LLVM that would allow me to implement closures efficiently.

We can assume that a closure function has a hidden parameter which 
points to its environment, that is, the values of the variables which 
were in scope at the point where the closure function was defined.

The problem comes when mixing closure and non-closure functions, because 
closure functions have a different calling protocol. Suppose I have a 
function pointer. This function pointer can either point to a regular 
function, which does not take the hidden parameter; or it can point to a 
closure function, which does. The problem is that the call site cannot 
support two incompatible calling conventions.

I can think of two solutions to this problem, but both entail 
considerable loss of efficiency: The first solution is to have every 
function, closure or not, have the hidden parameter; For non-closure 
functions the value of the parameter is simply ignored. The downside is 
that now every function has to pay the cost of an extra parameter.

The second solution is that when calling via a pointer, we always call 
with the closure protocol, i.e. we include the hidden parameter. 
However, when taking the address of a non-closure function, we actually 
point to a stub function which strips off the hidden parameter before 
calling the real function. This solution is better in that it means that 
the extra cost is only paid when calling a non-closure function via a 
pointer. However, it means that we essentially pay the cost of 
marshalling the calling arguments twice.

-- Talin

Nick Johnson

2008-Dec-28 16:11 UTC

head link

[LLVMdev] Efficient implementation of closures?

On Sun, Dec 28, 2008 at 12:53 AM, Talin <viridia at gmail.com>
wrote:> A somewhat random question: I'm wondering if there's any kind of
trick
> in LLVM that would allow me to implement closures efficiently.
>There is a third option.   Let me explain using C++ terminology.

I assume that all escaping variables are in a heap-allocated
activation record.  Represent a closure as a class which overloads the
parenthesis operator.  When you construct that class, you give a
reference to the activation record as one of the parameters to the
constructor, and the class holds that reference as a field.  The
closure can be re-written to access escaping variables indirectly
through that reference.

Calling a closure is then no different than a virtual method call, and
there is no need to change the calling convention, or to mix-and-match
calling conventions.

Nick

Jon Harrop

2008-Dec-28 18:42 UTC

head link

[LLVMdev] Efficient implementation of closures?

On Sunday 28 December 2008 05:53:55 Talin wrote:> The second solution is that when calling via a pointer, we always call
> with the closure protocol, i.e. we include the hidden parameter.
> However, when taking the address of a non-closure function, we actually
> point to a stub function which strips off the hidden parameter before
> calling the real function. This solution is better in that it means that
> the extra cost is only paid when calling a non-closure function via a
> pointer. However, it means that we essentially pay the cost of
> marshalling the calling arguments twice.
I have been thinking about this myself recently and there is another 
specialization that can completely remove the performance hit of functional 
abstraction in some important cases and, in particular, it is made feasible 
by JIT compilation.

It may be instructive to consider a concrete example such as summing the 
elements of a float array by passing the add function to the higher-order 
fold function:

  let sum xs = Array.fold_left ( + ) 0.0 xs

Previous generation languages like OCaml compile this into a completely 
generic representation where polymorphism is handled at run-time. 
Consequently, even though the polymorphism is buried in the fold function 
(not visible at all in our "sum" function) it still incurs massive 
performance degradation.

JIT compilation makes it easy to avoid the cost of polymorphism by generating 
code for each permutation of type variables that a definition is used with. 
That immediately removes the cost of polymorphism because we get a version of 
fold specialized for float arrays.

However, there is still the overhead of the call to "+" which is
obviously
gratuitous in this case even if the function call is statically optimized 
from a closure into a function pointer. Inlining the higher-order fold 
function completely removes this cost as well and recovers the performance of 
hand-written C code.

JIT compilation makes a lot of interesting optimizations feasible...

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

me22

2008-Dec-28 18:49 UTC

head link

[LLVMdev] Efficient implementation of closures?

On Sun, Dec 28, 2008 at 13:42, Jon Harrop <jon at ffconsultancy.com>
wrote:>
> Previous generation languages like OCaml compile this into a completely
> generic representation where polymorphism is handled at run-time.
> Consequently, even though the polymorphism is buried in the fold function
> (not visible at all in our "sum" function) it still incurs
massive
> performance degradation.
>
> JIT compilation makes it easy to avoid the cost of polymorphism by
generating
> code for each permutation of type variables that a definition is used with.
> That immediately removes the cost of polymorphism because we get a version
of
> fold specialized for float arrays.
>
I'm fairly certain that MLton does this during static compilation.

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Dec 2008 - [LLVMdev] Efficient implementation of closures?

[LLVMdev] Efficient implementation of closures?

[LLVMdev] Efficient implementation of closures?

[LLVMdev] Efficient implementation of closures?

[LLVMdev] Efficient implementation of closures?

Possibly Parallel Threads