A somewhat random question: I'm wondering if there's any kind of trick in LLVM that would allow me to implement closures efficiently. We can assume that a closure function has a hidden parameter which points to its environment, that is, the values of the variables which were in scope at the point where the closure function was defined. The problem comes when mixing closure and non-closure functions, because closure functions have a different calling protocol. Suppose I have a function pointer. This function pointer can either point to a regular function, which does not take the hidden parameter; or it can point to a closure function, which does. The problem is that the call site cannot support two incompatible calling conventions. I can think of two solutions to this problem, but both entail considerable loss of efficiency: The first solution is to have every function, closure or not, have the hidden parameter; For non-closure functions the value of the parameter is simply ignored. The downside is that now every function has to pay the cost of an extra parameter. The second solution is that when calling via a pointer, we always call with the closure protocol, i.e. we include the hidden parameter. However, when taking the address of a non-closure function, we actually point to a stub function which strips off the hidden parameter before calling the real function. This solution is better in that it means that the extra cost is only paid when calling a non-closure function via a pointer. However, it means that we essentially pay the cost of marshalling the calling arguments twice. -- Talin
On Sun, Dec 28, 2008 at 12:53 AM, Talin <viridia at gmail.com> wrote:> A somewhat random question: I'm wondering if there's any kind of trick > in LLVM that would allow me to implement closures efficiently. >There is a third option. Let me explain using C++ terminology. I assume that all escaping variables are in a heap-allocated activation record. Represent a closure as a class which overloads the parenthesis operator. When you construct that class, you give a reference to the activation record as one of the parameters to the constructor, and the class holds that reference as a field. The closure can be re-written to access escaping variables indirectly through that reference. Calling a closure is then no different than a virtual method call, and there is no need to change the calling convention, or to mix-and-match calling conventions. Nick
On Sunday 28 December 2008 05:53:55 Talin wrote:> The second solution is that when calling via a pointer, we always call > with the closure protocol, i.e. we include the hidden parameter. > However, when taking the address of a non-closure function, we actually > point to a stub function which strips off the hidden parameter before > calling the real function. This solution is better in that it means that > the extra cost is only paid when calling a non-closure function via a > pointer. However, it means that we essentially pay the cost of > marshalling the calling arguments twice.I have been thinking about this myself recently and there is another specialization that can completely remove the performance hit of functional abstraction in some important cases and, in particular, it is made feasible by JIT compilation. It may be instructive to consider a concrete example such as summing the elements of a float array by passing the add function to the higher-order fold function: let sum xs = Array.fold_left ( + ) 0.0 xs Previous generation languages like OCaml compile this into a completely generic representation where polymorphism is handled at run-time. Consequently, even though the polymorphism is buried in the fold function (not visible at all in our "sum" function) it still incurs massive performance degradation. JIT compilation makes it easy to avoid the cost of polymorphism by generating code for each permutation of type variables that a definition is used with. That immediately removes the cost of polymorphism because we get a version of fold specialized for float arrays. However, there is still the overhead of the call to "+" which is obviously gratuitous in this case even if the function call is statically optimized from a closure into a function pointer. Inlining the higher-order fold function completely removes this cost as well and recovers the performance of hand-written C code. JIT compilation makes a lot of interesting optimizations feasible... -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
On Sun, Dec 28, 2008 at 13:42, Jon Harrop <jon at ffconsultancy.com> wrote:> > Previous generation languages like OCaml compile this into a completely > generic representation where polymorphism is handled at run-time. > Consequently, even though the polymorphism is buried in the fold function > (not visible at all in our "sum" function) it still incurs massive > performance degradation. > > JIT compilation makes it easy to avoid the cost of polymorphism by generating > code for each permutation of type variables that a definition is used with. > That immediately removes the cost of polymorphism because we get a version of > fold specialized for float arrays. >I'm fairly certain that MLton does this during static compilation.