Hello, This is a little off-topic. But I am writing a compiler to llvm ir for a language that admits recursive nested functions and am stuck as to how to translate them. Concretely, I'm trying to lift them all to the topmost level and pass all their free variables explicitly as arguments. To do this, I have to determine all their free variables in their bodies. In particular when I come across a function call, I have to add _its_ free variables to the list of free variables of the enclosing function. You can see how this will not work if we allow recursive functions. Any pointers would be greatly appreciated. Thanks! N
On Jun 25, 2010, at 10:46 AM, Nicolas Ojeda Bar wrote:> Hello, > > This is a little off-topic. But I am writing a compiler to llvm ir > for a language that admits recursive nested functions and am stuck > as to how to translate them. Concretely, I'm trying to lift them all > to the topmost level and pass all their free variables explicitly as > arguments. To do this, I have to determine all their free variables > in their bodies. In particular when I come across a function call, > I have to add _its_ free variables to the list of free variables of > the enclosing function. You can see how this will not work if we > allow recursive functions.Topologically sort the functions defined in a single scope. The free variables of a cluster of mutually-recursive functions is the union of their "direct" free variables, plus the free variables of any functions they call. John.
On Jun 25, 2010, at 1:46 PM, Nicolas Ojeda Bar wrote:> Hello, > > This is a little off-topic. But I am writing a compiler to llvm ir > for a language that admits recursive nested functions and am stuck > as to how to translate them. Concretely, I'm trying to lift them all > to the topmost level and pass all their free variables explicitly as > arguments. To do this, I have to determine all their free variables > in their bodies. In particular when I come across a function call, > I have to add _its_ free variables to the list of free variables of > the enclosing function. You can see how this will not work if we > allow recursive functions.> > Any pointers would be greatly appreciated. > > Thanks! > NIn an earlier life, I built a commercial Pascal to C++ translator, called unimaginatively p2c, that was used by most of the top Mac application developers to port their Object Pascal programs to C++ when Apple forced them during the Motorola 68XXX to PowerPC transition. Since Object Pascal (like most Pascals) permitted nested function (or procedure) definitions and C++ does not, I had to address this very issue. Since we were translating commercial code, and our own database engine code, we needed the resulting C++ to be as close to the original as possible so it could be maintained. We even preserved comment positions and conditional compilation switches when possible. If I recall, I handled the nested procedure/function issue by keeping track of the particular scope for each of the free variables used within inner functions during identifier resolution. I used a linked set of symbol tables and pushed a new symbol table for each new nested procedure. This made it easy to determine where the free variable was defined. If it was defined in an enclosing function, I marked the entry for that symbol to indicate that the symbol was required in a nested function. I then created structs which held the free variables accessed by the nested functions and passed a pointer to the struct into each of the nested functions. I changed all uses of the variables to reference the struct members at the outer level, and then passed in a pointer to the struct into inner scoped functions. A double-nested function would have two struct pointers if it accessed free variables from both of its enclosing functions. I then changed all the accesses to the free variables within the nested procedures/functions to reference the struct members through the pointer. This ended up producing fairly efficient code that was pretty easy to understand when looking at the resulting C++ code. - Curtis
Curtis described a fairly standard approach -- you put all variables used in nested functions into a struct and pass pointer to it to these functions. The only difference is that these structs are usually linked (struct in a nested function has a pointer to struct in outer function), so deep nested functions only need one extra argument. gcc supports nested functions in C (including mutually recursive nested functions, forward declarations need to use "auto" linkage) using this approach, you can look at what it does with llvm-gcc. Eugene On Fri, Jun 25, 2010 at 6:46 PM, Nicolas Ojeda Bar <nojb at math.harvard.edu> wrote:> Hello, > > This is a little off-topic. But I am writing a compiler to llvm ir > for a language that admits recursive nested functions and am stuck > as to how to translate them. Concretely, I'm trying to lift them all > to the topmost level and pass all their free variables explicitly as > arguments. To do this, I have to determine all their free variables > in their bodies. In particular when I come across a function call, > I have to add _its_ free variables to the list of free variables of > the enclosing function. You can see how this will not work if we > allow recursive functions. > > Any pointers would be greatly appreciated. > > Thanks! > N > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >