Okay, we're starting to dig in, and I've hit a question that will no doubt seem strange. Context: BitC is a polymorphic language. Since it has unboxed value types, our approach to compiling a polymorphic function is to polyinstantate it -- once for each signature. The name mangling scheme is both stable and reversible. At the site of the use occurrence, we can fully determine the mangled name of the desired instantiation. Given the mangled name of an instantiation that has not yet been synthesized, we can determine which procedure must be synthesized. At present, our front end implements a demand-driven instantiator that proceeds from main() until all calls are statically resolved. It seems to me, however, that this duplicates (nearly) functionality that is probably already present in the LLVM JIT layer. Here is what I am wondering: Is there some way we can "hook" the global symbol resolver so that we can run the polyinstantiator on demand? I *think* this could be done by running the resolver normally, noticing failure on a symbol that we know ought to exist, code generating the instance definition corresponding to the needed symbol, and then re-performing the lookup at global scope. What I'm not sure about here is several issues: 1. Does this make sense? 2. If so, where is the right place to hook the resolver? 3. Are we going to run into trouble arising from the fact that we are likely to end up using the module compiler in a recursive way? That is: the referencing module's compile will (in effect) be paused in its symbol resolution phase while we introduce a new module and compile that. I can imagine implementations where this would wreak serious havoc on the implementation of environments within the LLVM infrastructure. 4. Is there a better/cleaner approach? What other options should I consider? We can, thankfully, ensure that this recursive instantiation exercise terminates. :-) shap
"Jonathan S. Shapiro" <shap at eros-os.com> writes: [snip]> 4. Is there a better/cleaner approach? What other options should I > consider?My front-end is very similar to yours in the feature of the multiple instantiations on demand, etc. One thing I learnt about LLVM is that it's philosophy is to be a friendly backend for frontends, but whatever your frontend does internally is not LLMV's business. Answuring your question, I'm not aware of any LLVM facility for what you describe. And if you find one, I would advise against using it. -- Oscar
Hi Jonathan, In the context of a static compiler, I would recommend that you implement your own “on the side” symbol table in order to track this state and perform on-demand instantiation as required. It is worthwhile to consider the LLVM module to be a passive output sink, not an active object. The JIT compiler, by contrast, is an active object, cooperating with its environment via the ModuleProvider interface. Overriding materializeFunction should be useful for on-demand instantiation. — Gordon On Mar 26, 2008, at 17:07, Jonathan S. Shapiro wrote:> Okay, we're starting to dig in, and I've hit a question that will no > doubt seem strange. > > Context: BitC is a polymorphic language. Since it has unboxed value > types, our approach to compiling a polymorphic function is to > polyinstantate it -- once for each signature. > > The name mangling scheme is both stable and reversible. At the site > of the use occurrence, we can fully determine the mangled name of > the desired instantiation. Given the mangled name of an > instantiation that has not yet been synthesized, we can determine > which procedure must be synthesized. > > At present, our front end implements a demand-driven instantiator > that proceeds from main() until all calls are statically resolved. > > It seems to me, however, that this duplicates (nearly) functionality > that is probably already present in the LLVM JIT layer. > > Here is what I am wondering: > > Is there some way we can "hook" the global symbol resolver so that > we can run the polyinstantiator on demand? I *think* this could be > done by running the resolver normally, noticing failure on a symbol > that we know ought to exist, code generating the instance definition > corresponding to the needed symbol, and then re-performing the > lookup at global scope. > > What I'm not sure about here is several issues: > > 1. Does this make sense? > > 2. If so, where is the right place to hook the resolver? > > 3. Are we going to run into trouble arising from the fact that we > are likely to end up using the module compiler in a recursive > way? That is: the referencing module's compile will (in effect) be > paused in its symbol resolution phase while we introduce a new > module and compile that. I can imagine implementations where this > would wreak serious havoc on the implementation of environments > within the LLVM infrastructure. > > 4. Is there a better/cleaner approach? What other options should I > consider? > > We can, thankfully, ensure that this recursive instantiation > exercise terminates. :-)-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080327/6cd566a2/attachment.html>
On Thu, 2008-03-27 at 07:44 -0400, Gordon Henriksen wrote:> In the context of a static compiler, I would recommend that you > implement your own “on the side” symbol table in order to track this > state and perform on-demand instantiation as required. It is > worthwhile to consider the LLVM module to be a passive output sink, > not an active object.I think I understand what you are saying, but let me delve into this a little further. My main point is that the technique we are after is generally useful for languages having well-integrated template mechanisms. The approach you advocate may still be the best approach, but I'ld like to make sure that we are having the same conversation. Let me illustrate the problem concretely. Consider the BitC primitive definition of lists: (defunion (list 'a) (cons 'a (list 'a)) nil) Modulo surface syntax, this looks on first inspection to be exactly like the corresponding definitions in ML or Haskell. But in ML or Haskell there is only one run-time "expansion" of this type, because all of the possible element types occupy exactly one fundamental unit of storage. This is not true for BitC types. On a 32-bit machine having suitable alignment, the above definition guarantees that variables and bindings of type (list int64) are pointers to boxes whose first element is 64 bits (the int64) and whose second element is 32 bits (the next pointer). That is: the type must be instantiated with different concrete representations for different uses. Since we had to do this anyway, we use the same technique to resolve type classes, thereby eliminating the need for dictionary pointers. The BitC procedure: (define (add-one x) (+ x 1)) has the type: (forall (Arith 'a) (fn ('a) 'a)) which (given our instantiation approach) is best imagined to be a template-like construct -- albeit one that is fully checked for consistency and whose expansion is known not to fail (barring memory exhaustion within the compiler). When a client program uses a dynamic library providing these sorts of constructs, we have two choices: 1. Generate them statically at static link time. This works, but it is not robust across dynamic library version changes (which is an endemic problem in languages that support both templates and dynamic linking). 2. Generate them dynamically at load time (or on first call). This is what we want to do. That is: we want to use a continuous compilation strategy, which is precisely what LLVM is supposedly attempting to achieve. If we adopt the approach that you suggest, we will end up implementing our own "generate on demand" infrastructure that has to operate in collusion with the dynamic loader. We know how to do that, but it is a moderately dicey business. Basically we have to run a pre-resolver before we emit code to an LLVM module, after which LLVM will run a second resolver. I certainly agree that this will work. But I didn't have in mind originally to view Modules as active things. It was not my intention to "extend" a module in mid compile. Rather, it was my thought that we could provide the unresolved symbol by emitting and compiling a second module. Where the LLVM infrastructure currently has something like this: if (!(sym = llvm_resolve_global(GlobalScope, symName))) some_failure_action(); it would now look something like: sym = llvm_resolve_global(GlobalScope, symName); if (!sym && frontend_has_symbol_generator && frontend_generate_symbol(symname)) // Note: if frontend_generate_symbol() has succeeded, it will have // constructed some LLVM Module and called the LLVM compiler to // admit it, with the consequence that GlobalScope will have been // updated to contain a binding for the desired symbol. sym = llvm_resolve_global(GlobalScope, symName); if (!sym) some_failure_action(); I don't think that it's any more complicated than that. This is basically the test that our static polyinstantiator runs right now. Note: I'm still not claiming that this is a good approach in the context of LLVM. I don't have my head wrapped around LLVM enough to have an opinion about that. I simply wanted to make sure that the question was clearly framed. I do, provisionally, think that this particular hook is consistent with the notion of continuous compilation. It seems (to me) necessary (and perhaps even sufficient) to let the front end participate in the continuousness of continuous compilation. shap
On Wed, 2008-03-26 at 23:48 +0100, Óscar Fuentes wrote:> "Jonathan S. Shapiro" <shap at eros-os.com> writes: > My front-end is very similar to yours in the feature of the multiple > instantiations on demand, etc.Oscar: after you have a chance to read my recent reply to Gordon, would you be kind enough to let me know whether you still believe the situations are similar. Also whether you think that what I am looking for is likely to be generally useful?> Answuring your question, I'm not aware of any LLVM facility for what you > describe. And if you find one, I would advise against using it.I understand the argument for separation of concerns, but your last sentence concerns me. Why would you recommend against using this kind of mechanism if it exists? Jonathan