thr3ads.net - llvm dev - [LLVMdev] Hooking the global symbol resolver [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Jonathan S. Shapiro

2008-Mar-26 21:07 UTC

[LLVMdev] Hooking the global symbol resolver

Okay, we're starting to dig in, and I've hit a question that will no
doubt seem strange.

Context: BitC is a polymorphic language. Since it has unboxed value
types, our approach to compiling a polymorphic function is to
polyinstantate it -- once for each signature.

The name mangling scheme is both stable and reversible. At the site of
the use occurrence, we can fully determine the mangled name of the
desired instantiation. Given the mangled name of an instantiation that
has not yet been synthesized, we can determine which procedure must be
synthesized.

At present, our front end implements a demand-driven instantiator that
proceeds from main() until all calls are statically resolved.

It seems to me, however, that this duplicates (nearly) functionality
that is probably already present in the LLVM JIT layer.

Here is what I am wondering:

Is there some way we can "hook" the global symbol resolver so that we
can run the polyinstantiator on demand? I *think* this could be done by
running the resolver normally, noticing failure on a symbol that we know
ought to exist, code generating the instance definition corresponding to
the needed symbol, and then re-performing the lookup at global scope.

What I'm not sure about here is several issues:

  1. Does this make sense?

  2. If so, where is the right place to hook the resolver?

  3. Are we going to run into trouble arising from the fact that we
     are likely to end up using the module compiler in a recursive
     way? That is: the referencing module's compile will (in effect) be
     paused in its symbol resolution phase while we introduce a new
     module and compile that. I can imagine implementations where this
     would wreak serious havoc on the implementation of environments
     within the LLVM infrastructure.

  4. Is there a better/cleaner approach? What other options should I
     consider?

We can, thankfully, ensure that this recursive instantiation exercise
terminates. :-)


shap

Óscar Fuentes

2008-Mar-26 22:48 UTC

head link

[LLVMdev] Hooking the global symbol resolver

"Jonathan S. Shapiro" <shap at eros-os.com> writes:

[snip]
>   4. Is there a better/cleaner approach? What other options should I
>      consider?
My front-end is very similar to yours in the feature of the multiple
instantiations on demand, etc.

One thing I learnt about LLVM is that it's philosophy is to be a
friendly backend for frontends, but whatever your frontend does
internally is not LLMV's business.

Answuring your question, I'm not aware of any LLVM facility for what you
describe. And if you find one, I would advise against using it.

-- 
Oscar

Gordon Henriksen

2008-Mar-27 11:44 UTC

head link

[LLVMdev] Hooking the global symbol resolver

Hi Jonathan,

In the context of a static compiler, I would recommend that you  
implement your own “on the side” symbol table in order to track this  
state and perform on-demand instantiation as required. It is  
worthwhile to consider the LLVM module to be a passive output sink,  
not an active object.

The JIT compiler, by contrast, is an active object, cooperating with  
its environment via the ModuleProvider interface. Overriding  
materializeFunction should be useful for on-demand instantiation.

— Gordon

On Mar 26, 2008, at 17:07, Jonathan S. Shapiro wrote:
> Okay, we're starting to dig in, and I've hit a question that will
no
> doubt seem strange.
>
> Context: BitC is a polymorphic language. Since it has unboxed value  
> types, our approach to compiling a polymorphic function is to  
> polyinstantate it -- once for each signature.
>
> The name mangling scheme is both stable and reversible. At the site  
> of the use occurrence, we can fully determine the mangled name of  
> the desired instantiation. Given the mangled name of an  
> instantiation that has not yet been synthesized, we can determine  
> which procedure must be synthesized.
>
> At present, our front end implements a demand-driven instantiator  
> that proceeds from main() until all calls are statically resolved.
>
> It seems to me, however, that this duplicates (nearly) functionality  
> that is probably already present in the LLVM JIT layer.
>
> Here is what I am wondering:
>
> Is there some way we can "hook" the global symbol resolver so
that
> we can run the polyinstantiator on demand? I *think* this could be  
> done by running the resolver normally, noticing failure on a symbol  
> that we know ought to exist, code generating the instance definition  
> corresponding to the needed symbol, and then re-performing the  
> lookup at global scope.
>
> What I'm not sure about here is several issues:
>
>  1. Does this make sense?
>
>  2. If so, where is the right place to hook the resolver?
>
>  3. Are we going to run into trouble arising from the fact that we
>     are likely to end up using the module compiler in a recursive
>     way? That is: the referencing module's compile will (in effect) be
>     paused in its symbol resolution phase while we introduce a new
>     module and compile that. I can imagine implementations where this
>     would wreak serious havoc on the implementation of environments
>     within the LLVM infrastructure.
>
>  4. Is there a better/cleaner approach? What other options should I
>     consider?
>
> We can, thankfully, ensure that this recursive instantiation  
> exercise terminates. :-)

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080327/6cd566a2/attachment.html>

Jonathan S. Shapiro

2008-Mar-27 16:21 UTC

head link

[LLVMdev] Hooking the global symbol resolver

On Thu, 2008-03-27 at 07:44 -0400, Gordon Henriksen
wrote:> In the context of a static compiler, I would recommend that you
> implement your own “on the side” symbol table in order to track this
> state and perform on-demand instantiation as required. It is
> worthwhile to consider the LLVM module to be a passive output sink,
> not an active object.
I think I understand what you are saying, but let me delve into this a
little further. My main point is that the technique we are after is
generally useful for languages having well-integrated template
mechanisms. The approach you advocate may still be the best approach,
but I'ld like to make sure that we are having the same conversation.

Let me illustrate the problem concretely.

Consider the BitC primitive definition of lists:

  (defunion (list 'a)
    (cons 'a (list 'a))
    nil)

Modulo surface syntax, this looks on first inspection to be exactly like
the corresponding definitions in ML or Haskell. But in ML or Haskell
there is only one run-time "expansion" of this type, because all of
the
possible element types occupy exactly one fundamental unit of storage.
This is not true for BitC types. On a 32-bit machine having suitable
alignment, the above definition guarantees that variables and bindings
of type

  (list int64)

are pointers to boxes whose first element is 64 bits (the int64) and
whose second element is 32 bits (the next pointer). That is: the type
must be instantiated with different concrete representations for
different uses. Since we had to do this anyway, we use the same
technique to resolve type classes, thereby eliminating the need for
dictionary pointers. The BitC procedure:

  (define (add-one x) (+ x 1))

has the type:

  (forall (Arith 'a) (fn ('a) 'a))

which (given our instantiation approach) is best imagined to be a
template-like construct -- albeit one that is fully checked for
consistency and whose expansion is known not to fail (barring memory
exhaustion within the compiler).

When a client program uses a dynamic library providing these sorts of
constructs, we have two choices:

  1. Generate them statically at static link time. This works,
     but it is not robust across dynamic library version changes
     (which is an endemic problem in languages that support both
     templates and dynamic linking).

  2. Generate them dynamically at load time (or on first call).
     This is what we want to do.

That is: we want to use a continuous compilation strategy, which is
precisely what LLVM is supposedly attempting to achieve.

If we adopt the approach that you suggest, we will end up implementing
our own "generate on demand" infrastructure that has to operate in
collusion with the dynamic loader. We know how to do that, but it is a
moderately dicey business. Basically we have to run a pre-resolver
before we emit code to an LLVM module, after which LLVM will run a
second resolver. I certainly agree that this will work.

But I didn't have in mind originally to view Modules as active things.
It was not my intention to "extend" a module in mid compile. Rather,
it
was my thought that we could provide the unresolved symbol by emitting
and compiling a second module. Where the LLVM infrastructure currently
has something like this:

  if (!(sym = llvm_resolve_global(GlobalScope, symName)))
    some_failure_action();

it would now look something like:

  sym = llvm_resolve_global(GlobalScope, symName);
  if (!sym && frontend_has_symbol_generator
      && frontend_generate_symbol(symname))
    // Note: if frontend_generate_symbol() has succeeded, it will have
    // constructed some LLVM Module and called the LLVM compiler to
    // admit it, with the consequence that GlobalScope will have been
    // updated to contain a binding for the desired symbol.
    sym = llvm_resolve_global(GlobalScope, symName);
  if (!sym)
    some_failure_action();

I don't think that it's any more complicated than that. This is
basically the test that our static polyinstantiator runs right now.

Note: I'm still not claiming that this is a good approach in the context
of LLVM. I don't have my head wrapped around LLVM enough to have an
opinion about that. I simply wanted to make sure that the question was
clearly framed.

I do, provisionally, think that this particular hook is consistent with
the notion of continuous compilation. It seems (to me) necessary (and
perhaps even sufficient) to let the front end participate in the
continuousness of continuous compilation.

shap

Jonathan S. Shapiro

2008-Mar-27 16:23 UTC

head link

[LLVMdev] Hooking the global symbol resolver

On Wed, 2008-03-26 at 23:48 +0100, Óscar Fuentes wrote:> "Jonathan S. Shapiro" <shap at eros-os.com> writes:
> My front-end is very similar to yours in the feature of the multiple
> instantiations on demand, etc.
Oscar: after you have a chance to read my recent reply to Gordon, would
you be kind enough to let me know whether you still believe the
situations are similar. Also whether you think that what I am looking
for is likely to be generally useful?
> Answuring your question, I'm not aware of any LLVM facility for what
you
> describe. And if you find one, I would advise against using it.
I understand the argument for separation of concerns, but your last
sentence concerns me. Why would you recommend against using this kind of
mechanism if it exists?


Jonathan

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Mar 2008 - [LLVMdev] Hooking the global symbol resolver

[LLVMdev] Hooking the global symbol resolver

[LLVMdev] Hooking the global symbol resolver

[LLVMdev] Hooking the global symbol resolver

[LLVMdev] Hooking the global symbol resolver

[LLVMdev] Hooking the global symbol resolver

Possibly Parallel Threads