thr3ads.net - llvm dev - [LLVMdev] Hooking the global symbol resolver [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Jonathan S. Shapiro

2008-Mar-27 16:21 UTC

[LLVMdev] Hooking the global symbol resolver

On Thu, 2008-03-27 at 07:44 -0400, Gordon Henriksen
wrote:> In the context of a static compiler, I would recommend that you
> implement your own “on the side” symbol table in order to track this
> state and perform on-demand instantiation as required. It is
> worthwhile to consider the LLVM module to be a passive output sink,
> not an active object.
I think I understand what you are saying, but let me delve into this a
little further. My main point is that the technique we are after is
generally useful for languages having well-integrated template
mechanisms. The approach you advocate may still be the best approach,
but I'ld like to make sure that we are having the same conversation.

Let me illustrate the problem concretely.

Consider the BitC primitive definition of lists:

  (defunion (list 'a)
    (cons 'a (list 'a))
    nil)

Modulo surface syntax, this looks on first inspection to be exactly like
the corresponding definitions in ML or Haskell. But in ML or Haskell
there is only one run-time "expansion" of this type, because all of
the
possible element types occupy exactly one fundamental unit of storage.
This is not true for BitC types. On a 32-bit machine having suitable
alignment, the above definition guarantees that variables and bindings
of type

  (list int64)

are pointers to boxes whose first element is 64 bits (the int64) and
whose second element is 32 bits (the next pointer). That is: the type
must be instantiated with different concrete representations for
different uses. Since we had to do this anyway, we use the same
technique to resolve type classes, thereby eliminating the need for
dictionary pointers. The BitC procedure:

  (define (add-one x) (+ x 1))

has the type:

  (forall (Arith 'a) (fn ('a) 'a))

which (given our instantiation approach) is best imagined to be a
template-like construct -- albeit one that is fully checked for
consistency and whose expansion is known not to fail (barring memory
exhaustion within the compiler).

When a client program uses a dynamic library providing these sorts of
constructs, we have two choices:

  1. Generate them statically at static link time. This works,
     but it is not robust across dynamic library version changes
     (which is an endemic problem in languages that support both
     templates and dynamic linking).

  2. Generate them dynamically at load time (or on first call).
     This is what we want to do.

That is: we want to use a continuous compilation strategy, which is
precisely what LLVM is supposedly attempting to achieve.

If we adopt the approach that you suggest, we will end up implementing
our own "generate on demand" infrastructure that has to operate in
collusion with the dynamic loader. We know how to do that, but it is a
moderately dicey business. Basically we have to run a pre-resolver
before we emit code to an LLVM module, after which LLVM will run a
second resolver. I certainly agree that this will work.

But I didn't have in mind originally to view Modules as active things.
It was not my intention to "extend" a module in mid compile. Rather,
it
was my thought that we could provide the unresolved symbol by emitting
and compiling a second module. Where the LLVM infrastructure currently
has something like this:

  if (!(sym = llvm_resolve_global(GlobalScope, symName)))
    some_failure_action();

it would now look something like:

  sym = llvm_resolve_global(GlobalScope, symName);
  if (!sym && frontend_has_symbol_generator
      && frontend_generate_symbol(symname))
    // Note: if frontend_generate_symbol() has succeeded, it will have
    // constructed some LLVM Module and called the LLVM compiler to
    // admit it, with the consequence that GlobalScope will have been
    // updated to contain a binding for the desired symbol.
    sym = llvm_resolve_global(GlobalScope, symName);
  if (!sym)
    some_failure_action();

I don't think that it's any more complicated than that. This is
basically the test that our static polyinstantiator runs right now.

Note: I'm still not claiming that this is a good approach in the context
of LLVM. I don't have my head wrapped around LLVM enough to have an
opinion about that. I simply wanted to make sure that the question was
clearly framed.

I do, provisionally, think that this particular hook is consistent with
the notion of continuous compilation. It seems (to me) necessary (and
perhaps even sufficient) to let the front end participate in the
continuousness of continuous compilation.

shap

Óscar Fuentes

2008-Mar-27 20:22 UTC

head link

[LLVMdev] Hooking the global symbol resolver

"Jonathan S. Shapiro" <shap at eros-os.com> writes:

[snip]
>   if (!(sym = llvm_resolve_global(GlobalScope, symName)))
>     some_failure_action();
>
> it would now look something like:
>
>   sym = llvm_resolve_global(GlobalScope, symName);
>   if (!sym && frontend_has_symbol_generator
>       && frontend_generate_symbol(symname))
>     // Note: if frontend_generate_symbol() has succeeded, it will have
>     // constructed some LLVM Module and called the LLVM compiler to
>     // admit it, with the consequence that GlobalScope will have been
>     // updated to contain a binding for the desired symbol.
>     sym = llvm_resolve_global(GlobalScope, symName);
>   if (!sym)
>     some_failure_action();
>
> I don't think that it's any more complicated than that. This is
> basically the test that our static polyinstantiator runs right now.
>
> Note: I'm still not claiming that this is a good approach in the
context
> of LLVM. I don't have my head wrapped around LLVM enough to have an
> opinion about that. I simply wanted to make sure that the question was
> clearly framed.
>
> I do, provisionally, think that this particular hook is consistent with
> the notion of continuous compilation. It seems (to me) necessary (and
> perhaps even sufficient) to let the front end participate in the
> continuousness of continuous compilation.
I'm all for hooks and delegation, but the problem here is that your
proposal is not general enough and is hard to generalize it. It does not
work for my project, for instance, although I face almost the same
requirements than you wrt dynamic generation. The symbol name is enough
for you, but not for me, and there is no way to teach LLVM about what
info I need.

This doesn't mean that the LLVM developers shouldn't consider your
proposal on the context of the typical LLVM user. Maybe your case is
common enough.

-- 
Oscar

Jonathan S. Shapiro

2008-Mar-27 20:30 UTC

head link

[LLVMdev] Hooking the global symbol resolver

On Thu, 2008-03-27 at 21:22 +0100, Óscar Fuentes wrote:> I'm all for hooks and delegation, but the problem here is that your
> proposal is not general enough and is hard to generalize it. It does not
> work for my project, for instance, although I face almost the same
> requirements than you wrt dynamic generation. The symbol name is enough
> for you, but not for me, and there is no way to teach LLVM about what
> info I need.
Is this because you have a more complicated scenario, or is it because
your name mangling scheme is not sufficiently well designed?

The evolution of mangling schemes in C++ initially ignored
reversibility. This was a *huge* mistake, and it took years to correct
it. The original problem was linkers that could not handle very long
identifiers. Over time that issue was fixed, and eventually the world
converged on an invertible scheme. Modern compilers almost universally
use an invertible scheme today.

I know nothing at all about your language (though I would like to
correct that deficiency), but I am very confident that if an invertible
mangling scheme is possible for you in principle, the time spent to
develop one early will be repaid many times over later.

shap

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Mar 2008 - [LLVMdev] Hooking the global symbol resolver

[LLVMdev] Hooking the global symbol resolver

[LLVMdev] Hooking the global symbol resolver

[LLVMdev] Hooking the global symbol resolver

Apparently Analagous Threads