On Sat, 10 Sep 2005, Andreas Fredriksson wrote:
> Hey list,
>
> I'm looking for information on how programs that span multiple LLVM
> modules work at runtime, especially wrt. symbol handling when running
> in a JIT setting. To give some background, I'm developing a language
> that targets LLVM as a backend, and I'd like my translation units to
> map to LLVM modules as closely as possible.
Ok. Currently the LLVM JIT just knows about a single module. I think it
would be very useful to extend this to support multiple modules at a time,
where a function reference consults a symbol table to determine the right
module to compile from.
In the context of C/C++, imagine completely skipping the link step.
Instead of linking, you could just present the JIT with a list of .o files
to load and execute. If it could execute from multiple modules at a time,
it would "just work" as if linking had occurred.
> What I'm looking for here is something similar to how Java or Python
> handles intra-module depencies at runtime, where they load modules (or
> classes, in the Java case) as necessary, and where different modules
> can cooperate during different runs of the same program depending on
> the code path that is taken.
I think this is another very logical application of this idea.
> Is it possibly to get a hook call when a JITed module encounters a
> symbol reference it can't resolve locally?
Yes, sort of. Look at lib/ExecutionEngine/JIT/Intercept.cpp.
getPointerToNamedFunction contains logic that works like this:
1. If this is one of the very few functions the JIT knows about, handle
it.
2. Otherwise, call 'dlsym' on the local process to resolve the address.
3. Otherwise abort.
It would be pretty straight-forward to extend that code, or the callers of
that code, to search multiple modules.
> My current solution is
> based upon pessimizations that force the loading of all dependent
> modules, but that's wasteful in many cases when only some of those
> dependencies are actually required for execution.
Yup.
> That said, I would also like to examine the possibility to recompile
> modules in the running system on the fly from source, so that it is
> possible to update modules as longs as their interfaces stay
> compatible. Can LLVM freeze the JIT in a safe place and unload
> modules?
Not really. However, it can do the equivalent thing: it can replace code
for functions that have already been compiled with new code (see
ExecutionEngine::recompileAndRelinkFunction). The semantics of this are
the any future invocations of the function will call the newly compiled
function. If there are any invocations of the function on the stack
(currently executing) they will finish executing the old function. Any
new calls into the function will get the new code (this is to avoid
having the JIT have to keep track of potentially very expensive mapping
information).
> I'm also curious to find out how the external symbols referenced from
> the C frontend are resolved (such as printf or other functions in
> libc). I assume there is a dlsym() call somewhere depending on the
> libs listed in the module, is this correct? Does this happen at module
> load time or at some later point while executing?
Yup, see above. These happen lazily as the process needs the symbols.
The address of 'printf' is inserted into the JIT's symbol table just
like
any JIT'd function's address.
> Finally, is the LLVM linked really required for a system like this? I
> know the JIT is happy executing my bytecode modules as long as they
> are self-contained, but on-demand loading is a requirement for this
> (test) project. Currently all I'm getting is a hard error from the
> runtime complaining that a referenced symbol is undefined.
Currently, yes, it does require this. However, I think it would be great
for the JIT to have a list of Module's that are currently 'open'
that it
can generate code for, and for this list to be dynamically mutable. Any
help adding the functionality to the JIT would be greatly appreciated!
-Chris
--
http://nondot.org/sabre/
http://llvm.org/