thr3ads.net - llvm dev - [LLVMdev] Dynamic updates of current executed code [Apr 2004]

If this information is useful, please help other people find it:
Share via:

Anders Alexandersson

2004-Apr-20 03:51 UTC

[LLVMdev] Dynamic updates of current executed code

Thanks!

Problem is though, that the Ruby compiler is integrated in the compilation of
the program being executed, to be able to parse & compile dynamic code at
run-time. Therefore the calls to ExecutionEngine::getPointerToGlobal(F) need to
be made in LLVM code. Here is a detailed simplistic example in pseudocode of
what we want to do:

First Ruby code is entered at run-time, received as a string, parsed and
compiled into the following code:

%-----------------------------------------------------
; External function
declare int %printf(sbyte*, ...)

; Custom function
int %puts_kernel( sbyte* %string )
{
        %tmp.0 = call int (sbyte*, ...)* %printf( sbyte* %string )
        ret int 0
}
%-----------------------------------------------------
 
This code is represented in the string variable 
%dynamically_compiled_function_code below:

%-----------------------------------------------------
%dynamically_compiled_function_code  = internal constant [LENGTH x sbyte]
c"--String with the function code--\0A\00"

; Table of function pointer(s)
%kernel  = type {  int ( sbyte* )*  }


int %main() {

        ; Create the kernel in memory, and get pointer to first function pointer
        %theKernel = malloc %kernel
        %FirstFunctionPTR = getelementptr %kernel* %theKernel, long 0, ubyte 0
        ;Load code
        %myNewFunction =
%getPointerToGlobal(%dynamically_compiled_function_code)

        ; Write memory address of myNewFunction() into kernel struct
        store RETURNTYPE (PARAMTYPE*)* %myNewFunction, RETURNTYPE (PARAMTYPE*)**
%FirstFunctionPTR

        ;Any code using first function element in %kernel is now_
        ;using dynamically updated function!?

        ret int 0
}

%-----------------------------------------------------

The questionmark is at this pseudocode row:
%myNewFunction = %getPointerToGlobal(%dynamically_compiled_function_code)

Is there an llvm version of the getPointerToGlobal() function as outlined, and
can the %myNewFunction pointer be used as described?
Also, does the getPointerToGlobal() take human readable code (.ll) or only
binary byte code (.bc)? Is there a specification of how to write binary byte
code directly, so we do not have to externally call the llvm-as utility?

Best regards
Anders


-----Original Message-----
From: Chris Lattner <sabre at nondot.org>
To: llvmdev at cs.uiuc.edu
Date: Mon, 19 Apr 2004 01:56:12 -0500 (CDT)
Subject: Re: [LLVMdev] Dynamic updates of current executed code

On Mon, 19 Apr 2004, Anders Alexandersson wrote:
> Hello!
>
> I saw that you just got the recent llvm paper published in IEEE!
> Congratulations! :-)
Thanks!
> More issues regarding the Ruby compiler:
>
> Ruby supports the possibility of the user to enter new Ruby code during
> execution, after which it is executed. Also, all classes are open,
> meaning that a user is able to redefine a class overriding or replacing
> methods therein at run-time (this is deep...).
Sure, many dynamic languages are like this...
> My question is how the llvm-jitter works on a low level. Say for example
> that a user redefines a method during execution. My compiler (in llvm
> code form) takes care of compiling that code into llvm code dynamically.
Okay, at the low-level, your class will have a hash-table or vtable or
something that represents the methods in the class.  This vtable or hash
table is a global LLVM variable.  In the case of our C++ front-end, each
object with a virtual method has a pointer to a class descriptor global,
and the class descriptor global has a pointer to the vtable for the class.
> Now, how do the mechanisms work that loads new llvm code and lets it
> co-exist with the code already running? In this case I want to update a
> function pointer in the struct in the already running code representing
> the changed class, to point to the newly compiled code representing the
> newly entered method instead.
In this case, just don't mark the ruby vtables/hash-tables as
"constant",
and update the pointers in the global at runtime.  All code will
automatically start executing the new method that you defined.
> Is this functionality at all accessible by an executing llvm program?
Sure.  There are multiple ways of doing this.  First, you want to codegen
the new method that was added, by creating an LLVM function for it and
adding it to the current module being run.  You then call
ExecutionEngine::getPointerToGlobal(F), passing in the LLVM Function
object that you compiled.  This will cause the function to be code
generated and give you a pointer to it.

Next, ask the execution engine for a pointer to the vtable or whatever you
are using for method dispatch, using the same method.  Once you have the
pointer to the global in memory, and a pointer to the function you want to
stick into it, go ahead and do it.  :)

The other option for *replacing* a method that has already been code
generated is to use the ExecutionEngine::recompileAndRelinkFunction
method, which tells the JIT to discard the previously compiled version of
an LLVM function and recompile it from scratch.

A lot of the details depend intrinsically on how you are representing Ruby
objects and method dispatch in general.  That said, all of the needed LLVM
functionality should be in place.

Feel free to ask if you have any other questions.  :)

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev



----------------------------------------------------------------
Anders Alexandersson
Masters student at the special year of Software Engineering, HTU Trollhättan
E-mail: anders.alexandersson at student.htu.se

Misha Brukman

2004-Apr-20 11:01 UTC

head link

[LLVMdev] Dynamic updates of current executed code

On Tue, Apr 20, 2004 at 10:52:21AM +0200, Anders Alexandersson
wrote:> Is there an llvm version of the getPointerToGlobal() function as
> outlined, and can the %myNewFunction pointer be used as described? 
Yes, see llvm/lib/ExecutionEngine/ExecutionEngine.cpp . It will compile
the given Function* to machine code, and return a pointer to you. In
your LLVM code, you will have to first cast your function pointer before
storing it, but it should work as you have it.

One thing is that you need to have a pointer to the currently-running
instance of ExecutionEngine to call methods on it... a easy quick hack
would be to create a static variable that the ExecutionEngine
constructor writes its own pointer (this) into, and have a C function
(in ExecutionEngine.cpp) that returns that pointer. Your code can then
access the currently-running ExecutionEngine through that pointer. There
should be a cleaner solution to this...
> Also, does the getPointerToGlobal() take human readable code (.ll) or
> only binary byte code (.bc)? 
ExecutionEngine::getPointerToGlobal() takes a Function*, which means
it's a memory representation of an LLVM function. This implies it has
been already parsed from text or inputted from a binary file.
> Is there a specification of how to write binary byte code directly, so
> we do not have to externally call the llvm-as utility?
You probably don't want to construct directly by hand. ;)
In case you wish to know WHY you don't want to do that, see
llvm/lib/Bytecode/{Writer,Reader}/*

Here's an idea:

In tools/llvm-as/llvm-as.cpp, you can see how the assembly parser really
does its job: it calls ParseAssemblyFile(InputFilename) which returns it
a Module*. 

ParseAssemblyFile is defined in llvm/lib/AsmParser/Parser.cpp; something
similar to it, but one that accepts a string instead of a filename could
be implemented.

However, there is a catch (as I see it, Chris may correct me):

The entry to the parser is to parse a whole Module.  The new function
that you write may reference other functions. When you parse a Module,
it has to be complete, so functions that are not defined in the module
are "external". Having a definition for those functions will make
those
external functions different (in terms of their location in memory) and
hence different value for Function*, which will mess with the
ExecutionEngine as it uses a map Function* => memory address of
translation. Other problems would be with types, same idea.

(Is this making sense?)

A solution would be perhaps to use a different entry point into a parser
that would just parse a Function instead of a whole Module. It should
then resolve the types it finds to the already-parsed Module in memory.
I don't think this has been done yet.

-- 
Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu

Chris Lattner

2004-Apr-20 12:03 UTC

head link

[LLVMdev] Dynamic updates of current executed code

On Tue, 20 Apr 2004, Anders Alexandersson wrote:
> Thanks!
>
> Problem is though, that the Ruby compiler is integrated in the
> compilation of the program being executed, to be able to parse &
compile
> dynamic code at run-time. Therefore the calls to
> ExecutionEngine::getPointerToGlobal(F) need to be made in LLVM code.
> Here is a detailed simplistic example in pseudocode of what we want to
> do:
Okay, I think the problem is that you're trying to integrate the compiler
into the program being compiled.  Instead, you want to make a new "llvm
tool", say, "llvm-ruby" or something.  The input to llvm-ruby is
a ruby
program, and the output is whatever the program produces, just like a
normal interpreter.

The key point of this is that you just make the compilation process
entirely JIT driven:  The first time a method or function is encountered,
you compile it from ruby, to llvm, to machine code.  There are two aspects
to this: ruby->LLVM (your part) and LLVM->machine code (our part), but
it's conceptually one ruby->machinecode step.  If you make it JIT driven,
dynamically loading code will not be a problem, and you don't need to
integrate the JIT into the program.

If you are interested in implementing this efficiently, I would *strongly*
recommend you design your ruby->LLVM converter to work with the C++
classes used to represent LLVM.  This will reduce the number of
translations and interfaces that the code will go through (and as Misha
pointed out, you will otherwise have to package up each function into it's
own module).  For examples of this, look at llvm/projects/ModuleMaker and
the Stacker front-end.

If this approach is acceptable to you, I would start with just a simple
*static* ruby -> LLVM compiler.  Doing a static compiler first will make
it easier to get up and running quickly and let you focus on the important
pieces (the ruby -> LLVM mapping, such as how method dispatch works).
However, when building this, you should keep an eye on making it modular
enough to allow function/method-at-a-time compilation.

Once you have a static compiler working reasonably well, you can integrate
the JIT into it.  Also, instead of parsing a compiling the ruby file from
top-to-bottom, you now start parsing and compiling on demand.  This is a
small change in the top-level flow of the compiler, but almost all of the
code you write should work unmodified.

This approach should allow you to dynamically load code into the
"ruby interpreter" and have it JIT compiled at the same time.  It has
the
added advantage that you can do a purely static ruby compiler as well, so
long as it doesn't dynamically load code.

Please let me know if I'm not making any sense here. :)

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Apr 2004 - [LLVMdev] Dynamic updates of current executed code

[LLVMdev] Dynamic updates of current executed code

[LLVMdev] Dynamic updates of current executed code

[LLVMdev] Dynamic updates of current executed code

Reasonably Related Threads