thr3ads.net - llvm dev - [LLVMdev] Using LLVM to serialize object state -- and performance [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Paul J. Lucas

2012-Oct-26 23:16 UTC

[LLVMdev] Using LLVM to serialize object state -- and performance

I have a legacy C++ application that constructs a tree of C++ objects (an
iterator tree to implement a query language).  I am trying to use LLVM to
"serialize" the state of this tree to disk for later loading and
execution (or "compile" it to disk, if you prefer).

Each of the C++ iterator objects now has a codegen() member function that adds
to the LLVM code of an llvm::Function.  The LLVM code generated is a sequence of
instructions to set up the arguments for and call the constructor of each C++
object.  (I am using C "thunks" that provide a C API to LLVM to make
C++ class constructor calls.)  Hence, all the LLVM code taken together into a
single "reconstitute" function are mostly a sequence of
"call" instructions with a few "store" and
"getelementptr" instructions here and there -- fairly straight-forward
LLVM code.

I then write out the LLVM IR code to disk and, at some later time, read it back
in with ParseIR(), do getPointerToFunction(), execute that function, and the C++
iterator tree has been reconstituted.

This all works, but the JIT compile step is *slow*.  For a sequence of about
8000 LLVM instructions (most of which are "call"), it takes several
seconds to execute.

It occurred to me that I don't really want JIT compiling.  I really want to
compile the LLVM code to machine code and write that to disk so that when I read
it back, I can just run it. The "reconstitute" function is only ever
run once per query invocation, so there's no benefit from JIT compiling it
since it will never be run a second or subsequent time.

Questions:

* Is what I'm doing with LLVM a "reasonable" thing to do with
LLVM?
* If so, how can I speed it up?  By generating machine code?  If so, how?

I've looked at the source for llc, but that apparently only generates
assembly source code, not object code.

- Paul

Kaylor, Andrew

2012-Oct-27 00:32 UTC

head link

[LLVMdev] Using LLVM to serialize object state -- and performance

I'm not sure I have a clear picture of what you're JIT'ing.  If any
of the JIT'ed functions call other JIT'ed functions, it may be difficult
to find all the dependencies of a functions and recreate them correctly on a
subsequent load.  Even if the JIT'ed functions only call non-JIT'ed
functions, I think you'd need some confidence that the address of the called
functions wasn't being moved.

It's possible that what you're considering would work, but I don't
think it's a scenario that the JIT intends to support.

It would be possible, however, to use the MCJIT engine and cache its results. 
It requires some modifications to the MCJIT engine but nothing major (I know
because my team has a patch in the works to do this, but it's blocked by
some other things at the moment).  MCJIT generates complete object images and
then uses RuntimeDyld to load them.  If you had a hook to save the generated
object, you could use RuntimeDyld directly to load it later.  There are other
ways to generate the object image (i.e. without MCJIT), but I'm not sure it
would be easier.

You basically just need to grab the Buffer that MCJIT::emitObject() has after it
calls PM.run() and Buffer->flush() but before it passes it to
Dyld.loadObject().  If you prefer, you could copy what MCJIT does and move it
somewhere in your own code.  There's not a lot to it.

-Andy


-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Paul J. Lucas
Sent: Friday, October 26, 2012 4:17 PM
To: llvmdev at cs.uiuc.edu List
Subject: [LLVMdev] Using LLVM to serialize object state -- and performance

I have a legacy C++ application that constructs a tree of C++ objects (an
iterator tree to implement a query language).  I am trying to use LLVM to
"serialize" the state of this tree to disk for later loading and
execution (or "compile" it to disk, if you prefer).

Each of the C++ iterator objects now has a codegen() member function that adds
to the LLVM code of an llvm::Function.  The LLVM code generated is a sequence of
instructions to set up the arguments for and call the constructor of each C++
object.  (I am using C "thunks" that provide a C API to LLVM to make
C++ class constructor calls.)  Hence, all the LLVM code taken together into a
single "reconstitute" function are mostly a sequence of
"call" instructions with a few "store" and
"getelementptr" instructions here and there -- fairly straight-forward
LLVM code.

I then write out the LLVM IR code to disk and, at some later time, read it back
in with ParseIR(), do getPointerToFunction(), execute that function, and the C++
iterator tree has been reconstituted.

This all works, but the JIT compile step is *slow*.  For a sequence of about
8000 LLVM instructions (most of which are "call"), it takes several
seconds to execute.

It occurred to me that I don't really want JIT compiling.  I really want to
compile the LLVM code to machine code and write that to disk so that when I read
it back, I can just run it. The "reconstitute" function is only ever
run once per query invocation, so there's no benefit from JIT compiling it
since it will never be run a second or subsequent time.

Questions:

* Is what I'm doing with LLVM a "reasonable" thing to do with
LLVM?
* If so, how can I speed it up?  By generating machine code?  If so, how?

I've looked at the source for llc, but that apparently only generates
assembly source code, not object code.

- Paul


_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Kaylor, Andrew

2012-Oct-30 16:25 UTC

head link

[LLVMdev] Using LLVM to serialize object state -- and performance

Hi Paul,

I had an additional thought with regard to the performance issue you are seeing.

As I understand it, you are generating a large number of functions that call
other functions.  If the functions being called are externals from the
perspective of the JITed code that need to be resolved against some static code
within the running executable, that's probably where the slowdown is
occurring.  Whenever the JIT engine (either the legacy JIT or MCJIT) needs to
resolve an external function it calls
JITMemoryManager::getPointerToNamedFunction to resolve the function address.

The default JITMemoryManager implementation uses
sys::DynamicLibrary::SearchForAddressOfSymbol to find the function.  If you know
all of the names and addresses of the functions that will need to be resolved,
you can provide a custom memory manager implementation to optimize this external
function resolution.

-Andy


-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Paul J. Lucas
Sent: Friday, October 26, 2012 4:17 PM
To: llvmdev at cs.uiuc.edu List
Subject: [LLVMdev] Using LLVM to serialize object state -- and performance

I have a legacy C++ application that constructs a tree of C++ objects (an
iterator tree to implement a query language).  I am trying to use LLVM to
"serialize" the state of this tree to disk for later loading and
execution (or "compile" it to disk, if you prefer).

Each of the C++ iterator objects now has a codegen() member function that adds
to the LLVM code of an llvm::Function.  The LLVM code generated is a sequence of
instructions to set up the arguments for and call the constructor of each C++
object.  (I am using C "thunks" that provide a C API to LLVM to make
C++ class constructor calls.)  Hence, all the LLVM code taken together into a
single "reconstitute" function are mostly a sequence of
"call" instructions with a few "store" and
"getelementptr" instructions here and there -- fairly straight-forward
LLVM code.

I then write out the LLVM IR code to disk and, at some later time, read it back
in with ParseIR(), do getPointerToFunction(), execute that function, and the C++
iterator tree has been reconstituted.

This all works, but the JIT compile step is *slow*.  For a sequence of about
8000 LLVM instructions (most of which are "call"), it takes several
seconds to execute.

It occurred to me that I don't really want JIT compiling.  I really want to
compile the LLVM code to machine code and write that to disk so that when I read
it back, I can just run it. The "reconstitute" function is only ever
run once per query invocation, so there's no benefit from JIT compiling it
since it will never be run a second or subsequent time.

Questions:

* Is what I'm doing with LLVM a "reasonable" thing to do with
LLVM?
* If so, how can I speed it up?  By generating machine code?  If so, how?

I've looked at the source for llc, but that apparently only generates
assembly source code, not object code.

- Paul


_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Paul J. Lucas

2012-Nov-06 15:43 UTC

head link

[LLVMdev] Using LLVM to serialize object state -- and performance

Thanks for responding.  Sorry for the delay in my reply, but I was dealing with
hurricane Sandy.  Anyway....

My software build produces libmylib.so.  The JIT'd function only calls
external C functions in libmylib.so and not other JIT'd functions.  The C
functions are simple thunks to call constructors.  For example, given:

	class BinaryNode : public Node {
	public:
	  BinaryNode( Node *left, Node *right );
	  // ...
	};

there exists a C thunk:

	void* T_BinaryNode_new_2Pv( void *left, void *right ) {
	  return new BinaryNode( (Node*)left, (Node*)right );
	}

The JIT'd function is just a sequence of such calls to thunks to build up an
object tree.  The idea is to generate LLVM code, write it out to disk, terminate
execution of the current program's process; then, at some later time, start
a new process for the program, read in the previously generated LLVM code from
disk, call the JIT function that will reconstitute the state of the tree just as
it was.

Elsewhere in my code, I keep a set of llvm::Function*'s, one for each thunk.
For each function, I use ExecutionEngine::addGlobalMapping() to bind the
Function* to the actual thunk.  The binding does use Module::getFunction(). 
Oddly, on Mac OS X, I only have to do this when my program is creating the LLVM
code; on Linux, I also have to do it when my program is reading the LLVM code
back in and trying to execute it.

Hopefully, I've explained this better.

You then later wrote:
> The default JITMemoryManager implementation uses
sys::DynamicLibrary::SearchForAddressOfSymbol to find the function.  If you know
all of the names and addresses of the functions that will need to be resolved,
you can provide a custom memory manager implementation to optimize this external
function resolution.
Based on my clarification, is this still the best course of action?

- Paul

On Oct 26, 2012, at 5:32 PM, "Kaylor, Andrew" <andrew.kaylor at
intel.com> wrote:
> I'm not sure I have a clear picture of what you're JIT'ing.  If
any of the JIT'ed functions call other JIT'ed functions, it may be
difficult to find all the dependencies of a functions and recreate them
correctly on a subsequent load.  Even if the JIT'ed functions only call
non-JIT'ed functions, I think you'd need some confidence that the
address of the called functions wasn't being moved.
> 
> It's possible that what you're considering would work, but I
don't think it's a scenario that the JIT intends to support.
> 
> It would be possible, however, to use the MCJIT engine and cache its
results.  It requires some modifications to the MCJIT engine but nothing major
(I know because my team has a patch in the works to do this, but it's
blocked by some other things at the moment).  MCJIT generates complete object
images and then uses RuntimeDyld to load them.  If you had a hook to save the
generated object, you could use RuntimeDyld directly to load it later.  There
are other ways to generate the object image (i.e. without MCJIT), but I'm
not sure it would be easier.
> 
> You basically just need to grab the Buffer that MCJIT::emitObject() has
after it calls PM.run() and Buffer->flush() but before it passes it to
Dyld.loadObject().  If you prefer, you could copy what MCJIT does and move it
somewhere in your own code.  There's not a lot to it.
> 
> -Andy
> 
> 
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu] On Behalf Of Paul J. Lucas
> Sent: Friday, October 26, 2012 4:17 PM
> To: llvmdev at cs.uiuc.edu List
> Subject: [LLVMdev] Using LLVM to serialize object state -- and performance
> 
> I have a legacy C++ application that constructs a tree of C++ objects (an
iterator tree to implement a query language).  I am trying to use LLVM to
"serialize" the state of this tree to disk for later loading and
execution (or "compile" it to disk, if you prefer).
> 
> Each of the C++ iterator objects now has a codegen() member function that
adds to the LLVM code of an llvm::Function.  The LLVM code generated is a
sequence of instructions to set up the arguments for and call the constructor of
each C++ object.  (I am using C "thunks" that provide a C API to LLVM
to make C++ class constructor calls.)  Hence, all the LLVM code taken together
into a single "reconstitute" function are mostly a sequence of
"call" instructions with a few "store" and
"getelementptr" instructions here and there -- fairly straight-forward
LLVM code.
> 
> I then write out the LLVM IR code to disk and, at some later time, read it
back in with ParseIR(), do getPointerToFunction(), execute that function, and
the C++ iterator tree has been reconstituted.
> 
> This all works, but the JIT compile step is *slow*.  For a sequence of
about 8000 LLVM instructions (most of which are "call"), it takes
several seconds to execute.
> 
> It occurred to me that I don't really want JIT compiling.  I really
want to compile the LLVM code to machine code and write that to disk so that
when I read it back, I can just run it. The "reconstitute" function is
only ever run once per query invocation, so there's no benefit from JIT
compiling it since it will never be run a second or subsequent time.
> 
> Questions:
> 
> * Is what I'm doing with LLVM a "reasonable" thing to do with
LLVM?
> * If so, how can I speed it up?  By generating machine code?  If so, how?
> 
> I've looked at the source for llc, but that apparently only generates
assembly source code, not object code.
> 
> - Paul
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Oct 2012 - [LLVMdev] Using LLVM to serialize object state -- and performance

[LLVMdev] Using LLVM to serialize object state -- and performance

[LLVMdev] Using LLVM to serialize object state -- and performance

[LLVMdev] Using LLVM to serialize object state -- and performance

[LLVMdev] Using LLVM to serialize object state -- and performance

Maybe Matching Threads