thr3ads.net - llvm dev - [LLVMdev] OCaml bindings to LLVM [Sep 2008]

If this information is useful, please help other people find it:
Share via:

Jon Harrop

2008-Sep-06 03:26 UTC

[LLVMdev] OCaml bindings to LLVM

I'm having another play with LLVM using the OCaml bindings for a forthcoming
OCaml Journal article and I have a couple of remarks:

Firstly, I noticed that the execute engine is very slow, taking milliseconds 
to call a JIT compiled function. Is this an inherent overhead or am I calling 
it incorrectly or is this something that can be optimized in the OCaml 
bindings?

Secondly, I happened to notice that JIT compiled code executed on the fly does 
not read from the stdin of the host OCaml program although it can write to 
stdout. Is this a bug?

Many thanks,
-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Daniel Dunbar

2008-Sep-08 16:23 UTC

head link

[LLVMdev] OCaml bindings to LLVM

On Fri, Sep 5, 2008 at 8:26 PM, Jon Harrop
<jonathandeanharrop at googlemail.com> wrote:> Firstly, I noticed that the execute engine is very slow, taking
milliseconds
> to call a JIT compiled function. Is this an inherent overhead or am I
calling
> it incorrectly or is this something that can be optimized in the OCaml
> bindings?
What is the signature of the function you are calling?

When calling a generated function via runFunction, the JIT handles some common
signatures but if it doesn't recognize the function signature it falls
back on generating
a stub function on the fly. This generation is fairly expensive and is
probably the overhead
you are seeing. There should be little more inherent overhead than the
cost of a function
call if the stub path isn't being taken.

The simple solution (aside from fixing JIT) is to change your
signature to match one
of the ones the JIT special cases (see JIT::runFunction). A nullary
one with arguments
passed in globals works fine, if thread safety isn't a concern.

 - Daniel

Jon Harrop

2008-Sep-08 19:10 UTC

head link

[LLVMdev] OCaml bindings to LLVM

On Monday 08 September 2008 17:23:48 Daniel Dunbar
wrote:> On Fri, Sep 5, 2008 at 8:26 PM, Jon Harrop
>
> <jonathandeanharrop at googlemail.com> wrote:
> > Firstly, I noticed that the execute engine is very slow, taking
> > milliseconds to call a JIT compiled function. Is this an inherent
> > overhead or am I calling it incorrectly or is this something that can
be
> > optimized in the OCaml bindings?
>
> What is the signature of the function you are calling?
  unit -> unit

So I am passing zero arguments and returning void.
> When calling a generated function via runFunction, the JIT handles some
> common signatures but if it doesn't recognize the function signature it
> falls back on generating
> a stub function on the fly. This generation is fairly expensive and is
> probably the overhead
> you are seeing. There should be little more inherent overhead than the
> cost of a function
> call if the stub path isn't being taken.
>
> The simple solution (aside from fixing JIT) is to change your
> signature to match one
> of the ones the JIT special cases (see JIT::runFunction). A nullary
> one with arguments
> passed in globals works fine, if thread safety isn't a concern.
I see. Looking at JIT::runFunction, passing one dummy int32 argument should do 
the trick.

I'll see if I can write something a little cleverer on the OCaml side to 
run-time compile stubs either so that partial application can be used to 
share them or just memoize to reuse them.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Gordon Henriksen

2008-Sep-08 19:17 UTC

head link

[LLVMdev] OCaml bindings to LLVM

On 2008-09-05, at 23:26, Jon Harrop wrote:
> I'm having another play with LLVM using the OCaml bindings for a  
> forthcoming
> OCaml Journal article and I have a couple of remarks:
>
> Firstly, I noticed that the execute engine is very slow, taking  
> milliseconds to call a JIT compiled function. Is this an inherent  
> overhead or am I calling it incorrectly or is this something that  
> can be optimized in the OCaml bindings?
The high-level calling convention using GenericValue is going to be  
very slow relative to a native function call. This is true in C++, but  
even moreso in Ocaml, which must cons up a bunch of objects on the  
heap for each call. To get best performance, you would want to avoid  
fine-grained calls into JIT'd code, e.g. by iterating over inputs  
inside the JIT instead of outside.


If you want to improve performance of the GenericValue-based  
interface, I'd suggest trying to minimize the number and overhead of  
allocations in your Ocaml code, then look at the bindings themselves:

- If GenericValues can't be reused, add bindings to allow mutating  
them. Reuse the same 'n' instances for each call into JIT code. Yucky  
imperative data structures to the rescue.

- Write bindings for a heap-allocated GenericValue[] and wrap that in  
a custom block instead of heap-allocating each GenericValue  
individually. Of course such an array must be mutable. More imperative  
data structures!

- Try using placement new to initialize GenericValues inside of Ocaml  
blocks instead of new'ing them up on the C++ heap as is presently  
done. This would be outside the bounds of standard C++, so it could  
fail. This would require circumventing the C bindings, since such  
cannot expose the C++ GenericValue class as a struct.

- Use Ocaml variants for inputs (type GenericValue = Pointer of 'a |  
Int of bits * value | ...) and convert those to a stack-based  
SmallVector<GenericValue>. This will avoid finalizers on the Ocaml  
blocks. This doesn't work symmetrically for outputs, though. Likewise,  
it involves going around the C bindings.

But realize that a GenericValue-based interface will always be slow  
relative to a native call. If you have a specific performance goal  
though, you may be able to cheaply eliminate 'enough' overhead for  
your needs without much work. All of the above are relatively simple  
(should be doable in a day, modulo patch review).


For the very best performance, you really want to call the JIT'd  
function directly—e.g.,

     let nf = native_function name m

where native_function has type string -> Llvm.module -> 'a and nf has
some functional type, like int -> int -> int.

However, this is subject to the quirks and complexities of the Ocaml  
FFI (e.g., overflow arguments passed in a global array on x86, totally  
nonstandard calling convention).

- If you know in advance the signature of the functions you're going  
to call, you can write shims in C (similar to those in llvm_ocaml.c)  
that will add not terribly much overhead. These wouldn't really be of  
any use to anyone else, though.

- If not, you can generate the shims at runtime using LLVM (even  
inline them into the callee), but will have to reimplement Ocaml's FFI  
macros for unwrapping values and tracking stack roots. This would take  
considerably more effort to implement (esp. portably), but would be a  
substantial improvement to the bindings if the helpers were  
incorporated therein.

> Secondly, I happened to notice that JIT compiled code executed on  
> the fly does not read from the stdin of the host OCaml program  
> although it can write to stdout. Is this a bug?

This has nothing to do with LLVM.

— Gordon

Jon Harrop

2008-Sep-08 20:47 UTC

head link

[LLVMdev] OCaml bindings to LLVM

On Monday 08 September 2008 20:17:01 Gordon Henriksen
wrote:> On 2008-09-05, at 23:26, Jon Harrop wrote:
> > I'm having another play with LLVM using the OCaml bindings for a
> > forthcoming
> > OCaml Journal article and I have a couple of remarks:
> >
> > Firstly, I noticed that the execute engine is very slow, taking
> > milliseconds to call a JIT compiled function. Is this an inherent
> > overhead or am I calling it incorrectly or is this something that
> > can be optimized in the OCaml bindings?
>
> The high-level calling convention using GenericValue is going to be
> very slow relative to a native function call. This is true in C++, but
> even moreso in Ocaml, which must cons up a bunch of objects on the
> heap for each call.
Unless tens of thousands of allocations are made for every call, I do not 
believe that explains the performance discrepancy I quantified. A millisecond 
is a long time in this context.

Does it spawn or fork a process?

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Sep 2008 - [LLVMdev] OCaml bindings to LLVM

[LLVMdev] OCaml bindings to LLVM

[LLVMdev] OCaml bindings to LLVM

[LLVMdev] OCaml bindings to LLVM

[LLVMdev] OCaml bindings to LLVM

[LLVMdev] OCaml bindings to LLVM

Apparently Analagous Threads