I'm having another play with LLVM using the OCaml bindings for a forthcoming OCaml Journal article and I have a couple of remarks: Firstly, I noticed that the execute engine is very slow, taking milliseconds to call a JIT compiled function. Is this an inherent overhead or am I calling it incorrectly or is this something that can be optimized in the OCaml bindings? Secondly, I happened to notice that JIT compiled code executed on the fly does not read from the stdin of the host OCaml program although it can write to stdout. Is this a bug? Many thanks, -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
On Fri, Sep 5, 2008 at 8:26 PM, Jon Harrop <jonathandeanharrop at googlemail.com> wrote:> Firstly, I noticed that the execute engine is very slow, taking milliseconds > to call a JIT compiled function. Is this an inherent overhead or am I calling > it incorrectly or is this something that can be optimized in the OCaml > bindings?What is the signature of the function you are calling? When calling a generated function via runFunction, the JIT handles some common signatures but if it doesn't recognize the function signature it falls back on generating a stub function on the fly. This generation is fairly expensive and is probably the overhead you are seeing. There should be little more inherent overhead than the cost of a function call if the stub path isn't being taken. The simple solution (aside from fixing JIT) is to change your signature to match one of the ones the JIT special cases (see JIT::runFunction). A nullary one with arguments passed in globals works fine, if thread safety isn't a concern. - Daniel
On Monday 08 September 2008 17:23:48 Daniel Dunbar wrote:> On Fri, Sep 5, 2008 at 8:26 PM, Jon Harrop > > <jonathandeanharrop at googlemail.com> wrote: > > Firstly, I noticed that the execute engine is very slow, taking > > milliseconds to call a JIT compiled function. Is this an inherent > > overhead or am I calling it incorrectly or is this something that can be > > optimized in the OCaml bindings? > > What is the signature of the function you are calling?unit -> unit So I am passing zero arguments and returning void.> When calling a generated function via runFunction, the JIT handles some > common signatures but if it doesn't recognize the function signature it > falls back on generating > a stub function on the fly. This generation is fairly expensive and is > probably the overhead > you are seeing. There should be little more inherent overhead than the > cost of a function > call if the stub path isn't being taken. > > The simple solution (aside from fixing JIT) is to change your > signature to match one > of the ones the JIT special cases (see JIT::runFunction). A nullary > one with arguments > passed in globals works fine, if thread safety isn't a concern.I see. Looking at JIT::runFunction, passing one dummy int32 argument should do the trick. I'll see if I can write something a little cleverer on the OCaml side to run-time compile stubs either so that partial application can be used to share them or just memoize to reuse them. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
On 2008-09-05, at 23:26, Jon Harrop wrote:> I'm having another play with LLVM using the OCaml bindings for a > forthcoming > OCaml Journal article and I have a couple of remarks: > > Firstly, I noticed that the execute engine is very slow, taking > milliseconds to call a JIT compiled function. Is this an inherent > overhead or am I calling it incorrectly or is this something that > can be optimized in the OCaml bindings?The high-level calling convention using GenericValue is going to be very slow relative to a native function call. This is true in C++, but even moreso in Ocaml, which must cons up a bunch of objects on the heap for each call. To get best performance, you would want to avoid fine-grained calls into JIT'd code, e.g. by iterating over inputs inside the JIT instead of outside. If you want to improve performance of the GenericValue-based interface, I'd suggest trying to minimize the number and overhead of allocations in your Ocaml code, then look at the bindings themselves: - If GenericValues can't be reused, add bindings to allow mutating them. Reuse the same 'n' instances for each call into JIT code. Yucky imperative data structures to the rescue. - Write bindings for a heap-allocated GenericValue[] and wrap that in a custom block instead of heap-allocating each GenericValue individually. Of course such an array must be mutable. More imperative data structures! - Try using placement new to initialize GenericValues inside of Ocaml blocks instead of new'ing them up on the C++ heap as is presently done. This would be outside the bounds of standard C++, so it could fail. This would require circumventing the C bindings, since such cannot expose the C++ GenericValue class as a struct. - Use Ocaml variants for inputs (type GenericValue = Pointer of 'a | Int of bits * value | ...) and convert those to a stack-based SmallVector<GenericValue>. This will avoid finalizers on the Ocaml blocks. This doesn't work symmetrically for outputs, though. Likewise, it involves going around the C bindings. But realize that a GenericValue-based interface will always be slow relative to a native call. If you have a specific performance goal though, you may be able to cheaply eliminate 'enough' overhead for your needs without much work. All of the above are relatively simple (should be doable in a day, modulo patch review). For the very best performance, you really want to call the JIT'd function directly—e.g., let nf = native_function name m where native_function has type string -> Llvm.module -> 'a and nf has some functional type, like int -> int -> int. However, this is subject to the quirks and complexities of the Ocaml FFI (e.g., overflow arguments passed in a global array on x86, totally nonstandard calling convention). - If you know in advance the signature of the functions you're going to call, you can write shims in C (similar to those in llvm_ocaml.c) that will add not terribly much overhead. These wouldn't really be of any use to anyone else, though. - If not, you can generate the shims at runtime using LLVM (even inline them into the callee), but will have to reimplement Ocaml's FFI macros for unwrapping values and tracking stack roots. This would take considerably more effort to implement (esp. portably), but would be a substantial improvement to the bindings if the helpers were incorporated therein.> Secondly, I happened to notice that JIT compiled code executed on > the fly does not read from the stdin of the host OCaml program > although it can write to stdout. Is this a bug?This has nothing to do with LLVM. — Gordon
On Monday 08 September 2008 20:17:01 Gordon Henriksen wrote:> On 2008-09-05, at 23:26, Jon Harrop wrote: > > I'm having another play with LLVM using the OCaml bindings for a > > forthcoming > > OCaml Journal article and I have a couple of remarks: > > > > Firstly, I noticed that the execute engine is very slow, taking > > milliseconds to call a JIT compiled function. Is this an inherent > > overhead or am I calling it incorrectly or is this something that > > can be optimized in the OCaml bindings? > > The high-level calling convention using GenericValue is going to be > very slow relative to a native function call. This is true in C++, but > even moreso in Ocaml, which must cons up a bunch of objects on the > heap for each call.Unless tens of thousands of allocations are made for every call, I do not believe that explains the performance discrepancy I quantified. A millisecond is a long time in this context. Does it spawn or fork a process? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e