thr3ads.net - llvm dev - [LLVMdev] OCaml bindings [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Jon Harrop

2007-Nov-25 16:49 UTC

[LLVMdev] OCaml

On Sunday 25 November 2007 12:23, Gordon Henriksen
wrote:> On 2007-11-24, at 21:58, Jon Harrop wrote:
> > - Garbage collection tuned for functional programming
>
> http://llvm.org/docs/GarbageCollection.html
>
> I've been doing some interesting work on this front. Getting Lattner-
> cycles to have it reviewed and integrated is probably the biggest
> challenge; LLVM is a joy to work with even on major surgery like
> this. :)
>
> The goal is that LLVM should be able to take care of the code
> generation aspects of GC while leaving the runtime open-ended. It
> would be nice to provide a GC runtime along with LLVM, but I'm not
> entirely certain how realistic that is given how intertwined GC is
> with the object model. All of this is thoroughly treated in the above
> doc.
>
> Can you elaborate on what tuning you're looking for?
I'll give you a bit of background info:

I'm actually a natural scientist rather than a computer scientist and I am 
looking for a next-generation technical computing platform for Linux and Mac 
OS X that is open source but commerce friendly and provides features that can 
compete with the likes of Microsoft's new language F#.

I am more than willing to knuckle down on the project myself provided it will 
give me a platform that I can sell libraries for but I have no expertise in 
this field so I need all the help that projects like LLVM can give me! :-)

I have been working professionally with the OCaml language for several years 
now and find it to be enormously productive for two main reasons:

. Expressive: like ML
. Fast: like ML for symbolic code and like C/C++ for numeric code

This marriage of features allows OCaml to carve out a huge niche in scientific 
computing between languages like Mathematica and C++. Consequently, OCaml has 
garnered a lot of interest from the scientific community.

Although F# is similarly expressive and fast for numeric code it is slow for 
symbolic code because its run-time is inherited from .NET and is tuned for 
C#. In ordinary imperative languages like C#, values are rarely allocated and 
deallocated rapidly. However, in functional languages like F#, the 
distribution of value lifetimes is heavily geared toward a huge rate of 
allocation of very short-lived objects. Consequently, idiomatic functional 
code often runs up to 5x slower with F# than with OCaml because the .NET 
run-time is not tuned for this.

Now, the .NET platform obviously provides a great starting point for 
implementing languages like OCaml and is arguably more suitable than LLVM 
because it is higher-level. However, every problem is an opportunity. In this 
case, I believe LLVM would make it much easier to use a GC tuned for 
functional programming languages and, consequently, I think it will be quite 
feasible to get performance between that of OCaml and F# without too much 
difficulty.
> > - Exceptions
>
> http://llvm.org/docs/ExceptionHandling.html
>
> LLVM's exception support is tuned toward DWARF "zero-cost
exceptions,"
> i.e. C++ exception handling. Anton Korobeynikov and Duncan Sands (who
> is working on Ada) are probably the experts in this area.
Excellent. There is one thing that confuses me about this though. I 
benchmarked exception handling in OCaml and C++ a while ago and found OCaml 
to be ~6x faster and the best explanation I got was that C++ does not have 
zero-cost exceptions because it requires destructors to be called as the 
stack is unwound, whereas OCaml can just jump back and leave collection to 
the GC.

So does zero-cost exception handling in C++ refer to a special case where you 
can statically prove that there are no destructors to call, or something?
> > - Some interface to LLVM from OCaml
> >
> > What is the easiest way to interface a front-end written in OCaml
> > with an LLVM backend?
>
> The C and Ocaml bindings in the source tree are intended to cover
> precisely this scenario, and I would recommend them over .ll emission.
> Jan's remark is a bit out of date; the bindings are sufficient for
> code generation now. A few corners of the IR are still not fully
> covered, but extending the bindings to new methods is quite
> straightforward.
Fantastic.
> If ocamlc is on your path, then 'configure; make; make install'
should
> install the bindings in your ocaml lib. To link with them, compile
> your program with:
>
>      ocamlopt -cc g++
>
> The LLVM libraries currently bound are:
>
>      llvm.cmxa / .cma
>      llvm_bitwriter.cmxa / .cma
>      llvm_analysis.cmxa / .cma
>
> Their .mli files and the corresponding llvm-c headers (coupled with an
> understanding of the C++ API) are presently the best reference.
Right. I hadn't noticed they were already installed after llvm "make
install"
in:

  /usr/local/lib/ocaml/
> > I just rediscovered the OCaml bindings in bindings/ocaml (...). They
> > do indeed look quite complete but I can't find any examples using
> > them.
>
> See an example here, which an Ocaml program to emit the bitcode for a
> "hello world" program:
>
>      http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-October/010996.html
>
> > I think a translation of the tutorial would be most welcome and
> > about 10x shorter. ;-)
>
> Ah, maybe. Patches are welcome. :)
Wow, this is just great!

I had to tweak your example to get it to compile. Some of the function names 
and signatures have changed (I'm using CVS LLVM) so I've updated them
and
just thrown away the booleans you were passing (no idea what they were for 
but it works ;-). Also, I think const_string maybe should null terminate the 
given string so I changed your example to pass it a null terminated string 
instead (nasty hack).

My code is:

open Printf
open Llvm

let main filename    let m = create_module filename in

   (* @greeting = global [14 x i8] c"Hello, world!\00" *)
   let greeting      define_global "greeting" (const_string
"Hello, world!\000") m in

   (* declare i32 @puts(i8* ) *)
   let puts      declare_function "puts"
       (function_type i32_type [|pointer_type i8_type|]) m in

   (* define i32 @main() { entry: *)
   let main = define_function "main" (function_type i32_type [| |]) m
in
   let at_entry = builder_at_end (entry_block main) in

   (* %tmp = getelementptr [14 x i8]* @greeting, i32 0, i32 0 *)
   let zero = const_int i32_type 0 in
   let str = build_gep greeting [| zero; zero |] "tmp" at_entry in

   (* call i32 @puts( i8* %tmp ) *)
   ignore (build_call puts [| str |] "" at_entry);

   (* ret void *)
   ignore (build_ret (const_null i32_type) at_entry);

   (* write the module to a file *)
   if not (Llvm_bitwriter.write_bitcode_file m filename) then exit 1;
   dispose_module m

let () = match Sys.argv with
  | [|_; filename|] -> main filename
  | _ -> main "a.out"

To use it I just do:

$ ocamlopt -dtypes -cc g++ -I /usr/local/lib/ocaml/ llvm.cmxa 
llvm_bitwriter.cmxa hellow.ml -o hellow
$ ./hellow run.bc
$ llc -f -march=c run.bc -o run.c
$ gcc run.c -o run
run.c:114: warning: conflicting types for built-in function ‘malloc’
run.c: In function ‘main’:
run.c:143: warning: return type of ‘main’ is not ‘int’
$ ./run
Hello, world!

How do I compile straight to native code without going via C? Can we use pipes 
to avoid generating intermediate files?
> > the OCaml bindings in bindings/ocaml (rather than the ones in test/
> > Bindings/OCaml!).
>
> The latter directory contains tests of the former!
Ah, I see. :-)

Shall we port the tutorial to OCaml?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

Gordon Henriksen

2007-Nov-25 19:31 UTC

head link

[LLVMdev] OCaml and GC

On Nov 25, 2007, at 11:49, Jon Harrop wrote:
> On Sunday 25 November 2007 12:23, Gordon Henriksen wrote:
>
>> On 2007-11-24, at 21:58, Jon Harrop wrote:
>>
>>> - Garbage collection tuned for functional programming
>>
>> http://llvm.org/docs/GarbageCollection.html
>>
>> I've been doing some interesting work on this front. [..] Can you  
>> elaborate on what tuning you're looking for?
>
> Although F# is similarly expressive and fast for numeric code it is  
> slow for symbolic code because its run-time is inherited from .NET  
> and is tuned for C#. In ordinary imperative languages like C#,  
> values are rarely allocated and deallocated rapidly. However, in  
> functional languages like F#, the distribution of value lifetimes is  
> heavily geared toward a huge rate of allocation of very short-lived  
> objects. Consequently, idiomatic functional code often runs up to 5x  
> slower with F# than with OCaml because the .NET run-time is not  
> tuned for this.
The GC infrastructure in LLVM is agnostic to the runtime algorithms,  
so I believe it should be suitable here. It is not hampered by "one  
collector to rule them all." Though with one hand He giveth, and the  
other He taketh away: there is no default collector runtime, either.

— Gordon

Gordon Henriksen

2007-Nov-25 20:01 UTC

head link

[LLVMdev] OCaml bindings

On Nov 25, 2007, at 11:49, Jon Harrop wrote:
> On Sunday 25 November 2007 12:23, Gordon Henriksen wrote:
>
>> If ocamlc is on your path, then 'configure; make; make install'
>> should install the bindings in your ocaml lib.
>
> Right. I hadn't noticed they were already installed after llvm
"make
> install" in:
>
>  /usr/local/lib/ocaml/

Right. They're installed in 'if $stdlib is beneath $prefix then  
$stdlib else $prefix/lib/ocaml'. :) You can force the matter with

     ./configure --with-ocaml-libdir=`ocamlc -where`
>> On 2007-11-24, at 21:58, Jon Harrop wrote:
>>
>>
>>> - Some interface to LLVM from OCaml
>>> What is the easiest way to interface a front-end written in OCaml  
>>> with an LLVM backend?
>>
>> The C and Ocaml bindings in the source tree are intended to cover  
>> precisely this scenario, and I would recommend them over .ll  
>> emission. Jan's remark is a bit out of date; the bindings are  
>> sufficient for code generation now. A few corners of the IR are  
>> still not fully covered, but extending the bindings to new methods  
>> is quite straightforward.
>>
>>> I just rediscovered the OCaml bindings in bindings/ocaml (...).  
>>> They do indeed look quite complete but I can't find any
examples
>>> using them.
>>
>> See an example here, which an Ocaml program to emit the bitcode for  
>> a "hello world" program:
>>
>>     http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-October/010996.html
>>
>
> Wow, this is just great!
>
> I had to tweak your example to get it to compile. Some of the  
> function names and signatures have changed (I'm using CVS LLVM) so  
> I've updated them and just thrown away the booleans you were passing  
> (no idea what they were for but it works ;-).
Ah, right; I'd forgotten about those changes or I would've updated it  
for you. The booleans in particular were used for null-terminating  
strings and creating varargs function types, among others, but weren't  
self-documenting, so I introduced variant names instead. For instance:
> Also, I think const_string maybe should null terminate the given  
> string so I changed your example to pass it a null terminated string  
> instead (nasty hack).
const_stringz null-terminates the string. But adding \000 in the  
literal as you did is equivalent.
> My code is:
>
> [...]
>
> To use it I just do:
>
> $ ocamlopt -dtypes -cc g++ -I /usr/local/lib/ocaml/ llvm.cmxa  
> llvm_bitwriter.cmxa hellow.ml -o hellow
> $ ./hellow run.bc
>
> How do I compile straight to native code without going via C?
-march=c invokes a very unusual LLVM target which emits C code instead  
of assembly. Read http://llvm.org/cmds/llc.html and llc --help. Here:
> $ llc -f -march=c run.bc -o run.c
Simply run 'llc run.bc -o run.s' to generate native assembly code.  
 From there, you can use 'as' and 'ld' to assemble and link.

In my example, I had used the gcc driver to succinctly invoke 'as' and  
'ld' in the proper platform-specific manner, including linking with  
the C standard library (for printf). Since the input was already  
compiled to assembly code, 'gcc' did not invoke the C compiler.
> $ gcc run.c -o run
> run.c:114: warning: conflicting types for built-in function ‘malloc’
> run.c: In function ‘main’:
> run.c:143: warning: return type of ‘main’ is not ‘int’
> $ ./run
> Hello, world!
>
> Can we use pipes to avoid generating intermediate files?
For the llvm tools, yes. For instance:

     # optimize bitcode and disassemble to LLVM assembly
     opt -std-compile-opts < run.bc | llvm-dis

     # optimize bitcode, compile to native assembly, and assemble to  
native object
     opt -std-compile-opts < run.bc | llc | as -o run.o

However, writing the bitcode presently requires a temp file. This is a  
problem with the libraries at both ends:

  1. Standard C++ doesn't provide file-descriptor streams at all.  
(LLVM uses C++ iostreams.)
  2. Ocaml doesn't allow extracting the file descriptor from an  
Output_channel.

There are a variety of workarounds available for the bindings, but I  
have not yet pursued them. The simplest is a "write_bitcode_to_stdout"
function which uses the std::cout stream internally.
>> I think a translation of the tutorial would be most welcome and  
>> about 10x shorter. ;-)
>
> Shall we port the tutorial to OCaml?
By all means! I think you'd also stumble across some areas that are  
not bound yet, such as building pipelines to run optimizations in- 
process. I'd be happy to fill in any gaps you find. It would be a very  
useful exercise in that it would generate that feedback, improving  
both the C and Ocaml bindings.

You'll find that the LLVM community openly welcomes not only users and  
their contributions, so please jump in and get your hands dirty. The  
#llvm IRC channel on irc.oftc.net is also a great resource.

— Gordon

Jon Harrop

2007-Nov-25 21:04 UTC

head link

[LLVMdev] OCaml bindings

On Sunday 25 November 2007 20:01, Gordon Henriksen
wrote:> On Nov 25, 2007, at 11:49, Jon Harrop wrote:
> > On Sunday 25 November 2007 12:23, Gordon Henriksen wrote:
> >> If ocamlc is on your path, then 'configure; make; make
install'
> >> should install the bindings in your ocaml lib.
> >
> > Right. I hadn't noticed they were already installed after llvm
"make
> > install" in:
> >
> >  /usr/local/lib/ocaml/
>
> Right. They're installed in 'if $stdlib is beneath $prefix then
> $stdlib else $prefix/lib/ocaml'. :) You can force the matter with
>
>      ./configure --with-ocaml-libdir=`ocamlc -where`
For some reason my /etc/ocamlfind.conf was wrong so that didn't do what I 
would have expected.
> > I had to tweak your example to get it to compile. Some of the
> > function names and signatures have changed (I'm using CVS LLVM) so
> > I've updated them and just thrown away the booleans you were
passing
> > (no idea what they were for but it works ;-).
>
> Ah, right; I'd forgotten about those changes or I would've updated
it
> for you. The booleans in particular were used for null-terminating
> strings and creating varargs function types, among others, but weren't
>
> self-documenting, so I introduced variant names instead. For instance:
> > Also, I think const_string maybe should null terminate the given
> > string so I changed your example to pass it a null terminated string
> > instead (nasty hack).
>
> const_stringz null-terminates the string. But adding \000 in the
> literal as you did is equivalent.
Great.
> > My code is:
> >
> > [...]
> >
> > To use it I just do:
> >
> > $ ocamlopt -dtypes -cc g++ -I /usr/local/lib/ocaml/ llvm.cmxa
> > llvm_bitwriter.cmxa hellow.ml -o hellow
> > $ ./hellow run.bc
> >
> > How do I compile straight to native code without going via C?
>
> -march=c invokes a very unusual LLVM target which emits C code instead
>
> of assembly. Read http://llvm.org/cmds/llc.html and llc --help. Here:
> > $ llc -f -march=c run.bc -o run.c
>
> Simply run 'llc run.bc -o run.s' to generate native assembly code.
>  From there, you can use 'as' and 'ld' to assemble and
link.
>
> In my example, I had used the gcc driver to succinctly invoke 'as'
and
> 'ld' in the proper platform-specific manner, including linking with
> the C standard library (for printf). Since the input was already
> compiled to assembly code, 'gcc' did not invoke the C compiler.
I've done the same now: works like a treat!
> > Can we use pipes to avoid generating intermediate files?
>
> For the llvm tools, yes. For instance:
>
>      # optimize bitcode and disassemble to LLVM assembly
>      opt -std-compile-opts < run.bc | llvm-dis
>
>      # optimize bitcode, compile to native assembly, and assemble to
> native object
>      opt -std-compile-opts < run.bc | llc | as -o run.o
>
> However, writing the bitcode presently requires a temp file. This is a
> problem with the libraries at both ends:
>
>   1. Standard C++ doesn't provide file-descriptor streams at all.
> (LLVM uses C++ iostreams.)
>   2. Ocaml doesn't allow extracting the file descriptor from an
> Output_channel.
>
> There are a variety of workarounds available for the bindings, but I
> have not yet pursued them. The simplest is a
"write_bitcode_to_stdout"
> function which uses the std::cout stream internally.
Right. That would probably do the trick. Anyone wanting JIT functionality from 
OCaml will presumably want that.
> >> I think a translation of the tutorial would be most welcome and
> >> about 10x shorter. ;-)
> >
> > Shall we port the tutorial to OCaml?
>
> By all means! I think you'd also stumble across some areas that are
> not bound yet, such as building pipelines to run optimizations in-
> process. I'd be happy to fill in any gaps you find. It would be a very
> useful exercise in that it would generate that feedback, improving
> both the C and Ocaml bindings.
Brilliant. I'll keep hacking away at this then. I'm just working on an
OCaml
port of the (hard-coded) Fibonacci example. Then I'll try compiling 
expression trees on the OCaml side and we can start doing more interesting 
things.
> You'll find that the LLVM community openly welcomes not only users and
> their contributions, so please jump in and get your hands dirty. The
> #llvm IRC channel on irc.oftc.net is also a great resource.
Awesome. I'm not a big user of IRC but I'll check it out.

Many thanks,
-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

Gordon Henriksen

2007-Nov-25 21:40 UTC

head link

[LLVMdev] OCaml and Exceptions

On Nov 25, 2007, at 11:49, Jon Harrop wrote:
> On Sunday 25 November 2007 12:23, Gordon Henriksen wrote:
>
>> On 2007-11-24, at 21:58, Jon Harrop wrote:
>>
>>> - Exceptions
>>
>> http://llvm.org/docs/ExceptionHandling.html
>>
>> LLVM's exception support is tuned toward DWARF "zero-cost  
>> exceptions," i.e. C++ exception handling. Anton Korobeynikov and  
>> Duncan Sands (who is working on Ada) are probably the experts in  
>> this area.
>
> Excellent.
>
> So does zero-cost exception handling in C++ refer to a special case  
> where you can statically prove that there are no destructors to  
> call, or something?
"Zero cost" refers to the overhead introduced in the case when an  
exception is not thrown at all.

C++ has a guiding principal that "you don't pay for it if you don't
use it." In the case of exceptions, this had pretty far-reaching  
implications, since each stack-based object calls for a destructor  
cleanup block. The resolution was for implementations to introduce no  
runtime overhead whatsoever for a try/catch block if no exception is  
raised, thus "zero cost". Throw actually became more expensive as a  
result of these implementations.
> There is one thing that confuses me about this though. I benchmarked  
> exception handling in OCaml and C++ a while ago and found OCaml to  
> be ~6x faster and the best explanation I got was that C++ does not  
> have zero-cost exceptions
Microbenchmarking raise vs. throw is not very realistic. ZCEH makes a  
complicated performance tradeoff wherein the exceptional case is  
penalized in order to optimize the common case. Since C++ code has  
many implicit try/finally blocks (each stack-based object with a  
destructor), this is a clear win. Ocaml code is quite different.
> because it requires destructors to be called as the stack is  
> unwound, whereas OCaml can just jump back and leave collection to  
> the GC.
GC in general will reduce the amount of cleanup code, which should  
reduce DWARF EH's actual overhead (fewer cleanup blocks means smaller  
tables and accordingly cheaper unwinds), especially when unwinding  
deep stacks. Objective-C 2 on Leopard may be one of the few systems  
which has this combination of properties. (But exceptions in Cocoa  
code are discouraged even more strongly than in C++.)

Note that there is nothing in LLVM that prevents alternative exception  
handling regimes, they just won't benefit from the existing  
infrastructure, and will not interoperate with the DWARF runtime.

— Gordon

Jon Harrop

2007-Nov-25 22:00 UTC

head link

[LLVMdev] OCaml and Exceptions

On Sunday 25 November 2007 21:40, Gordon Henriksen
wrote:> On Nov 25, 2007, at 11:49, Jon Harrop wrote:
> > So does zero-cost exception handling in C++ refer to a special case
> > where you can statically prove that there are no destructors to
> > call, or something?
>
> "Zero cost" refers to the overhead introduced in the case when an
> exception is not thrown at all.
Ah, I see.
> > There is one thing that confuses me about this though. I benchmarked
> > exception handling in OCaml and C++ a while ago and found OCaml to
> > be ~6x faster and the best explanation I got was that C++ does not
> > have zero-cost exceptions
>
> Microbenchmarking raise vs. throw is not very realistic...
Just to clarify: the benchmark compared the performance of RB tree 
implementations using exceptions to terminate early when an already-present 
element was inserted. I found that it is more efficient in C++ (using g++) to 
mimic the exception using nested "if"s rather than throw and catch.
Although
simple, this benchmark was of practical importance to me at the time.
> GC in general will reduce the amount of cleanup code, which should
> reduce DWARF EH's actual overhead (fewer cleanup blocks means smaller
> tables and accordingly cheaper unwinds), especially when unwinding
> deep stacks.
I'm not quite sure I understand this (I don't even know what DWARF EH
is!) but
are "cleanup blocks" the construct that I would be using to build
exception
handlers?
> Objective-C 2 on Leopard may be one of the few systems 
> which has this combination of properties. (But exceptions in Cocoa
> code are discouraged even more strongly than in C++.)
Ok.
> Note that there is nothing in LLVM that prevents alternative exception
> handling regimes, they just won't benefit from the existing
> infrastructure, and will not interoperate with the DWARF runtime.
Ok. Might as well start by reusing as much as possible. Exceptions are very 
common in OCaml though. Is my 6x result a fair quantitative estimate of how 
much faster exceptions could be make in this system for my compiler if it 
were customized?

If so, I don't think that is a problem: exceptions are 600x slower in F#
than
in OCaml!

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Nov 2007 - [LLVMdev] OCaml bindings

[LLVMdev] OCaml

[LLVMdev] OCaml and GC

[LLVMdev] OCaml bindings

[LLVMdev] OCaml bindings

[LLVMdev] OCaml and Exceptions

[LLVMdev] OCaml and Exceptions

Seemingly Similar Threads