thr3ads.net - llvm dev - [LLVMdev] Fibonacci example in OCaml [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Gordon Henriksen

2007-Nov-26 16:21 UTC

[LLVMdev] Fibonacci example in OCaml

On Nov 26, 2007, at 00:47, Jon Harrop wrote:
> Here is a complete 104-line native code compiler for a tiny subset  
> of OCaml that is expressive enough to compile an external Fibonacci  
> program:
>
> [...]
>
> I was kind of hoping that function pointers would just magically  
> work, so this:
>
>  do (if 1 <= 2 then fib else fib) 40
>
> would run, but instead it produces this error:
>
> $ llc -f run.bc -o run.s
> llc: bitcode didn't read correctly.
> Reason: Invalid instruction with no BB
Try 'Llvm_analysis.assert_valid_module m;' before you write bitcode to  
figure out where things went awry. ('dump_module m;' may also help.)  
GIGO applies (but garbage-in/segfault-out is more likely).

Unfortunately, even if the bindings were more strongly typed, it would  
still be structurally possible to build invalid LLVM code, so you've  
just got to take care not to violate the invariants, then use the  
verifier as a cross-check.

Using an asserts build of LLVM is also helpful, but you should have  
this if you built from SVN.
> I assume I need a real type system that can handle both ints and  
> function pointers?
Something like that.

If you're fond of Ocaml's tagged object model (and it does have some  
nice properties), you'll want to slam every value to some generic  
value type and use lots of bitcasts. The value type is either the  
native integer (i32 or i64 depending on target) or a pointer type,  
probably the recursive pointer type:

   let value_ty      let temp_ty = opaque_type () in
     let h = handle_to_type temp_ty in
     refine_type temp_ty (pointer_type temp_ty);
     type_of_handle h

Since LLVM does not have an intptr type, you'd need to know your  
target and parameterize your codegen accordingly to use the integer  
types.

The bitcasts have no runtime cost, and LLVM optimization passes can  
simplify them. Your compiler's semantic analysis phase can be  
responsible for proving type-safety, so it isn't strictly necessary to  
propagate type annotations on expressions into the codegen phase so  
long as the AST nodes themselves are not polymorphic.

However, the potential exists to significantly reduce memory pressure  
(avoiding boxing) by using a typed object model instead of a tagged  
one. In this case, propagating types throughout would be necessary.  
This decision has very significant implications for the compilation of  
polymorphic functions.
> After that, I'll add compound types and a memory leak. ;-)

Indeed.

— Gordon

Jon Harrop

2007-Nov-26 19:18 UTC

head link

[LLVMdev] Fibonacci example in OCaml

On Monday 26 November 2007 16:21, Gordon Henriksen
wrote:> Try 'Llvm_analysis.assert_valid_module m;' before you write bitcode
to
> figure out where things went awry. ('dump_module m;' may also
help.)
> GIGO applies (but garbage-in/segfault-out is more likely).
Ok, thanks for the tip.
> Unfortunately, even if the bindings were more strongly typed, it would
> still be structurally possible to build invalid LLVM code, so you've
> just got to take care not to violate the invariants, then use the
> verifier as a cross-check.
I suspect the OCaml bindings to LLVM could express more of these constraints 
in the static type system. Has anyone tried to leverage this?

I'd certainly like to do this for a higher-level representation. I have lots
of ideas but I suspect that is OT for this list.
> Using an asserts build of LLVM is also helpful, but you should have
> this if you built from SVN.
I did, yes.
> > I assume I need a real type system that can handle both ints and
> > function pointers?
>
> Something like that.
>
> If you're fond of Ocaml's tagged object model (and it does have
some
> nice properties), you'll want to slam every value to some generic
> value type and use lots of bitcasts. The value type is either the
> native integer (i32 or i64 depending on target) or a pointer type,
> probably the recursive pointer type:
>
>    let value_ty >      let temp_ty = opaque_type () in
>      let h = handle_to_type temp_ty in
>      refine_type temp_ty (pointer_type temp_ty);
>      type_of_handle h
>
> Since LLVM does not have an intptr type, you'd need to know your
> target and parameterize your codegen accordingly to use the integer
> types.
>
> The bitcasts have no runtime cost, and LLVM optimization passes can
> simplify them. Your compiler's semantic analysis phase can be
> responsible for proving type-safety, so it isn't strictly necessary to
> propagate type annotations on expressions into the codegen phase so
> long as the AST nodes themselves are not polymorphic.
>
> However, the potential exists to significantly reduce memory pressure
> (avoiding boxing) by using a typed object model instead of a tagged
> one. In this case, propagating types throughout would be necessary.
> This decision has very significant implications for the compilation of
> polymorphic functions.
So this is the part where I start asking really basic questions. :-)

As I understand it, the simplest approach is to box all values, which means 
that every value is stored as a struct containing the type (an enum) and the 
value itself (inline if it is no bigger than a pointer or as a pointer to a 
larger data structure otherwise). Something like this:

enum runtype {
  Int;
  Float;
  Array;
}

struct array {
  int length;
  box *a;
}

union value {
  int n;    // 64-bit
  float x;  // 64-bit
  array* u; // >64-bit
}

struct box {
  runtype t;
  value v;
}

So you'd create an int with:

box make_int(int n) {
  box b;
  b.t = Int;
  b.v.n = n;
  return b;
}

Is that right or are values stored as a box*?

I'd rather box everything rather than tag ints to start with. I'll
relegate
that to a potential optimization.

What of this can LLVM's optimizer optimize away for me?

So I have to work out how to generate IR that handles those data structures.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

Gordon Henriksen

2007-Nov-26 20:05 UTC

head link

[LLVMdev] Fibonacci example in OCaml

On Nov 26, 2007, at 14:18, Jon Harrop wrote:
> On Monday 26 November 2007 16:21, Gordon Henriksen wrote:
>>
>
>> Unfortunately, even if the bindings were more strongly typed, it  
>> would still be structurally possible to build invalid LLVM code, so  
>> you've just got to take care not to violate the invariants, then  
>> use the verifier as a cross-check.
>
> I suspect the OCaml bindings to LLVM could express more of these  
> constraints in the static type system. Has anyone tried to leverage  
> this?
Any specific ideas?
>>> I assume I need a real type system that can handle both ints and
>>> function pointers?
>>
>
> As I understand it, the simplest approach is to box all values,  
> which means that every value is stored as a struct containing the  
> type (an enum) and the value itself (inline if it is no bigger than  
> a pointer or as a pointer to a larger data structure otherwise).
>
> I'd rather box everything rather than tag ints to start with. I'll
> relegate that to a potential optimization.
You could go about it that way, sure. Memory pressure will be very high.
> What of this can LLVM's optimizer optimize away for me?
Not much. LLVM won't really change your memory layout. The features  
which did perform dramatic memory reorganization were excised from the  
source tree due to patent infringement issues. (They remain in the  
repository on a branch for UIUC research.) These were also targeted  
toward manual memory management.

It should be possible for LLVM to perform intra- or interprocedural  
escape analysis and lower heap allocations to stack allocations, but I  
don't believe this is implemented. Garbage collection would also  
complicate it, since LLVM would need to recognize GC allocator  
functions as such. This would be an excellent project if you are  
interested in contributing to LLVM itself.
> So I have to work out how to generate IR that handles those data  
> structures.

Should be straightforward. Read the getelementptr FAQ. Unions are  
handled with pointer bitcasts.

http://llvm.org/docs/GetElementPtr.html

— Gordon

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Nov 2007 - [LLVMdev] Fibonacci example in OCaml

[LLVMdev] Fibonacci example in OCaml

[LLVMdev] Fibonacci example in OCaml

[LLVMdev] Fibonacci example in OCaml

Possibly Parallel Threads