thr3ads.net - llvm dev - [LLVMdev] improving the ocaml binding's type safety [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Erick Tryzelaar

2008-Mar-15 08:03 UTC

[LLVMdev] improving the ocaml binding's type safety

I was talking to Gordon on #llvm earlier, and he challenged me with
coming up with a way to improve the ocaml binding's type safety. We
can't go letting haskell beat us now, can we? I think I got an easy
solution with phantom types.

For those who don't know what the problem is, the ocaml bindings share
one type between whole class branches (like values). This means we
need to downcast and assert to protect against, say, trying to get the
parameter list of a constant. We could try to use ocaml objects to
model the c++ classes, but then we run into issues with memory
management and complexity. Phantom types are extra types that help
with type checking, but are thrown away afterwards. This lets us
create tags to distinguish between values and functions without any
overhead. When we combine these with polymorphic variants, we get
something that's very similar to inheritance.

Here's an example. Polymorphic variants are similar to regular
variants, but the names can be applied to anything. So, you can have:

let a = `Foo 2
let b = `Foo "bar"

And it's not a type error. You specify the type like this:

type a = [ `Foo ] (* specify a type that can only be a `Foo. *)
type b = [ `Foo | `Bar ] (* type can either be `Foo or `Bar. *)
type c = [ a | `Bar ]   (* equivalent to b. *)
type d = [< `Foo | `Bar ] (* match any polymorphic variant that is a
subset of [ `Foo | `Bar ], like [ `Foo ]. *)
type e = [> `Foo | `Bar ] (* match any polymorphic variant that has a
superset of [ `Foo | `Bar ] like [ `Foo | `Baz ] or even `Baz, since
without brackets `Baz has the type of all the possible variants. *)

So with this we can construct a simplified type hierarchy:

type llvalue = [ `Value ]
type llfunction = [ llvalue | `Function ]

Finally, we'll create a polymorphic type that holds the phantom type:

type 'a t

And then define functions that work on these types. Since we want a
function that can take an llvalue, we need to specify it like this:

val use_value: [> `Value ] t -> unit
val use_function: [> `Function ] t -> unit

For llvalue and llfunction it's obvious that it'll match, but won't
it
also match `Foo? Yes it will. We'll work around this by making the
type 't' abstract. Then, only our module can create values of it.
We'll provide helper functions to create llvalues and llfunctions,
which will preserve our invariants:

val make_value : unit -> llvalue t
val make_function : unit -> llfunction t


Here's the full example:

foo.mli:
type 'a t
type llvalue = [ `Value ]
type llfunction = [ llvalue | `Function ]

val use_value : [> `Value] t -> unit
val use_function : [> `Function] t -> unit

val make_value : unit -> llvalue t
val make_function : unit -> llfunction t


Finally here are some examples of uses that typecheck

Foo.use_value (Foo.make_value ())
Foo.use_value (Foo.make_function ())
Foo.use_function (Foo.make_function ())

And just to be sure, this fails:

Bar.use_function (Bar.make_value ())

with:
This expression has type Foo.llvalue Foo.t but is here used with type
  ([> `Function ] as 'a) Foo.t
Type Foo.llvalue = [ `Value ] is not compatible with type 'a
The first variant type does not allow tag(s) `Function


Does this sound like it could work, or am I missing something?

Jon Harrop

2008-Mar-15 14:09 UTC

head link

[LLVMdev] improving the ocaml binding's type safety

For all LLVM users out there: what kinds of errors do you spend the most time 
debugging?

On Saturday 15 March 2008 08:03:47 Erick Tryzelaar
wrote:> Does this sound like it could work, or am I missing something?
Excellent idea. You might also consider building a data structure on the OCaml 
side that is equivalent to the LLVM code:

  module Op = struct
    type int_op = [ `IAdd | ... ]
    type float_op = [ `FAdd | ... ]
    type bool_op = [ `ILe of int_op * int_op | ... ]
    type t = [ int_op | float_op | bool_op | ... ]
  end

Now you have int_op and bool_op as subtypes of Op.t.

Note that neither of these approaches can be as expressive as using GADTs 
which, I assume, is what the Haskellers have done. On the other hand, this 
will probably catch errors that people don't make anyway (hence my opening 
question).

There is an overhead in building these intermediate data structures but I 
expect it will be insignificant. You also have the advantage that you can 
print out the blocks that will be passed to LLVM.

There are other errors that you can catch using a static type system as well. 
Several ops can only appear either at the start (e.g. phi node) or end (e.g. 
unconditional branch) of a block. These should be taken out of the set of 
general ops and blocks should have different types of start and end:

  module Block = struct
    type start = [ `None | Phi of ... ]
    type end = [ `Return of Op.t | `Branch of Block.t ref ]
    type t = start * Op.t list * end
  end

Another practical advance I'd like to see is the LLVM bindings catching 
exceptions on the C++ side and reraising them in OCaml. At the moment you 
cannot get a stacktrace from an OCaml program that dies due to an exception 
being raised in LLVM which is a bit frustrating because you have to guess 
where the program died in the OCaml code.

Still, this is all good stuff and I believe OCaml and LLVM have a rosy future 
together. :-)

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

Gordon Henriksen

2008-Mar-15 15:49 UTC

head link

[LLVMdev] improving the ocaml binding's type safety

On 2008-03-15, at 10:09, Jon Harrop wrote:
> There is an overhead in building these intermediate data structures
Converting enums to tags, yes. Phantom types no.
> Several ops can only appear either at the start (e.g. phi node) or  
> end (e.g. unconditional branch) of a block. These should be taken  
> out of the set of general ops and blocks should have different types  
> of start and end:
>
>  module Block = struct
>    type start = [ `None | Phi of ... ]
>    type end = [ `Return of Op.t | `Branch of Block.t ref ]
>    type t = start * Op.t list * end
>  end
You're reinventing the object model here. There's no reason you  
shouldn't build your own IR separate from LLVM's, though. Just write  
an output algorithm to convert your IR into an llmodule using the  
bindings so you can interoperate with the LLVM infrastructure.
> Another practical advance I'd like to see is the LLVM bindings  
> catching exceptions on the C++ side and reraising them in OCaml. At  
> the moment you cannot get a stacktrace from an OCaml program that  
> dies due to an exception being raised in LLVM which is a bit  
> frustrating because you have to guess where the program died in the  
> OCaml code.
LLVM doesn't use exceptions. assert() failures just print to stderr  
and call abort() [= exit(1)], so they can't be caught. Even if they  
could, the stack couldn't be unwound (because the C++ exception tables  
are not emitted), so the program would leak memory and leave leave the  
heap in inconsistent state, inevitably leading to crashes. So this  
isn't going to happen.

I'm not disputing that this is a pain point, however. I wouldn't be  
opposed to patches which check preconditions in the ocaml bindings  
layer and raise exceptions before calling into the C bindings.

— Gordon

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080315/07696bd8/attachment.html>

Gordon Henriksen

2008-Mar-15 15:52 UTC

head link

[LLVMdev] improving the ocaml binding's type safety

Hi Erick,

On 2008-03-15, at 04:03, Erick Tryzelaar wrote:
> I was talking to Gordon on #llvm earlier, and he challenged me with  
> coming up with a way to improve the ocaml binding's type safety. I  
> think I got an easy solution with phantom types.
This could be a good step. I'm not sure I can predict all of the  
implications; I'd suggest you work up a proof of concept.

The Value hierarchy is very complicated; it will take much work to  
apply this technique there, so I would suggest that be a second step.  
A similar but smaller problem presents itself in the Type hierarchy,  
which is (relatively speaking) simple. You may be able to implement  
type safety for Type as a layer on top of the existing bindings, or at  
least on top of the existing glue; you should have sufficient  
flexibility to do so because we do expose the TypeID enum via  
classify_type.

In order to be useful, I think it will also be necessary to provide a  
new class of functions, a set of type checking helpers which play the  
same role as bool T::isa<U>(T*), U* T::cast<U>(T*), and U*  
T::dyn_cast<U>(T*):

(** @see [bool Type::isa<FunctionType>(Type*)] *)
val is_function_type : lltype t -> bool

(** @see [FunctionType *Type::dyn_cast<FunctionType>(Type*)]
     @raise Invalid_cast if the conversion is impossible. *)
val as_function_type : lltype t -> `FunctionTy t

For Value, it will be important to push the ISA routines through all  
of the layers (esp. for Value), but I think you can implement ISA  
routines in Ocaml relatively painlessly for Type.
> Here's the full example:
>
> foo.mli:
> type 'a t
> type llvalue = [ `Value ]
> type llfunction = [ llvalue | `Function ]
>
> val use_value : [> `Value] t -> unit
> val use_function : [> `Function] t -> unit
>
> val make_value : unit -> llvalue t
> val make_function : unit -> llfunction t
>
> Does this sound like it could work, or am I missing something?

I think your sum types are incorrect here. Value is abstract; I think  
you want to say something like:

type llglobal = [ `GlobalVariable | `Function | ... ]
type llconst = [ llglobal | `ConstantInt | ... ]
type llany = [ llconst | ... ]

With only the concrete classes given tags. (doxygen is your friend  
here.) Try mocking up the type system for the concrete types  
GlobalVariable, Function, ConstantFP and ConstantInt; the abstract  
types GlobalValue, Value and Constant; and the functions  
set_value_name, linkage, declare_global, set_initializer,  
declare_function, entry_block, param, const_float, const_int, and  
const_array. That should be a reasonable survey.
> We can't go letting haskell beat us now, can we?
To truly compete with Bryan's Haskell bindings in terms of type  
safety, you'll need to go beyond preventing the cast macros in the C  
bindings from asserting and actually take over some of the work  
performed by the verifier. This is very difficult to do correctly  
outside of toy programs. For instance, given const_add x y, there are  
several conditions that would be interesting for the type checker to  
verify:

(* Easy; is encoded in the C++ type system. *)
isa<Constant>(x) && isa<Constant>(y)
(* Harder; assertions in C++. *)
x->getType() == y->getType()
x->getType()->isIntOrIntVector() || x->getType()->isFPOrFPVector()

I was very impressed that Bryan made build_call statically type check!

Still, there are also some problems that I don't think are tractable.  
Consider param 1 f. Even if the type checker knows the type of f via  
something like `Function of ('a v -> 'b v -> 'c v) v, I
don't think
it's possible for the expression to have the correct type. Thought it  
might be possible with param0 f, param1 f, etc. And clearly params f  
can't have a type significantly more restricted than llvalue v array.

Speaking of which, Bryan, if you're still reading the list, do you  
have any plans for submitting your Haskell bindings to the project?
>
— Gordon

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080315/ea279b74/attachment.html>

Erick Tryzelaar

2008-Mar-15 21:49 UTC

head link

[LLVMdev] improving the ocaml binding's type safety

So just to compare, here are two different typesafe phantom type
implementations. One is bottom down, the other bottom up. This is an
example of the following functions:

string Value::getName()
bool Constant::isNull()
bool GlobalValue::isDeclaration()
bool GlobalVariable::isGlobalConstant()
bool Function::isVarArg()

Driver code:

val make_constant : unit -> llconstant t
val make_global_variable : unit -> llglobalvariable t
val make_function : unit -> llfunction t

(* typesafe *)
value_name (make_constant ());;
value_name (make_global_variable ());;
value_name (make_function ());;
is_null (make_constant ());;
is_declaration (make_global_variable ());;
is_declaration (make_function ());;
is_global_constant (make_global_variable ());;
is_var_arg (make_function ());;

(* type errors *)
is_null (make_global_variable ());;
is_null (make_function ());;
is_declaration (make_constant ());;
is_global_constant (make_constant ());;
is_var_arg (make_constant ());;
is_var_arg (make_global_variable ());;


Bottom up. The advantage of this one is that we'll always be typesafe
since we're saying the arguments must be a subset of the specified
variants, and thus all the variants are closed. The disadvantage is
that if we want to add another type we have to edit all the type
definitions. This also means that we can't add another library that
subclasses from something and still use these functions:

type 'a t
type llfunction = [ `Function ]
type llglobalvariable = [ `GlobalVariable ]
type llglobalvalue = [ llfunction | llglobalvariable ]
type llconstant = [ `Constant ]
type llvalue = [ llconstant | llglobalvalue ]

val value_name : [<  llvalue] t -> string
val is_null : [<  llconstant] t -> bool
val is_declaration : [< llglobalvalue] t -> bool
val is_global_constant : [< llglobalvariable] t -> bool
val is_var_arg : [< llfunction] t -> bool


The other way is top down. This lets us extend our types, but
sacrifices some type safety, as we're saying that the arguments are a
superset of the variants. We can control this by limiting who can
create 't's:

type 'a t

type llvalue = [ `Value ]
type llconstant = [ llvalue | `Constant ]
type llglobalvalue = [ llvalue | `GlobalValue ]
type llglobalvariable = [ llglobalvalue | `GlobalVariable ]
type llfunction = [ llglobalvalue | `Function ]

val value_name : [> `Value] t -> string
val is_null : [> `Constant] t -> bool
val is_declaration : [> `GlobalValue] t -> bool
val is_global_constant : [> `GlobalVariable] t -> bool
val is_var_arg : [> `Function] t -> bool


So what's better to use?

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Mar 2008 - [LLVMdev] improving the ocaml binding's type safety

[LLVMdev] improving the ocaml binding's type safety

[LLVMdev] improving the ocaml binding's type safety

[LLVMdev] improving the ocaml binding's type safety

[LLVMdev] improving the ocaml binding's type safety

[LLVMdev] improving the ocaml binding's type safety

Seemingly Similar Threads