I was talking to Gordon on #llvm earlier, and he challenged me with coming up with a way to improve the ocaml binding's type safety. We can't go letting haskell beat us now, can we? I think I got an easy solution with phantom types. For those who don't know what the problem is, the ocaml bindings share one type between whole class branches (like values). This means we need to downcast and assert to protect against, say, trying to get the parameter list of a constant. We could try to use ocaml objects to model the c++ classes, but then we run into issues with memory management and complexity. Phantom types are extra types that help with type checking, but are thrown away afterwards. This lets us create tags to distinguish between values and functions without any overhead. When we combine these with polymorphic variants, we get something that's very similar to inheritance. Here's an example. Polymorphic variants are similar to regular variants, but the names can be applied to anything. So, you can have: let a = `Foo 2 let b = `Foo "bar" And it's not a type error. You specify the type like this: type a = [ `Foo ] (* specify a type that can only be a `Foo. *) type b = [ `Foo | `Bar ] (* type can either be `Foo or `Bar. *) type c = [ a | `Bar ] (* equivalent to b. *) type d = [< `Foo | `Bar ] (* match any polymorphic variant that is a subset of [ `Foo | `Bar ], like [ `Foo ]. *) type e = [> `Foo | `Bar ] (* match any polymorphic variant that has a superset of [ `Foo | `Bar ] like [ `Foo | `Baz ] or even `Baz, since without brackets `Baz has the type of all the possible variants. *) So with this we can construct a simplified type hierarchy: type llvalue = [ `Value ] type llfunction = [ llvalue | `Function ] Finally, we'll create a polymorphic type that holds the phantom type: type 'a t And then define functions that work on these types. Since we want a function that can take an llvalue, we need to specify it like this: val use_value: [> `Value ] t -> unit val use_function: [> `Function ] t -> unit For llvalue and llfunction it's obvious that it'll match, but won't it also match `Foo? Yes it will. We'll work around this by making the type 't' abstract. Then, only our module can create values of it. We'll provide helper functions to create llvalues and llfunctions, which will preserve our invariants: val make_value : unit -> llvalue t val make_function : unit -> llfunction t Here's the full example: foo.mli: type 'a t type llvalue = [ `Value ] type llfunction = [ llvalue | `Function ] val use_value : [> `Value] t -> unit val use_function : [> `Function] t -> unit val make_value : unit -> llvalue t val make_function : unit -> llfunction t Finally here are some examples of uses that typecheck Foo.use_value (Foo.make_value ()) Foo.use_value (Foo.make_function ()) Foo.use_function (Foo.make_function ()) And just to be sure, this fails: Bar.use_function (Bar.make_value ()) with: This expression has type Foo.llvalue Foo.t but is here used with type ([> `Function ] as 'a) Foo.t Type Foo.llvalue = [ `Value ] is not compatible with type 'a The first variant type does not allow tag(s) `Function Does this sound like it could work, or am I missing something?
For all LLVM users out there: what kinds of errors do you spend the most time debugging? On Saturday 15 March 2008 08:03:47 Erick Tryzelaar wrote:> Does this sound like it could work, or am I missing something?Excellent idea. You might also consider building a data structure on the OCaml side that is equivalent to the LLVM code: module Op = struct type int_op = [ `IAdd | ... ] type float_op = [ `FAdd | ... ] type bool_op = [ `ILe of int_op * int_op | ... ] type t = [ int_op | float_op | bool_op | ... ] end Now you have int_op and bool_op as subtypes of Op.t. Note that neither of these approaches can be as expressive as using GADTs which, I assume, is what the Haskellers have done. On the other hand, this will probably catch errors that people don't make anyway (hence my opening question). There is an overhead in building these intermediate data structures but I expect it will be insignificant. You also have the advantage that you can print out the blocks that will be passed to LLVM. There are other errors that you can catch using a static type system as well. Several ops can only appear either at the start (e.g. phi node) or end (e.g. unconditional branch) of a block. These should be taken out of the set of general ops and blocks should have different types of start and end: module Block = struct type start = [ `None | Phi of ... ] type end = [ `Return of Op.t | `Branch of Block.t ref ] type t = start * Op.t list * end end Another practical advance I'd like to see is the LLVM bindings catching exceptions on the C++ side and reraising them in OCaml. At the moment you cannot get a stacktrace from an OCaml program that dies due to an exception being raised in LLVM which is a bit frustrating because you have to guess where the program died in the OCaml code. Still, this is all good stuff and I believe OCaml and LLVM have a rosy future together. :-) -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?e
Gordon Henriksen
2008-Mar-15 15:49 UTC
[LLVMdev] improving the ocaml binding's type safety
On 2008-03-15, at 10:09, Jon Harrop wrote:> There is an overhead in building these intermediate data structuresConverting enums to tags, yes. Phantom types no.> Several ops can only appear either at the start (e.g. phi node) or > end (e.g. unconditional branch) of a block. These should be taken > out of the set of general ops and blocks should have different types > of start and end: > > module Block = struct > type start = [ `None | Phi of ... ] > type end = [ `Return of Op.t | `Branch of Block.t ref ] > type t = start * Op.t list * end > endYou're reinventing the object model here. There's no reason you shouldn't build your own IR separate from LLVM's, though. Just write an output algorithm to convert your IR into an llmodule using the bindings so you can interoperate with the LLVM infrastructure.> Another practical advance I'd like to see is the LLVM bindings > catching exceptions on the C++ side and reraising them in OCaml. At > the moment you cannot get a stacktrace from an OCaml program that > dies due to an exception being raised in LLVM which is a bit > frustrating because you have to guess where the program died in the > OCaml code.LLVM doesn't use exceptions. assert() failures just print to stderr and call abort() [= exit(1)], so they can't be caught. Even if they could, the stack couldn't be unwound (because the C++ exception tables are not emitted), so the program would leak memory and leave leave the heap in inconsistent state, inevitably leading to crashes. So this isn't going to happen. I'm not disputing that this is a pain point, however. I wouldn't be opposed to patches which check preconditions in the ocaml bindings layer and raise exceptions before calling into the C bindings. — Gordon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080315/07696bd8/attachment.html>
Gordon Henriksen
2008-Mar-15 15:52 UTC
[LLVMdev] improving the ocaml binding's type safety
Hi Erick, On 2008-03-15, at 04:03, Erick Tryzelaar wrote:> I was talking to Gordon on #llvm earlier, and he challenged me with > coming up with a way to improve the ocaml binding's type safety. I > think I got an easy solution with phantom types.This could be a good step. I'm not sure I can predict all of the implications; I'd suggest you work up a proof of concept. The Value hierarchy is very complicated; it will take much work to apply this technique there, so I would suggest that be a second step. A similar but smaller problem presents itself in the Type hierarchy, which is (relatively speaking) simple. You may be able to implement type safety for Type as a layer on top of the existing bindings, or at least on top of the existing glue; you should have sufficient flexibility to do so because we do expose the TypeID enum via classify_type. In order to be useful, I think it will also be necessary to provide a new class of functions, a set of type checking helpers which play the same role as bool T::isa<U>(T*), U* T::cast<U>(T*), and U* T::dyn_cast<U>(T*): (** @see [bool Type::isa<FunctionType>(Type*)] *) val is_function_type : lltype t -> bool (** @see [FunctionType *Type::dyn_cast<FunctionType>(Type*)] @raise Invalid_cast if the conversion is impossible. *) val as_function_type : lltype t -> `FunctionTy t For Value, it will be important to push the ISA routines through all of the layers (esp. for Value), but I think you can implement ISA routines in Ocaml relatively painlessly for Type.> Here's the full example: > > foo.mli: > type 'a t > type llvalue = [ `Value ] > type llfunction = [ llvalue | `Function ] > > val use_value : [> `Value] t -> unit > val use_function : [> `Function] t -> unit > > val make_value : unit -> llvalue t > val make_function : unit -> llfunction t > > Does this sound like it could work, or am I missing something?I think your sum types are incorrect here. Value is abstract; I think you want to say something like: type llglobal = [ `GlobalVariable | `Function | ... ] type llconst = [ llglobal | `ConstantInt | ... ] type llany = [ llconst | ... ] With only the concrete classes given tags. (doxygen is your friend here.) Try mocking up the type system for the concrete types GlobalVariable, Function, ConstantFP and ConstantInt; the abstract types GlobalValue, Value and Constant; and the functions set_value_name, linkage, declare_global, set_initializer, declare_function, entry_block, param, const_float, const_int, and const_array. That should be a reasonable survey.> We can't go letting haskell beat us now, can we?To truly compete with Bryan's Haskell bindings in terms of type safety, you'll need to go beyond preventing the cast macros in the C bindings from asserting and actually take over some of the work performed by the verifier. This is very difficult to do correctly outside of toy programs. For instance, given const_add x y, there are several conditions that would be interesting for the type checker to verify: (* Easy; is encoded in the C++ type system. *) isa<Constant>(x) && isa<Constant>(y) (* Harder; assertions in C++. *) x->getType() == y->getType() x->getType()->isIntOrIntVector() || x->getType()->isFPOrFPVector() I was very impressed that Bryan made build_call statically type check! Still, there are also some problems that I don't think are tractable. Consider param 1 f. Even if the type checker knows the type of f via something like `Function of ('a v -> 'b v -> 'c v) v, I don't think it's possible for the expression to have the correct type. Thought it might be possible with param0 f, param1 f, etc. And clearly params f can't have a type significantly more restricted than llvalue v array. Speaking of which, Bryan, if you're still reading the list, do you have any plans for submitting your Haskell bindings to the project?>— Gordon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080315/ea279b74/attachment.html>
So just to compare, here are two different typesafe phantom type implementations. One is bottom down, the other bottom up. This is an example of the following functions: string Value::getName() bool Constant::isNull() bool GlobalValue::isDeclaration() bool GlobalVariable::isGlobalConstant() bool Function::isVarArg() Driver code: val make_constant : unit -> llconstant t val make_global_variable : unit -> llglobalvariable t val make_function : unit -> llfunction t (* typesafe *) value_name (make_constant ());; value_name (make_global_variable ());; value_name (make_function ());; is_null (make_constant ());; is_declaration (make_global_variable ());; is_declaration (make_function ());; is_global_constant (make_global_variable ());; is_var_arg (make_function ());; (* type errors *) is_null (make_global_variable ());; is_null (make_function ());; is_declaration (make_constant ());; is_global_constant (make_constant ());; is_var_arg (make_constant ());; is_var_arg (make_global_variable ());; Bottom up. The advantage of this one is that we'll always be typesafe since we're saying the arguments must be a subset of the specified variants, and thus all the variants are closed. The disadvantage is that if we want to add another type we have to edit all the type definitions. This also means that we can't add another library that subclasses from something and still use these functions: type 'a t type llfunction = [ `Function ] type llglobalvariable = [ `GlobalVariable ] type llglobalvalue = [ llfunction | llglobalvariable ] type llconstant = [ `Constant ] type llvalue = [ llconstant | llglobalvalue ] val value_name : [< llvalue] t -> string val is_null : [< llconstant] t -> bool val is_declaration : [< llglobalvalue] t -> bool val is_global_constant : [< llglobalvariable] t -> bool val is_var_arg : [< llfunction] t -> bool The other way is top down. This lets us extend our types, but sacrifices some type safety, as we're saying that the arguments are a superset of the variants. We can control this by limiting who can create 't's: type 'a t type llvalue = [ `Value ] type llconstant = [ llvalue | `Constant ] type llglobalvalue = [ llvalue | `GlobalValue ] type llglobalvariable = [ llglobalvalue | `GlobalVariable ] type llfunction = [ llglobalvalue | `Function ] val value_name : [> `Value] t -> string val is_null : [> `Constant] t -> bool val is_declaration : [> `GlobalValue] t -> bool val is_global_constant : [> `GlobalVariable] t -> bool val is_var_arg : [> `Function] t -> bool So what's better to use?