On Nov 26, 2007, at 14:18, Jon Harrop wrote:> On Monday 26 November 2007 16:21, Gordon Henriksen wrote: >> > >> Unfortunately, even if the bindings were more strongly typed, it >> would still be structurally possible to build invalid LLVM code, so >> you've just got to take care not to violate the invariants, then >> use the verifier as a cross-check. > > I suspect the OCaml bindings to LLVM could express more of these > constraints in the static type system. Has anyone tried to leverage > this?Any specific ideas?>>> I assume I need a real type system that can handle both ints and >>> function pointers? >> > > As I understand it, the simplest approach is to box all values, > which means that every value is stored as a struct containing the > type (an enum) and the value itself (inline if it is no bigger than > a pointer or as a pointer to a larger data structure otherwise). > > I'd rather box everything rather than tag ints to start with. I'll > relegate that to a potential optimization.You could go about it that way, sure. Memory pressure will be very high.> What of this can LLVM's optimizer optimize away for me?Not much. LLVM won't really change your memory layout. The features which did perform dramatic memory reorganization were excised from the source tree due to patent infringement issues. (They remain in the repository on a branch for UIUC research.) These were also targeted toward manual memory management. It should be possible for LLVM to perform intra- or interprocedural escape analysis and lower heap allocations to stack allocations, but I don't believe this is implemented. Garbage collection would also complicate it, since LLVM would need to recognize GC allocator functions as such. This would be an excellent project if you are interested in contributing to LLVM itself.> So I have to work out how to generate IR that handles those data > structures.Should be straightforward. Read the getelementptr FAQ. Unions are handled with pointer bitcasts. http://llvm.org/docs/GetElementPtr.html — Gordon
On Monday 26 November 2007 20:05, Gordon Henriksen wrote:> On Nov 26, 2007, at 14:18, Jon Harrop wrote: > > On Monday 26 November 2007 16:21, Gordon Henriksen wrote: > >> Unfortunately, even if the bindings were more strongly typed, it > >> would still be structurally possible to build invalid LLVM code, so > >> you've just got to take care not to violate the invariants, then > >> use the verifier as a cross-check. > > > > I suspect the OCaml bindings to LLVM could express more of these > > constraints in the static type system. Has anyone tried to leverage > > this? > > Any specific ideas?Provide a type enumerating the valid terminators and restrict the last instruction in a block to be a terminator. Something like this: type terminator = [ `ret of llvalue | `br of llvalue ] type instruction [ terminator | `add of llvalue * llvalue | `sub of llvalue * llvalue ] type block = instruction list * terminator If you want to avoid having a data structure representing the input to LLVM then you can probably achieve the same result using combinators, e.g. by having the building functions for terminators change an [`incomplete] block into a [`complete] block. However, that might make the bindings harder to use. Use phantom types to track the type of each llvalue: type 'a llvalue This could prevent my OCaml code with a function pointer error from compiling. For example, the "define_function" function would return a value of the type: [ `function ] llvalue and the build_call function would require a value of that type, so you could not accidentally pass it an int. I would use polymorphic variants more, particularly for enums and types that are only used once (e.g. "linkage" and "visibility"). So types would be `i32 rather than i32_type and int_predicate and real_predicate would become overlapping sum types, e.g. `ugt is valid for both. I'd also rather see structuring than identifier bloat, e.g.: module Linkage = struct type linkage [ `External | `Link_once | `Weak | `Appending | `Internal | `Dllimport | `Dllexport | `External_weak | `Ghost ] external linkage : llvalue -> linkage = "llvm_linkage" end> > I'd rather box everything rather than tag ints to start with. I'll > > relegate that to a potential optimization. > > You could go about it that way, sure. Memory pressure will be very high.Let's ignore that for now and get something up and running. I think I can unbox within expressions easily enough.> > What of this can LLVM's optimizer optimize away for me? > > Not much. LLVM won't really change your memory layout. The features > which did perform dramatic memory reorganization were excised from the > source tree due to patent infringement issues. (They remain in the > repository on a branch for UIUC research.) These were also targeted > toward manual memory management.That's a shame. I'm not sure how this works but presumably distros are free to add that functionality back in provided they are not in geographical areas affected by software patents?> It should be possible for LLVM to perform intra- or interprocedural > escape analysis and lower heap allocations to stack allocations, but I > don't believe this is implemented. Garbage collection would also > complicate it, since LLVM would need to recognize GC allocator > functions as such. This would be an excellent project if you are > interested in contributing to LLVM itself.Perhaps such optimizations would be better done at a higher-level, above the level of the GC? So that would be ideal for a higher-level VM on top of LLVM.> > So I have to work out how to generate IR that handles those data > > structures. > > Should be straightforward. Read the getelementptr FAQ. Unions are > handled with pointer bitcasts. > > http://llvm.org/docs/GetElementPtr.htmlBrilliant, thanks. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?e
On 2007-11-26, at 21:12, Jon Harrop wrote:> Provide a type enumerating the valid terminators and restrict the > last instruction in a block to be a terminator. Something like this: > > type terminator = [ `ret of llvalue | `br of llvalue ] > type instruction > [ terminator > | `add of llvalue * llvalue > | `sub of llvalue * llvalue ] > type block = instruction list * terminator > > If you want to avoid having a data structure representing the input > to LLVM then you can probably achieve the same result using > combinators, e.g. by having the building functions for terminators > change an [`incomplete] block into a [`complete] block. However, > that might make the bindings harder to use.Intermediate states are important, though they might be invalid. This is inherited from the underlying object model. I don't think validity is a concept that can effectively be grafted in through the type system of a binding.> Use phantom types to track the type of each llvalue: > > type 'a llvalueThis does not cover the full generality of the IR. The first argument to a call instruction need not be a Function. Rather, the type of the value must be pointer to function. Still, those phantom types may be a solution for binding the Value hierarchy without introducing gratuitous static casts. (The argument to set_visibility must be a GlobalValue, for instance.) Can you represent multiple-level inheritance? Value -> GlobalValue -> GlobalVariable, say.> I would use polymorphic variants more, particularly for enums and > types that are only used once (e.g. "linkage" and "visibility"). So > types would be `i32 rather than i32_typeTypes are not enums, they're first-class objects.> and int_predicate and real_predicate would become overlapping sum > types, e.g. `ugt is valid for both.These variant types were set up to have a 1:1 correspondence with the C ++ enums, and I'd prefer to keep that. There's also no overlap for integer and FP predicates (unsigned greater than is not unordered greater than).> I'd also rather see structuring than identifier bloat, e.g.: > > module Linkage = struct > type linkage > [ `External > | ... > | `Ghost ] > > external linkage : llvalue -> linkage = "llvm_linkage" > endThis is a fair idea for grouping enums.>> It should be possible for LLVM to perform intra- or interprocedural >> escape analysis and lower heap allocations to stack allocations, >> but I don't believe this is implemented. Garbage collection would >> also complicate it, since LLVM would need to recognize GC allocator >> functions as such. This would be an excellent project if you are >> interested in contributing to LLVM itself. > > Perhaps such optimizations would be better done at a higher-level, > above the level of the GC? So that would be ideal for a higher-level > VM on top of LLVM.Escape analysis is perfectly practical on the LLVM representation. Reorganizing data structures is probably best done by the language front end. A functional language is the ideal host for such experiments. The closest LLVM does is SROA. — Gordon
Hi, Gordon Henriksen wrote:> > What of this can LLVM's optimizer optimize away for me? > > Not much. LLVM won't really change your memory layout. The features > which did perform dramatic memory reorganization were excised from the > source tree due to patent infringement issues. (They remain in the > repository on a branch for UIUC research.) These were also targeted > toward manual memory management.Concern about software patents is a shame. Do those behind LLVM support the patent holders in this case, or are they just concerned about being hassled over infringement. If the latter, could an approach be made to one of the anti-software patent organisations to see if prior art can be found, or else look at moving the software outside of the scope of the applicable patent law? Cheers, Ralph.
On Wed, 28 Nov 2007, Ralph Corderoy wrote:> Concern about software patents is a shame. Do those behind LLVM support > the patent holders in this case, or are they just concerned about being > hassled over infringement. If the latter, could an approach be made to > one of the anti-software patent organisations to see if prior art can be > found, or else look at moving the software outside of the scope of the > applicable patent law?The patents in question are based on unification points-to analysis held by Microsoft. Working around them is easy, just don't use those techniques. Unfortunately, DSA is designed and built around them, so it can't be "simply modified" to not use them. Also, as far as software patents go, they are fairly reasonable. I don't think there was any prior art in that area. -Chris -- http://nondot.org/sabre/ http://llvm.org/