I'm a newcomer to llvm, but what you've done so far is very impressive.
Llvm is a godsend to anybody who is attempting to implement their own
their own language. :-) My company is considering using llvm as the
backend for a small matlab-like language for scientific computation; our
other option is MSIL.
After reading through the documentation, I noticed that llvm seems to
have one major limitation -- the lack of parametric polymorphism. I would
like to compile code such as the following:
max <T extends Comparable>(T a, T b) {
if (a > b) return a; else return b;
}
There are, of course, various ways to implement the above code. I could
compile the above function to a fully generic version with boxed arguments,
but that is very slow for scalar types. I could also take the C++ template
route, and generate different IR code for every type instantiation. However,
I have spent way too much time fighting with templates and code bloat to
like that idea.
I believe that type instantiation should ideally be handled by llvm, rather
than the high-level language. First of all, there are a lot of
optimization passes
that are type-invariant; it would be nice to be able to partially
optimize the code
before instantiation. Second, type substitution is very similar to many of the
other optimizations that llvm already does, such as inlining, constant
propagation, and so on. And third, I am planning to use llvm in JIT mode, and
it just makes more sense (to me) to instantiate such functions on demand, at
run-time.
Are there any plans to add such capability to llvm? I tried looking through the
list archives for any discussion, but the archives are not searchable (or I have
not figured out how to search them) so I didn't find much; feel free
to point me
to the proper place.
How difficult would it be to add such a capability to llvm? I was thinking of
marking type variables like T as opaque types for the initial codegen, and then
writing a custom pass that instantiates them to real types. However, I
don't
know if that would confuse or break other parts of the compiler infrastructure;
parametric polymorphism is not necessarily a trivial modification.
My personal background is in type theory; I received my doctorate from the
functional programming group at the University of Edinburgh. I love the fact
that llvm uses a typed assembly language, but the actual type system that
is currently used is pretty limited; it seems to be mostly a copy of C.
I know that compatibility with C is very important to llvm for obvious reasons,
but IMHO, the single biggest problem in making different languages talk to
one another is the type system. I'm not a fan of Microsoft's common
type
system because it's far too OOP-centric, but the basic idea is a good one.
I think llvm would really benefit from having a much stronger, but
still low-level
and language-neutral type theory; it would enable cross-platform multi-language
libraries to be developed in any language, and then distributed as llvm IR.
(The other major necessity is an accurate and high-performance garbage
collector, but the intrinsics for that are already in place.)
Is anyone on this list familiar with System F, System F_sub, or System
F^\omega_sub? They comprise the basic, standard theories of parametric
polymorphism used in the academic world, and have been around for about
20 years. You can obviously get more sophisticated, but the System-F series
of calculi have the advantage that they are simple, well-known, off-the-shelf
solutions. Pick one, plug it into llvm, and you have a type system that can
compete with the JVM or .NET in terms of functionality, without being OOP
centric or sacrificing language neutrality. (System F is low-level --
OOP can be
easily implemented on top of it).
-DeLesley
On Tue, Feb 17, 2009 at 12:26, DeLesley SpamBox <delesley.spambox at googlemail.com> wrote:> After reading through the documentation, I noticed that llvm seems to > have one major limitation -- the lack of parametric polymorphism.I think the problem is deeper than that, in that LLVM has no official concept of a subtype, so I don't see how the idea of polymorphism could be defined in it.> I would like to compile code such as the following: > > max <T extends Comparable>(T a, T b) { > if (a > b) return a; else return b; > } >Also, "Comparable" implies some kind of function associated with the type in order to actually perform it, and LLVM has no such association mechanism. I'd argue that structural typing means that it cannot, since I don't want std::pair<double, double>'s operator< to work on std::complex<double, double>, despite them having the same representation.> There are, of course, various ways to implement the above code. I could > compile the above function to a fully generic version with boxed arguments, > but that is very slow for scalar types. I could also take the C++ template > route, and generate different IR code for every type instantiation. However, > I have spent way too much time fighting with templates and code bloat to > like that idea. >I'd be curious to see if inlining + mem2reg would be able to automatically unbox that example, if you always generate auto-boxing code. Even for things too big to inline, there may be a general pass to convert the mallocs from boxing into allocas if nothing captures the argument pointers. I'm by no means an LLVM or compiler expert, though, so I could be quite wrong. ~ Scott
Hi DeLesley,
This is by design. LLVM's type system is very low-level; it doesn't
even have a concept of types as most languages reason about them. For
instance:
struct ColorRGB { float r, g, b; }
struct ColorHSV { float h, s, v; }
struct ColorHSV { float h, s, v; }
struct Point3D { float x, y, z; }
all become the same type { float, float, float } in LLVM IR, as you
know. Expecting it to directly support generics seems a third-order-of-
magnitude leap of faith. :) But there is good news for the faithful…
>
In regards instantiation on demand, you can do this today:
1. Create a facility such that you can determine which type-generic
function x specialized types to instantiate based upon a symbol name.
This could be a registry or a name mangling scheme.
2. Implement your own ModuleProvider, overriding materializeFunction
to perform specialization rather than loading the type from disk.
http://llvm.org/doxygen/classllvm_1_1ModuleProvider.html
You'll be in complete control of which insantiation, of course.
One could argue that LLVM could have much better support for type
genericity by simply allowing full use of abstract data types (those
containing opaque types) to be valid in IR, but not for codegen. In
general, this creates a second-class "abstract function." The
consequences of this would then need to ripple through the design. I
wouldn't expect the impact to be too high, for the most part; LLVM
already deals with objects of unknown size.
Still, there are a large number of potential foibles here. For
instance, passing an argument can require platform-specific
contortions to conform to the platform ABI, and these contortions
depend on information from the high-level (C) type which is not
recoverable from LLVM's structural types.
Specialization in this scheme would entail a modification of the
existing CloneFunction algorithm. The method
Type::refineAbstractTypeTo is not useful here, because it would
destroy the original type-generic template.
http://llvm.org/doxygen/namespacellvm.html#82ca1ea30b8e181ed30dc10bdd1bfbad
Instead of creating a literal copy, the algorithm would need to
inspect the type of each IR object, replacing abstract data types (as
LLVM already supports) with concrete ones as required by the
instantiation. It could also jump through platform ABI hoops at call
sites if required, but that would be quite complex.
But were I you, I wouldn't hold my breath waiting for someone else to
implement it for me. :)
The answer today is to generate IR anew for each specialization, using
the materializeFunction in a managed VM environment, or doing so
statically. Improved support for specialization would be an
interesting capability to add to LLVM's toolbox, but I would not
expect LLVM to fully internalize support for one particular
instantiation type system and scheme anytime in the foreseeable future.
As for code bloat, you could take .NET's compromise example and use
the same code instantiation (but different metadata in your VM) for
reference types, but create full specializations for value types.
Given your concerns, you clearly have strong ideas about how type
specialization should be implemented; why do you think having LLVM
make the decision for you internally would be better than making the
decision yourself, as you can do today?
I'll let others comment on the alternate type system. I wouldn't
expect this to happen, personally.
>
On 2009-02-17, at 12:26, DeLesley SpamBox wrote:
> I'm a newcomer to llvm, but what you've done so far is very
> impressive.
> Llvm is a godsend to anybody who is attempting to implement their own
> their own language. :-) My company is considering using llvm as the
> backend for a small matlab-like language for scientific computation;
> our
> other option is MSIL.
>
> After reading through the documentation, I noticed that llvm seems to
> have one major limitation -- the lack of parametric polymorphism.
> I would
> like to compile code such as the following:
>
> max <T extends Comparable>(T a, T b) {
> if (a > b) return a; else return b;
> }
>
> There are, of course, various ways to implement the above code. I
> could
> compile the above function to a fully generic version with boxed
> arguments,
> but that is very slow for scalar types. I could also take the C++
> template
> route, and generate different IR code for every type instantiation.
> However,
> I have spent way too much time fighting with templates and code
> bloat to
> like that idea.
> I believe that type instantiation should ideally be handled by llvm,
> rather
> than the high-level language.
> First of all, there are a lot of
> optimization passes
> that are type-invariant; it would be nice to be able to partially
> optimize the code
> before instantiation. Second, type substitution is very similar to
> many of the
> other optimizations that llvm already does, such as inlining, constant
> propagation, and so on.
> And third, I am planning to use llvm in JIT mode, and
> it just makes more sense (to me) to instantiate such functions on
> demand, at
> run-time.
> Are there any plans to add such capability to llvm? I tried looking
> through the
> list archives for any discussion, but the archives are not
> searchable (or I have
> not figured out how to search them) so I didn't find much; feel free
> to point me
> to the proper place.
>
> How difficult would it be to add such a capability to llvm? I was
> thinking of
> marking type variables like T as opaque types for the initial
> codegen, and then
> writing a custom pass that instantiates them to real types.
> However, I don't
> know if that would confuse or break other parts of the compiler
> infrastructure;
> parametric polymorphism is not necessarily a trivial modification.
>
> My personal background is in type theory; I received my doctorate
> from the
> functional programming group at the University of Edinburgh. I love
> the fact
> that llvm uses a typed assembly language, but the actual type system
> that
> is currently used is pretty limited; it seems to be mostly a copy of
> C.
>
> I know that compatibility with C is very important to llvm for
> obvious reasons,
> but IMHO, the single biggest problem in making different languages
> talk to
> one another is the type system. I'm not a fan of Microsoft's
common
> type
> system because it's far too OOP-centric, but the basic idea is a
> good one.
> I think llvm would really benefit from having a much stronger, but
> still low-level
> and language-neutral type theory; it would enable cross-platform
> multi-language
> libraries to be developed in any language, and then distributed as
> llvm IR.
> (The other major necessity is an accurate and high-performance garbage
> collector, but the intrinsics for that are already in place.)
>
> Is anyone on this list familiar with System F, System F_sub, or System
> F^\omega_sub? They comprise the basic, standard theories of
> parametric
> polymorphism used in the academic world, and have been around for
> about
> 20 years. You can obviously get more sophisticated, but the System-
> F series
> of calculi have the advantage that they are simple, well-known, off-
> the-shelf
> solutions. Pick one, plug it into llvm, and you have a type system
> that can
> compete with the JVM or .NET in terms of functionality, without
> being OOP
> centric or sacrificing language neutrality. (System F is low-level --
> OOP can be
> easily implemented on top of it).
— Gordon
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090218/cf773f85/attachment.html>
On Tuesday 17 February 2009 17:26:13 DeLesley SpamBox wrote:> I'm a newcomer to llvm, but what you've done so far is very impressive. > Llvm is a godsend to anybody who is attempting to implement their own > their own language. :-) My company is considering using llvm as the > backend for a small matlab-like language for scientific computation;Very interesting. My company is prototyping a high-level VM built upon LLVM that is also designed for scientific computing.> our other option is MSIL.We decided not to target MSIL because it will be impossible to compete with Microsoft's F#. Building upon LLVM offers *much* better performance and platform independence. If the start of this year has been anything to go by, LLVM also has much better commercial potential and I believe we could be earning more revenue from it than from .NET in 12 months time.> After reading through the documentation, I noticed that llvm seems to > have one major limitation -- the lack of parametric polymorphism. I would > like to compile code such as the following: > > max <T extends Comparable>(T a, T b) { > if (a > b) return a; else return b; > } > > There are, of course, various ways to implement the above code. I could > compile the above function to a fully generic version with boxed arguments, > but that is very slow for scalar types. I could also take the C++ template > route, and generate different IR code for every type instantiation. > However, I have spent way too much time fighting with templates and code > bloat to like that idea.I am using the latter approach and it is easy to implement and works well so far, although our test base in tiny so bloat is not a problem.> I believe that type instantiation should ideally be handled by llvm, rather > than the high-level language. First of all, there are a lot of > optimization passes > that are type-invariant; it would be nice to be able to partially > optimize the code > before instantiation. Second, type substitution is very similar to many of > the other optimizations that llvm already does, such as inlining, constant > propagation, and so on. And third, I am planning to use llvm in JIT mode, > and it just makes more sense (to me) to instantiate such functions on > demand, at run-time.Excellent idea.> Are there any plans to add such capability to llvm?I do not believe so.> How difficult would it be to add such a capability to llvm? I was thinking > of marking type variables like T as opaque types for the initial codegen, > and then writing a custom pass that instantiates them to real types. > However, I don't know if that would confuse or break other parts of the > compiler infrastructure; parametric polymorphism is not necessarily a > trivial modification.What complications do you forsee?> My personal background is in type theory; I received my doctorate from the > functional programming group at the University of Edinburgh. I love the > fact that llvm uses a typed assembly language, but the actual type system > that is currently used is pretty limited; it seems to be mostly a copy of > C.Yes. LLVM has augmented the functionality required for C with some low-level but critical features like tail call elimination but it does not implement any of the higher-level features you may have expected. After all, it is the LLvm. ;-) LLVM is ideal for building HLVMs and CLRs though.> I know that compatibility with C is very important to llvm for obvious > reasons, but IMHO, the single biggest problem in making different languages > talk to one another is the type system. I'm not a fan of Microsoft's > common type system because it's far too OOP-centric, but the basic idea is > a good one.Absolutely.> I think llvm would really benefit from having a much stronger, > but still low-level > and language-neutral type theory; it would enable cross-platform > multi-language libraries to be developed in any language, and then > distributed as llvm IR.The prospect of higher-level VMs is certainly the single most compelling aspect of LLVM for me (and many other people, I believe) but I am not convinced that such functionality deserves a place in LLVM itself. You can implement parametric polymorphism easily on top of LLVM today. LLVM's optimizations will be run (probably redundantly) on multiple instantiations but the HLVM will also be rewriting code, e.g. to interoperate with the GC and to perform optimizations like instantiating higher-order functions for a given function argument. Rather than trying to push high-level type system features into the LLVM I would opt for an HLVM that provided not only a decent type system but also garbage collection, reflection and any other high-level features of interest. I would also prioritize getting HLVM 1.0 shipped over the technical aspects of integration otherwise you are likely to end up with nothing.> (The other major necessity is an accurate and > high-performance garbage collector, but the intrinsics for that are already > in place.)I heard that LLVM's GC API is immature and largely untested so I chose to assume an uncooperative environment instead.> Is anyone on this list familiar with System F, System F_sub, or System > F^\omega_sub?I am only vaguely aware of System F because I use MLs extensively and HM is a relative.> They comprise the basic, standard theories of parametric > polymorphism used in the academic world, and have been around for about > 20 years. You can obviously get more sophisticated, but the System-F > series of calculi have the advantage that they are simple, well-known, > off-the-shelf solutions. Pick one, plug it into llvm, and you have a type > system that can compete with the JVM or .NET in terms of functionality, > without being OOP centric or sacrificing language neutrality. (System F is > low-level -- OOP can be > easily implemented on top of it).Sounds like a dream come true. Where's the catch? ;-) -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
Yes, I'm familiar with System F and its siblings. It's not as easy as just plugging that into LLVM and you get something good. Plugging them together would take hard work, and then you have to do specialization at the LLVM level anyway to get good code. I don't think the LLVM should have polymorphism, like Jon Harrop I think that should be dealt with at a higher level. -- Lennart On Tue, Feb 17, 2009 at 5:26 PM, DeLesley SpamBox <delesley.spambox at googlemail.com> wrote:> Is anyone on this list familiar with System F, System F_sub, or System > F^\omega_sub? They comprise the basic, standard theories of parametric > polymorphism used in the academic world, and have been around for about > 20 years. You can obviously get more sophisticated, but the System-F series > of calculi have the advantage that they are simple, well-known, off-the-shelf > solutions. Pick one, plug it into llvm, and you have a type system that can > compete with the JVM or .NET in terms of functionality, without being OOP > centric or sacrificing language neutrality. (System F is low-level -- > OOP can be > easily implemented on top of it).