> I think the problem is deeper than that, in that LLVM has no official > concept of a subtype, so I don't see how the idea of polymorphism > could be defined in it.Parametric polymorphism is different from subtype polymorphism; you can have one without the other. Parametric polymorphism just means that you can use type variables (like T) in the IR, which are later instantiated to actual types.>> max <T extends Comparable>(T a, T b) { >> if (a > b) return a; else return b; >> } >> > > Also, "Comparable" implies some kind of function associated with the > type in order to actually perform it...True, but I'm not worried about that; method tables are easy to add once type substitutions are allowed. Dropping down to a mythical low-level language that I call template C, we would have: struct ComparableDict<T> { bool (*equals)(T a, T b); bool (*leq)(T a, T b); bool (*geq)(T a, T b); } T max<T>(ComparableDict<T>* dict, T a, T b) { if ((*dict->geq)(a,b)) return a; else return b; } This mechanism duplicates the dictionary-passing used in Haskell type classes. The correct dictionary for any given type is inferred by the high-level language, but it is passed as an argument at the llvm level. Notice that there is no subtyping. The only things I need from llvm are: (1) The ability to define a parameterized type, e.g. ComparableDict<T>. (2) The ability to define a function that is parameterized by a type, e.g. max<T>. (3) The ability to substitute a type for a type variable, e.g. specialize max to max<int>. In order to get good performance, you will need to instantiate max with both a type T, and the dictionary for T. In other words, max must be partially evaluated with respect to a given type -and- a given dictionary. While this is obviously a complication, partial evaluation with respect to constant arguments is something that llvm is already quite capable of. If there is not a pass that does it already, I could easily write one. The problem is that I have no way of declaring a parameterized type or a parameterized function within the llvm IR. Note also that when I talk about ``types'', I am referring to low-level llvm types -- scalars, records, pointers, etc. I'm not referring to complex functional types with pattern matching as found in Haskell, or complex OOP classes as found in JVM or .NET with constructors, methods, and all that jazz. -DeLesley
On Wed, Feb 18, 2009 at 12:32, DeLesley SpamBox <delesley.spambox at googlemail.com> wrote:>> I think the problem is deeper than that, in that LLVM has no official >> concept of a subtype, so I don't see how the idea of polymorphism >> could be defined in it. > > Parametric polymorphism is different from subtype polymorphism; you > can have one without the other. Parametric polymorphism just means > that you can use type variables (like T) in the IR, which are later > instantiated > to actual types. >I was thinking of the "T extends Comparable" part, which does involve subtype polymorphism. Apologies if I'm getting terms mixed up.> > True, but I'm not worried about that; method tables are easy to add once type > substitutions are allowed. Dropping down to a mythical low-level language > that I call template C, we would have: > > struct ComparableDict<T> { > bool (*equals)(T a, T b); > bool (*leq)(T a, T b); > bool (*geq)(T a, T b); > } > > T max<T>(ComparableDict<T>* dict, T a, T b) { > if ((*dict->geq)(a,b)) return a; else return b; > } >What do the parametrized types give you that you don't get from using opaque instead of T? Type checking has already passed by the time it reaches this level, and you would simply add the appropriate bitcasts when generating the functions to be added to the dictionaries. You could even safely bitcast integral and floating-point values to pass them through the opaque to avoid boxing.> > In order to get good performance, you will need to instantiate max > with both a type > T, and the dictionary for T. In other words, max must be partially > evaluated with > respect to a given type -and- a given dictionary. While this is > obviously a complication, > partial evaluation with respect to constant arguments is something that llvm is > already quite capable of. If there is not a pass that does it > already, I could easily write > one. The problem is that I have no way of declaring a parameterized type or a > parameterized function within the llvm IR. >I wonder if the "instantiation" could be done instead as a normal pass taking advantage of a general call-site-context-sensitive inter-procedural points-to analysis. Getting rid of the indirection in the generic dictionary seems like the same problem as devirtualizing method calls, so a unified solution would be nice. On Wed, Feb 18, 2009 at 14:53, DeLesley Hutchins <delesley.spambox at googlemail.com> wrote:> > Moreover, specialization should really be done at the codegen level > in order to do it properly. C++ templates are a great example of > why *NOT* to do specialization within the high-level language. >But specialization (in the C++ template sense) is also a great example of why it's needed in the host language, as is overloading.> > The generated code for getSecond() needs to know the size of T in > order to calculate the correct offset into the record. However, > we still don't need to generate a bazillion specialized copies; > we can simply pass the size of T as an argument: > > void getSecond(int sizeT, void* pair, void* result) { > return memcpy(result, ((char*) pair) + sizeT, sizeT); > } >Of course, that only works for POD types, so you still need a different instantiation for std::string, std::vector, ... There's no possibility of implementing C++'s complicated lookup and resolution rules down at the LLVM level, and since you need the instantiations just to parse C++, using it as an example for why LLVM should do instantiation seems flawed. Once you restrict yourself to generics, you're only ever passing pointers around, which, as you said, is the easy case (relatively), since you don't need the type information at all once past the front-end. Thanks for putting up with the newbie, ~ Scott
> I was thinking of the "T extends Comparable" part, which does involve > subtype polymorphism. Apologies if I'm getting terms mixed up.It was a bad example -- not close enough to actual LLVM. :-)> What do the parametrized types give you that you don't get from using > opaque instead of T?Possibly nothing. I don't really understand the limitations of opaque. Is it possible to declare a structure type { int, opaque, int }, and then use getelementptr to retrieve the third element? I'm guessing not, because there's no way for the code generator to calculate the correct offset. If T is a type variable, then the type { int, T, int } should be valid. Eventually of course, the native code generator will need to know the size of T, but that shouldn't worry other optimization passes that are only dealing with the IR.> Type checking has already passed by the time it > reaches this level, and you would simply add the appropriate bitcasts > when generating the functions to be added to the dictionaries.Assuming that all values are 32-bit. Oops -- can't use doubles, long doubles, or compile to 64-bit architectures. Doh! ;-)> I wonder if the "instantiation" could be done instead as a normal passYes, and it should.> inter-procedural points-to analysis. Getting rid of the indirection > in the generic dictionary seems like the same problem as > devirtualizing method calls, so a unified solution would be nice.Devirtualization is actually a lot trickier, since it relies on whole program analysis and type information from the high-level language; we need to know that a class C has no subclasses in order to devirtualize its methods. Getting rid of the dictionary indirection is a simple matter of constant propagation and inlining; no type wizardry required. (This is one reason why I'm not a big fan of OOP type systems.)> But specialization (in the C++ template sense) is also a great example > of why it's needed in the host language, as is overloading.There's no reason why specialization -has- to be done by llvm. Idiotic languages like C++ can do it themselves, badly, if they want. ;-) I want llvm to do it for me because it can do a better job. ;-)>> void getSecond(int sizeT, void* pair, void* result) { >> return memcpy(result, ((char*) pair) + sizeT, sizeT); >> } > > Of course, that only works for POD types, so you still need a > different instantiation for std::string, std::vector, ...Every llvm type is a POD type. It's either a scalar, a vector, an array, a struct, or a pointer. Every one of those types has a fixed size. The C++ compiler is supposed to translate complicated thingies like std::string into a POD type that llvm can understand.> There's no possibility of implementing C++'s complicated lookup and > resolution rules down at the LLVM level,Why on earth would we want to do anything like C++ lookup? I tried writing a C++ parser once, and I think the Obama administration can easily use it as a geneva-convention-friendly alternative to waterboarding on suspected terrorists. :-)> Once you restrict yourself to generics, you're only ever passing > pointers around, which, as you said, is the easy case (relatively), > since you don't need the type information at all once past the > front-end.Yes, and Java generics are dog-slow because they can only use pointers. Try implementing a generic complex number class in Java, and watch the two-order-of-magnitude drop in performance on scientific code. -DeLesley