thr3ads.net - llvm dev - [LLVMdev] Parametric polymorphism [Feb 2009]

If this information is useful, please help other people find it:
Share via:

DeLesley SpamBox

2009-Feb-18 17:32 UTC

[LLVMdev] Parametric polymorphism

> I think the problem is deeper than that, in that LLVM has no official
> concept of a subtype, so I don't see how the idea of polymorphism
> could be defined in it.
Parametric polymorphism is different from subtype polymorphism; you
can have one without the other.  Parametric polymorphism just means
that you can use type variables (like T) in the IR, which are later
instantiated
to actual types.
>> max <T extends Comparable>(T a, T b) {
>>  if (a > b) return a; else return b;
>> }
>>
>
> Also, "Comparable" implies some kind of function associated with
the
> type in order to actually perform it...
True, but I'm not worried about that; method tables are easy to add once
type
substitutions are allowed.  Dropping down to a mythical low-level language
that I call template C, we would have:

  struct ComparableDict<T> {
    bool (*equals)(T a, T b);
    bool (*leq)(T a, T b);
    bool (*geq)(T a, T b);
  }

  T max<T>(ComparableDict<T>* dict, T a, T b) {
    if ((*dict->geq)(a,b)) return a; else return b;
  }

This mechanism duplicates the dictionary-passing used in Haskell type classes.
The correct dictionary for any given type is inferred by the
high-level language, but
it is passed as an argument at the llvm level.

Notice that there is no subtyping.  The only things I need from llvm are:

(1) The ability to define a parameterized type, e.g. ComparableDict<T>.
(2) The ability to define a function that is parameterized by a type,
e.g. max<T>.
(3) The ability to substitute a type for a type variable, e.g. specialize max
     to max<int>.

In order to get good performance, you will need to instantiate max
with both a type
T, and the dictionary for T.  In other words, max must be partially
evaluated with
respect to a given type -and- a given dictionary.  While this is
obviously a complication,
partial evaluation with respect to constant arguments is something that llvm is
already quite capable of.  If there is not a pass that does it
already, I could easily write
one.  The problem is that I have no way of declaring a parameterized type or a
parameterized function within the llvm IR.

Note also that when I talk about ``types'', I am referring to
low-level llvm types --
scalars, records, pointers, etc.  I'm not referring to complex
functional types with pattern
matching as found in Haskell, or complex OOP classes as found in JVM
or .NET with
constructors, methods, and all that jazz.

  -DeLesley

me22

2009-Feb-18 20:32 UTC

head link

[LLVMdev] Parametric polymorphism

On Wed, Feb 18, 2009 at 12:32, DeLesley SpamBox
<delesley.spambox at googlemail.com> wrote:>> I think the problem is deeper than that, in that LLVM has no official
>> concept of a subtype, so I don't see how the idea of polymorphism
>> could be defined in it.
>
> Parametric polymorphism is different from subtype polymorphism; you
> can have one without the other.  Parametric polymorphism just means
> that you can use type variables (like T) in the IR, which are later
> instantiated
> to actual types.
>
I was thinking of the "T extends Comparable" part, which does involve
subtype polymorphism.  Apologies if I'm getting terms mixed up.
>
> True, but I'm not worried about that; method tables are easy to add
once type
> substitutions are allowed.  Dropping down to a mythical low-level language
> that I call template C, we would have:
>
>  struct ComparableDict<T> {
>    bool (*equals)(T a, T b);
>    bool (*leq)(T a, T b);
>    bool (*geq)(T a, T b);
>  }
>
>  T max<T>(ComparableDict<T>* dict, T a, T b) {
>    if ((*dict->geq)(a,b)) return a; else return b;
>  }
>
What do the parametrized types give you that you don't get from using
opaque instead of T?  Type checking has already passed by the time it
reaches this level, and you would simply add the appropriate bitcasts
when generating the functions to be added to the dictionaries.  You
could even safely bitcast integral and floating-point values to pass
them through the opaque to avoid boxing.
>
> In order to get good performance, you will need to instantiate max
> with both a type
> T, and the dictionary for T.  In other words, max must be partially
> evaluated with
> respect to a given type -and- a given dictionary.  While this is
> obviously a complication,
> partial evaluation with respect to constant arguments is something that
llvm is
> already quite capable of.  If there is not a pass that does it
> already, I could easily write
> one.  The problem is that I have no way of declaring a parameterized type
or a
> parameterized function within the llvm IR.
>
I wonder if the "instantiation" could be done instead as a normal pass
taking advantage of a general call-site-context-sensitive
inter-procedural points-to analysis.  Getting rid of the indirection
in the generic dictionary seems like the same problem as
devirtualizing method calls, so a unified solution would be nice.

On Wed, Feb 18, 2009 at 14:53, DeLesley Hutchins
<delesley.spambox at googlemail.com> wrote:>
> Moreover, specialization should really be done at the codegen level
> in order to do it properly.  C++ templates are a great example of
> why *NOT* to do specialization within the high-level language.
>
But specialization (in the C++ template sense) is also a great example
of why it's needed in the host language, as is overloading.
>
> The generated code for getSecond() needs to know the size of T in
> order to calculate the correct offset into the record.  However,
> we still don't need to generate a bazillion specialized copies;
> we can simply pass the size of T as an argument:
>
>  void getSecond(int sizeT, void* pair, void* result) {
>    return memcpy(result, ((char*) pair) + sizeT, sizeT);
>  }
>
Of course, that only works for POD types, so you still need a
different instantiation for std::string, std::vector, ...

There's no possibility of implementing C++'s complicated lookup and
resolution rules down at the LLVM level, and since you need the
instantiations just to parse C++, using it as an example for why LLVM
should do instantiation seems flawed.

Once you restrict yourself to generics, you're only ever passing
pointers around, which, as you said, is the easy case (relatively),
since you don't need the type information at all once past the
front-end.

Thanks for putting up with the newbie,
~ Scott

DeLesley Hutchins

2009-Feb-18 21:27 UTC

head link

[LLVMdev] Parametric polymorphism

> I was thinking of the "T extends Comparable" part, which does
involve
> subtype polymorphism.  Apologies if I'm getting terms mixed up.
It was a bad example -- not close enough to actual LLVM.  :-)
> What do the parametrized types give you that you don't get from using
> opaque instead of T?
Possibly nothing.  I don't really understand the limitations of opaque.
Is it possible to declare a structure type { int, opaque, int }, and then
use getelementptr to retrieve the third element?  I'm guessing not,
because there's no way for the code generator to calculate the
correct offset.

If T is a type variable, then the type { int, T, int } should be valid.
Eventually of course, the native code generator will need to know
the size of T, but that shouldn't worry other optimization passes
that are only dealing with the IR.
> Type checking has already passed by the time it
> reaches this level, and you would simply add the appropriate bitcasts
> when generating the functions to be added to the dictionaries.
Assuming that all values are 32-bit.  Oops -- can't use doubles, long
doubles, or compile to 64-bit architectures.  Doh!   ;-)
> I wonder if the "instantiation" could be done instead as a normal
pass
Yes, and it should.
> inter-procedural points-to analysis.  Getting rid of the indirection
> in the generic dictionary seems like the same problem as
> devirtualizing method calls, so a unified solution would be nice.
Devirtualization is actually a lot trickier, since it relies on whole
program analysis and type information from the high-level language;
we need to know that a class C has no subclasses in order to
devirtualize its methods.  Getting rid of the dictionary indirection is
a simple matter of constant propagation and inlining; no type wizardry
required.  (This is one reason why I'm not a big fan of OOP type
systems.)
> But specialization (in the C++ template sense) is also a great example
> of why it's needed in the host language, as is overloading.
There's no reason why specialization -has- to be done by llvm.
Idiotic languages like C++ can do it themselves, badly, if they want. ;-)
I want llvm to do it for me because it can do a better job.  ;-)
>>  void getSecond(int sizeT, void* pair, void* result) {
>>    return memcpy(result, ((char*) pair) + sizeT, sizeT);
>>  }
>
> Of course, that only works for POD types, so you still need a
> different instantiation for std::string, std::vector, ...
Every llvm type is a POD type.  It's either a scalar, a vector,
an array, a struct, or a pointer.  Every one of those types has a
fixed size.  The C++ compiler is supposed to translate complicated
thingies like std::string into a POD type that llvm can understand.
> There's no possibility of implementing C++'s complicated lookup and
> resolution rules down at the LLVM level,
Why on earth would we want to do anything like C++ lookup?  I
tried writing a C++ parser once, and I think the Obama administration
can easily use it as a geneva-convention-friendly alternative to
waterboarding on suspected terrorists.  :-)
> Once you restrict yourself to generics, you're only ever passing
> pointers around, which, as you said, is the easy case (relatively),
> since you don't need the type information at all once past the
> front-end.
Yes, and Java generics are dog-slow because they can only use
pointers.  Try implementing a generic complex number class in Java,
and watch the two-order-of-magnitude drop in performance on scientific
code.

  -DeLesley

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Feb 2009 - [LLVMdev] Parametric polymorphism

[LLVMdev] Parametric polymorphism

[LLVMdev] Parametric polymorphism

[LLVMdev] Parametric polymorphism

Reasonably Related Threads