thr3ads.net - llvm dev - [LLVMdev] Suggestion: Support union types in IR [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Talin

2009-Jan-01 00:57 UTC

[LLVMdev] Suggestion: Support union types in IR

Responding to everyone in one post:

Eli Friedman wrote:>> At the moment, the only way I know to achieve this right now is to
>> create a struct that is as large as the largest type in the union and
then
>> bitcast it to access the fields contained within. However, that
requires
>> that the frontend know the sizes of all of the various low-level types
(the
>> "size_t" problem, which has been discussed before), otherwise
you get
>> problems trying to mix pointer and non-pointer types.
> I thought we had a reasonable solution to the sizeof problem: the
> getelementptr trick.  Why isn't that usable here?
>   As Chris points out, that won't work for type declarations.

Nick Lewycky wrote:> It has to come down to a fixed size at some point, using target-specific 
> knowledge. Is there any reason you couldn't use 'opaque' here
and
> resolve it once you have the necessary information?
>   Perhaps I've never quite understood the role of bitcode files. Assume I 
have a compiler that generates .bc files, and a linker which combines 
them into a runnable program. At which point is the target machine 
chosen, in the compiler or in the linker? The assumption that I have 
made up to this point is that the compiler produces .bc files which are 
target-neutral, and the linker then generates a target-specific program. 
(I realize that other design choices are possible.)

In order to do what you suggest, the linker needs to know all of the 
possible types that the union type can be. That means that the compiler 
has to transmit this knowledge to the linker in some way. Since the only 
communication between the compiler and the linker is bitcode (at least, 
in my current design), the types need to either be represented in-band 
(that is, representable within LLVM IR) or out-of-band (via some other 
mechanism.) It would seem to be cleaner if the information could be 
represented in-band.

Jon Harrop wrote:> I think very few people are still using C-style unions so there is probably
> little desire to put them into LLVM itself. Class hierarchies and algebraic
> datatypes are much more common and use a different representation (that
LLVM
> can already handle with bitcasts). Specifically, as a pointer to one of 
> several struct types that can vary in size.
>
> I am considering a design along the following lines. All algebraic
datatypes
> are stored as a pointer to an int tag:
>
>   i32 *
>   Exactly. I'm not especially interested in C-style unions, I'm interested
in discriminated unions. But the actual discriminator field is easily 
represented in LLVM IR already, so there's no need to extend the IR to 
support them. That's why I am only asking for C-style union support - 
it's a lower-level primitive that you can use to build discriminated 
unions on top of.

The problem with varying-size structs is that it would be nice to be 
able to pass them as function parameters and function return values 
using LLVM's support of structs as first-class data types. That requires 
that the code that handles argument marshalling know the size of the 
largest possible type that can be passed.> The tag conveys the true type of the value and, indeed, it may even be a 
> pointer to comprehensive run-time type information. So you load the tag and
> examine it in order to determine what type to cast to. Then you bitcast to 
> another type if you need to load any arguments (that appear after the tag).
> For example, an int option type might cast as follows:
>
>   None   (tag=0) -> {i32} *
>   Some n (tag=1) -> {i32, i32} *
>   Since the language I am working on is statically typed, I was going to 
select a type discrimination strategy based on the actual types 
involved. For example, in the case of a union of reference types, 
there's no need for a discriminator field, since reference types already 
carry around their own type information. On the other hand, if you have 
a union of two enum types of limited range, the high bits of the enum 
would be available. In cases such as int or float where all the bits 
were used, then a separate discriminator field would be needed.

Chris Lattner wrote:> I don't have a specific problem with adding this.  The cost of doing  
> so is that it adds (a small amount of) complexity to a lot of places  
> that walk the type graphs.  The only pass that I predict will be  
> difficult to update to handle this is the BasicAA pass, which reasons  
> about symbolic (not concrete) offsets and should return mustalias in  
> the appropriate cases.  Also, to validate this, I think llvm-gcc  
> should start generating this for C unions where possible.
>
> If you're interested in implementing this and seeing all the details  
> of the implementation through to the end, I don't see significant  
> problems.  I think adding a simple union type would make more sense  
> than adding first-class support for a *discriminated* union.
>   Well, we'll see. Parsers and type hierarchies I grok. Backends and code 
generation, not so much. It will be a while before I am forced to solve 
this problem one way or another, but eventually I will.

-- Talin

Jon Harrop

2009-Jan-01 14:25 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

On Thursday 01 January 2009 00:57:50 Talin wrote:> Jon Harrop wrote:
> > I think very few people are still using C-style unions so there is
> > probably little desire to put them into LLVM itself. Class hierarchies
> > and algebraic datatypes are much more common and use a different
> > representation (that LLVM can already handle with bitcasts).
> > Specifically, as a pointer to one of several struct types that can
vary
> > in size.
> >
> > I am considering a design along the following lines. All algebraic
> > datatypes are stored as a pointer to an int tag:
> >
> >   i32 *
>
> Exactly. I'm not especially interested in C-style unions, I'm
interested
> in discriminated unions. But the actual discriminator field is easily
> represented in LLVM IR already, so there's no need to extend the IR to
> support them. That's why I am only asking for C-style union support -
> it's a lower-level primitive that you can use to build discriminated
> unions on top of.
I don't think you would want to build discriminated unions on top of C-style
unions though.
> The problem with varying-size structs is that it would be nice to be
> able to pass them as function parameters and function return values
> using LLVM's support of structs as first-class data types.
As I understand it, LLVM does not provide first-class structs and attempting 
to return arbitrary structs from functions will randomly break. I believe you 
need to implement that yourself in whatever way your target language 
requires.
> That requires that the code that handles argument marshalling know the size
> of the largest possible type that can be passed.
Pass a pointer and that size will always be 1 word. You can still stack 
allocate the space if you must and LLVM's optimization passes will do their 
best to completely remove that overhead when possible.
> > The tag conveys the true type of the value and, indeed, it may even be
a
> > pointer to comprehensive run-time type information. So you load the
tag
> > and examine it in order to determine what type to cast to. Then you
> > bitcast to another type if you need to load any arguments (that appear
> > after the tag). For example, an int option type might cast as follows:
> >
> >   None   (tag=0) -> {i32} *
> >   Some n (tag=1) -> {i32, i32} *
>
> Since the language I am working on is statically typed, I was going to
> select a type discrimination strategy based on the actual types
> involved. For example, in the case of a union of reference types,
> there's no need for a discriminator field,
Only if they are all different reference types. Otherwise you might have the 
situation:

  type t = Negate of t | Factorial of t

Where both constructors carry an argument of the same type (t) and you do 
still need a tag to distinguish between the cases (Negate and Factorial) in 
the union.
> since reference types already 
> carry around their own type information. On the other hand, if you have
> a union of two enum types of limited range, the high bits of the enum
> would be available. In cases such as int or float where all the bits
> were used, then a separate discriminator field would be needed.
There is certainly some scope here and I have been thinking about similar 
things. I would like both run-time type information and efficient pattern 
matching over algebraic datatypes.

The run-time type information can be a field from every reference type that 
points to its run-time type. When pattern matching, you usually want to 
bisect the number of cases covered in the union to perform a balanced search 
and achieve the best performance. That usually assumes the presence of 
consecutive integers as tags so the use of effectively-random pointers would 
throw that off a bit, especially if the run-time type information is 
subjected to a moving garbage collector. Alternatively, I could put a 
redundant tag in each value on the heap...

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Chris Lattner

2009-Jan-02 20:05 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

On Jan 1, 2009, at 6:25 AM, Jon Harrop wrote:>> Exactly. I'm not especially interested in C-style unions, I'm  
>> interested
>> in discriminated unions. But the actual discriminator field is easily
>> represented in LLVM IR already, so there's no need to extend the IR
>> to
>> support them. That's why I am only asking for C-style union support
-
>> it's a lower-level primitive that you can use to build
discriminated
>> unions on top of.
>
> I don't think you would want to build discriminated unions on top of  
> C-style
> unions though.
Why?
>> The problem with varying-size structs is that it would be nice to be
>> able to pass them as function parameters and function return values
>> using LLVM's support of structs as first-class data types.
>
> As I understand it, LLVM does not provide first-class structs and  
> attempting
> to return arbitrary structs from functions will randomly break. I  
> believe you
> need to implement that yourself in whatever way your target language
> requires.
This is a current implementation deficiency, not a design limitation.   
We welcome patches to fix this.

-Chris

Chris Lattner

2009-Jan-02 20:10 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

On Dec 31, 2008, at 4:57 PM, Talin wrote:> Nick Lewycky wrote:
>> It has to come down to a fixed size at some point, using target- 
>> specific
>> knowledge. Is there any reason you couldn't use 'opaque'
here and
>> resolve it once you have the necessary information?
>>
> Perhaps I've never quite understood the role of bitcode files.  
> Assume I
> have a compiler that generates .bc files, and a linker which combines
> them into a runnable program. At which point is the target machine
> chosen, in the compiler or in the linker? The assumption that I have
> made up to this point is that the compiler produces .bc files which  
> are
> target-neutral, and the linker then generates a target-specific  
> program.
> (I realize that other design choices are possible.)
Well there are two answers to this.  Ideally, LLVM would work as you  
say.  Unfortunately some of the current implementation has been  
tainted by LLVM's primary development being guided by the needs of C  
compilers.  C compilers inherently need the front-end to encode target- 
specific information in the IR.

However, I really do want to serve portable languages better.   
Ideally, the first time that target-specificity comes into play is at  
the "llc" level.  I think we are very close to this, but I believe  
there are still a few problem areas.

That said, I don't think this is a reason *not to add* the 'portable  
union' feature, I just think it's a reason we *haven't already*
added
it.
> Exactly. I'm not especially interested in C-style unions, I'm  
> interested
> in discriminated unions. But the actual discriminator field is easily
> represented in LLVM IR already, so there's no need to extend the IR to
> support them. That's why I am only asking for C-style union support -
> it's a lower-level primitive that you can use to build discriminated
> unions on top of.
Yes, I completely agree with you.

-Chris

Chris Lattner

2009-Jan-02 20:48 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

On Jan 1, 2009, at 6:25 AM, Jon Harrop wrote:>> Exactly. I'm not especially interested in C-style unions, I'm  
>> interested
>> in discriminated unions. But the actual discriminator field is easily
>> represented in LLVM IR already, so there's no need to extend the IR
>> to
>> support them. That's why I am only asking for C-style union support
-
>> it's a lower-level primitive that you can use to build
discriminated
>> unions on top of.
>
> I don't think you would want to build discriminated unions on top of  
> C-style
> unions though.
Why?
>> The problem with varying-size structs is that it would be nice to be
>> able to pass them as function parameters and function return values
>> using LLVM's support of structs as first-class data types.
>
> As I understand it, LLVM does not provide first-class structs and  
> attempting
> to return arbitrary structs from functions will randomly break. I  
> believe you
> need to implement that yourself in whatever way your target language
> requires.
This is a current implementation deficiency, not a design limitation.   
We welcome patches to fix this.

-Chris

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Jan 2009 - [LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

Apparently Analagous Threads