thr3ads.net - llvm dev - [LLVMdev] Suggestion: Support union types in IR [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Talin

2008-Dec-30 20:41 UTC

[LLVMdev] Suggestion: Support union types in IR

I've been thinking about how to represent unions or "disjoint
types" in LLVM
IR. At the moment, the only way I know to achieve this right now is to
create a struct that is as large as the largest type in the union and then
bitcast it to access the fields contained within. However, that requires
that the frontend know the sizes of all of the various low-level types (the
"size_t" problem, which has been discussed before), otherwise you get
problems trying to mix pointer and non-pointer types.
It seems to me that adding a union type to the IR would be a logical
extension to the language. The syntax for declaring a union would be similar
to that of declaring a struct. To access a union member, you would use
GetElementPointer, just as if it were a struct. The only difference is that
in this case, the GEP doesn't actually modify the address, it merely returns
the input argument as a different type. In all other ways, unions would be
treated like structs, except that the size of the union would always be the
size of the largest member, and all of the fields within the union would be
located located at relative offset zero.

One issue would be how to represent the union in text form. One alternative
is to use the same syntax for structs, except replace the comma with a
vertical bar character, as in this example:

    {int|float}

The drawback to this approach is that you can't tell whether it is a struct
or union until after you have parsed the first argument. If that turns out
to be too inconvenient for the parser, then some unique two-character
sequence will have to be used, such as:

   {|int|float|}

Another idea is to use struct syntax with a keyword prefix:

   union {int, float}

The specific details of syntax I will leave to others.

Unions could of course be combined with other types:

   {{int|float}, bool} *
   n = getelementptr i32 0, i32 0, i32 1

So in the above example, the GEP returns a pointer to the float field.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20081230/3964e7bd/attachment.html>

Eli Friedman

2008-Dec-30 21:20 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

On Tue, Dec 30, 2008 at 12:41 PM, Talin <viridia at gmail.com>
wrote:> I've been thinking about how to represent unions or "disjoint
types" in LLVM
> IR.
This has been rejected in the past due to insufficient utility:
bitcasts have worked perfectly fine in the past...
> At the moment, the only way I know to achieve this right now is to
> create a struct that is as large as the largest type in the union and then
> bitcast it to access the fields contained within. However, that requires
> that the frontend know the sizes of all of the various low-level types (the
> "size_t" problem, which has been discussed before), otherwise you
get
> problems trying to mix pointer and non-pointer types.
I thought we had a reasonable solution to the sizeof problem: the
getelementptr trick.  Why isn't that usable here?

-Eli

Nick Lewycky

2008-Dec-30 23:02 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

Talin wrote:> I've been thinking about how to represent unions or "disjoint
types"
> in LLVM IR. At the moment, the only way I know to achieve this right 
> now is to create a struct that is as large as the largest type in the 
> union and then bitcast it to access the fields contained within. 
> However, that requires that the frontend know the sizes of all of the 
> various low-level types (the "size_t" problem, which has been 
> discussed before), otherwise you get problems trying to mix pointer 
> and non-pointer types.It has to come down to a fixed size at some point, using target-specific 
knowledge. Is there any reason you couldn't use 'opaque' here and 
resolve it once you have the necessary information?

Nick
> It seems to me that adding a union type to the IR would be a logical 
> extension to the language. The syntax for declaring a union would be 
> similar to that of declaring a struct. To access a union member, you 
> would use GetElementPointer, just as if it were a struct. The only 
> difference is that in this case, the GEP doesn't actually modify the 
> address, it merely returns the input argument as a different type. In 
> all other ways, unions would be treated like structs, except that the 
> size of the union would always be the size of the largest member, and 
> all of the fields within the union would be located located at 
> relative offset zero.
>
> One issue would be how to represent the union in text form. One 
> alternative is to use the same syntax for structs, except replace the 
> comma with a vertical bar character, as in this example:
>
>     {int|float}
>
> The drawback to this approach is that you can't tell whether it is a 
> struct or union until after you have parsed the first argument. If 
> that turns out to be too inconvenient for the parser, then some unique 
> two-character sequence will have to be used, such as:
>
>    {|int|float|}
>
> Another idea is to use struct syntax with a keyword prefix:
>
>    union {int, float}
>
> The specific details of syntax I will leave to others.
>
> Unions could of course be combined with other types:
>
>    {{int|float}, bool} *
>    n = getelementptr i32 0, i32 0, i32 1
>
> So in the above example, the GEP returns a pointer to the float field.
>
> -- 
> -- Talin
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Jon Harrop

2008-Dec-31 17:35 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

On Tuesday 30 December 2008 20:41:04 Talin wrote:> I've been thinking about how to represent unions or "disjoint
types" in
> LLVM IR. At the moment, the only way I know to achieve this right now is to
> create a struct that is as large as the largest type in the union and then
> bitcast it to access the fields contained within. However, that requires
> that the frontend know the sizes of all of the various low-level types (the
> "size_t" problem, which has been discussed before), otherwise you
get
> problems trying to mix pointer and non-pointer types.
> It seems to me that adding a union type to the IR would be a logical
> extension to the language...
I think very few people are still using C-style unions so there is probably 
little desire to put them into LLVM itself. Class hierarchies and algebraic 
datatypes are much more common and use a different representation (that LLVM 
can already handle with bitcasts). Specifically, as a pointer to one of 
several struct types that can vary in size.

I am considering a design along the following lines. All algebraic datatypes 
are stored as a pointer to an int tag:

  i32 *

The tag conveys the true type of the value and, indeed, it may even be a 
pointer to comprehensive run-time type information. So you load the tag and 
examine it in order to determine what type to cast to. Then you bitcast to 
another type if you need to load any arguments (that appear after the tag). 
For example, an int option type might cast as follows:

  None   (tag=0) -> {i32} *
  Some n (tag=1) -> {i32, i32} *

Assuming LLVM keeps the first i32 field of any struct in the same place, this 
provides the required uniformity for algebraic datatypes and will also scale 
to full class hierarchies.

I assume the C++ front-end for Clang does something similar to implement 
classes?

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Chris Lattner

2008-Dec-31 20:41 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

On Dec 30, 2008, at 12:41 PM, Talin wrote:> I've been thinking about how to represent unions or "disjoint
types"
> in LLVM IR. At the moment, the only way I know to achieve this right  
> now is to create a struct that is as large as the largest type in  
> the union and then bitcast it to access the fields contained within.  
> However, that requires that the frontend know the sizes of all of  
> the various low-level types (the "size_t" problem, which has been
> discussed before), otherwise you get problems trying to mix pointer  
> and non-pointer types.
That's an interesting point.  As others have pointed out, we've  
resisted having a union type because it isn't strictly needed for the  
current set of front-ends.  If a front-end is trying to generate  
target-independent IR though, I can see the utility.  The "gep trick"
won't work for type generation.
> It seems to me that adding a union type to the IR would be a logical  
> extension to the language. The syntax for declaring a union would be  
> similar to that of declaring a struct. To access a union member, you  
> would use GetElementPointer, just as if it were a struct. The only  
> difference is that in this case, the GEP doesn't actually modify the  
> address, it merely returns the input argument as a different type.  
> In all other ways, unions would be treated like structs, except that  
> the size of the union would always be the size of the largest  
> member, and all of the fields within the union would be located  
> located at relative offset zero.
Yes, your proposal makes sense, for syntax, I'd suggest:  u{ i32, float}
> Unions could of course be combined with other types:
>
>    {{int|float}, bool} *
>    n = getelementptr i32 0, i32 0, i32 1
>
> So in the above example, the GEP returns a pointer to the float field.
I don't have a specific problem with adding this.  The cost of doing  
so is that it adds (a small amount of) complexity to a lot of places  
that walk the type graphs.  The only pass that I predict will be  
difficult to update to handle this is the BasicAA pass, which reasons  
about symbolic (not concrete) offsets and should return mustalias in  
the appropriate cases.  Also, to validate this, I think llvm-gcc  
should start generating this for C unions where possible.

If you're interested in implementing this and seeing all the details  
of the implementation through to the end, I don't see significant  
problems.  I think adding a simple union type would make more sense  
than adding first-class support for a *discriminated* union.

-Chris

Talin

2009-Jan-01 00:57 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

Responding to everyone in one post:

Eli Friedman wrote:>> At the moment, the only way I know to achieve this right now is to
>> create a struct that is as large as the largest type in the union and
then
>> bitcast it to access the fields contained within. However, that
requires
>> that the frontend know the sizes of all of the various low-level types
(the
>> "size_t" problem, which has been discussed before), otherwise
you get
>> problems trying to mix pointer and non-pointer types.
> I thought we had a reasonable solution to the sizeof problem: the
> getelementptr trick.  Why isn't that usable here?
>   As Chris points out, that won't work for type declarations.

Nick Lewycky wrote:> It has to come down to a fixed size at some point, using target-specific 
> knowledge. Is there any reason you couldn't use 'opaque' here
and
> resolve it once you have the necessary information?
>   Perhaps I've never quite understood the role of bitcode files. Assume I 
have a compiler that generates .bc files, and a linker which combines 
them into a runnable program. At which point is the target machine 
chosen, in the compiler or in the linker? The assumption that I have 
made up to this point is that the compiler produces .bc files which are 
target-neutral, and the linker then generates a target-specific program. 
(I realize that other design choices are possible.)

In order to do what you suggest, the linker needs to know all of the 
possible types that the union type can be. That means that the compiler 
has to transmit this knowledge to the linker in some way. Since the only 
communication between the compiler and the linker is bitcode (at least, 
in my current design), the types need to either be represented in-band 
(that is, representable within LLVM IR) or out-of-band (via some other 
mechanism.) It would seem to be cleaner if the information could be 
represented in-band.

Jon Harrop wrote:> I think very few people are still using C-style unions so there is probably
> little desire to put them into LLVM itself. Class hierarchies and algebraic
> datatypes are much more common and use a different representation (that
LLVM
> can already handle with bitcasts). Specifically, as a pointer to one of 
> several struct types that can vary in size.
>
> I am considering a design along the following lines. All algebraic
datatypes
> are stored as a pointer to an int tag:
>
>   i32 *
>   Exactly. I'm not especially interested in C-style unions, I'm interested
in discriminated unions. But the actual discriminator field is easily 
represented in LLVM IR already, so there's no need to extend the IR to 
support them. That's why I am only asking for C-style union support - 
it's a lower-level primitive that you can use to build discriminated 
unions on top of.

The problem with varying-size structs is that it would be nice to be 
able to pass them as function parameters and function return values 
using LLVM's support of structs as first-class data types. That requires 
that the code that handles argument marshalling know the size of the 
largest possible type that can be passed.> The tag conveys the true type of the value and, indeed, it may even be a 
> pointer to comprehensive run-time type information. So you load the tag and
> examine it in order to determine what type to cast to. Then you bitcast to 
> another type if you need to load any arguments (that appear after the tag).
> For example, an int option type might cast as follows:
>
>   None   (tag=0) -> {i32} *
>   Some n (tag=1) -> {i32, i32} *
>   Since the language I am working on is statically typed, I was going to 
select a type discrimination strategy based on the actual types 
involved. For example, in the case of a union of reference types, 
there's no need for a discriminator field, since reference types already 
carry around their own type information. On the other hand, if you have 
a union of two enum types of limited range, the high bits of the enum 
would be available. In cases such as int or float where all the bits 
were used, then a separate discriminator field would be needed.

Chris Lattner wrote:> I don't have a specific problem with adding this.  The cost of doing  
> so is that it adds (a small amount of) complexity to a lot of places  
> that walk the type graphs.  The only pass that I predict will be  
> difficult to update to handle this is the BasicAA pass, which reasons  
> about symbolic (not concrete) offsets and should return mustalias in  
> the appropriate cases.  Also, to validate this, I think llvm-gcc  
> should start generating this for C unions where possible.
>
> If you're interested in implementing this and seeing all the details  
> of the implementation through to the end, I don't see significant  
> problems.  I think adding a simple union type would make more sense  
> than adding first-class support for a *discriminated* union.
>   Well, we'll see. Parsers and type hierarchies I grok. Backends and code 
generation, not so much. It will be a while before I am forced to solve 
this problem one way or another, but eventually I will.

-- Talin

Talin

2009-May-06 03:09 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

I wanted to mention, by the way, that my need/desire for this hasn't 
gone away :)

And my wish list still includes support for something like uintptr_t - a 
primitive integer type that is defined to always be the same size as a 
pointer, however large or small that may be on different platforms. (So 
that the frontend doesn't need to know how big a pointer is and can 
generate the same IR that works on both 32-bit and 64-bit platforms.)

-- Talin

Chris Lattner wrote:> On Dec 30, 2008, at 12:41 PM, Talin wrote:
>   
>> I've been thinking about how to represent unions or "disjoint
types"
>> in LLVM IR. At the moment, the only way I know to achieve this right  
>> now is to create a struct that is as large as the largest type in  
>> the union and then bitcast it to access the fields contained within.  
>> However, that requires that the frontend know the sizes of all of  
>> the various low-level types (the "size_t" problem, which has
been
>> discussed before), otherwise you get problems trying to mix pointer  
>> and non-pointer types.
>>     
>
> That's an interesting point.  As others have pointed out, we've  
> resisted having a union type because it isn't strictly needed for the  
> current set of front-ends.  If a front-end is trying to generate  
> target-independent IR though, I can see the utility.  The "gep
trick"
> won't work for type generation.
>
>   
>> It seems to me that adding a union type to the IR would be a logical  
>> extension to the language. The syntax for declaring a union would be  
>> similar to that of declaring a struct. To access a union member, you  
>> would use GetElementPointer, just as if it were a struct. The only  
>> difference is that in this case, the GEP doesn't actually modify
the
>> address, it merely returns the input argument as a different type.  
>> In all other ways, unions would be treated like structs, except that  
>> the size of the union would always be the size of the largest  
>> member, and all of the fields within the union would be located  
>> located at relative offset zero.
>>     
>
> Yes, your proposal makes sense, for syntax, I'd suggest:  u{ i32,
float}
>
>   
>> Unions could of course be combined with other types:
>>
>>    {{int|float}, bool} *
>>    n = getelementptr i32 0, i32 0, i32 1
>>
>> So in the above example, the GEP returns a pointer to the float field.
>>     
>
> I don't have a specific problem with adding this.  The cost of doing  
> so is that it adds (a small amount of) complexity to a lot of places  
> that walk the type graphs.  The only pass that I predict will be  
> difficult to update to handle this is the BasicAA pass, which reasons  
> about symbolic (not concrete) offsets and should return mustalias in  
> the appropriate cases.  Also, to validate this, I think llvm-gcc  
> should start generating this for C unions where possible.
>
> If you're interested in implementing this and seeing all the details  
> of the implementation through to the end, I don't see significant  
> problems.  I think adding a simple union type would make more sense  
> than adding first-class support for a *discriminated* union.
>
> -Chris
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Talin

2009-May-08 05:49 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

Chris Lattner wrote:> On Dec 30, 2008, at 12:41 PM, Talin wrote:
>   
>> I've been thinking about how to represent unions or "disjoint
types"
>> in LLVM IR. At the moment, the only way I know to achieve this right  
>> now is to create a struct that is as large as the largest type in  
>> the union and then bitcast it to access the fields contained within.  
>> However, that requires that the frontend know the sizes of all of  
>> the various low-level types (the "size_t" problem, which has
been
>> discussed before), otherwise you get problems trying to mix pointer  
>> and non-pointer types.
>>     
>
> That's an interesting point.  As others have pointed out, we've  
> resisted having a union type because it isn't strictly needed for the  
> current set of front-ends.  If a front-end is trying to generate  
> target-independent IR though, I can see the utility.  The "gep
trick"
> won't work for type generation.
>
>   
>> It seems to me that adding a union type to the IR would be a logical  
>> extension to the language. The syntax for declaring a union would be  
>> similar to that of declaring a struct. To access a union member, you  
>> would use GetElementPointer, just as if it were a struct. The only  
>> difference is that in this case, the GEP doesn't actually modify
the
>> address, it merely returns the input argument as a different type.  
>> In all other ways, unions would be treated like structs, except that  
>> the size of the union would always be the size of the largest  
>> member, and all of the fields within the union would be located  
>> located at relative offset zero.
>>     
>
> Yes, your proposal makes sense, for syntax, I'd suggest:  u{ i32,
float}
>
>   
>> Unions could of course be combined with other types:
>>
>>    {{int|float}, bool} *
>>    n = getelementptr i32 0, i32 0, i32 1
>>
>> So in the above example, the GEP returns a pointer to the float field.
>>     
>
> I don't have a specific problem with adding this.  The cost of doing  
> so is that it adds (a small amount of) complexity to a lot of places  
> that walk the type graphs.  The only pass that I predict will be  
> difficult to update to handle this is the BasicAA pass, which reasons  
> about symbolic (not concrete) offsets and should return mustalias in  
> the appropriate cases.  Also, to validate this, I think llvm-gcc  
> should start generating this for C unions where possible.
>
> If you're interested in implementing this and seeing all the details  
> of the implementation through to the end, I don't see significant  
> problems.  I think adding a simple union type would make more sense  
> than adding first-class support for a *discriminated* union.
>   So, I spent a little bit of time looking at how to code this. Rather 
than defining a brand-new LLVM type, I think the best approach is to 
generalize the existing StructType. We can change the "isPacked" flag 
(which is stored in the SubclassData of the type) to a 3-valued enum 
called StructLayout, which has 3 members: UnpackedLayout, PackedLayout, 
and UnionLayout.

After that, it's mostly a case of hunting down everywhere that calls 
"isPacked", and adding the appropriate case.
:)> -Chris
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Dec 2008 - [LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

Seemingly Similar Threads