thr3ads.net - llvm dev - [LLVMdev] Suggestion: Support union types in IR [May 2009]

If this information is useful, please help other people find it:
Share via:

Talin

2009-May-08 05:49 UTC

[LLVMdev] Suggestion: Support union types in IR

Chris Lattner wrote:> On Dec 30, 2008, at 12:41 PM, Talin wrote:
>   
>> I've been thinking about how to represent unions or "disjoint
types"
>> in LLVM IR. At the moment, the only way I know to achieve this right  
>> now is to create a struct that is as large as the largest type in  
>> the union and then bitcast it to access the fields contained within.  
>> However, that requires that the frontend know the sizes of all of  
>> the various low-level types (the "size_t" problem, which has
been
>> discussed before), otherwise you get problems trying to mix pointer  
>> and non-pointer types.
>>     
>
> That's an interesting point.  As others have pointed out, we've  
> resisted having a union type because it isn't strictly needed for the  
> current set of front-ends.  If a front-end is trying to generate  
> target-independent IR though, I can see the utility.  The "gep
trick"
> won't work for type generation.
>
>   
>> It seems to me that adding a union type to the IR would be a logical  
>> extension to the language. The syntax for declaring a union would be  
>> similar to that of declaring a struct. To access a union member, you  
>> would use GetElementPointer, just as if it were a struct. The only  
>> difference is that in this case, the GEP doesn't actually modify
the
>> address, it merely returns the input argument as a different type.  
>> In all other ways, unions would be treated like structs, except that  
>> the size of the union would always be the size of the largest  
>> member, and all of the fields within the union would be located  
>> located at relative offset zero.
>>     
>
> Yes, your proposal makes sense, for syntax, I'd suggest:  u{ i32,
float}
>
>   
>> Unions could of course be combined with other types:
>>
>>    {{int|float}, bool} *
>>    n = getelementptr i32 0, i32 0, i32 1
>>
>> So in the above example, the GEP returns a pointer to the float field.
>>     
>
> I don't have a specific problem with adding this.  The cost of doing  
> so is that it adds (a small amount of) complexity to a lot of places  
> that walk the type graphs.  The only pass that I predict will be  
> difficult to update to handle this is the BasicAA pass, which reasons  
> about symbolic (not concrete) offsets and should return mustalias in  
> the appropriate cases.  Also, to validate this, I think llvm-gcc  
> should start generating this for C unions where possible.
>
> If you're interested in implementing this and seeing all the details  
> of the implementation through to the end, I don't see significant  
> problems.  I think adding a simple union type would make more sense  
> than adding first-class support for a *discriminated* union.
>   So, I spent a little bit of time looking at how to code this. Rather 
than defining a brand-new LLVM type, I think the best approach is to 
generalize the existing StructType. We can change the "isPacked" flag 
(which is stored in the SubclassData of the type) to a 3-valued enum 
called StructLayout, which has 3 members: UnpackedLayout, PackedLayout, 
and UnionLayout.

After that, it's mostly a case of hunting down everywhere that calls 
"isPacked", and adding the appropriate case.
:)> -Chris
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Nick Lewycky

2009-May-08 06:10 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

Talin wrote:> Chris Lattner wrote:
>> On Dec 30, 2008, at 12:41 PM, Talin wrote:
>>   
>>> I've been thinking about how to represent unions or
"disjoint types"
>>> in LLVM IR. At the moment, the only way I know to achieve this
right
>>> now is to create a struct that is as large as the largest type in  
>>> the union and then bitcast it to access the fields contained
within.
>>> However, that requires that the frontend know the sizes of all of  
>>> the various low-level types (the "size_t" problem, which
has been
>>> discussed before), otherwise you get problems trying to mix pointer
>>> and non-pointer types.
>>>     
>> That's an interesting point.  As others have pointed out, we've
>> resisted having a union type because it isn't strictly needed for
the
>> current set of front-ends.  If a front-end is trying to generate  
>> target-independent IR though, I can see the utility.  The "gep
trick"
>> won't work for type generation.
>>
>>   
>>> It seems to me that adding a union type to the IR would be a
logical
>>> extension to the language. The syntax for declaring a union would
be
>>> similar to that of declaring a struct. To access a union member,
you
>>> would use GetElementPointer, just as if it were a struct. The only
>>> difference is that in this case, the GEP doesn't actually
modify the
>>> address, it merely returns the input argument as a different type.
>>> In all other ways, unions would be treated like structs, except
that
>>> the size of the union would always be the size of the largest  
>>> member, and all of the fields within the union would be located  
>>> located at relative offset zero.
>>>     
>> Yes, your proposal makes sense, for syntax, I'd suggest:  u{ i32,
float}
>>
>>   
>>> Unions could of course be combined with other types:
>>>
>>>    {{int|float}, bool} *
>>>    n = getelementptr i32 0, i32 0, i32 1
>>>
>>> So in the above example, the GEP returns a pointer to the float
field.
>>>     
>> I don't have a specific problem with adding this.  The cost of
doing
>> so is that it adds (a small amount of) complexity to a lot of places  
>> that walk the type graphs.  The only pass that I predict will be  
>> difficult to update to handle this is the BasicAA pass, which reasons  
>> about symbolic (not concrete) offsets and should return mustalias in  
>> the appropriate cases.  Also, to validate this, I think llvm-gcc  
>> should start generating this for C unions where possible.
>>
>> If you're interested in implementing this and seeing all the
details
>> of the implementation through to the end, I don't see significant  
>> problems.  I think adding a simple union type would make more sense  
>> than adding first-class support for a *discriminated* union.
>>   
> So, I spent a little bit of time looking at how to code this. Rather 
> than defining a brand-new LLVM type, I think the best approach is to 
> generalize the existing StructType. We can change the "isPacked"
flag
> (which is stored in the SubclassData of the type) to a 3-valued enum 
> called StructLayout, which has 3 members: UnpackedLayout, PackedLayout, 
> and UnionLayout.
> 
> After that, it's mostly a case of hunting down everywhere that calls 
> "isPacked", and adding the appropriate case. :)
Hi Talin, thanks for looking in to this.

I think there's already a lot of code that makes the assumption that the 
different members of a StructType represent distinct storage. Why is it 
easier to co-opt StructType than to create a new 'UnionType' under 
CompositeType? What's the tradeoff?

Nick

Talin

2009-May-08 17:07 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

Nick Lewycky wrote:> Hi Talin, thanks for looking in to this.
>
> I think there's already a lot of code that makes the assumption that
the
> different members of a StructType represent distinct storage. Why is it 
> easier to co-opt StructType than to create a new 'UnionType' under 
> CompositeType? What's the tradeoff?
>   Well, from a pure lines-of-code perspective, the struct and union types 
share about 90% of their implementation, and it seemed wrong to 
duplicate that code. I thought about factoring out a common base class, 
however that seemed more disruptive to the code base than I really 
wanted to deal with.

However, I haven't really looked into the parts of LLVM that actually do 
the analysis. (I'm mostly a front-end guy.) So I don't yet understand 
how much trouble such a change would cause.> Nick
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Talin

2009-May-12 05:46 UTC

head link

[LLVMdev] Suggestion: Support union types in IR

Nick Lewycky wrote:> Hi Talin, thanks for looking in to this.
>
> I think there's already a lot of code that makes the assumption that
the
> different members of a StructType represent distinct storage. Why is it 
> easier to co-opt StructType than to create a new 'UnionType' under 
> CompositeType? What's the tradeoff?
>
>   So, I'm not sure what path to take now, or how to proceed on this. 
Anyone have an opinion as to whether unions should be a brand new type, 
or a variation of struct? The union class will share a lot of the same 
internal stuff as struct, such as the methods for serializing and 
managing the memory for the member list.

In the mean time, I've decided to use an evil hack to get around the 
lack of proper union support. What I do is to examine all of the types 
in the union and estimate which one will be the largest. However, since 
my compiler is platform-independent, and doesn't know how big a pointer 
is, my calculation for "largest" is fairly complex - I calculate the 
size of all of the union members based on the assumption that pointers 
are 32-bits, and then filter out all members which are smaller than some 
other member in the set. I then do the same calculation but this time 
assume that pointers are 64 bits. Now I have two "largest" sets, I
then
intersect them to see if they have any entries in common. If they do, I 
pick one, and create a struct containing that type. This struct is 
guaranteed to be large enough to hold any union member on either a 
32-bit or 64-bit platform. If the two sets don't have any entries in 
common, then I just fail and print a message saying wait until proper 
union support is done :)

This hack at least gets me far enough so that in my language I can now 
compile and run code that looks like this:

  def testSimpleUnionType() {
    var x:int or float;
    x = 1;
    Debug.assertTrue(x isa int);
    Debug.assertFalse(x isa float);
    Debug.assertFalse(x isa String);
    x = 1.0;
    Debug.assertTrue(x isa float);
    Debug.assertFalse(x isa int);
    Debug.assertFalse(x isa String);
  }

-- Talin

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - May 2009 - [LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

[LLVMdev] Suggestion: Support union types in IR

Possibly Parallel Threads