Chris Lattner wrote:> On Dec 30, 2008, at 12:41 PM, Talin wrote: > >> I've been thinking about how to represent unions or "disjoint types" >> in LLVM IR. At the moment, the only way I know to achieve this right >> now is to create a struct that is as large as the largest type in >> the union and then bitcast it to access the fields contained within. >> However, that requires that the frontend know the sizes of all of >> the various low-level types (the "size_t" problem, which has been >> discussed before), otherwise you get problems trying to mix pointer >> and non-pointer types. >> > > That's an interesting point. As others have pointed out, we've > resisted having a union type because it isn't strictly needed for the > current set of front-ends. If a front-end is trying to generate > target-independent IR though, I can see the utility. The "gep trick" > won't work for type generation. > > >> It seems to me that adding a union type to the IR would be a logical >> extension to the language. The syntax for declaring a union would be >> similar to that of declaring a struct. To access a union member, you >> would use GetElementPointer, just as if it were a struct. The only >> difference is that in this case, the GEP doesn't actually modify the >> address, it merely returns the input argument as a different type. >> In all other ways, unions would be treated like structs, except that >> the size of the union would always be the size of the largest >> member, and all of the fields within the union would be located >> located at relative offset zero. >> > > Yes, your proposal makes sense, for syntax, I'd suggest: u{ i32, float} > > >> Unions could of course be combined with other types: >> >> {{int|float}, bool} * >> n = getelementptr i32 0, i32 0, i32 1 >> >> So in the above example, the GEP returns a pointer to the float field. >> > > I don't have a specific problem with adding this. The cost of doing > so is that it adds (a small amount of) complexity to a lot of places > that walk the type graphs. The only pass that I predict will be > difficult to update to handle this is the BasicAA pass, which reasons > about symbolic (not concrete) offsets and should return mustalias in > the appropriate cases. Also, to validate this, I think llvm-gcc > should start generating this for C unions where possible. > > If you're interested in implementing this and seeing all the details > of the implementation through to the end, I don't see significant > problems. I think adding a simple union type would make more sense > than adding first-class support for a *discriminated* union. >So, I spent a little bit of time looking at how to code this. Rather than defining a brand-new LLVM type, I think the best approach is to generalize the existing StructType. We can change the "isPacked" flag (which is stored in the SubclassData of the type) to a 3-valued enum called StructLayout, which has 3 members: UnpackedLayout, PackedLayout, and UnionLayout. After that, it's mostly a case of hunting down everywhere that calls "isPacked", and adding the appropriate case. :)> -Chris > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
Talin wrote:> Chris Lattner wrote: >> On Dec 30, 2008, at 12:41 PM, Talin wrote: >> >>> I've been thinking about how to represent unions or "disjoint types" >>> in LLVM IR. At the moment, the only way I know to achieve this right >>> now is to create a struct that is as large as the largest type in >>> the union and then bitcast it to access the fields contained within. >>> However, that requires that the frontend know the sizes of all of >>> the various low-level types (the "size_t" problem, which has been >>> discussed before), otherwise you get problems trying to mix pointer >>> and non-pointer types. >>> >> That's an interesting point. As others have pointed out, we've >> resisted having a union type because it isn't strictly needed for the >> current set of front-ends. If a front-end is trying to generate >> target-independent IR though, I can see the utility. The "gep trick" >> won't work for type generation. >> >> >>> It seems to me that adding a union type to the IR would be a logical >>> extension to the language. The syntax for declaring a union would be >>> similar to that of declaring a struct. To access a union member, you >>> would use GetElementPointer, just as if it were a struct. The only >>> difference is that in this case, the GEP doesn't actually modify the >>> address, it merely returns the input argument as a different type. >>> In all other ways, unions would be treated like structs, except that >>> the size of the union would always be the size of the largest >>> member, and all of the fields within the union would be located >>> located at relative offset zero. >>> >> Yes, your proposal makes sense, for syntax, I'd suggest: u{ i32, float} >> >> >>> Unions could of course be combined with other types: >>> >>> {{int|float}, bool} * >>> n = getelementptr i32 0, i32 0, i32 1 >>> >>> So in the above example, the GEP returns a pointer to the float field. >>> >> I don't have a specific problem with adding this. The cost of doing >> so is that it adds (a small amount of) complexity to a lot of places >> that walk the type graphs. The only pass that I predict will be >> difficult to update to handle this is the BasicAA pass, which reasons >> about symbolic (not concrete) offsets and should return mustalias in >> the appropriate cases. Also, to validate this, I think llvm-gcc >> should start generating this for C unions where possible. >> >> If you're interested in implementing this and seeing all the details >> of the implementation through to the end, I don't see significant >> problems. I think adding a simple union type would make more sense >> than adding first-class support for a *discriminated* union. >> > So, I spent a little bit of time looking at how to code this. Rather > than defining a brand-new LLVM type, I think the best approach is to > generalize the existing StructType. We can change the "isPacked" flag > (which is stored in the SubclassData of the type) to a 3-valued enum > called StructLayout, which has 3 members: UnpackedLayout, PackedLayout, > and UnionLayout. > > After that, it's mostly a case of hunting down everywhere that calls > "isPacked", and adding the appropriate case. :)Hi Talin, thanks for looking in to this. I think there's already a lot of code that makes the assumption that the different members of a StructType represent distinct storage. Why is it easier to co-opt StructType than to create a new 'UnionType' under CompositeType? What's the tradeoff? Nick
Nick Lewycky wrote:> Hi Talin, thanks for looking in to this. > > I think there's already a lot of code that makes the assumption that the > different members of a StructType represent distinct storage. Why is it > easier to co-opt StructType than to create a new 'UnionType' under > CompositeType? What's the tradeoff? >Well, from a pure lines-of-code perspective, the struct and union types share about 90% of their implementation, and it seemed wrong to duplicate that code. I thought about factoring out a common base class, however that seemed more disruptive to the code base than I really wanted to deal with. However, I haven't really looked into the parts of LLVM that actually do the analysis. (I'm mostly a front-end guy.) So I don't yet understand how much trouble such a change would cause.> Nick > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Nick Lewycky wrote:> Hi Talin, thanks for looking in to this. > > I think there's already a lot of code that makes the assumption that the > different members of a StructType represent distinct storage. Why is it > easier to co-opt StructType than to create a new 'UnionType' under > CompositeType? What's the tradeoff? > >So, I'm not sure what path to take now, or how to proceed on this. Anyone have an opinion as to whether unions should be a brand new type, or a variation of struct? The union class will share a lot of the same internal stuff as struct, such as the methods for serializing and managing the memory for the member list. In the mean time, I've decided to use an evil hack to get around the lack of proper union support. What I do is to examine all of the types in the union and estimate which one will be the largest. However, since my compiler is platform-independent, and doesn't know how big a pointer is, my calculation for "largest" is fairly complex - I calculate the size of all of the union members based on the assumption that pointers are 32-bits, and then filter out all members which are smaller than some other member in the set. I then do the same calculation but this time assume that pointers are 64 bits. Now I have two "largest" sets, I then intersect them to see if they have any entries in common. If they do, I pick one, and create a struct containing that type. This struct is guaranteed to be large enough to hold any union member on either a 32-bit or 64-bit platform. If the two sets don't have any entries in common, then I just fail and print a message saying wait until proper union support is done :) This hack at least gets me far enough so that in my language I can now compile and run code that looks like this: def testSimpleUnionType() { var x:int or float; x = 1; Debug.assertTrue(x isa int); Debug.assertFalse(x isa float); Debug.assertFalse(x isa String); x = 1.0; Debug.assertTrue(x isa float); Debug.assertFalse(x isa int); Debug.assertFalse(x isa String); } -- Talin