I wanted to mention, by the way, that my need/desire for this hasn't gone away :) And my wish list still includes support for something like uintptr_t - a primitive integer type that is defined to always be the same size as a pointer, however large or small that may be on different platforms. (So that the frontend doesn't need to know how big a pointer is and can generate the same IR that works on both 32-bit and 64-bit platforms.) -- Talin Chris Lattner wrote:> On Dec 30, 2008, at 12:41 PM, Talin wrote: > >> I've been thinking about how to represent unions or "disjoint types" >> in LLVM IR. At the moment, the only way I know to achieve this right >> now is to create a struct that is as large as the largest type in >> the union and then bitcast it to access the fields contained within. >> However, that requires that the frontend know the sizes of all of >> the various low-level types (the "size_t" problem, which has been >> discussed before), otherwise you get problems trying to mix pointer >> and non-pointer types. >> > > That's an interesting point. As others have pointed out, we've > resisted having a union type because it isn't strictly needed for the > current set of front-ends. If a front-end is trying to generate > target-independent IR though, I can see the utility. The "gep trick" > won't work for type generation. > > >> It seems to me that adding a union type to the IR would be a logical >> extension to the language. The syntax for declaring a union would be >> similar to that of declaring a struct. To access a union member, you >> would use GetElementPointer, just as if it were a struct. The only >> difference is that in this case, the GEP doesn't actually modify the >> address, it merely returns the input argument as a different type. >> In all other ways, unions would be treated like structs, except that >> the size of the union would always be the size of the largest >> member, and all of the fields within the union would be located >> located at relative offset zero. >> > > Yes, your proposal makes sense, for syntax, I'd suggest: u{ i32, float} > > >> Unions could of course be combined with other types: >> >> {{int|float}, bool} * >> n = getelementptr i32 0, i32 0, i32 1 >> >> So in the above example, the GEP returns a pointer to the float field. >> > > I don't have a specific problem with adding this. The cost of doing > so is that it adds (a small amount of) complexity to a lot of places > that walk the type graphs. The only pass that I predict will be > difficult to update to handle this is the BasicAA pass, which reasons > about symbolic (not concrete) offsets and should return mustalias in > the appropriate cases. Also, to validate this, I think llvm-gcc > should start generating this for C unions where possible. > > If you're interested in implementing this and seeing all the details > of the implementation through to the end, I don't see significant > problems. I think adding a simple union type would make more sense > than adding first-class support for a *discriminated* union. > > -Chris > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
On May 5, 2009, at 8:09 PM, Talin wrote:> I wanted to mention, by the way, that my need/desire for this hasn't > gone away :) > > And my wish list still includes support for something like uintptr_t > - a > primitive integer type that is defined to always be the same size as a > pointer, however large or small that may be on different platforms. > (So > that the frontend doesn't need to know how big a pointer is and can > generate the same IR that works on both 32-bit and 64-bit platforms.)Why not just use a pointer, such as i8*? -Chris> > > -- Talin > > Chris Lattner wrote: >> On Dec 30, 2008, at 12:41 PM, Talin wrote: >> >>> I've been thinking about how to represent unions or "disjoint types" >>> in LLVM IR. At the moment, the only way I know to achieve this right >>> now is to create a struct that is as large as the largest type in >>> the union and then bitcast it to access the fields contained within. >>> However, that requires that the frontend know the sizes of all of >>> the various low-level types (the "size_t" problem, which has been >>> discussed before), otherwise you get problems trying to mix pointer >>> and non-pointer types. >>> >> >> That's an interesting point. As others have pointed out, we've >> resisted having a union type because it isn't strictly needed for the >> current set of front-ends. If a front-end is trying to generate >> target-independent IR though, I can see the utility. The "gep trick" >> won't work for type generation. >> >> >>> It seems to me that adding a union type to the IR would be a logical >>> extension to the language. The syntax for declaring a union would be >>> similar to that of declaring a struct. To access a union member, you >>> would use GetElementPointer, just as if it were a struct. The only >>> difference is that in this case, the GEP doesn't actually modify the >>> address, it merely returns the input argument as a different type. >>> In all other ways, unions would be treated like structs, except that >>> the size of the union would always be the size of the largest >>> member, and all of the fields within the union would be located >>> located at relative offset zero. >>> >> >> Yes, your proposal makes sense, for syntax, I'd suggest: u{ i32, >> float} >> >> >>> Unions could of course be combined with other types: >>> >>> {{int|float}, bool} * >>> n = getelementptr i32 0, i32 0, i32 1 >>> >>> So in the above example, the GEP returns a pointer to the float >>> field. >>> >> >> I don't have a specific problem with adding this. The cost of doing >> so is that it adds (a small amount of) complexity to a lot of places >> that walk the type graphs. The only pass that I predict will be >> difficult to update to handle this is the BasicAA pass, which reasons >> about symbolic (not concrete) offsets and should return mustalias in >> the appropriate cases. Also, to validate this, I think llvm-gcc >> should start generating this for C unions where possible. >> >> If you're interested in implementing this and seeing all the details >> of the implementation through to the end, I don't see significant >> problems. I think adding a simple union type would make more sense >> than adding first-class support for a *discriminated* union. >> >> -Chris >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Chris Lattner wrote:> On May 5, 2009, at 8:09 PM, Talin wrote: > > >> I wanted to mention, by the way, that my need/desire for this hasn't >> gone away :) >> >> And my wish list still includes support for something like uintptr_t >> - a >> primitive integer type that is defined to always be the same size as a >> pointer, however large or small that may be on different platforms. >> (So >> that the frontend doesn't need to know how big a pointer is and can >> generate the same IR that works on both 32-bit and 64-bit platforms.) >> > > Why not just use a pointer, such as i8*? >Suppose I have an STL-like container that has a 'begin' and 'end' pointer. Now I want to find the size() of the container. Since you cannot subtract pointers in LLVM IR, you have to cast them to an integer type first. But what integer type do you cast them to? I suppose you could simply always cast them to i64, and hope that the backend will generate efficient code for the subtraction, but I have no way of knowing this. Now, I'm going to anticipate what I think will be your next argument, which is that at some point I must know the size of the result since I am assigning the result of size() to some interger variable eventually. Which is true, however, if the size of that eventual variable is smaller than a pointer, then I want to check it for overflow before I do the assignment. I don't want to just do a blind bitcast and have the top bits be lopped off. The problem of checking for overflow when assigning from an integer of unknown size to an integer of known size is left as an exercise for the reader.> -Chris >