As a side effect of bug 178 (Stacker not handling 64-bit pointers on Solaris), I got thinking about a union type for LLVM. Is there any good reason that LLVM shouldn't support unions? This is essentially a structure that has its members all at the same address rather than at sequential addresses. I know there are various issues with unions (alignment, etc.) but wouldn't it make sense to provide a union type that deals with all those issues in a platform independent way? The reason this comes up is because the idiom of saving space by using a memory location for storing different types of data is quite frequent. For example, suppose you want to store both an "int" and a "char*" in a single slot in an array. Each slot can have only one or the other type of value at any given time. There are three ways to do this: structure, casting, or union: 1: % foo = type { int, char* }; 2: % foo = type { int }; 3: % foo = union { int, char* }; Number 3 doesn't exist in LLVM and is what I'm proposing. In the first case, we incur a memory object that has both an int and a char* with non-overlapping sequential addresses. This wastes space since both the int and the char* will never be concurrently used. In the second case we have just an int that could be casted to a char* but that might cause undefined results if the size of a char* is larger than the size of an int. The third option, union, is the compromise. It says, "make the memory object as large as the largest element but have all elements start at (or near) the same memory address". The "or near" part is necessary because alignment rules might cause one of the members to start at a non-zero offset from the start of the memory object. In my particular case in Stacker, I have tried to do something like: %foo = global [10 x int]; void %func() { %int_ptr = getelementptr [10 x int]* %foo, long 0, long 0; %int_val = load int* %int_ptr; %char_ptr = cast int %int_val to char*; %oops = load char* %char_ptr; } The above will probably work on a 32-bit platform where pointers are the same size as int. However, on a 64-bit platform, a pointer is the same size as a long and the value retrieved from the array would actually span two entries in the array, one of which could have been corrupted by a previous write of an integer into the array. Yes, I know, I should have chosen the pointer type as the basis for the array .. but then, it wouldn't work reliably on an 8086 with segmented memory model :) While various tests for word sizes and alignment rules could be used, this problem is _gracefully_ handled by unions. To rewrite the example above we would use something like: %a_union = union { int, char* }; %foo = global [ 10 x %a_union ]; void %func() { %int_ptr = getelementptr [10 x %a_union]* %foo, long 0, long 0, ubyte 0; %int_val = load int* %int_ptr; %char_ptr = getelementptr [10 x %a_union]* %foo, long 0, long 0, ubyte 1; %good = load char* %char_ptr; } This effectively does the same thing as the first but the union takes care of the word length and alignment issues for us. If anyone thinks that unions are bad ideas, I challenge you to create a computer that doesn't support an OR operation. For data structures, unions fill the same role: structures are AND, unions are OR. Unions only get dicey when they are incorrectly disambiguated .. but that's a source language compiler writer's problem. What think ye? Reid. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20031219/05a75823/attachment.sig>
On Fri, 19 Dec 2003, Reid Spencer wrote:> As a side effect of bug 178 (Stacker not handling 64-bit pointers on > Solaris), I got thinking about a union type for LLVM. Is there any > good reason that LLVM shouldn't support unions? This is essentially a > structure that has its members all at the same address rather than at > sequential addresses. I know there are various issues with unions > (alignment, etc.) but wouldn't it make sense to provide a union type > that deals with all those issues in a platform independent way?This is intentionally not part of the LLVM type-system, because it is redundant. If you compile a C program that uses a union, for example, the C front-end will turn it into a type (often a structure) that contains only one of the element types (usually the largest one, perhaps modified to have the correct alignment). To access the other "parts" of the union, LLVM casts are inserted to convert it to the appropriate type.> 3: % foo = union { int, char* }; > > Number 3 doesn't exist in LLVM and is what I'm proposing.A union type is not needed if you encode some simple properties of the target (like the pointer size) into the bytecode file, which we do with the C/C++ front-end. The only question then is how to make _portable_ bytecode files with "unions". I'm not really sure what the answer is here. I would really like to avoid adding a new union type, as it is not needed at the LLVM level, and it seems like high-level languages can map source-level unions onto existing LLVM operations. In Stacker, for example, would this really solve the problem? You could, for example, write a program that pushes a pointer, then pops off an int. This would work fine on a 32-bit target, but obviously not on a 64-bit one. Since stacker doesn't "protect" its end users (in a memory safety sense), I think that a 64-bit stacker target should just push 64-bit pointers like it would push 64 bit integer types: just take up two slots. Is there anything wrong with this approach? Using a union for the stacker stack on a 64-bit machine would waste a ton of space when integers are pushed.> While various tests for word sizes and alignment rules could be used, > this problem is _gracefully_ handled by unions. To rewrite the exampleThe problem with adding unions is that it would require modifying _all of the LLVM code_ that looks at the type-system, and it doesn't seem like it gives us anything fundamentally new (like a vector type would). Also, forcing the front-end to generate casts is an important feature of the LLVM type-system: it makes it obvious that something non-type-safe is happening.> If anyone thinks that unions are bad ideas, I challenge you to create a > computer that doesn't support an OR operation. For data structures, > unions fill the same role: structures are AND, unions are OR. Unions > only get dicey when they are incorrectly disambiguated .. but that's a > source language compiler writer's problem.The problem isn't that we can't effectively represent this, the problem is that it's not clear what the best way to do it is. :) -Chris -- http://llvm.cs.uiuc.edu/ http://www.nondot.org/~sabre/Projects/
On Fri, 2003-12-19 at 15:32, Chris Lattner wrote:> This is intentionally not part of the LLVM type-system, because it is > redundant. If you compile a C program that uses a union, for example, the > C front-end will turn it into a type (often a structure) that contains > only one of the element types (usually the largest one, perhaps modified > to have the correct alignment). > > To access the other "parts" of the union, LLVM casts are inserted to > convert it to the appropriate type.Okay, that's fair. IMO a union type would make compiler writing easier, but I understand the minimalist approach that LLVM needs to maintain.> A union type is not needed if you encode some simple properties of the > target (like the pointer size) into the bytecode file, which we do with > the C/C++ front-end. The only question then is how to make _portable_ > bytecode files with "unions". I'm not really sure what the answer is > here.Me either :(> I would really like to avoid adding a new union type, as it is not needed > at the LLVM level, and it seems like high-level languages can map > source-level unions onto existing LLVM operations. In Stacker, for > example, would this really solve the problem? You could, for example, > write a program that pushes a pointer, then pops off an int. This would > work fine on a 32-bit target, but obviously not on a 64-bit one.Stack mistakes resulting from the Stacker source, as in this case, are the problem of the Stacker programmer, not the compiler. Its possible but not valid to do the operation you suggested. If you push a pointer, you should pop it with something that expects a pointer and knows how to use it correctly. The issue in my mind is by how much one increments the stack index when a pointer is pushed. The answer in the current implementation is always "1". That works fine on a 32-bit platform because the stack array element is 32-bits (int). Both ints and pointers fit in 32-bits and incrementing the index by 1 moves the index by 32-bits. When you move to a machine that has 64-bit pointers and 32-bit ints, then this needs to change so that the index is incremented by 2 when a pointer is pushed. There's a number of ways to solve this, the union type is one of them but I understand your reasons for not wanting it in the LLVM Assembly language.> > Since stacker doesn't "protect" its end users (in a memory safety sense), > I think that a 64-bit stacker target should just push 64-bit pointers like > it would push 64 bit integer types: just take up two slots. Is there > anything wrong with this approach?Nope. In fact, what I think I'll implement is just a 64-bit stack. That is, the base type of the stack will be "long" instead of int. LLVM assures me that this is 64-bits. It is large enough to hold a pointer on all supported platforms. By doing this, I don't have to mess with the index increment, its still always 1. This also has the added advantage of increasing the range of integer values and better supporting floating point should that become a future feature.> Using a union for the stacker stack on > a 64-bit machine would waste a ton of space when integers are pushed.Why? The union would still be 64-bits long. If we increase the integer value size to 64-bits it won't be a waste, it'll be a "feature" :)> The problem with adding unions is that it would require modifying _all of > the LLVM code_ that looks at the type-system, and it doesn't seem like it > gives us anything fundamentally new (like a vector type would). Also, > forcing the front-end to generate casts is an important feature of the > LLVM type-system: it makes it obvious that something non-type-safe is > happening.Okay! I'm convinced! Reid. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20031219/e8b3851c/attachment.sig>