On Wed, Oct 15, 2008 at 8:28 AM, Chris Lattner <clattner at apple.com> wrote:>> I do think however that it's bit dangerous to combine static constants >> across compilation units. > > GCC does the same things with strings in some cases. You shouldn't > depend on this behavior if you want portable code.Combining is explicitly allowed for strings in C: 6.5.2.5p8: "String literals, and compound literals with const-qualified types, need not designate distinct objects." This isn't allowed for distinct declarations. 6.5.9p6: "Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space." 6.2.4: "An object exists, has a constant address, and retains its last-stored value throughout its lifetime." 6.7: "A definition of an identifier is a declaration for that identifier that: — for an object, causes storage to be reserved for that object;" There isn't any other reasonable interpretation of the standard. Also, the only semantics that "const" has in C is that writing to an object with a const-qualified type is illegal.> If you avoid > marking the global variable const, you should have better luck.Not marking the variable const only makes the problem more obscure. Testcase in C: static char x = 1, y = 1;int c() {char* u = &x; char* v = &y; return u == v;} int d() {return x+y;} Running this through "llvm-gcc -O0 -emit-llvm -c -x c - -o - | opt -mem2reg -globalopt -constmerge -instcombine" has precisely the same effect: c returns 1. It's conceivable that globalopt+constmerge could do even crazier stuff. Potential example: suppose we have two mallocs, and store the allocated pointers into globals. GlobalOpt knows how to turn mallocs into statically allocated globals. Then suppose there's exactly one store to each of these mallocs: GlobalOpt knows how to turn these into constant globals. Then, ConstMerge or the AsmPrinter will actually merge them, so the computed address ends up being the same. Therefore, we conclude that malloc(1) == malloc(1) can be true in some situations! Now, this doesn't actually work because malloc elimination transformation isn't quite aggressive enough, but making this work wouldn't require any controversial changes. This bug actually manifests itself in two places: one is ConstantMerge, the other is the AsmPrinter. It's non-trivial to fix because it's really a design bug: we assume that constant==mergeable, which simply isn't true. There are a few different ways of fixing this; however, I think the only real option is to add a new "mergeable" linkage type. -Eli
On Oct 15, 2008, at 7:34 PM, Eli Friedman wrote:> This bug actually manifests itself in two places: one is > ConstantMerge, the other is the AsmPrinter. It's non-trivial to fix > because it's really a design bug: we assume that constant==mergeable, > which simply isn't true. There are a few different ways of fixing > this; however, I think the only real option is to add a new > "mergeable" linkage type.Eli, I don't disagree with you on any specific detail here. I think there are decent solutions to this if anyone cares enough. My only point is that this is an existing problem with other compilers. On darwin, for example, x and y can get the same address, because x and y end up in the 'cstring' section which is coalesced by the linker: static const char x[] = "foo"; static const char y[] = "foo"; void *X() { return x; } void *Y() { return y; } This is clearly invalid, and a well known problem. I agree that neither LLVM nor GCC should not do this, however, noone has cared enough to fix it yet. If anyone cares enough to do so, I'm happy to help discuss various design points: I don't think this is very difficult. -Chris
On Oct 15, 2008, at 10:11 PM, Chris Lattner wrote:> Eli, I don't disagree with you on any specific detail here. I think > there are decent solutions to this if anyone cares enough. My only > point is that this is an existing problem with other compilers. On > darwin, for example, x and y can get the same address, because x and y > end up in the 'cstring' section which is coalesced by the linker: > > static const char x[] = "foo"; > static const char y[] = "foo"; > void *X() { return x; } > void *Y() { return y; } > > This is clearly invalid, and a well known problem. I agree that > neither LLVM nor GCC should not do this, however, noone has cared > enough to fix it yet. If anyone cares enough to do so, I'm happy to > help discuss various design points: I don't think this is very > difficult.Actually, we do this on purpose. It is a huge win for us and we don't want to not do this, even if that means not exactly conforming to the standard. Essentially, we wish the standard had chosen otherwise. gcc handles this choice like this: @item -fmerge-all-constants Attempt to merge identical constants and identical variables. This option implies @option{-fmerge-constants}. In addition to @option{-fmerge-constants} this considers e.g.@: even constant initialized arrays or initialized constant variables with integral or floating point types. Languages like C or C++ require each non-automatic variable to have distinct location, so using this option will result in non- conforming behavior. In reality, we should handle this by setting flag_merge_constants = 2 in darwin.c, and documenting it. All code that does merging should be testing flag_merge_constants and respecting it. People that don't want merging are then free to turn it off. It is this experience that gives rise to the idea that changing the standard to merge would not be a bad idea. If they did this, then we wouldn't even need the flag, well, wouldn't need it except for legacy code. Internally, one would still want it to support previous language standards.