On Wed, Oct 15, 2008 at 11:09 PM, Mike Stump <mrs at apple.com> wrote:> On Oct 15, 2008, at 10:11 PM, Chris Lattner wrote: >> Eli, I don't disagree with you on any specific detail here. I think >> there are decent solutions to this if anyone cares enough. My only >> point is that this is an existing problem with other compilers. On >> darwin, for example, x and y can get the same address, because x and y >> end up in the 'cstring' section which is coalesced by the linker: >> >> static const char x[] = "foo"; >> static const char y[] = "foo"; >> void *X() { return x; } >> void *Y() { return y; } >> >> This is clearly invalid, and a well known problem. I agree that >> neither LLVM nor GCC should not do this, however, noone has cared >> enough to fix it yet. If anyone cares enough to do so, I'm happy to >> help discuss various design points: I don't think this is very >> difficult. > > Actually, we do this on purpose. It is a huge win for us and we don't > want to not do this, even if that means not exactly conforming to the > standard.Hmm... so the issue is that it's good for codesize to merge objects with static duration that are marked in the source as const in C/ObjC/C++, even when we can't prove it's correct? That sounds generally reasonable, although it would be mildly surprising for anyone coding according to the standard. It doesn't really change the issue, though; we want the merging to be a front-end option, and we still need a solution which handles variables that gets marked by the optimizer. -Eli
On Oct 16, 2008, at 1:50 AM, Eli Friedman wrote:> Hmm... so the issue is that it's good for codesize to merge objects > with static duration that are marked in the source as const in > C/ObjC/C++, even when we can't prove it's correct?Yes.> That sounds generally reasonable, although it would be mildly > surprising for anyone coding according to the standard.Apparently no. :-) In practice, I've seen one bug report for it in 5 years.> It doesn't really change the issue, though; we want the merging to be > a front-end option, and we still need a solution which handles > variables that gets marked by the optimizer.I think so. If we could get C/C++ to just bless merging and then just support that and ignore legacy standards and legacy code, we might be able to leave it as is.
On Thu, Oct 16, 2008 at 12:35 PM, Mike Stump <mrs at apple.com> wrote:>> It doesn't really change the issue, though; we want the merging to be >> a front-end option, and we still need a solution which handles >> variables that gets marked by the optimizer. > > I think so. If we could get C/C++ to just bless merging and then just > support that and ignore legacy standards and legacy code, we might be > able to leave it as is.The only allowance I can think of that's general enough to allow everything LLVM knows how to do at the moment is "the result of an equality comparison between two pointers to objects with distinct base objects is undefined". (See http://lists.cs.uiuc.edu/pipermail/llvmdev/2008-October/017769.html and http://lists.cs.uiuc.edu/pipermail/llvmdev/2008-October/017747.html for the examples I'm thinking of.) I strongly doubt we can get away with that. Here's a more concrete version of the solution I'm proposing: we add a new optional marking to constant globals, say "mergeable". There are two reasonable semantics: one is that the result of equality comparisons of a pointer into this global with a pointer into any other similarly marked global is undefined. This is actually slightly more aggressive than what the current standard seems to allow even for string constants, but it seems reasonable. The more conservative definition of the semantics is just to say that the compiler chooses whether any pair of mergeable globals are distinct objects, which is roughly how the current C99 standard defines string merging. This has the following effects on current optimizers: constmerge only merges globals marked mergeable. Only mergeable constants are emitted into mergeable sections in assembly. If we use the conservative definition of mergeable, fix any code that assumes distinct globals don't get merged, like the code that folds equality comparisons between distinct globals to false. Optionally, add a new optimization step which marks constants as mergable when their address isn't taken. -Eli