On Tue, Mar 12, 2013 at 7:21 PM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote:> On 3/12/2013 12:13 PM, Manman Ren wrote: >> >> Given >> struct A { >> int x; >> int y; >> }; >> struct B { >> A a; >> int z; >> }; >> struct C { >> B b1; >> B b2; >> }; >> struct D { >> C c; >> }; >> >> with struct-access-path aware TBAA, C::b1.a.x does not alias with >> D::c.b2.a.x. >> without it, the 2 scalar accesses can alias since both have int type. > > I browsed the 2012 standard for a while and I didn't see anything that would > make this illegal: > > char *p = malloc(enough_bytes); > intptr_t x = reinterpret_cast<intptr_t>(p); > x += offsetof(C, b2); > D &vd = *reinterpret_cast<D*>(p); > C &vc = *reinterpret_cast<C*>(x); > vd.c.b2.a.x = 1; // ..accessing the same > int t = vc.b1.a.x; // ..storage > > I don't think that the path through the type structure is really sufficient.There are simpler examples of this kind for C++, because placement new can change the dynamic type of the object (I actually haven't looked to see if they changed this in 2012, but it was definitely legal in C++98): #include <new> struct Foo { long i; }; struct Bar { void *p; }; long foo(int n) { Foo *f = new Foo; f->i = 1; for (int i=0; i<n; ++i) { Bar *b = new (f) Bar; b->p = 0; f = new (f) Foo; f->i = i; } return f->i; } Both access to the same memory, both will end up with completely different access paths, both legal by TBAA rules, but access path alone will claim no-alias. (This is taken from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29286)
Someone privately asked me to explain this example, so here goes ...> There are simpler examples of this kind for C++, because placement > new can change the dynamic type of the object (I actually haven't > looked to see if they changed this in 2012, but it was definitely > legal in C++98): > > #include <new> > struct Foo { long i; }; > struct Bar { void *p; }; > long foo(int n) > { > Foo *f = new Foo; > f->i = 1; > for (int i=0; i<n; ++i) > { > Bar *b = new (f) Bar; > b->p = 0; > f = new (f) Foo; > f->i = i; > } > return f->i; > } > > Both access to the same memory, both will end up with completely > different access paths, both legal by TBAA rules, but access path > alone will claim no-alias. > > (This is taken from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29286)At the language level, both f and b are not the same "object". The lifetime of b ends with the f = new (f) Foo. This is actually irrelevant at the IR level, however. What this transforms into at the IR level is something like this. I've elided the fact that it will really start out one level indirected, and then get promoted to regs, and pretended we have a load/store at offset instead of getelementptr. #include <new> struct Foo { long i; }; struct Bar { void *p; }; long foo(int n) { Foo *f1 = call operator new store value 1 into offset 0 of f for (int i=0; i<n; ++i) { Foo *f4 = phi(f1, f2) Bar *b = cast f4 to type Bar * store value 0 into offset 0 of b Foo *f2 = cast b to type Foo * store value i into offset of f2 } Foo *f3 = phi(f1, f2) temp = load offset 0 of f3 return temp; } If you annotate the loads/stores with access paths, these paths will say "no alias" if you ask if the stores alias. Thus, you will feel free to reorder the stores. If you reorder the loop stores, you will return 0 instead of n. The language guarantees the function to return n. The cause of this is simple: C++ provides a way to legally and dynamically change the TBAA type of a pointer. Without evaluating the placement new's, you can't know what that new type is. You also need what the thing you placement new'd pointed to to know what now legally shares memory, despite TBAA, and even if the objects in that memory do not share the same lifetime. At the IR level, it's all just memory, loads and stores, and if you are asked "do these pointers alias", the answer is "yes", and if you annotate them with TBAA paths, the paths, as described, this will cause you to say "no". FWIW: GCC does three things now in this case: 1. These are transformed into "change dynamic type expressions" 2. For the higher level IR, it unions the type's TBAA info and the location's points-to sets when it sees a "change dynamic type expression" 3. For the lower level, where this won't work, we assume the pointer can alias anything)
I personally believe that in the context of type-based AA, correctness is a subjective term:-). If the user smell something fishy, it is up to user to disable such opt, there is no other way around. TBAA is just to find the a sweet spot between precision & safeness. Unfortunately, in the context of TBAA, precision & safeness usually come at each other's expense... It would be nice if we can union the dynamic types of each elements in point-to set. However, we certainly can live without if the program being compiled is well typed. I don't think it is a dispensable tool for this kind of opt. For some nasty license issue, I'm not allowed to peruse and discuss gcc internals. I'm just using the gcc 4.6.3 binary shipped with Ubuntu to compile following snippet. I'm wondering why the p->x is promoted. In the light of "dynamic type", shouldn't p and q's point-to sets are "everything whose addr is taken". ------------------------------ typedef struct { int x; }T1; typedef struct { int y; }T2; int foo(T1 *p, T2 *q) { p->x = 1; q->y = 4; return p->x; } -------------------------- On 03/12/2013 11:09 PM, Daniel Berlin wrote:> Someone privately asked me to explain this example, so here goes ... > >> There are simpler examples of this kind for C++, because placement >> new can change the dynamic type of the object (I actually haven't >> looked to see if they changed this in 2012, but it was definitely >> legal in C++98): >> >> #include <new> >> struct Foo { long i; }; >> struct Bar { void *p; }; >> long foo(int n) >> { >> Foo *f = new Foo; >> f->i = 1; >> for (int i=0; i<n; ++i) >> { >> Bar *b = new (f) Bar; >> b->p = 0; >> f = new (f) Foo; >> f->i = i; >> } >> return f->i; >> } >> >> Both access to the same memory, both will end up with completely >> different access paths, both legal by TBAA rules, but access path >> alone will claim no-alias. >> >> (This is taken from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29286) > > At the language level, both f and b are not the same "object". The > lifetime of b ends with the f = new (f) Foo. > This is actually irrelevant at the IR level, however. > > What this transforms into at the IR level is something like this. > I've elided the fact that it will really start out one level > indirected, and then get promoted to regs, and pretended we have a > load/store at offset instead of getelementptr. > > > #include <new> > struct Foo { long i; }; > struct Bar { void *p; }; > long foo(int n) > { > Foo *f1 = call operator new > store value 1 into offset 0 of f > > for (int i=0; i<n; ++i) > { > Foo *f4 = phi(f1, f2) > Bar *b = cast f4 to type Bar * > store value 0 into offset 0 of b > Foo *f2 = cast b to type Foo * > store value i into offset of f2 > } > Foo *f3 = phi(f1, f2) > temp = load offset 0 of f3 > return temp; > } > > If you annotate the loads/stores with access paths, these paths will > say "no alias" if you ask if the stores alias. > Thus, you will feel free to reorder the stores. > If you reorder the loop stores, you will return 0 instead of n. > The language guarantees the function to return n. > > The cause of this is simple: C++ provides a way to legally and > dynamically change the TBAA type of a pointer. Without evaluating the > placement new's, you can't know what that new type is. You also need > what the thing you placement new'd pointed to to know what now legally > shares memory, despite TBAA, and even if the objects in that memory do > not share the same lifetime. > > At the IR level, it's all just memory, loads and stores, and if you > are asked "do these pointers alias", the answer is "yes", and if you > annotate them with TBAA paths, the paths, as described, this will > cause you to say "no". > > FWIW: GCC does three things now in this case: > 1. These are transformed into "change dynamic type expressions" > 2. For the higher level IR, it unions the type's TBAA info and the > location's points-to sets when it sees a "change dynamic type > expression" > 3. For the lower level, where this won't work, we assume the pointer > can alias anything) > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev