Daniel, Hal, I'm trying to figure out what we would need to do in terms of the current approach, accepting its limitations, to support unions, member arrays, aggregate accesses and type identifiers and what I get looks suspiciously simple. Can you please check it out and let me know if I'm missing something? For unions: TBAA, regardless of a specific approach, cannot guarantee correct answers for couples of accesses that involve pointers to union members. This means that in terms of TBAA two access paths that come through unions may overlap only if the most-enclosing unions in these paths are the same union. Furthermore, two accesses with the same base union type overlap if their offset ranges overlap. The access types do not matter. Otherwise, they are considered not to overlap and with the current approach we can't say better. This sounds like during scanning through type trees in TypeBasedAAResult::Aliases() we should just stop at the first union type we have run into and see if: a) that union is the base type of the other access and b) the offset ranges overlap. No need to traverse through any union members. Similarly, in MDNode::getMostGenericTBAA() we do not consider nodes that represent union-enclosed types. This means the most generic type may be an aggregate or a union, in which case we return null node unless aggreagte accesses are supported. To distinct unions from other types we can have a special group for them. Alternatively, we can use type identification nodes, see below. For aggregate accesses: The comment in TypeBasedAliasAnalysis.cpp suggests: // TODO: We need to check if AccessType of TagA encloses AccessType of // TagB to support aggregate AccessType. If yes, return true. Consider these two accesses: struct A { int i; }; struct B { struct A a; } *b; struct X { int i; } *x; b->a x->i Following the rule in the comment, the accesses are allowed to overlap, which is wrong. Instead, we should check if the access type of one access encloses the base type of the other. And we should only check this if neither base type encloses the other. This relies on traversing through types, but again, we shall never follow union-enclosed type nodes as for them to potentially overlap we require explicit base specification. For member arrays: There are no accesses to arrays, only accesses to array elements. In terms of TBAA, an access to an array element is no different than an access to an object of the element type. And an access to a member array element is no different than an access to a member of the element type. All this provided offsets remain the same. The overall array size does not matter because structure members are known not to overlap and we never differentiate between union members. For type identifiers: We can add a language-specific node to type metadata tuples that guarantees uniqueness of type nodes that are not interchangeable in terms of TBAA. This is supposed to resolve issues like this one: https://bugs.llvm.org/show_bug.cgi?id=8572 Thanks, --
On Sat, Aug 19, 2017 at 6:29 AM, Ivan A. Kosarev via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Daniel, Hal, > > I'm trying to figure out what we would need to do in terms of the current > approach, accepting its limitations,We don't have to accept the limitations? We already know it's pretty trivial to extend it, and we aren't against extending it at all.> to support unions, member arrays, aggregate accessesand type identifiers and what I get looks suspiciously simple.> Can you please check it out and let me know if I'm missing something? >> > For aggregate accesses: > > The comment in TypeBasedAliasAnalysis.cpp suggests: > > // TODO: We need to check if AccessType of TagA encloses AccessType of > // TagB to support aggregate AccessType. If yes, return true. > > Consider these two accesses: > > struct A { int i; }; > struct B { struct A a; } *b; > > struct X { int i; } *x; > > b->a > x->i > > Following the rule in the comment, the accesses are allowed to overlap, > which is wrong. >i don't believe this to be wrong in terms of the C/C++ TBAA rules. C rule:> An object shall have its stored value accessed only by an lvalue > expression that has one of the following types: > > - a type compatible with the effective type of the object, > > > - a qualified version of a type compatible with the effective type of > the object, > > > - a type that is the signed or unsigned type corresponding to the > effective type of the object, > > > - a type that is the signed or unsigned type corresponding to a > qualified version of the effective type of the object, > > > - an aggregate or union type that includes one of the aforementioned > types among its members (including, recursively, a member of a subaggregate > or contained union), or > > > - a character type. > >This is an aggregate type that includes a type compatible with the effective type of the object. In particular, x->i is an lvalue expression of type "int" b->a is an lvalue expression of type "struct A" "struct A" is an aggregate type that includes "int" among its members. Therefore, the b->a object may access x->i by TBAA These rules are not based on the name of the type, only the types contained in the structs. TBAA cannot distinguish types solely based on name. C++ is similar: 3.10.10> If a program attempts to access the stored value of an object through a > glvalue of other than one of the > following types the behavior is undefined:54 > — the dynamic type of the object, > — a cv-qualified version of the dynamic type of the object, > — a type similar (as defined in 4.4 ) to the dynamic type of the object, > — a type that is the signed or unsigned type corresponding to the dynamic > type of the object, > — a type that is the signed or unsigned type corresponding to a > cv-qualified version of the dynamic type > of the object, > — an aggregate or union type that includes one of the aforementioned types > among its elements or nonstatic > data members (including, recursively, an element or non-static data member > of a subaggregate > or contained union), > — a type that is a (possibly cv-qualified) base class type of the dynamic > type of the object, > — a char or unsigned char type.This is not going to end up with a different result in this case. Do you agree? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170819/2158b238/attachment.html>
(and note: It's not clear GCC agrees with my interpretation. I can get it to seem to answer both ways. Depending on what i do, and looking at optimizer dumps, i can get it to ignore the access to x->i, and assume it has no effect on b->a, *and* get it to assume it does, and cause reloads of x->i) +Richard in case he has any thoughts. On Sat, Aug 19, 2017 at 9:00 AM, Daniel Berlin <dberlin at dberlin.org> wrote:> > > On Sat, Aug 19, 2017 at 6:29 AM, Ivan A. Kosarev via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Daniel, Hal, >> >> I'm trying to figure out what we would need to do in terms of the current >> approach, accepting its limitations, > > > We don't have to accept the limitations? > We already know it's pretty trivial to extend it, and we aren't against > extending it at all. > > >> to support unions, member arrays, aggregate accesses > > and type identifiers and what I get looks suspiciously simple. > > > > >> Can you please check it out and let me know if I'm missing something? >> > >> >> For aggregate accesses: >> >> The comment in TypeBasedAliasAnalysis.cpp suggests: >> >> // TODO: We need to check if AccessType of TagA encloses AccessType of >> // TagB to support aggregate AccessType. If yes, return true. >> >> Consider these two accesses: >> >> struct A { int i; }; >> struct B { struct A a; } *b; >> >> struct X { int i; } *x; >> >> b->a >> x->i >> >> Following the rule in the comment, the accesses are allowed to overlap, >> which is wrong. >> > > i don't believe this to be wrong in terms of the C/C++ TBAA rules. > > C rule: > >> An object shall have its stored value accessed only by an lvalue >> expression that has one of the following types: >> >> - a type compatible with the effective type of the object, >> >> >> - a qualified version of a type compatible with the effective type of >> the object, >> >> >> - a type that is the signed or unsigned type corresponding to the >> effective type of the object, >> >> >> - a type that is the signed or unsigned type corresponding to a >> qualified version of the effective type of the object, >> >> >> - an aggregate or union type that includes one of the aforementioned >> types among its members (including, recursively, a member of a subaggregate >> or contained union), or >> >> >> - a character type. >> >> > This is an aggregate type that includes a type compatible with the > effective type of the object. In particular, > x->i is an lvalue expression of type "int" > b->a is an lvalue expression of type "struct A" > "struct A" is an aggregate type that includes "int" among its members. > Therefore, the b->a object may access x->i by TBAA > > These rules are not based on the name of the type, only the types > contained in the structs. > > TBAA cannot distinguish types solely based on name. > > C++ is similar: > > 3.10.10 > >> If a program attempts to access the stored value of an object through a >> glvalue of other than one of the >> following types the behavior is undefined:54 >> — the dynamic type of the object, >> — a cv-qualified version of the dynamic type of the object, >> — a type similar (as defined in 4.4 ) to the dynamic type of the object, >> — a type that is the signed or unsigned type corresponding to the dynamic >> type of the object, >> — a type that is the signed or unsigned type corresponding to a >> cv-qualified version of the dynamic type >> of the object, >> — an aggregate or union type that includes one of the aforementioned >> types among its elements or nonstatic >> data members (including, recursively, an element or non-static data >> member of a subaggregate >> or contained union), >> — a type that is a (possibly cv-qualified) base class type of the dynamic >> type of the object, >> — a char or unsigned char type. > > > > This is not going to end up with a different result in this case. > > Do you agree? > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170819/7ededbee/attachment.html>
Daniel, > This is an aggregate type that includes a type compatible > with the effective type of the object. In particular, > x->i is an lvalue expression of type "int" > b->a is an lvalue expression of type "struct A" > "struct A" is an aggregate type that includes "int" among > its members. > Therefore, the b->a object may access x->i by TBAA My understanding is in x->i there are two accesses by lvalue, of which the first one is (*x) and that lvalue is of type that is not compatible with [any part of] struct A. Would you agree? Thanks, -- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170819/93cb0594/attachment.html>