> > > > For two given access sequences we can determine if the accessed > objects are allowed to overlap by the rules of the input > language.Sadly, this is where this becomes "unlikely to want to use to replace TBAA", at least for me. It may still be a thing we want anyway. This scheme is really an encoding of C/C++ TBAA info so it can be read by LLVM and requires that *LLVM* have some set of rules that it enforces about that scheme. But that scheme is still very language specific in how it is used. GCC still has something in between this and LLVM, where the language rules are a bit encoded (but not as much as you have). We (and gcc) have deliberately avoided such schemes, in favor of transforming the info into abstract set trees that then tag loads and stores. The encoding of "struct path" tbaa, is just a way of trading space vs time in that encoding. We trade walking time for the space of transitive closure, etc. None of the TBAA in LLVM really has any *real* relation to the original type system rules, and that is deliberate. I've argued for years that calling it "TBAA" just badly confuses people, and i believe here is a good example :) So i don't think what you've written can be used to replace TBAA. However, i *do* believe it would be useful to further optimize C and C++ programs in LLVM by using access paths. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20170817/dc45cc46/attachment.html>
On 08/17/2017 04:49 PM, Daniel Berlin via llvm-dev wrote:> > > > For two given access sequences we can determine if the accessed > objects are allowed to overlap by the rules of the input > language. > > > Sadly, this is where this becomes "unlikely to want to use to replace > TBAA", at least for me. It may still be a thing we want anyway.The rules proposed by Ivan for handling C/C++ seem pretty generic. We generally explain our current TBAA rules by saying that they're generic but motivated by C/C++ rules. I could say the same thing about this proposed system with its proposed rules.> > This scheme is really an encoding of C/C++ TBAA info so it can be read > by LLVM and requires that *LLVM* have some set of rules that it > enforces about that scheme. > But that scheme is still very language specific in how it is used. > GCC still has something in between this and LLVM, where the language > rules are a bit encoded (but not as much as you have). > > We (and gcc) have deliberately avoided such schemes, in favor of > transforming the info into abstract set trees that then tag loads and > stores. > > The encoding of "struct path" tbaa, is just a way of trading space vs > time in that encoding. We trade walking time for the space of > transitive closure, etc.If the provided statistic of 15% holds up, maybe which way we go in this trade-off space doesn't matter much?> > None of the TBAA in LLVM really has any *real* relation to the > original type system rules, and that is deliberate. I've argued for > years that calling it "TBAA" just badly confuses people, and i believe > here is a good example :) > > So i don't think what you've written can be used to replace TBAA.So *if* we just take the proposed rules for C/C++ as the rules for the scheme in general, I'm not sure this is true. There is an isomorphism it seems (we could auto-upgrade even, if we'd like, by mapping the current offset value onto the field id of this scheme (we'd need to use a different root name, however, so everything would remain consistent under LTO)). -Hal> > However, i *do* believe it would be useful to further optimize C and > C++ programs in LLVM by using access paths. > > > > > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20170817/c2fe89e3/attachment.html>
On Thu, Aug 17, 2017 at 4:49 PM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 08/17/2017 04:49 PM, Daniel Berlin via llvm-dev wrote: > > >> >> For two given access sequences we can determine if the accessed >> objects are allowed to overlap by the rules of the input >> language. > > > Sadly, this is where this becomes "unlikely to want to use to replace > TBAA", at least for me. It may still be a thing we want anyway. > > > The rules proposed by Ivan for handling C/C++ seem pretty generic. >Uh? " [X...] is allowed to overlap with [S1..., X..., S2...] and the most generic access sequence is [X...]. [X1..., X2...] is allowed to overlap with [S1..., X1...] with the most generic access sequence to be [X1...]. [X1..., U, U::m1, X2...] is allowed to overlap with [S1..., X1..., U, U::m2, S2...] for a union U of an unknown effective type, provided m1 != m2 and the most generic access sequence is [X1..., U]. If neither of the given sequences contains the leading access type of the other, then they are allowed to overlap if the leading access type of one sequence is a direct or indirect field of the final access type of the other sequence and then the most generic access sequence consists of a single element, which is that final access type. For the purpose of determining whether one type is a direct or indirect member of another type unions are considered to have no members as accesses to members of unions are only allowed to overlap if they have the base union object explicitly specified." This sounds super specific to me. It talks about unions, and how they behave in C/C++, etc. It has initial sequence rules, etc. These would make no sense for "Rust", "Go", or "Haskell" at all. We generally explain our current TBAA rules by saying that they're generic> but motivated by C/C++ rules. >We do say that but that's not really what our implementation does in any way.> I could say the same thing about this proposed system with its proposed > rules. >It is, IMHO, nowhere near as generic.> >> This scheme is really an encoding of C/C++ TBAA info so it can be read by > LLVM and requires that *LLVM* have some set of rules that it enforces about > that scheme. > But that scheme is still very language specific in how it is used. > GCC still has something in between this and LLVM, where the language rules > are a bit encoded (but not as much as you have). > > We (and gcc) have deliberately avoided such schemes, in favor of > transforming the info into abstract set trees that then tag loads and > stores. > > The encoding of "struct path" tbaa, is just a way of trading space vs time > in that encoding. We trade walking time for the space of transitive > closure, etc. > > > If the provided statistic of 15% holds up, maybe which way we go in this > trade-off space doesn't matter much? >Sure, that part i'm indifferent about.> > > > None of the TBAA in LLVM really has any *real* relation to the original > type system rules, and that is deliberate. I've argued for years that > calling it "TBAA" just badly confuses people, and i believe here is a good > example :) > > So i don't think what you've written can be used to replace TBAA. > > > So *if* we just take the proposed rules for C/C++ as the rules for the > scheme in general, I'm not sure this is true. >Okay, let me try to rephrase a little more precisely: It is almost certainly true that we *could* make *any* particular lowering work, at different costs to the front ends and middle ends. IE i totally agree that the power of all of these systems (with small extensions) are equal. and also, for the record, i certainly think the rules/scheme described here would almost certainly produce better optimization results than what is currently implemented. *But* I don't see how you use this as a general scheme without ending up forcing the other frontends to lower things to look as if it was a C/C++ based type system. This is definitely *not* true today. They only have to lower things to look like a tree of sets. Having been down this path before, it's definitely easier to lower things to look like trees of sets than like groups of C/C++ structs/bitfields/etc that follow certain rules. This is one of the reasons I'm honestly having a lot of trouble seeing how this is is definitively better than what IBM's proposal was, which seemed both more generic and incremental, and handled most concerns. The only practical difference i see is that this just changes the scheme from one where the language rules are mostly in the front ends to one where the language rules are also partially in llvm. To me that isn't necessarily better, just different (heck, i'm even more used to that system because it's a push closer to gcc :P) So overall I don't see one as particularly better than the other. You seem to. So i'd really like to understand your perspective on this in terms of the pro/con that you are seeing. I mean, i'm really not even opposed (though others may be) in principle to saying "hey, let's just have a set of language specific tbaa passes with semi-common metadata". I'd just like to understand why we would change tradeoffs, etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20170817/478f35fa/attachment-0001.html>
Thanks Daniel, I've read through all your recent messages carefully and what you say is very interesting because it sounds like you know how to represent everything we need to support C/C++ TBAA rules in a more generic way that expresses things in terms of type identifiers and byte offsets. If that is correct, could you please consider this: struct A { int i, j; }; union { struct S1 { struct A a; } s1; struct S2 { struct A a; } s2; } u; For the definitions above, can you tell how representations of these accesses: 1) u.s1.a.i 2) u.s1.a.j // Does not overlap with #1. and this couple of accesses: 3) u.s1.a.i 4) u.s2.a.j // Does overlap with #3. can be differentiated in the approach you are talking about? Thanks, --
Daniel, Hal, I'm trying to figure out what we would need to do in terms of the current approach, accepting its limitations, to support unions, member arrays, aggregate accesses and type identifiers and what I get looks suspiciously simple. Can you please check it out and let me know if I'm missing something? For unions: TBAA, regardless of a specific approach, cannot guarantee correct answers for couples of accesses that involve pointers to union members. This means that in terms of TBAA two access paths that come through unions may overlap only if the most-enclosing unions in these paths are the same union. Furthermore, two accesses with the same base union type overlap if their offset ranges overlap. The access types do not matter. Otherwise, they are considered not to overlap and with the current approach we can't say better. This sounds like during scanning through type trees in TypeBasedAAResult::Aliases() we should just stop at the first union type we have run into and see if: a) that union is the base type of the other access and b) the offset ranges overlap. No need to traverse through any union members. Similarly, in MDNode::getMostGenericTBAA() we do not consider nodes that represent union-enclosed types. This means the most generic type may be an aggregate or a union, in which case we return null node unless aggreagte accesses are supported. To distinct unions from other types we can have a special group for them. Alternatively, we can use type identification nodes, see below. For aggregate accesses: The comment in TypeBasedAliasAnalysis.cpp suggests: // TODO: We need to check if AccessType of TagA encloses AccessType of // TagB to support aggregate AccessType. If yes, return true. Consider these two accesses: struct A { int i; }; struct B { struct A a; } *b; struct X { int i; } *x; b->a x->i Following the rule in the comment, the accesses are allowed to overlap, which is wrong. Instead, we should check if the access type of one access encloses the base type of the other. And we should only check this if neither base type encloses the other. This relies on traversing through types, but again, we shall never follow union-enclosed type nodes as for them to potentially overlap we require explicit base specification. For member arrays: There are no accesses to arrays, only accesses to array elements. In terms of TBAA, an access to an array element is no different than an access to an object of the element type. And an access to a member array element is no different than an access to a member of the element type. All this provided offsets remain the same. The overall array size does not matter because structure members are known not to overlap and we never differentiate between union members. For type identifiers: We can add a language-specific node to type metadata tuples that guarantees uniqueness of type nodes that are not interchangeable in terms of TBAA. This is supposed to resolve issues like this one: bugs.llvm.org/show_bug.cgi?id=8572 Thanks, --