On 08/16/2017 01:59 PM, Ivan A. Kosarev via llvm-dev wrote:> Hal, Daniel, > > Thanks for your responses. Here's a quick formal introduction to the > proposed approach follows. I also attached a couple files to put some > visibility on implementation details. We'd love to hear from you > gentlemen and LLVM community in general on how this fits what you know > about TBAA. > > > Overview > =======> > With this writing we propose a new approach to the TBAA mechanism > designed to overcome current issues such as: > - inability to represent accesses to aggregates, unions, union > members, bit fieldsWhat exactly do you do, or plan to do, for bit fields? A single load/store can access more than one field. This is true for other kinds of things too, but, in particularly common for bit fields. (see also my comment below about !tbaa.struct).> , fields of union and aggregate types, > including array members and > - lack of a mean to identify user types defined in different > translation units. > > As of today, we have a local patch that implements this approach. > This new implementation is known to be at least as functionally > complete as the current one. Additionally, with this patch on top > we improved SROA to propagate TBAA information, thus making sure > the new features work as expected. > > We should be able to upload that patch for review in a few days.Do you have a Clang patch as well?> > > The Approach > ===========> > With the proposed approach we represent accesses as sequences > that contain all accessed types and fields in order. For example > for this access: > > struct T { union U { struct S { int i1, i2; } s1, s2; } u1, u2; } t; > t.u1.s1.i1 > > we generate an access sequence of the form: > > [T, T::u1, U, U::s1, S, S::i1, int] > > An array is allowed to overlap with any object of its element > type, including other arrays of the same element type, so an > access to an array element is represented as an access to an > object of the element type: > > int a[7]; > a[5] > > [int] > > In case of a multi-dimensional array this rule applies > recursively: > > int a[7][9]; > a[3][5] > > [int] > > For member arrays we specify the member field as usual so we can > distinct it from other array members: > > struct S { int a[7], b[9]; } s; > s.a[5] > > [S, S::a, int] > > Similarly to the scalar and struct-path approaches, we consider > every type to be a member of a type group it explicitly refers > to. Here's how the tree that describes relations between type > groups would look like for the example above: > > <tbaa_root> > |- <may_alias> > |- <representation_byte> > |-<structure> > | |- S > |- int > > The <vtable_pointer> group has a special meaning and is used to > describe accesses to virtual table pointers.As in the current scheme, this group is a peer to <may_alias> above?> Similarly, the > <union> type group includes all union types and used by the TBAA > implementation to distinct union types from other types. The > <may_alias> group is technically equivalent to > <representation_byte> and supposed to be a group for > may_alias-marked types.Of all of these groups above, only <union> has special semantics, correct?> > For two given access sequences we can determine if the accessed > objects are allowed to overlap by the rules of the input > language. Here's the list of rules complete enough to support C > and C++. Ellipsis elements denote sequences of zero or more > elements.Which is presumed to match in both accesses, correct?> For other input languages more rules can be supported, > if necessary. > > [X...] is allowed to overlap with [S1..., X..., S2...] > and the most generic access sequence is [X...]. > > [X1..., X2...] is allowed to overlap with [S1..., X1...] > with the most generic access sequence to be [X1...]. > > [X1..., U, U::m1, X2...] is allowed to overlap with > [S1..., X1..., U, U::m2, S2...] > for a union U of an unknown effective type, provided m1 != m2 > and the most generic access sequence is [X1..., U]. > > If neither of the given sequences contains the leading access > type of the other, then they are allowed to overlap if the > leading access type of one sequence is a direct or indirect field > of the final access type of the other sequence and then the most > generic access sequence consists of a single element, which is > that final access type.I'm not following the above paragraph. Can you please explain from where this comes?> > For the purpose of determining whether one type is a direct or > indirect member of another type unions are considered to have no > members as accesses to members of unions are only allowed to > overlap if they have the base union object explicitly specified. > > Otherwise, given sequences overlap if there is a type group that > includes both the leading access types and the most generic > access sequence consists of the smallest common type group as its > only element. > > See the attached TypeBasedAliasAnalysis.cpp file and specifically > the MatchAccessSequences() function for how these rules can be > implemented. > > TBAA information is encoded as metadata nodes, as usual. Load and > store instructions refer to access sequences: > store %struct.T* %p, %struct.T** %p.addr, align 8, !tbaa !11 > > A type node is either a terminal type node that names a root type > group: > !0 = !{ !"<tbaa_root>" } > > or a non-terminal type node that names a type and refers to a > type group it belongs to: > !1 = !{ !0, !"int" } > > Record types also refer to their field descriptors: > !3 = !{ !0, !"S", !9 } > > An access node is either a terminal access node that refers to > the corresponding access type: > !5 = !{ !1 } > !9 = !{ !3 } > > or a member node that refers to a structure/class or union field > descriptor and a subsequent access path node: > !7 = !{ !type_group, !field_id, !field_offset, !field_size } > !11 = !{ !5, !9, !7 } > > For a field node the first element refers to its type. The > purpose of other elements is to make the field node unique. Their > meaning is unspecified. Currently the other members for C and C++ > are the field name, bit offset and bit size of the member, butI suspect that we'll want to define a meaning to the size/offset fields. One thing to think about is that, today, we also have !tbaa.struct metadata. This is put on memcpy, etc. to indicate what fields are defined (the rest is padding which the memcpy does not need to actually copy). This is somewhat akin to the bitfield problem (i.e. a single load can access multiple fields in the bitfield). If we can represent that, maybe we can use the same thing to replace !tbaa.struct as well. We could specify size/offset in bits to handle bitfields.> this may change in future and front ends for other input > languages may act differently, so TBAA implementation in the > codegen shall not rely on specific shape or meaning of these > elements. > > For types that are interchangeable for purposes of TBAA it is > important to encode them identically so that descriptors of > interchangeable types defined in different modules merge into > same metadata nodes. > > Structure/class fields are specified in the order of declaration. > For union fields there is a canonical order that guarantee that > definitions of the same union type will result in identical > descriptors regardless of the order of member declarations. > Currently we sort union fields with key (field_id, field_offset, > field_size). > > C++ tagged types with no linkage are to be encoded as "distinct" > nodes to guarantee their uniqueness. > > The support for !tbaa.struct information is to be replaced with > plain !tbaa tags representing accesses to the corresponding > record types. > > Another attached file, the tbaa.cpp one, is a test case that can > give an idea what encoded TBAA metadata may look like. > > > Space and Performance Analysis > =============================> > In terms of metadata size, with the new approach we generate > about 15% more of metadata nodes. The ratio of the total number > of TBAA nodes to the amount of code remains very low, meaning the > size of TBAA metadata is unlikely to be a problem any time soon. > > From the performance perspective, the proposed approach differs > from the current one in that: > - it does not traverse through record types if one of the access > sequences to match contains the leading access type of the > other and > - it never traverses union types. > > We thus expect that the new approach is at least as efficient as > the current one. Our experiments do not indicate any sensible > difference in performance between the implementations.Sounds good. Can you be more specific about your experiments. What kinds of code did you try? Thanks again, Hal> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/db1f88ae/attachment.html>
Hello Hal, > What exactly do you do, or plan to do, for bit fields? A single > load/store can access more than one field. This is true for > other kinds of things too, but, in particularly common for bit > fields. (see also my comment below about !tbaa.struct). Given how Clang currently generates code for bit field accesses, it seems the right thing to do is to annotate corresponding loads and stores as ones accessing whole storage units these bit fields map to. So this is the plan. With the proposed approach to TBAA we can do better, but the rest of LLVM is obviously not ready to this yet. > Do you have a Clang patch as well? Yes, we have both Clang and LLVM patches. Not everything of what can be supported is already there, but at least we know we pass all 'check-all' tests with the new implementation in place of the scalar and struct-path ones and we see how current issues can be resolved with this new approach. > As in the current scheme, this group is a peer to <may_alias> > above? Yes, <vtable_pointer> and <may_alias> are peers. Just like in the current implementation. This reminds me another aspect of the discussion: the C/C++ TBAA rules are not completely settled yet and we may want to change some bits as time flows. Or make them configurable. The proposed approach is designed with this possibility in mind and provides greater flexibility in this regard. > Of all of these groups above, only <union> has special > semantics, correct? Yes, currently. But we consider type groups as one of the natural ways to extend the implementation to support other input languages and their dialects. >> For two given access sequences we can determine if the >> accessed objects are allowed to overlap by the rules of the >> input language. Here's the list of rules complete enough to >> support C and C++. Ellipsis elements denote sequences of zero >> or more elements. > > Which is presumed to match in both accesses, correct? Yes, same-named elements shall match. Thanks for the catch! >> If neither of the given sequences contains the leading access >> type of the other, then they are allowed to overlap if the >> leading access type of one sequence is a direct or indirect >> field of the final access type of the other sequence and then >> the most generic access sequence consists of a single element, >> which is that final access type. > I'm not following the above paragraph. Can you please explain > from where this comes? Consider this: struct A { int i; } *a; struct B { struct A a; }; struct M { struct B b; }; struct N { struct M m; } *n; Now, access sequences for accesses 'n->m' and 'a->i' do not have common elements, but are still allowed to overlap as 'a' may be pointing to 'n->m.b.a'. >> For a field node the first element refers to its type. The >> purpose of other elements is to make the field node unique. >> Their meaning is unspecified. Currently the other members for >> C and C++ are the field name, bit offset and bit size of the >> member, but > I suspect that we'll want to define a meaning to the > size/offset fields. One thing to think about is that, today, we > also have !tbaa.struct metadata. This is put on memcpy, etc. to > indicate what fields are defined (the rest is padding which the > memcpy does not need to actually copy). This is somewhat akin > to the bitfield problem (i.e. a single load can access multiple > fields in the bitfield). If we can represent that, maybe we can > use the same thing to replace !tbaa.struct as well. We could > specify size/offset in bits to handle bitfields. Yes, I was thinking about representing parts of objects when working on SROA where we have to annotate accesses to slices as if they are accesses to whole objects. This is still a task in progress. And by the way, another thing to think about is that in some cases we may need an instruction to be associated with more than one access sequence. For example, memcpy() calls that initialize and copy aggregates read and write something at the same time. So we probably should consider this as a potential solution to the slices problem. As to !tbaa.struct, the only place where we analyze this tag is during simplification of the memcpy/memmove/memset() calls for aggregates. Since in the mainline we can only represent accesses to scalars, this code generates TBAA info for replacing loads and stores only if the aggregate comprises a single scalar field, if I read it correctly. Furthermore, it seems there are no tests that would fail if this code is thrown away completely. This effectively looks like there's no any !tbaa.struct at the moment. Anyway, in our local patch we replaced it with plain !tbaa. > Sounds good. Can you be more specific about your experiments. > What kinds of code did you try? We are playing with sources from llvm-project/test-suite, like spirit.cpp, for example. Unfortunately, Clang does not generate TBAA info in so many cases that even from this 600Kb source code with lots of memory accesses we only get about 150 TBAA metadata nodes and this makes measuring more or less meaningless. I would expect we to pay more attention to how much memory TBAA information consumes as we support more features, but in the same time I wouldn't expect a huge overhead as we only encode what is positively necessary. Thanks, --
On 08/18/2017 09:48 AM, Ivan A. Kosarev via llvm-dev wrote:> Hello Hal, > > > What exactly do you do, or plan to do, for bit fields? A single > > load/store can access more than one field. This is true for > > other kinds of things too, but, in particularly common for bit > > fields. (see also my comment below about !tbaa.struct). > > Given how Clang currently generates code for bit field accesses, it > seems the right thing to do is to annotate corresponding loads and > stores as ones accessing whole storage units these bit fields map to. > So this is the plan.In your proposed scheme, exactly how to do you this? How do you represent a single load accessing multiple fields?> With the proposed approach to TBAA we can do better, but the rest of > LLVM is obviously not ready to this yet.What does this mean. How could it do better?> > > Do you have a Clang patch as well? > > Yes, we have both Clang and LLVM patches.Can you provide the Clang patch as well?> Not everything of what can be supported is already there, but at least > we know we pass all 'check-all' tests with the new implementation in > place of the scalar and struct-path ones and we see how current issues > can be resolved with this new approach. > > > As in the current scheme, this group is a peer to <may_alias> > > above? > > Yes, <vtable_pointer> and <may_alias> are peers. Just like in the > current implementation. This reminds me another aspect of the > discussion: the C/C++ TBAA rules are not completely settled yet and we > may want to change some bits as time flows. Or make them configurable. > The proposed approach is designed with this possibility in mind and > provides greater flexibility in this regard.I understand. I'm not convinced that we want flexibility, however. I believe that we want a frontend-language-independent set of rules, and a straightforward representation necessary to implement those rules, just one that is more expressive than what we have today. As TBAA has been, and as the IR has been, TBAA should continue to be inspired by the needs of language frontends, however, I have no interest in directly importing the semantics of other languages into LLVM itself (if not absolutely necessary). This is how we develop the LLVM IR as well: it is an independently-defined system, and where we need additional capabilities to meet the needs of clients, we extend the IR while keeping it an independently-defined system.> > > Of all of these groups above, only <union> has special > > semantics, correct? > > Yes, currently. But we consider type groups as one of the natural ways > to extend the implementation to support other input languages and > their dialects. > > >> For two given access sequences we can determine if the > >> accessed objects are allowed to overlap by the rules of the > >> input language. Here's the list of rules complete enough to > >> support C and C++. Ellipsis elements denote sequences of zero > >> or more elements. > > > > Which is presumed to match in both accesses, correct? > > Yes, same-named elements shall match. Thanks for the catch! > > >> If neither of the given sequences contains the leading access > >> type of the other, then they are allowed to overlap if the > >> leading access type of one sequence is a direct or indirect > >> field of the final access type of the other sequence and then > >> the most generic access sequence consists of a single element, > >> which is that final access type. > > > I'm not following the above paragraph. Can you please explain > > from where this comes? > > Consider this: > > struct A { int i; } *a; > struct B { struct A a; }; > > struct M { struct B b; }; > struct N { struct M m; } *n; > > Now, access sequences for accesses 'n->m' and 'a->i' do not have > common elements, but are still allowed to overlap as 'a' may be > pointing to 'n->m.b.a'.Okay, now I understand what you mean.> > >> For a field node the first element refers to its type. The > >> purpose of other elements is to make the field node unique. > >> Their meaning is unspecified. Currently the other members for > >> C and C++ are the field name, bit offset and bit size of the > >> member, but > > > I suspect that we'll want to define a meaning to the > > size/offset fields. One thing to think about is that, today, we > > also have !tbaa.struct metadata. This is put on memcpy, etc. to > > indicate what fields are defined (the rest is padding which the > > memcpy does not need to actually copy). This is somewhat akin > > to the bitfield problem (i.e. a single load can access multiple > > fields in the bitfield). If we can represent that, maybe we can > > use the same thing to replace !tbaa.struct as well. We could > > specify size/offset in bits to handle bitfields. > > Yes, I was thinking about representing parts of objects when working > on SROA where we have to annotate accesses to slices as if they are > accesses to whole objects. This is still a task in progress.Sounds good. In the backend (i.e., at the SDAG/MI levels) we specifically track offsets into scalars for a similar reason. Tracking offsets here may make sense too.> > And by the way, another thing to think about is that in some cases we > may need an instruction to be associated with more than one access > sequence. For example, memcpy() calls that initialize and copy > aggregates read and write something at the same time. So we probably > should consider this as a potential solution to the slices problem.I agree. In the memcpy case, or more generally for argmemonly functions, I'd like to be able to mark specific arguments with metadata specifically. This will require an infrastructure enhancement to support metadata on function arguments, or some scheme for otherwise encoding metadata arguments.> > As to !tbaa.struct, the only place where we analyze this tag is during > simplification of the memcpy/memmove/memset() calls for aggregates.I believe that's correct.> Since in the mainline we can only represent accesses to scalars, this > code generates TBAA info for replacing loads and stores only if the > aggregate comprises a single scalar field, if I read it correctly.That appears to be correct. That should certainly be improved.> Furthermore, it seems there are no tests that would fail if this code > is thrown away completely. This effectively looks like there's no any > !tbaa.struct at the moment.By that standard, we'd throw away all parts of the compiler without 100% regression-test coverage. In such cases, we should add regression tests. I agree, however, that it is not currently used for much. The intent was that, when expanding memcpy (and friends), we could avoid copying padding bytes in cases where we expanded the intrinsics into individual loads/stores. We should still consider that use case in this context.> > Anyway, in our local patch we replaced it with plain !tbaa. > > > Sounds good. Can you be more specific about your experiments. > > What kinds of code did you try? > > We are playing with sources from llvm-project/test-suite, like > spirit.cpp, for example. Unfortunately, Clang does not generate TBAA > info in so many cases that even from this 600Kb source code with lots > of memory accesses we only get about 150 TBAA metadata nodes and this > makes measuring more or less meaningless. I would expect we to pay > more attention to how much memory TBAA information consumes as we > support more features, but in the same time I wouldn't expect a huge > overhead as we only encode what is positively necessary.Okay. Thanks again, Hal> > Thanks, >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory