thr3ads.net - llvm dev - [llvm-dev] RFC: Resolving TBAA issues [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin via llvm-dev

2017-Aug-14 17:29 UTC

[llvm-dev] RFC: Representing unions in TBAA

It's hard to say.
What you've described sounds close to a neutral type system implemented in
metadata.
In particular, ". It also defines a set of language-neutral formal rules
that LLVM codegen follows to determine whether a given pair of accesses are
allowed to overlap by rules of the input language. "
and "the base type followed by field descriptors"
etc

Despite the name, our current TBAA does not require or represent types.  It
represents a translation of language access rules into a language of
hierarchical sets, that are represented by a  tree with weighted edges.

If you are actually attempting to represent neutral types, certainly, the
approach can work, but probably not represent exact semantics for all
languages.

Most LLVM metadata also tries to avoid understanding the language, instead
modeling the effects.

For example, it's unlikely we'd use metadata to say "This is a
struct field
access to a, and this is one to b" and use that in analysis.  Because it
requires the semantics be at the LLVM level, and understand something about
the language.

Instead, we'd usually say "this is an access to offset 0 of memory,
with
size 4, and this is an  access to offset 4 of memory, with size 8", with
the with the semantic that accesses tagged in such a manner can only
overlap if the offset, size ranges overlap.  That semantic is language
independent.

But again, this is all very theoretical. I'd be very interested to see what
you came up with.

On Mon, Aug 14, 2017 at 10:10 AM, Ivan A. Kosarev <ivan at kosarev.info>
wrote:
> Sure, I will provide those. I just wanted to make sure this doesn't
sound
> like what you know will not work for some reasons I'm not aware of.
>
>
> On 14/08/17 20:04, Daniel Berlin wrote:
>
> Do you have a formal description of your approach with examples?
> I have a bit of trouble visualizing exactly what your approach does.
>
> On Mon, Aug 14, 2017 at 9:58 AM, Ivan A. Kosarev via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hello Steven, Hal and Daniel,
>>
>> Thanks a lot for your discussion; it really helps with summarizing
>> current TBAA issues and ways to resolve them.
>>
>> Do you guys know anything of the current status of the proposed change?
>> Steven, will you please let us know if the work is in progress and if
there
>> is any ETA you can share?
>>
>> I'm asking because we are working on an alternative approach that
not
>> only supports accesses to union members, bit fields, fields of
aggregate
>> and union types, but also allows to represent accesses to aggregates
and
>> unions the same way we do it for scalars so that !tbaa.struct is
replaced
>> with plain !tbaa, meaning TBAA information can be propagated uniformly
>> regardless of types of accessed objects. As a consequence, it supports
>> identification of user types defined in different translation units,
even
>> if some of them are written in C and others are in C++. It also defines
a
>> set of language-neutral formal rules that LLVM codegen follows to
determine
>> whether a given pair of accesses are allowed to overlap by rules of the
>> input language. As of today, we know this implementation covers all
>> currently supported TBAA functionality reflected in the test suites and
to
>> test the new functionality we have SROA improved to preserve TBAA
>> information.
>>
>> The point is, our approach does not try to describe accesses as (type,
>> offset) pairs and instead represents access sequences explicitly
beginning
>> from the base type followed by field descriptors, which is what makes
the
>> approach so flexible. TypeBasedAAResult::Aliases() and
>> MDNode::getMostGenericTBAA() are a bit more complex than they used to
be
>> (they actually use the same internal function), but rely exclusively on
>> linear scans of access sequences unless we have a situation when have
to
>> check if one of the accessed types is the type of a member of the other
>> one, in which case it seems we just have to traverse through fields
>> recursively no matter what.
>>
>> So, I wonder if this or similar approaches have ever been considered
>> before and what are the cons, if there are any sounded. Do you think it
is
>> worth to consider it now?
>>
>> Thanks again,
>>
>> --
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170814/390c6bf6/attachment.html>

Ivan A. Kosarev via llvm-dev

2017-Aug-16 18:59 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

Hal, Daniel,

Thanks for your responses. Here's a quick formal introduction to the 
proposed approach follows. I also attached a couple files to put some 
visibility on implementation details. We'd love to hear from you 
gentlemen and LLVM community in general on how this fits what you know 
about TBAA.


Overview
=======
With this writing we propose a new approach to the TBAA mechanism
designed to overcome current issues such as:
  - inability to represent accesses to aggregates, unions, union
    members, bit fields, fields of union and aggregate types,
    including array members and
  - lack of a mean to identify user types defined in different
    translation units.

As of today, we have a local patch that implements this approach.
This new implementation is known to be at least as functionally
complete as the current one. Additionally, with this patch on top
we improved SROA to propagate TBAA information, thus making sure
the new features work as expected.

We should be able to upload that patch for review in a few days.


The Approach
===========
With the proposed approach we represent accesses as sequences
that contain all accessed types and fields in order. For example
for this access:

   struct T { union U { struct S { int i1, i2; } s1, s2; } u1, u2; } t;
   t.u1.s1.i1

we generate an access sequence of the form:

   [T, T::u1, U, U::s1, S, S::i1, int]

An array is allowed to overlap with any object of its element
type, including other arrays of the same element type, so an
access to an array element is represented as an access to an
object of the element type:

   int a[7];
   a[5]

   [int]

In case of a multi-dimensional array this rule applies
recursively:

   int a[7][9];
   a[3][5]

   [int]

For member arrays we specify the member field as usual so we can
distinct it from other array members:

   struct S { int a[7], b[9]; } s;
   s.a[5]

   [S, S::a, int]

Similarly to the scalar and struct-path approaches, we consider
every type to be a member of a type group it explicitly refers
to. Here's how the tree that describes relations between type
groups would look like for the example above:

   <tbaa_root>
   |- <may_alias>
     |- <representation_byte>
       |-<structure>
       | |- S
       |- int

The <vtable_pointer> group has a special meaning and is used to
describe accesses to virtual table pointers. Similarly, the
<union> type group includes all union types and used by the TBAA
implementation to distinct union types from other types. The
<may_alias> group is technically equivalent to
<representation_byte> and supposed to be a group for
may_alias-marked types.

For two given access sequences we can determine if the accessed
objects are allowed to overlap by the rules of the input
language. Here's the list of rules complete enough to support C
and C++. Ellipsis elements denote sequences of zero or more
elements. For other input languages more rules can be supported,
if necessary.

   [X...] is allowed to overlap with [S1..., X..., S2...]
   and the most generic access sequence is [X...].

   [X1..., X2...] is allowed to overlap with [S1..., X1...]
   with the most generic access sequence to be [X1...].

   [X1..., U, U::m1, X2...] is allowed to overlap with
   [S1..., X1..., U, U::m2, S2...]
   for a union U of an unknown effective type, provided m1 != m2
   and the most generic access sequence is [X1..., U].

If neither of the given sequences contains the leading access
type of the other, then they are allowed to overlap if the
leading access type of one sequence is a direct or indirect field
of the final access type of the other sequence and then the most
generic access sequence consists of a single element, which is
that final access type.

For the purpose of determining whether one type is a direct or
indirect member of another type unions are considered to have no
members as accesses to members of unions are only allowed to
overlap if they have the base union object explicitly specified.

Otherwise, given sequences overlap if there is a type group that
includes both the leading access types and the most generic
access sequence consists of the smallest common type group as its
only element.

See the attached TypeBasedAliasAnalysis.cpp file and specifically
the MatchAccessSequences() function for how these rules can be
implemented.

TBAA information is encoded as metadata nodes, as usual. Load and
store instructions refer to access sequences:
   store %struct.T* %p, %struct.T** %p.addr, align 8, !tbaa !11

A type node is either a terminal type node that names a root type
group:
   !0 = !{ !"<tbaa_root>" }

or a non-terminal type node that names a type and refers to a
type group it belongs to:
   !1 = !{ !0, !"int" }

Record types also refer to their field descriptors:
   !3 = !{ !0, !"S", !9 }

An access node is either a terminal access node that refers to
the corresponding access type:
   !5 = !{ !1 }
   !9 = !{ !3 }

or a member node that refers to a structure/class or union field
descriptor and a subsequent access path node:
   !7 = !{ !type_group, !field_id, !field_offset, !field_size }
   !11 = !{ !5, !9, !7 }

For a field node the first element refers to its type. The
purpose of other elements is to make the field node unique. Their
meaning is unspecified. Currently the other members for C and C++
are the field name, bit offset and bit size of the member, but
this may change in future and front ends for other input
languages may act differently, so TBAA implementation in the
codegen shall not rely on specific shape or meaning of these
elements.

For types that are interchangeable for purposes of TBAA it is
important to encode them identically so that descriptors of
interchangeable types defined in different modules merge into
same metadata nodes.

Structure/class fields are specified in the order of declaration.
For union fields there is a canonical order that guarantee that
definitions of the same union type will result in identical
descriptors regardless of the order of member declarations.
Currently we sort union fields with key (field_id, field_offset,
field_size).

C++ tagged types with no linkage are to be encoded as "distinct"
nodes to guarantee their uniqueness.

The support for !tbaa.struct information is to be replaced with
plain !tbaa tags representing accesses to the corresponding
record types.

Another attached file, the tbaa.cpp one, is a test case that can
give an idea what encoded TBAA metadata may look like.


Space and Performance Analysis
=============================
In terms of metadata size, with the new approach we generate
about 15% more of metadata nodes. The ratio of the total number
of TBAA nodes to the amount of code remains very low, meaning the
size of TBAA metadata is unlikely to be a problem any time soon.

 From the performance perspective, the proposed approach differs
from the current one in that:
  - it does not traverse through record types if one of the access
    sequences to match contains the leading access type of the
    other and
  - it never traverses union types.

We thus expect that the new approach is at least as efficient as
the current one. Our experiments do not indicate any sensible
difference in performance between the implementations.

-- 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: tbaa.cpp
Type: text/x-c++src
Size: 11239 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170816/09ae1d7e/attachment.cpp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TypeBasedAliasAnalysis.cpp
Type: text/x-c++src
Size: 24293 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170816/09ae1d7e/attachment-0001.cpp>

Daniel Berlin via llvm-dev

2017-Aug-17 21:49 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

>
>
>
> For two given access sequences we can determine if the accessed
> objects are allowed to overlap by the rules of the input
> language.

Sadly, this is where this becomes "unlikely to want to use to replace
TBAA", at least for me. It may still be a thing we want anyway.

This scheme is really an encoding of C/C++ TBAA info so it can be read by
LLVM and requires that *LLVM* have some set of rules that it enforces about
that scheme.
But that scheme is still very language specific in how it is used.
GCC still has something in between this and LLVM, where the language rules
are a bit encoded (but not as much as you have).

We (and gcc) have deliberately avoided such schemes, in favor of
transforming the info into abstract set trees that then tag loads and
stores.

The encoding of "struct path" tbaa, is just a way of trading space vs
time
in that encoding.  We trade walking time for the space of transitive
closure, etc.

None of the TBAA in LLVM really has any *real* relation to the original
type system rules, and that is deliberate.  I've argued for years that
calling it "TBAA" just badly confuses people, and i believe here is a
good
example :)

So i don't think what you've written can be used to replace TBAA.

However, i *do* believe it would be useful to further optimize C and C++
programs in LLVM by using access paths.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/dc45cc46/attachment.html>

Hal Finkel via llvm-dev

2017-Aug-18 00:11 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

On 08/16/2017 01:59 PM, Ivan A. Kosarev via llvm-dev
wrote:> Hal, Daniel,
>
> Thanks for your responses. Here's a quick formal introduction to the 
> proposed approach follows. I also attached a couple files to put some 
> visibility on implementation details. We'd love to hear from you 
> gentlemen and LLVM community in general on how this fits what you know 
> about TBAA.
>
>
> Overview
> =======>
> With this writing we propose a new approach to the TBAA mechanism
> designed to overcome current issues such as:
>  - inability to represent accesses to aggregates, unions, union
>    members, bit fields
What exactly do you do, or plan to do, for bit fields? A single 
load/store can access more than one field. This is true for other kinds 
of things too, but, in particularly common for bit fields. (see also my 
comment below about !tbaa.struct).
> , fields of union and aggregate types,
>    including array members and
>  - lack of a mean to identify user types defined in different
>    translation units.
>
> As of today, we have a local patch that implements this approach.
> This new implementation is known to be at least as functionally
> complete as the current one. Additionally, with this patch on top
> we improved SROA to propagate TBAA information, thus making sure
> the new features work as expected.
>
> We should be able to upload that patch for review in a few days.
Do you have a Clang patch as well?
>
>
> The Approach
> ===========>
> With the proposed approach we represent accesses as sequences
> that contain all accessed types and fields in order. For example
> for this access:
>
>   struct T { union U { struct S { int i1, i2; } s1, s2; } u1, u2; } t;
>   t.u1.s1.i1
>
> we generate an access sequence of the form:
>
>   [T, T::u1, U, U::s1, S, S::i1, int]
>
> An array is allowed to overlap with any object of its element
> type, including other arrays of the same element type, so an
> access to an array element is represented as an access to an
> object of the element type:
>
>   int a[7];
>   a[5]
>
>   [int]
>
> In case of a multi-dimensional array this rule applies
> recursively:
>
>   int a[7][9];
>   a[3][5]
>
>   [int]
>
> For member arrays we specify the member field as usual so we can
> distinct it from other array members:
>
>   struct S { int a[7], b[9]; } s;
>   s.a[5]
>
>   [S, S::a, int]
>
> Similarly to the scalar and struct-path approaches, we consider
> every type to be a member of a type group it explicitly refers
> to. Here's how the tree that describes relations between type
> groups would look like for the example above:
>
>   <tbaa_root>
>   |- <may_alias>
>     |- <representation_byte>
>       |-<structure>
>       | |- S
>       |- int
>
> The <vtable_pointer> group has a special meaning and is used to
> describe accesses to virtual table pointers. 
As in the current scheme, this group is a peer to <may_alias> above?
> Similarly, the
> <union> type group includes all union types and used by the TBAA
> implementation to distinct union types from other types. The
> <may_alias> group is technically equivalent to
> <representation_byte> and supposed to be a group for
> may_alias-marked types.
Of all of these groups above, only <union> has special semantics, correct?
>
> For two given access sequences we can determine if the accessed
> objects are allowed to overlap by the rules of the input
> language. Here's the list of rules complete enough to support C
> and C++. Ellipsis elements denote sequences of zero or more
> elements.
Which is presumed to match in both accesses, correct?
> For other input languages more rules can be supported,
> if necessary.
>
>   [X...] is allowed to overlap with [S1..., X..., S2...]
>   and the most generic access sequence is [X...].
>
>   [X1..., X2...] is allowed to overlap with [S1..., X1...]
>   with the most generic access sequence to be [X1...].
>
>   [X1..., U, U::m1, X2...] is allowed to overlap with
>   [S1..., X1..., U, U::m2, S2...]
>   for a union U of an unknown effective type, provided m1 != m2
>   and the most generic access sequence is [X1..., U].
>
> If neither of the given sequences contains the leading access
> type of the other, then they are allowed to overlap if the
> leading access type of one sequence is a direct or indirect field
> of the final access type of the other sequence and then the most
> generic access sequence consists of a single element, which is
> that final access type.
I'm not following the above paragraph. Can you please explain from where 
this comes?
>
> For the purpose of determining whether one type is a direct or
> indirect member of another type unions are considered to have no
> members as accesses to members of unions are only allowed to
> overlap if they have the base union object explicitly specified.
>
> Otherwise, given sequences overlap if there is a type group that
> includes both the leading access types and the most generic
> access sequence consists of the smallest common type group as its
> only element.
>
> See the attached TypeBasedAliasAnalysis.cpp file and specifically
> the MatchAccessSequences() function for how these rules can be
> implemented.
>
> TBAA information is encoded as metadata nodes, as usual. Load and
> store instructions refer to access sequences:
>   store %struct.T* %p, %struct.T** %p.addr, align 8, !tbaa !11
>
> A type node is either a terminal type node that names a root type
> group:
>   !0 = !{ !"<tbaa_root>" }
>
> or a non-terminal type node that names a type and refers to a
> type group it belongs to:
>   !1 = !{ !0, !"int" }
>
> Record types also refer to their field descriptors:
>   !3 = !{ !0, !"S", !9 }
>
> An access node is either a terminal access node that refers to
> the corresponding access type:
>   !5 = !{ !1 }
>   !9 = !{ !3 }
>
> or a member node that refers to a structure/class or union field
> descriptor and a subsequent access path node:
>   !7 = !{ !type_group, !field_id, !field_offset, !field_size }
>   !11 = !{ !5, !9, !7 }
>
> For a field node the first element refers to its type. The
> purpose of other elements is to make the field node unique. Their
> meaning is unspecified. Currently the other members for C and C++
> are the field name, bit offset and bit size of the member, but
I suspect that we'll want to define a meaning to the size/offset fields. 
One thing to think about is that, today, we also have !tbaa.struct 
metadata. This is put on memcpy, etc. to indicate what fields are 
defined (the rest is padding which the memcpy does not need to actually 
copy). This is somewhat akin to the bitfield problem (i.e. a single load 
can access multiple fields in the bitfield). If we can represent that, 
maybe we can use the same thing to replace !tbaa.struct as well. We 
could specify size/offset in bits to handle bitfields.
> this may change in future and front ends for other input
> languages may act differently, so TBAA implementation in the
> codegen shall not rely on specific shape or meaning of these
> elements.
>
> For types that are interchangeable for purposes of TBAA it is
> important to encode them identically so that descriptors of
> interchangeable types defined in different modules merge into
> same metadata nodes.
>
> Structure/class fields are specified in the order of declaration.
> For union fields there is a canonical order that guarantee that
> definitions of the same union type will result in identical
> descriptors regardless of the order of member declarations.
> Currently we sort union fields with key (field_id, field_offset,
> field_size).
>
> C++ tagged types with no linkage are to be encoded as "distinct"
> nodes to guarantee their uniqueness.
>
> The support for !tbaa.struct information is to be replaced with
> plain !tbaa tags representing accesses to the corresponding
> record types.
>
> Another attached file, the tbaa.cpp one, is a test case that can
> give an idea what encoded TBAA metadata may look like.
>
>
> Space and Performance Analysis
> =============================>
> In terms of metadata size, with the new approach we generate
> about 15% more of metadata nodes. The ratio of the total number
> of TBAA nodes to the amount of code remains very low, meaning the
> size of TBAA metadata is unlikely to be a problem any time soon.
>
> From the performance perspective, the proposed approach differs
> from the current one in that:
>  - it does not traverse through record types if one of the access
>    sequences to match contains the leading access type of the
>    other and
>  - it never traverses union types.
>
> We thus expect that the new approach is at least as efficient as
> the current one. Our experiments do not indicate any sensible
> difference in performance between the implementations.
Sounds good. Can you be more specific about your experiments. What kinds 
of code did you try?

Thanks again,
Hal
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/db1f88ae/attachment.html>

Xinliang David Li via llvm-dev

2017-Aug-21 01:47 UTC

head link

[llvm-dev] RFC: Resolving TBAA issues

Hi Ivan, thanks for writing it up. This is a pretty long thread, and there
are many good points brought up in the heated discussions.  Here is my take
on many of the points that have been mentioned:

1) The type based aliasing IR annotation should be generic enough to
represent aliasing constraints for any frontend languages;
2) The offset based aliasing rules and type based aliasing rule have a lot
in common. The difference is that the offset based aliasing rule require
that the two accesses to have same base pointer value, while type based
aliasing rules specifies whether two base pointers can possibly point to
the same underlying object, and if yes, what are the legally allowed
offsets at which the two base pointers can be aligned;
3) Type specific information is usually not interesting to offset based
aliasing rules if offsets can be folded into constant. It is when there is
a variable involved in the indexing that makes language guaranteed bounds
information also useful.

Given the above, the type based aliasing annotation can be represented as
list of <base_type, offset> pairs, as well as the original base pointer
type in the source.  Each pair represent a way to access the memory legally
allowed by the language -- this field can be accessed with a base pointer
of type 'base_type' at 'offset'.

For instance:

struct A { struct B { struct C {int m, n; } c1; int i, j; } b1, b2; int k;
};
struct A*ap;
struct B *bp;
struct C *cp;
int *ip;

(1) access ap->b2.c1.n has following annotation:

  {A*, [ <A, 12>, <B, 4>, <C,4>, <int, 0>] }

What it means

  Access (1) may only be aliased with another access if that access's base
pointer is type A and the offset is 12, or type B at offset 4, or type C at
offset 4, or type int at offset 0.  Also access (1) is originally accessed
via path <A, 12>.

(2) access ap->k has only

  {A*, [<A, 16>, <int, 0>}

(3) access bp->c1.n has

  {B*, [<B, 4>, <int, 0>]}

(4) Access bp->j has

  {B*, [<B, 12>, <int, 0>]}

(5) access *ip (implicitly) has

  {int *, <int, 0>}
>From the above, we can see (5) is aliased with all the above, and (3) isalso aliased with (1).

With this representation, the type based alias rules implementation
becomes: given two memory accesses, align the pointer of one access to the
other. If it can be done, they are not aliased. If yes, the use the same
base+offset+access_size aliasing rule to check overlapping.

For languages that have rules about access bounds of member arrays, the
offset information can be replaced with offset + range. By default, the
range is the size of the array type.

David


On Wed, Aug 16, 2017 at 11:59 AM, Ivan A. Kosarev via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hal, Daniel,
>
> Thanks for your responses. Here's a quick formal introduction to the
> proposed approach follows. I also attached a couple files to put some
> visibility on implementation details. We'd love to hear from you
gentlemen
> and LLVM community in general on how this fits what you know about TBAA.
>
>
> Overview
> =======>
> With this writing we propose a new approach to the TBAA mechanism
> designed to overcome current issues such as:
>  - inability to represent accesses to aggregates, unions, union
>    members, bit fields, fields of union and aggregate types,
>    including array members and
>  - lack of a mean to identify user types defined in different
>    translation units.
>
> As of today, we have a local patch that implements this approach.
> This new implementation is known to be at least as functionally
> complete as the current one. Additionally, with this patch on top
> we improved SROA to propagate TBAA information, thus making sure
> the new features work as expected.
>
> We should be able to upload that patch for review in a few days.
>
>
> The Approach
> ===========>
> With the proposed approach we represent accesses as sequences
> that contain all accessed types and fields in order. For example
> for this access:
>
>   struct T { union U { struct S { int i1, i2; } s1, s2; } u1, u2; } t;
>   t.u1.s1.i1
>
> we generate an access sequence of the form:
>
>   [T, T::u1, U, U::s1, S, S::i1, int]
>
> An array is allowed to overlap with any object of its element
> type, including other arrays of the same element type, so an
> access to an array element is represented as an access to an
> object of the element type:
>
>   int a[7];
>   a[5]
>
>   [int]
>
> In case of a multi-dimensional array this rule applies
> recursively:
>
>   int a[7][9];
>   a[3][5]
>
>   [int]
>
> For member arrays we specify the member field as usual so we can
> distinct it from other array members:
>
>   struct S { int a[7], b[9]; } s;
>   s.a[5]
>
>   [S, S::a, int]
>
> Similarly to the scalar and struct-path approaches, we consider
> every type to be a member of a type group it explicitly refers
> to. Here's how the tree that describes relations between type
> groups would look like for the example above:
>
>   <tbaa_root>
>   |- <may_alias>
>     |- <representation_byte>
>       |-<structure>
>       | |- S
>       |- int
>
> The <vtable_pointer> group has a special meaning and is used to
> describe accesses to virtual table pointers. Similarly, the
> <union> type group includes all union types and used by the TBAA
> implementation to distinct union types from other types. The
> <may_alias> group is technically equivalent to
> <representation_byte> and supposed to be a group for
> may_alias-marked types.
>
> For two given access sequences we can determine if the accessed
> objects are allowed to overlap by the rules of the input
> language. Here's the list of rules complete enough to support C
> and C++. Ellipsis elements denote sequences of zero or more
> elements. For other input languages more rules can be supported,
> if necessary.
>
>   [X...] is allowed to overlap with [S1..., X..., S2...]
>   and the most generic access sequence is [X...].
>
>   [X1..., X2...] is allowed to overlap with [S1..., X1...]
>   with the most generic access sequence to be [X1...].
>
>   [X1..., U, U::m1, X2...] is allowed to overlap with
>   [S1..., X1..., U, U::m2, S2...]
>   for a union U of an unknown effective type, provided m1 != m2
>   and the most generic access sequence is [X1..., U].
>
> If neither of the given sequences contains the leading access
> type of the other, then they are allowed to overlap if the
> leading access type of one sequence is a direct or indirect field
> of the final access type of the other sequence and then the most
> generic access sequence consists of a single element, which is
> that final access type.
>
> For the purpose of determining whether one type is a direct or
> indirect member of another type unions are considered to have no
> members as accesses to members of unions are only allowed to
> overlap if they have the base union object explicitly specified.
>
> Otherwise, given sequences overlap if there is a type group that
> includes both the leading access types and the most generic
> access sequence consists of the smallest common type group as its
> only element.
>
> See the attached TypeBasedAliasAnalysis.cpp file and specifically
> the MatchAccessSequences() function for how these rules can be
> implemented.
>
> TBAA information is encoded as metadata nodes, as usual. Load and
> store instructions refer to access sequences:
>   store %struct.T* %p, %struct.T** %p.addr, align 8, !tbaa !11
>
> A type node is either a terminal type node that names a root type
> group:
>   !0 = !{ !"<tbaa_root>" }
>
> or a non-terminal type node that names a type and refers to a
> type group it belongs to:
>   !1 = !{ !0, !"int" }
>
> Record types also refer to their field descriptors:
>   !3 = !{ !0, !"S", !9 }
>
> An access node is either a terminal access node that refers to
> the corresponding access type:
>   !5 = !{ !1 }
>   !9 = !{ !3 }
>
> or a member node that refers to a structure/class or union field
> descriptor and a subsequent access path node:
>   !7 = !{ !type_group, !field_id, !field_offset, !field_size }
>   !11 = !{ !5, !9, !7 }
>
> For a field node the first element refers to its type. The
> purpose of other elements is to make the field node unique. Their
> meaning is unspecified. Currently the other members for C and C++
> are the field name, bit offset and bit size of the member, but
> this may change in future and front ends for other input
> languages may act differently, so TBAA implementation in the
> codegen shall not rely on specific shape or meaning of these
> elements.
>
> For types that are interchangeable for purposes of TBAA it is
> important to encode them identically so that descriptors of
> interchangeable types defined in different modules merge into
> same metadata nodes.
>
> Structure/class fields are specified in the order of declaration.
> For union fields there is a canonical order that guarantee that
> definitions of the same union type will result in identical
> descriptors regardless of the order of member declarations.
> Currently we sort union fields with key (field_id, field_offset,
> field_size).
>
> C++ tagged types with no linkage are to be encoded as "distinct"
> nodes to guarantee their uniqueness.
>
> The support for !tbaa.struct information is to be replaced with
> plain !tbaa tags representing accesses to the corresponding
> record types.
>
> Another attached file, the tbaa.cpp one, is a test case that can
> give an idea what encoded TBAA metadata may look like.
>
>
> Space and Performance Analysis
> =============================>
> In terms of metadata size, with the new approach we generate
> about 15% more of metadata nodes. The ratio of the total number
> of TBAA nodes to the amount of code remains very low, meaning the
> size of TBAA metadata is unlikely to be a problem any time soon.
>
> From the performance perspective, the proposed approach differs
> from the current one in that:
>  - it does not traverse through record types if one of the access
>    sequences to match contains the leading access type of the
>    other and
>  - it never traverses union types.
>
> We thus expect that the new approach is at least as efficient as
> the current one. Our experiments do not indicate any sensible
> difference in performance between the implementations.
>
> --
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/a9068eec/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Aug 2017 - RFC: Resolving TBAA issues

[llvm-dev] RFC: Representing unions in TBAA

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

[llvm-dev] RFC: Resolving TBAA issues

Possibly Parallel Threads