Manman Ren
2013-Mar-27  20:11 UTC
[LLVMdev] PROPOSAL: struct-access-path aware TBAA (new version)
Hello,
After discussions with Daniel, Dan and others, here is an updated proposal for
struct-access-path aware TBAA.
Given an example
struct A {
  int x;
  int y;
};
struct B {
  A a;
  int z;
};
struct C {
  B b1;
  B b2;
  int *p;
};
struct D {
  C c;
};
The purpose of struct-path-aware TBAA is to say
"C::b1.a" will alias with "B::a.x", "C::b1.a" will
alias with "B",
"C::b1.a" does not alias with "D::c.b2.a.x".
"A::x" does not alias with "A::y"
We can potentially disambiguate aggregate accesses and scalar accesses.
The current scalar TBAA will say
"C::b1.a.x" will alias with "D::c.b2.a.x" since both are
accesses to int.
"A::x" will alias with "A::y"
This proposal is about the format of metadata and how to implement alias
queries.
--------- Option A
A> metadata for the type DAG
   1> the existing scalar nodes
      !0 = metadata !{metadata !"any pointer", metadata !1}
      !1 = metadata !{metadata !"omnipotent char", metadata !2}
      !2 = metadata !{metadata !"Simple C/C++ TBAA"}
      !4 = metadata !{metadata !"int", metadata !1}
   2> the struct nodes
      A struct node has a unique name plus a list of field types
      For example, struct node for "C" should look like
      !5 = metadata !{"C", metadata !3, metadata !3, metadata !0}
      where !3 is the struct node for "B"
   The type DAG for the example:
   int <------------- A <- B <-- C <-- D
   int <-------------------- B
   any pointer <--------------- C
B> We call metadata attached to scalar accesses scalar tags and metadata
   attached to aggregate accesses aggregate tags.
   Each tag can be a sequence of field accesses and nodes in the type DAG.
   For example, the tag for "C::b2.a.x" is
   !7 := [ "tbaa.path", !C, "C::b2", !B, "B::a",
!A, "A::x", !int ] ;C::b2.a.x
   where !C, !B, !A are metadata nodes in the type DAG, "C::b2"
"B::a" "A::x" are strings.
C> alias(x, y) = alias_rule(x, y) or alias_rule(y, x), where alias_rule(x, y)
is
   if the first element of 'x' does not enclose the first element of
'y'
     return NoAlias
   if the last element of 'x' encloses the first element of 'y'
via type DAG,
     return Alias
   Check the first element of 'y', if it is not in 'x', return
NoAlias
   Repeat until we reach the end of either 'x' or 'y'
     Check the next element of 'y' with the next element of 'x'
     if not the same, return false.
-------------- Option B
Option B is to include offset of each field in the type DAG and let tag
be offset based.
A> metadata for the type DAG
   1> the existing scalar nodes
   2> the struct nodes
      A struct node has a unique name plus a list of pairs (offset, field type)
      For example, struct node for "C" should look like
      !5 = metadata !{"C", 0, metadata !3, 12, metadata !3, 24,
metadata !0}
      where !3 is the struct node for "B"
   The type DAG for the example:
   int <- 0 <- A
   int <- 4 <- A <- 0 <- B <- 0  <- C <- 0 <- D
   int <-------------- 8 <- B <- 12 <- C
   any pointer <--------------- 24 <- C
B> The tag will be (outermost struct type node, offset, size, the last type
node)
   For example, the tag for C::b2.a.x will be
   !7 = [ "tbaa.offset", !C, 12, 4, !int] ;C::b2.a.x
   The last field of the tag is to handle alias(D::c, A::x), the tags will be
   [!D, 0, 32, !C] and [!A, 0, 4, !int], we only need to check whether !C
encloses
   !A.
C> Consider access pair "C::b1.a.x" and "D::c.b2.a.x",
the tags are [!C, 0, 4, !int]
   and [!D, 12, 4, !int].
   To check if they alias, we first check if !D encloses !C. If no, we return
   NoAlias. Since !D does enclose !C in our example, we start tracing from !D
   and follow the edge with the correct offset by
   adjusting the offset to be relative to the next node. We will reach C in the
   type DAG, now we have a common type node and can compare the (offset, size)
   directly. [!D, 12, 4] will be adjusted to [!C, 12, 4], since it does not
   overlap with [!C, 0, 4], we say they do not alias.
   alias_rule(x, y) is
   if the outermost type of 'x' does not enclose the outermost type of
'y',
     return NoAlias
   if the last type of 'x' encloses the outermost type of 'y'
     return Alias
   Start from the outermost type of 'x'
   Repeat until we reach the outermost type of 'y'
     follow the edge with the correct offset in the type DAG
     adjust the offset to be relative to the next node
   Compare the adjusted offset, size pair for 'x' against 'y'
   if they overlap, return Alias
   Otherwise, return NoAlias
------------------ Comparison
The complexity of alias queries: similar
The space for metadata: option A should require more space because the tag is a
  sequence of nodes and strings.
I will work on Option B.
------------------ Holes
We use BasicAA to perform sanity check. If BasicAA returns MayAlias, TBAA will
kick off with full speed.
Watch out for cases where the incoming arguments have different types but they
do alias. Scalar TBAA has the same issue but we don't see bugs coming up. If
we see bugs coming up with struct-path aware TBAA, we can do less aggressive
TBAA or apply similar tricks as GCC.
Scalar TBAA will say NoAlias for the placement new example under
test/Analysis/TypeBasedAliasAnalysis/placement-tbaa.ll.
If we optimize placement-tbaa.ll with -O1 -gvn, BaiscAA will start saying
MustAlias. This may cause problem.
Xinliang David Li
2013-Mar-28  16:39 UTC
[LLVMdev] PROPOSAL: struct-access-path aware TBAA (new version)
Nice proposal. A possibly simplified way of doing TBAA is: 1) decompose an access into a 3-tuples: <BaseType, AccessType, BitOffset> -- there is no need to remember the full access path. 2) If the access types are incompatible, return noAlias (scalar TBAA rule) 3) if there is no containment relationship between two BaseTypes, return noAlias 4) Otherwise, align the the enclosed base type into the enclosing type (for each enclosed instance), and adjust the offset. If there is no overlap, return noAlias, otherwise return mayAlias. For instance, "C::b1.a.y" AND "D::c.b2.a.x C::b1.a.y --> <C, int, 32> D::c.b2.a.x --> <D, int, 96> Base Type C is enclosed in D, so the first access is normalized to <D, int, 32> which does not overlap with <D, int, 96>. For more complicated cases, such as p->a[i].x.b[j].y, TBAA can choose to normalize the access via projection before creating the tuple --> p->a[0].x.b[0].y. Make sure MustAlias is not wrongly returned for this case. Cheers, David On Wed, Mar 27, 2013 at 1:11 PM, Manman Ren <mren at apple.com> wrote:> > Hello, > > After discussions with Daniel, Dan and others, here is an updated proposal > for struct-access-path aware TBAA. > > Given an example > struct A { > int x; > int y; > }; > struct B { > A a; > int z; > }; > struct C { > B b1; > B b2; > int *p; > }; > struct D { > C c; > }; > > The purpose of struct-path-aware TBAA is to say > "C::b1.a" will alias with "B::a.x", "C::b1.a" will alias with "B", > "C::b1.a" does not alias with "D::c.b2.a.x". > "A::x" does not alias with "A::y" > We can potentially disambiguate aggregate accesses and scalar accesses. > > The current scalar TBAA will say > "C::b1.a.x" will alias with "D::c.b2.a.x" since both are accesses to int. > "A::x" will alias with "A::y" > > This proposal is about the format of metadata and how to implement alias > queries. > --------- Option A > A> metadata for the type DAG > 1> the existing scalar nodes > !0 = metadata !{metadata !"any pointer", metadata !1} > !1 = metadata !{metadata !"omnipotent char", metadata !2} > !2 = metadata !{metadata !"Simple C/C++ TBAA"} > !4 = metadata !{metadata !"int", metadata !1} > 2> the struct nodes > A struct node has a unique name plus a list of field types > For example, struct node for "C" should look like > !5 = metadata !{"C", metadata !3, metadata !3, metadata !0} > where !3 is the struct node for "B" > > The type DAG for the example: > int <------------- A <- B <-- C <-- D > int <-------------------- B > any pointer <--------------- C > > B> We call metadata attached to scalar accesses scalar tags and metadata > attached to aggregate accesses aggregate tags. > Each tag can be a sequence of field accesses and nodes in the type DAG. > For example, the tag for "C::b2.a.x" is > !7 := [ "tbaa.path", !C, "C::b2", !B, "B::a", !A, "A::x", !int ] > ;C::b2.a.x > where !C, !B, !A are metadata nodes in the type DAG, "C::b2" "B::a" > "A::x" are strings. > C> alias(x, y) = alias_rule(x, y) or alias_rule(y, x), where alias_rule(x, > y) is > if the first element of 'x' does not enclose the first element of 'y' > return NoAlias > if the last element of 'x' encloses the first element of 'y' via type > DAG, > return Alias > Check the first element of 'y', if it is not in 'x', return NoAlias > Repeat until we reach the end of either 'x' or 'y' > Check the next element of 'y' with the next element of 'x' > if not the same, return false. > > -------------- Option B > Option B is to include offset of each field in the type DAG and let tag > be offset based. > A> metadata for the type DAG > 1> the existing scalar nodes > 2> the struct nodes > A struct node has a unique name plus a list of pairs (offset, field > type) > For example, struct node for "C" should look like > !5 = metadata !{"C", 0, metadata !3, 12, metadata !3, 24, metadata > !0} > where !3 is the struct node for "B" > The type DAG for the example: > int <- 0 <- A > int <- 4 <- A <- 0 <- B <- 0 <- C <- 0 <- D > int <-------------- 8 <- B <- 12 <- C > any pointer <--------------- 24 <- C > > B> The tag will be (outermost struct type node, offset, size, the last > type node) > For example, the tag for C::b2.a.x will be > !7 = [ "tbaa.offset", !C, 12, 4, !int] ;C::b2.a.x > The last field of the tag is to handle alias(D::c, A::x), the tags will > be > [!D, 0, 32, !C] and [!A, 0, 4, !int], we only need to check whether !C > encloses > !A. > > C> Consider access pair "C::b1.a.x" and "D::c.b2.a.x", the tags are [!C, > 0, 4, !int] > and [!D, 12, 4, !int]. > To check if they alias, we first check if !D encloses !C. If no, we > return > NoAlias. Since !D does enclose !C in our example, we start tracing from > !D > and follow the edge with the correct offset by > adjusting the offset to be relative to the next node. We will reach C > in the > type DAG, now we have a common type node and can compare the (offset, > size) > directly. [!D, 12, 4] will be adjusted to [!C, 12, 4], since it does not > overlap with [!C, 0, 4], we say they do not alias. > > alias_rule(x, y) is > if the outermost type of 'x' does not enclose the outermost type of 'y', > return NoAlias > if the last type of 'x' encloses the outermost type of 'y' > return Alias > Start from the outermost type of 'x' > Repeat until we reach the outermost type of 'y' > follow the edge with the correct offset in the type DAG > adjust the offset to be relative to the next node > Compare the adjusted offset, size pair for 'x' against 'y' > if they overlap, return Alias > Otherwise, return NoAlias > > ------------------ Comparison > The complexity of alias queries: similar > The space for metadata: option A should require more space because the tag > is a > sequence of nodes and strings. > I will work on Option B. > > ------------------ Holes > We use BasicAA to perform sanity check. If BasicAA returns MayAlias, TBAA > will > kick off with full speed. > > Watch out for cases where the incoming arguments have different types but > they > do alias. Scalar TBAA has the same issue but we don't see bugs coming up. > If > we see bugs coming up with struct-path aware TBAA, we can do less > aggressive > TBAA or apply similar tricks as GCC. > > Scalar TBAA will say NoAlias for the placement new example under > test/Analysis/TypeBasedAliasAnalysis/placement-tbaa.ll. > If we optimize placement-tbaa.ll with -O1 -gvn, BaiscAA will start saying > MustAlias. This may cause problem. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130328/dcbf6eb6/attachment.html>