thr3ads.net - llvm dev - [LLVMdev] PROPOSAL: struct-access-path aware TBAA [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Manman Ren

2013-Mar-11 18:41 UTC

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

Based on discussions with John McCall

We currently focus on field accesses of structs, more specifically, on fields
that are scalars or structs.

Fundamental rules from C11
--------------------------
An object shall have its stored value accessed only by an lvalue expression that
has one of the following types: [footnote: The intent of this list is to specify
those circumstances in which an object may or may not be aliased.]
1. a type compatible with the effective type of the object,
2. a qualified version of a type compatible with the effective type of the
object,
3. a type that is the signed or unsigned type corresponding to the effective
type of the object,
4. a type that is the signed or unsigned type corresponding to a qualified
version of the effective type of the object,
5. an aggregate or union type that includes one of the aforementioned types
among its members (including, recursively, a member of a subaggregate or
contained union), or
6. a character type.

Example   
-------  
  struct A {
    int x;
    int y;
  };
  struct B { 
    A a;
    int z;
  };
  struct C {
    B b1;
    B b2; 
    int *p;
  };
    
Type DAG:
  int <- A::x <- A  
  int <- A::y <- A <- B::a <- B <- C::b1 <- C
  int <----------------- B::z <- B <- C::b2 <- C
  any pointer <--------------------- C::p  <- C 

The type DAG has two types of TBAA nodes:
1> the existing scalar nodes
2> the struct nodes (this is different from the current tbaa.struct)
   A struct node has a unique name plus a list of pairs (field name, field
type).
   For example, struct node for "C" should look like
   !4 = metadata !{"C", "C::b1", metadata !3,
"C::b2", metadata !3, "C::p", metadata !2}
   where !3 is the struct node for "B", !2 is pointer type.

Given a field access
  struct B *bp = ...;
  bp->a.x = 5;
we annotate it as B::a.x.
This creates an obvious hierarchy of specificity:
  'B::a.x'
    |-> 'A::x'
      |-> 'int'
Each step outwards adds extra contextual information for an access to an
'int' object.
We call each step inwards the super-type relation. It peels a field access from
the left.

There exists a second hierarchy, that of containment:
  'B::a.x'
    |-> 'B::a'
      |-> 'B'
Each step outwards "extends" the object into a subobject. Each step
inwards "retracts" the object into an container.
"retraction" peels a field access from the right;
"extension" appends a field access at the right.

Aliasing Rules
--------------
alias(x,y) = rule(x,y) or rule(y,x)
rule(x,y) has 3 cases:
I> 'x' and 'y' do not have a common struct type
  rule_case1(x,y) returns true if the last element of 'x' can reach
'y' via type DAG
  |  x  |
         extends
  |    z         |
      super-type
           |  y  |
II> 'x' and 'y' have a common struct type
  The field access starting from the first common type should match until the
end of 'x' or 'y'.
  case a:
  |  x   |
     super-type
      |z'|
        extends
      |   y      |
  case b:
  |  x            |
        retracts
  |   z     |
      super-type
      |  y  |

Combine the three cases to get rule(x,y):
Check the first element of 'y', if it is not in 'x', return
rule_case1(x,y)
Check the next element of 'y' with the next element of 'x', if
not the same, return false.
When we reach the end of either 'x' or 'y', return true.

If we only care about alias between scalar accesses, the above rules can be
simplified.
alias(scalar access x, scalar access y) = super_type(x,y) or super_type(y,x)
where super_type(x,y) is true if 'x' can reach 'y' via
super-type relation.

Implementing the Hierarchy
--------------------------
We can attach metadata to both scalar accesses and aggregate accesses. Let's
call scalar tags and aggregate tags.
Each tag can be a sequence of nodes in the type DAG.
!C::b2.a.x := [ "tbaa.path", !C, "C::b2", !B,
"B::a", !A, "A::x", !int ]

Comments and questions are welcome.

Manman

Krzysztof Parzyszek

2013-Mar-11 19:14 UTC

head link

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

On 3/11/2013 1:41 PM, Manman Ren wrote:>
>    |  x  |
>           extends
>    |    z         |
>        super-type
>             |  y  |
What does this mean?  "The type of z extends the type of x, and the type 
of z is a super-type of y"?

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Daniel Berlin

2013-Mar-11 20:17 UTC

head link

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

On Mon, Mar 11, 2013 at 11:41 AM, Manman Ren <mren at apple.com>
wrote:> Based on discussions with John McCall
>
> We currently focus on field accesses of structs, more specifically, on
fields that are scalars or structs.
>
> Fundamental rules from C11
> --------------------------
> An object shall have its stored value accessed only by an lvalue expression
that has one of the following types: [footnote: The intent of this list is to
specify those circumstances in which an object may or may not be aliased.]
> 1. a type compatible with the effective type of the object,
> 2. a qualified version of a type compatible with the effective type of the
object,
> 3. a type that is the signed or unsigned type corresponding to the
effective type of the object,
> 4. a type that is the signed or unsigned type corresponding to a qualified
version of the effective type of the object,
> 5. an aggregate or union type that includes one of the aforementioned types
among its members (including, recursively, a member of a subaggregate or
contained union), or
> 6. a character type.
>
> Example
> -------
>   struct A {
>     int x;
>     int y;
>   };
>   struct B {
>     A a;
>     int z;
>   };
>   struct C {
>     B b1;
>     B b2;
>     int *p;
>   };
>
> Type DAG:
>   int <- A::x <- A
>   int <- A::y <- A <- B::a <- B <- C::b1 <- C
>   int <----------------- B::z <- B <- C::b2 <- C
>   any pointer <--------------------- C::p  <- C
>
> The type DAG has two types of TBAA nodes:
> 1> the existing scalar nodes
> 2> the struct nodes (this is different from the current tbaa.struct)
>    A struct node has a unique name plus a list of pairs (field name, field
type).
>    For example, struct node for "C" should look like
>    !4 = metadata !{"C", "C::b1", metadata !3,
"C::b2", metadata !3, "C::p", metadata !2}
>    where !3 is the struct node for "B", !2 is pointer type.
>
> Given a field access
>   struct B *bp = ...;
>   bp->a.x = 5;
> we annotate it as B::a.x.
In the case of multiple structures containing substructures, how are
you differentiating?

IE given

struct A {
struct B b;
}
struct C {
struct B b;
}

How do you know the above struct B *bp =...; is B::b from C and not B::b from A?

(I agree you can know in the case of direct aggregates, but I argue
you have no way to know in the case of pointer arguments without
interprocedural analysis)
It gets worse the more levels you have.

ie if you add
struct B {
struct E e;
}

and have struct E *e = ...
how do you know it's the B::e contained in struct C, or the B::e
contained in struct A?


Again, i agree you can do both scalar and direct aggregates, but not
aggregates and scalars through pointers.

> Implementing the Hierarchy
> --------------------------
> We can attach metadata to both scalar accesses and aggregate accesses.
Let's call scalar tags and aggregate tags.
> Each tag can be a sequence of nodes in the type DAG.
> !C::b2.a.x := [ "tbaa.path", !C, "C::b2", !B,
"B::a", !A, "A::x", !int ]
>
This can get quite deep quite quickly.
Are there actually cases where knowing the outermost aggregate type +
{byte offset, size} of field access does not give you the exact same
disambiguation capability?

I ask because in GCC, we determined the answer was "no", since
pointers require special analysis anyway.

Manman Ren

2013-Mar-11 20:53 UTC

head link

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

On Mar 11, 2013, at 12:14 PM, Krzysztof Parzyszek wrote:
> On 3/11/2013 1:41 PM, Manman Ren wrote:
>> 
>>   |  x  |
>>          extends
>>   |    z         |
>>       super-type
>>            |  y  |
> 
> What does this mean?  "The type of z extends the type of x, and the
type of z is a super-type of y"?Extends x into a subobject z, and z is a super-type of y.

-Manman> 
> -Krzysztof
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by The Linux Foundation
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Manman Ren

2013-Mar-11 21:06 UTC

head link

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

On Mar 11, 2013, at 1:17 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> On Mon, Mar 11, 2013 at 11:41 AM, Manman Ren <mren at apple.com>
wrote:
>> Based on discussions with John McCall
>> 
>> We currently focus on field accesses of structs, more specifically, on
fields that are scalars or structs.
>> 
>> Fundamental rules from C11
>> --------------------------
>> An object shall have its stored value accessed only by an lvalue
expression that has one of the following types: [footnote: The intent of this
list is to specify those circumstances in which an object may or may not be
aliased.]
>> 1. a type compatible with the effective type of the object,
>> 2. a qualified version of a type compatible with the effective type of
the object,
>> 3. a type that is the signed or unsigned type corresponding to the
effective type of the object,
>> 4. a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
>> 5. an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a subaggregate or
contained union), or
>> 6. a character type.
>> 
> 
>> Example
>> -------
>>  struct A {
>>    int x;
>>    int y;
>>  };
>>  struct B {
>>    A a;
>>    int z;
>>  };
>>  struct C {
>>    B b1;
>>    B b2;
>>    int *p;
>>  };
>> 
>> Type DAG:
>>  int <- A::x <- A
>>  int <- A::y <- A <- B::a <- B <- C::b1 <- C
>>  int <----------------- B::z <- B <- C::b2 <- C
>>  any pointer <--------------------- C::p  <- C
>> 
>> The type DAG has two types of TBAA nodes:
>> 1> the existing scalar nodes
>> 2> the struct nodes (this is different from the current tbaa.struct)
>>   A struct node has a unique name plus a list of pairs (field name,
field type).
>>   For example, struct node for "C" should look like
>>   !4 = metadata !{"C", "C::b1", metadata !3,
"C::b2", metadata !3, "C::p", metadata !2}
>>   where !3 is the struct node for "B", !2 is pointer type.
>> 
>> Given a field access
>>  struct B *bp = ...;
>>  bp->a.x = 5;
>> we annotate it as B::a.x.
> 
> In the case of multiple structures containing substructures, how are
> you differentiating?
> 
> IE given
> 
> struct A {
> struct B b;
> }
> struct C {
> struct B b;
> }
> 
> How do you know the above struct B *bp =...; is B::b from C and not B::b
from A?
> 
> (I agree you can know in the case of direct aggregates, but I argue
> you have no way to know in the case of pointer arguments without
> interprocedural analysis) 
> It gets worse the more levels you have.
> 
> ie if you add
> struct B {
> struct E e;
> }
> 
> and have struct E *e = ...
> how do you know it's the B::e contained in struct C, or the B::e
> contained in struct A?
> 
> 
> Again, i agree you can do both scalar and direct aggregates, but not
> aggregates and scalars through pointers.
> 
I don't have immediate plan of pointer analysis. For the given example, we
will treat field accesses from bp (pointer to struct B) as B::x.x, bp can be
from either C or A.
> 
>> Implementing the Hierarchy
>> --------------------------
>> We can attach metadata to both scalar accesses and aggregate accesses.
Let's call scalar tags and aggregate tags.
>> Each tag can be a sequence of nodes in the type DAG.
>> !C::b2.a.x := [ "tbaa.path", !C, "C::b2", !B,
"B::a", !A, "A::x", !int ]
>> 
> 
> This can get quite deep quite quickly.
> Are there actually cases where knowing the outermost aggregate type +
> {byte offset, size} of field access does not give you the exact same
> disambiguation capability?The answer is no. We should be able to deduce the access path from {byte offset,
size}.
However, I don't know an easy way to check alias(x,y) given {byte offset,
size} for x and y.

Thanks,
Manman
> 
> I ask because in GCC, we determined the answer was "no", since
> pointers require special analysis anyway.

Shuxin Yang

2013-Mar-12 22:44 UTC

head link

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

On 3/11/13 1:17 PM, Daniel Berlin wrote:> On Mon, Mar 11, 2013 at 11:41 AM, Manman Ren <mren at apple.com>
wrote:
>> Based on discussions with John McCall
>>
>> We currently focus on field accesses of structs, more specifically, on
fields that are scalars or structs.
>>
>> Fundamental rules from C11
>> --------------------------
>> An object shall have its stored value accessed only by an lvalue
expression that has one of the following types: [footnote: The intent of this
list is to specify those circumstances in which an object may or may not be
aliased.]
>> 1. a type compatible with the effective type of the object,
>> 2. a qualified version of a type compatible with the effective type of
the object,
>> 3. a type that is the signed or unsigned type corresponding to the
effective type of the object,
>> 4. a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
>> 5. an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a subaggregate or
contained union), or
>> 6. a character type.
>>
>> Example
>> -------
>>    struct A {
>>      int x;
>>      int y;
>>    };
>>    struct B {
>>      A a;
>>      int z;
>>    };
>>    struct C {
>>      B b1;
>>      B b2;
>>      int *p;
>>    };
>>
>> Type DAG:
>>    int <- A::x <- A
>>    int <- A::y <- A <- B::a <- B <- C::b1 <- C
>>    int <----------------- B::z <- B <- C::b2 <- C
>>    any pointer <--------------------- C::p  <- C
>>
>> The type DAG has two types of TBAA nodes:
>> 1> the existing scalar nodes
>> 2> the struct nodes (this is different from the current tbaa.struct)
>>     A struct node has a unique name plus a list of pairs (field name,
field type).
>>     For example, struct node for "C" should look like
>>     !4 = metadata !{"C", "C::b1", metadata !3,
"C::b2", metadata !3, "C::p", metadata !2}
>>     where !3 is the struct node for "B", !2 is pointer type.
>>
>> Given a field access
>>    struct B *bp = ...;
>>    bp->a.x = 5;
>> we annotate it as B::a.x.
> In the case of multiple structures containing substructures, how are
> you differentiating?
>
> IE given
>
> struct A {
> struct B b;
> }
> struct C {
> struct B b;
> }
>
> How do you know the above struct B *bp =...; is B::b from C and not B::b
from A?If I understand correct, the proposed graph is DAG, not tree, and it 
should be able to tackling the case
where a type is included more than once.

On the other hand, the info which is annotated to the memory access is 
kind of "immediate enclosing aggregate type",
which should be unique.
>
> (I agree you can know in the case of direct aggregates, but I argue
> you have no way to know in the case of pointer arguments without
> interprocedural analysis)
> It gets worse the more levels you have.
>
> ie if you add
> struct B {
> struct E e;
> }
>
> and have struct E *e = ...
> how do you know it's the B::e contained in struct C, or the B::e
> contained in struct A?
>
>
> Again, i agree you can do both scalar and direct aggregates, but not
> aggregates and scalars through pointers.
>
>That is exactly the power of TBAA.  If the both memory accesses are 
direct load/store,
it dose not even need TBAA, check their base/offset/size is sufficient 
for disambiguation.

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Mar 2013 - [LLVMdev] PROPOSAL: struct-access-path aware TBAA

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

Seemingly Similar Threads