thr3ads.net - llvm dev - [LLVMdev] PROPOSAL: struct-access-path aware TBAA [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Manman Ren

2013-Mar-11 22:45 UTC

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

On Mar 11, 2013, at 2:37 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> On Mon, Mar 11, 2013 at 2:06 PM, Manman Ren <mren at apple.com>
wrote:
>> 
>> On Mar 11, 2013, at 1:17 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>> 
>>> On Mon, Mar 11, 2013 at 11:41 AM, Manman Ren <mren at
apple.com> wrote:
>>>> Based on discussions with John McCall
>>>> 
>>>> We currently focus on field accesses of structs, more
specifically, on fields that are scalars or structs.
>>>> 
>>>> Fundamental rules from C11
>>>> --------------------------
>>>> An object shall have its stored value accessed only by an
lvalue expression that has one of the following types: [footnote: The intent of
this list is to specify those circumstances in which an object may or may not be
aliased.]
>>>> 1. a type compatible with the effective type of the object,
>>>> 2. a qualified version of a type compatible with the effective
type of the object,
>>>> 3. a type that is the signed or unsigned type corresponding to
the effective type of the object,
>>>> 4. a type that is the signed or unsigned type corresponding to
a qualified version of the effective type of the object,
>>>> 5. an aggregate or union type that includes one of the
aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
>>>> 6. a character type.
>>>> 
>>> 
>>>> Example
>>>> -------
>>>> struct A {
>>>>   int x;
>>>>   int y;
>>>> };
>>>> struct B {
>>>>   A a;
>>>>   int z;
>>>> };
>>>> struct C {
>>>>   B b1;
>>>>   B b2;
>>>>   int *p;
>>>> };
>>>> 
>>>> Type DAG:
>>>> int <- A::x <- A
>>>> int <- A::y <- A <- B::a <- B <- C::b1 <- C
>>>> int <----------------- B::z <- B <- C::b2 <- C
>>>> any pointer <--------------------- C::p  <- C
>>>> 
>>>> The type DAG has two types of TBAA nodes:
>>>> 1> the existing scalar nodes
>>>> 2> the struct nodes (this is different from the current
tbaa.struct)
>>>>  A struct node has a unique name plus a list of pairs (field
name, field type).
>>>>  For example, struct node for "C" should look like
>>>>  !4 = metadata !{"C", "C::b1", metadata !3,
"C::b2", metadata !3, "C::p", metadata !2}
>>>>  where !3 is the struct node for "B", !2 is pointer
type.
>>>> 
>>>> Given a field access
>>>> struct B *bp = ...;
>>>> bp->a.x = 5;
>>>> we annotate it as B::a.x.
>>> 
>>> In the case of multiple structures containing substructures, how
are
>>> you differentiating?
>>> 
>>> IE given
>>> 
>>> struct A {
>>> struct B b;
>>> }
>>> struct C {
>>> struct B b;
>>> }
>>> 
>>> How do you know the above struct B *bp =...; is B::b from C and not
B::b from A?
>>> 
>>> (I agree you can know in the case of direct aggregates, but I argue
>>> you have no way to know in the case of pointer arguments without
>>> interprocedural analysis)
>>> It gets worse the more levels you have.
>>> 
>>> ie if you add
>>> struct B {
>>> struct E e;
>>> }
>>> 
>>> and have struct E *e = ...
>>> how do you know it's the B::e contained in struct C, or the
B::e
>>> contained in struct A?
>>> 
>>> 
>>> Again, i agree you can do both scalar and direct aggregates, but
not
>>> aggregates and scalars through pointers.
>>> 
>> 
>> I don't have immediate plan of pointer analysis. For the given
example, we will treat field accesses from bp (pointer to struct B) as B::x.x,
bp can be from either C or A.
>> 
>>> 
>>>> Implementing the Hierarchy
>>>> --------------------------
>>>> We can attach metadata to both scalar accesses and aggregate
accesses. Let's call scalar tags and aggregate tags.
>>>> Each tag can be a sequence of nodes in the type DAG.
>>>> !C::b2.a.x := [ "tbaa.path", !C, "C::b2",
!B, "B::a", !A, "A::x", !int ]
>>>> 
>>> 
>>> This can get quite deep quite quickly.
>>> Are there actually cases where knowing the outermost aggregate type
+
>>> {byte offset, size} of field access does not give you the exact
same
>>> disambiguation capability?
>> The answer is no. We should be able to deduce the access path from
{byte offset, size}.
>> However, I don't know an easy way to check alias(x,y) given {byte
offset, size} for x and y.
> 
> Well, that part is easy, it's just overlap, once you know they are
> both contained in the same outermost type.
How do you figure out the outermost common type of two access 'x' and
'y'?
Once that is figured out, it should be easy to check alias('x',
'y').> 
> You have to track what the types are in the frontend to generate the
> access path anyway, so you can compute the offsets + sizes into those
> types at the same time.
Generating the offsets+sizes are not hard.> 
> Essentially, it it translating your representation into something that
> can be intersected in constant time, unless i'm missing something
> about how you are annotating the access paths.If we can query alias('x','y') in constant time, irrelevant to
the depth of each access, that will be great.

-Manman

Daniel Berlin

2013-Mar-11 23:23 UTC

head link

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

On Mon, Mar 11, 2013 at 3:45 PM, Manman Ren <mren at apple.com>
wrote:>
> On Mar 11, 2013, at 2:37 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
>
>> On Mon, Mar 11, 2013 at 2:06 PM, Manman Ren <mren at apple.com>
wrote:
>>>
>>> On Mar 11, 2013, at 1:17 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>>
>>>> On Mon, Mar 11, 2013 at 11:41 AM, Manman Ren <mren at
apple.com> wrote:
>>>>> Based on discussions with John McCall
>>>>>
>>>>> We currently focus on field accesses of structs, more
specifically, on fields that are scalars or structs.
>>>>>
>>>>> Fundamental rules from C11
>>>>> --------------------------
>>>>> An object shall have its stored value accessed only by an
lvalue expression that has one of the following types: [footnote: The intent of
this list is to specify those circumstances in which an object may or may not be
aliased.]
>>>>> 1. a type compatible with the effective type of the object,
>>>>> 2. a qualified version of a type compatible with the
effective type of the object,
>>>>> 3. a type that is the signed or unsigned type corresponding
to the effective type of the object,
>>>>> 4. a type that is the signed or unsigned type corresponding
to a qualified version of the effective type of the object,
>>>>> 5. an aggregate or union type that includes one of the
aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
>>>>> 6. a character type.
>>>>>
>>>>
>>>>> Example
>>>>> -------
>>>>> struct A {
>>>>>   int x;
>>>>>   int y;
>>>>> };
>>>>> struct B {
>>>>>   A a;
>>>>>   int z;
>>>>> };
>>>>> struct C {
>>>>>   B b1;
>>>>>   B b2;
>>>>>   int *p;
>>>>> };
>>>>>
>>>>> Type DAG:
>>>>> int <- A::x <- A
>>>>> int <- A::y <- A <- B::a <- B <- C::b1 <-
C
>>>>> int <----------------- B::z <- B <- C::b2 <- C
>>>>> any pointer <--------------------- C::p  <- C
>>>>>
>>>>> The type DAG has two types of TBAA nodes:
>>>>> 1> the existing scalar nodes
>>>>> 2> the struct nodes (this is different from the current
tbaa.struct)
>>>>>  A struct node has a unique name plus a list of pairs
(field name, field type).
>>>>>  For example, struct node for "C" should look
like
>>>>>  !4 = metadata !{"C", "C::b1", metadata
!3, "C::b2", metadata !3, "C::p", metadata !2}
>>>>>  where !3 is the struct node for "B", !2 is
pointer type.
>>>>>
>>>>> Given a field access
>>>>> struct B *bp = ...;
>>>>> bp->a.x = 5;
>>>>> we annotate it as B::a.x.
>>>>
>>>> In the case of multiple structures containing substructures,
how are
>>>> you differentiating?
>>>>
>>>> IE given
>>>>
>>>> struct A {
>>>> struct B b;
>>>> }
>>>> struct C {
>>>> struct B b;
>>>> }
>>>>
>>>> How do you know the above struct B *bp =...; is B::b from C and
not B::b from A?
>>>>
>>>> (I agree you can know in the case of direct aggregates, but I
argue
>>>> you have no way to know in the case of pointer arguments
without
>>>> interprocedural analysis)
>>>> It gets worse the more levels you have.
>>>>
>>>> ie if you add
>>>> struct B {
>>>> struct E e;
>>>> }
>>>>
>>>> and have struct E *e = ...
>>>> how do you know it's the B::e contained in struct C, or the
B::e
>>>> contained in struct A?
>>>>
>>>>
>>>> Again, i agree you can do both scalar and direct aggregates,
but not
>>>> aggregates and scalars through pointers.
>>>>
>>>
>>> I don't have immediate plan of pointer analysis. For the given
example, we will treat field accesses from bp (pointer to struct B) as B::x.x,
bp can be from either C or A.
>>>
>>>>
>>>>> Implementing the Hierarchy
>>>>> --------------------------
>>>>> We can attach metadata to both scalar accesses and
aggregate accesses. Let's call scalar tags and aggregate tags.
>>>>> Each tag can be a sequence of nodes in the type DAG.
>>>>> !C::b2.a.x := [ "tbaa.path", !C,
"C::b2", !B, "B::a", !A, "A::x", !int ]
>>>>>
>>>>
>>>> This can get quite deep quite quickly.
>>>> Are there actually cases where knowing the outermost aggregate
type +
>>>> {byte offset, size} of field access does not give you the exact
same
>>>> disambiguation capability?
>>> The answer is no. We should be able to deduce the access path from
{byte offset, size}.
>>> However, I don't know an easy way to check alias(x,y) given
{byte offset, size} for x and y.
>>
>> Well, that part is easy, it's just overlap, once you know they are
>> both contained in the same outermost type.
>
> How do you figure out the outermost common type of two access 'x'
and 'y'?
How are you annotating each access with a path as you go right now?
It should be the same thing to just annotate it then.
Without thinking too hard:
If it's all direct aggregate access in the original source, whatever
you see as the leftmost part should work.

I haven't thought hard about it, but I believe it should wrok.


Note that again, neither works for pointers unless you can guarantee
you have the entire access path, which is hard:

union u {
struct A a;
struct B b;
};

int foo(struct A *a, struct B *b) {
....

}

void bar() {
union u whoops;
foo(&whoops.a, &whoops.b);
}

If you annotated foo with a partial access path, you may or may not
get the right answer.
You would think your type dag solves this, because you'll see they
share a supertype.

But it's actually worse than that:

file: bar.h
 union u {
struct A a;
struct B b;
};

int foo(struct A *, struct B*);

file: foo.c

int foo(struct A *a, struct B *b) {
....

}

file: bar.c

#include "bar.h"

void bar() {
union u whoops;
foo(&whoops.a, &whoops.b);
}


You will never see compiling  foo.c that tells you both types are in a
union together, or how to annotate these access paths.

Hence the "you need to punt on pointers without more analysis" claim I
made :)

> Once that is figured out, it should be easy to check alias('x',
'y').
>>
>> You have to track what the types are in the frontend to generate the
>> access path anyway, so you can compute the offsets + sizes into those
>> types at the same time.
> Generating the offsets+sizes are not hard.
>>
>> Essentially, it it translating your representation into something that
>> can be intersected in constant time, unless i'm missing something
>> about how you are annotating the access paths.
> If we can query alias('x','y') in constant time, irrelevant
to the depth of each access, that will be great.
>
> -Manman
>

Manman Ren

2013-Mar-11 23:56 UTC

head link

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

On Mar 11, 2013, at 4:23 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> On Mon, Mar 11, 2013 at 3:45 PM, Manman Ren <mren at apple.com>
wrote:
>> 
>> On Mar 11, 2013, at 2:37 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>> 
>>> On Mon, Mar 11, 2013 at 2:06 PM, Manman Ren <mren at
apple.com> wrote:
>>>> 
>>>> On Mar 11, 2013, at 1:17 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>>> 
>>>>> On Mon, Mar 11, 2013 at 11:41 AM, Manman Ren <mren at
apple.com> wrote:
>>>>>> Based on discussions with John McCall
>>>>>> 
>>>>>> We currently focus on field accesses of structs, more
specifically, on fields that are scalars or structs.
>>>>>> 
>>>>>> Fundamental rules from C11
>>>>>> --------------------------
>>>>>> An object shall have its stored value accessed only by
an lvalue expression that has one of the following types: [footnote: The intent
of this list is to specify those circumstances in which an object may or may not
be aliased.]
>>>>>> 1. a type compatible with the effective type of the
object,
>>>>>> 2. a qualified version of a type compatible with the
effective type of the object,
>>>>>> 3. a type that is the signed or unsigned type
corresponding to the effective type of the object,
>>>>>> 4. a type that is the signed or unsigned type
corresponding to a qualified version of the effective type of the object,
>>>>>> 5. an aggregate or union type that includes one of the
aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
>>>>>> 6. a character type.
>>>>>> 
>>>>> 
>>>>>> Example
>>>>>> -------
>>>>>> struct A {
>>>>>>  int x;
>>>>>>  int y;
>>>>>> };
>>>>>> struct B {
>>>>>>  A a;
>>>>>>  int z;
>>>>>> };
>>>>>> struct C {
>>>>>>  B b1;
>>>>>>  B b2;
>>>>>>  int *p;
>>>>>> };
>>>>>> 
>>>>>> Type DAG:
>>>>>> int <- A::x <- A
>>>>>> int <- A::y <- A <- B::a <- B <- C::b1
<- C
>>>>>> int <----------------- B::z <- B <- C::b2
<- C
>>>>>> any pointer <--------------------- C::p  <- C
>>>>>> 
>>>>>> The type DAG has two types of TBAA nodes:
>>>>>> 1> the existing scalar nodes
>>>>>> 2> the struct nodes (this is different from the
current tbaa.struct)
>>>>>> A struct node has a unique name plus a list of pairs
(field name, field type).
>>>>>> For example, struct node for "C" should look
like
>>>>>> !4 = metadata !{"C", "C::b1",
metadata !3, "C::b2", metadata !3, "C::p", metadata !2}
>>>>>> where !3 is the struct node for "B", !2 is
pointer type.
>>>>>> 
>>>>>> Given a field access
>>>>>> struct B *bp = ...;
>>>>>> bp->a.x = 5;
>>>>>> we annotate it as B::a.x.
>>>>> 
>>>>> In the case of multiple structures containing
substructures, how are
>>>>> you differentiating?
>>>>> 
>>>>> IE given
>>>>> 
>>>>> struct A {
>>>>> struct B b;
>>>>> }
>>>>> struct C {
>>>>> struct B b;
>>>>> }
>>>>> 
>>>>> How do you know the above struct B *bp =...; is B::b from C
and not B::b from A?
>>>>> 
>>>>> (I agree you can know in the case of direct aggregates, but
I argue
>>>>> you have no way to know in the case of pointer arguments
without
>>>>> interprocedural analysis)
>>>>> It gets worse the more levels you have.
>>>>> 
>>>>> ie if you add
>>>>> struct B {
>>>>> struct E e;
>>>>> }
>>>>> 
>>>>> and have struct E *e = ...
>>>>> how do you know it's the B::e contained in struct C, or
the B::e
>>>>> contained in struct A?
>>>>> 
>>>>> 
>>>>> Again, i agree you can do both scalar and direct
aggregates, but not
>>>>> aggregates and scalars through pointers.
>>>>> 
>>>> 
>>>> I don't have immediate plan of pointer analysis. For the
given example, we will treat field accesses from bp (pointer to struct B) as
B::x.x, bp can be from either C or A.
>>>> 
>>>>> 
>>>>>> Implementing the Hierarchy
>>>>>> --------------------------
>>>>>> We can attach metadata to both scalar accesses and
aggregate accesses. Let's call scalar tags and aggregate tags.
>>>>>> Each tag can be a sequence of nodes in the type DAG.
>>>>>> !C::b2.a.x := [ "tbaa.path", !C,
"C::b2", !B, "B::a", !A, "A::x", !int ]
>>>>>> 
>>>>> 
>>>>> This can get quite deep quite quickly.
>>>>> Are there actually cases where knowing the outermost
aggregate type +
>>>>> {byte offset, size} of field access does not give you the
exact same
>>>>> disambiguation capability?
>>>> The answer is no. We should be able to deduce the access path
from {byte offset, size}.
>>>> However, I don't know an easy way to check alias(x,y) given
{byte offset, size} for x and y.
>>> 
>>> Well, that part is easy, it's just overlap, once you know they
are
>>> both contained in the same outermost type.
>> 
>> How do you figure out the outermost common type of two access
'x' and 'y'?
> 
> How are you annotating each access with a path as you go right now?
> It should be the same thing to just annotate it then.
I thought your suggestion was to replace
!C::b2.a.x := [ "tbaa.path", !C, "C::b2", !B,
"B::a", !A, "A::x", !int ]
with
!C::b2.a.x := [ "tbaa.offset", !C, offset within C, size ]

!B::a := [ "tbaa.path", !B, "B:a", !A ]
with
!B::a := [ "tbaa.offset", !B, offset within B, size]

Then we already lost the access path at IR level.
> Without thinking too hard:
> If it's all direct aggregate access in the original source, whatever
> you see as the leftmost part should work.
> 
> I haven't thought hard about it, but I believe it should work.

At llvm IR level, it is not that straight-forward to figure out the outermost
common type of C::b2.a.x and B::a with tbaa.offset.
To me, the outermost common type should be "B", then we need to check
the relative offset from B for both accesses.

-Manman
> 
> 
> Note that again, neither works for pointers unless you can guarantee
> you have the entire access path, which is hard:
> 
> union u {
> struct A a;
> struct B b;
> };
> 
> int foo(struct A *a, struct B *b) {
> ....
> 
> }
> 
> void bar() {
> union u whoops;
> foo(&whoops.a, &whoops.b);
> }
> 
> If you annotated foo with a partial access path, you may or may not
> get the right answer.
> You would think your type dag solves this, because you'll see they
> share a supertype.
> 
> But it's actually worse than that:
> 
> file: bar.h
> union u {
> struct A a;
> struct B b;
> };
> 
> int foo(struct A *, struct B*);
> 
> file: foo.c
> 
> int foo(struct A *a, struct B *b) {
> ....
> 
> }
> 
> file: bar.c
> 
> #include "bar.h"
> 
> void bar() {
> union u whoops;
> foo(&whoops.a, &whoops.b);
> }
> 
> 
> You will never see compiling  foo.c that tells you both types are in a
> union together, or how to annotate these access paths.
> 
> Hence the "you need to punt on pointers without more analysis"
claim I made :)
> 
> 
>> Once that is figured out, it should be easy to check alias('x',
'y').
>>> 
>>> You have to track what the types are in the frontend to generate
the
>>> access path anyway, so you can compute the offsets + sizes into
those
>>> types at the same time.
>> Generating the offsets+sizes are not hard.
>>> 
>>> Essentially, it it translating your representation into something
that
>>> can be intersected in constant time, unless i'm missing
something
>>> about how you are annotating the access paths.
>> If we can query alias('x','y') in constant time,
irrelevant to the depth of each access, that will be great.
>> 
>> -Manman
>>

Manman Ren

2013-Mar-12 17:02 UTC

head link

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

On Mar 11, 2013, at 4:23 PM, Daniel Berlin wrote:
> On Mon, Mar 11, 2013 at 3:45 PM, Manman Ren <mren at apple.com>
wrote:
>> 
>> On Mar 11, 2013, at 2:37 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>> 
>>> On Mon, Mar 11, 2013 at 2:06 PM, Manman Ren <mren at
apple.com> wrote:
>>>> 
>>>> On Mar 11, 2013, at 1:17 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>>> 
>>>>> On Mon, Mar 11, 2013 at 11:41 AM, Manman Ren <mren at
apple.com> wrote:
>>>>>> Based on discussions with John McCall
>>>>>> 
>>>>>> We currently focus on field accesses of structs, more
specifically, on fields that are scalars or structs.
>>>>>> 
>>>>>> Fundamental rules from C11
>>>>>> --------------------------
>>>>>> An object shall have its stored value accessed only by
an lvalue expression that has one of the following types: [footnote: The intent
of this list is to specify those circumstances in which an object may or may not
be aliased.]
>>>>>> 1. a type compatible with the effective type of the
object,
>>>>>> 2. a qualified version of a type compatible with the
effective type of the object,
>>>>>> 3. a type that is the signed or unsigned type
corresponding to the effective type of the object,
>>>>>> 4. a type that is the signed or unsigned type
corresponding to a qualified version of the effective type of the object,
>>>>>> 5. an aggregate or union type that includes one of the
aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
>>>>>> 6. a character type.
>>>>>> 
>>>>> 
>>>>>> Example
>>>>>> -------
>>>>>> struct A {
>>>>>>  int x;
>>>>>>  int y;
>>>>>> };
>>>>>> struct B {
>>>>>>  A a;
>>>>>>  int z;
>>>>>> };
>>>>>> struct C {
>>>>>>  B b1;
>>>>>>  B b2;
>>>>>>  int *p;
>>>>>> };
>>>>>> 
>>>>>> Type DAG:
>>>>>> int <- A::x <- A
>>>>>> int <- A::y <- A <- B::a <- B <- C::b1
<- C
>>>>>> int <----------------- B::z <- B <- C::b2
<- C
>>>>>> any pointer <--------------------- C::p  <- C
>>>>>> 
>>>>>> The type DAG has two types of TBAA nodes:
>>>>>> 1> the existing scalar nodes
>>>>>> 2> the struct nodes (this is different from the
current tbaa.struct)
>>>>>> A struct node has a unique name plus a list of pairs
(field name, field type).
>>>>>> For example, struct node for "C" should look
like
>>>>>> !4 = metadata !{"C", "C::b1",
metadata !3, "C::b2", metadata !3, "C::p", metadata !2}
>>>>>> where !3 is the struct node for "B", !2 is
pointer type.
>>>>>> 
>>>>>> Given a field access
>>>>>> struct B *bp = ...;
>>>>>> bp->a.x = 5;
>>>>>> we annotate it as B::a.x.
>>>>> 
>>>>> In the case of multiple structures containing
substructures, how are
>>>>> you differentiating?
>>>>> 
>>>>> IE given
>>>>> 
>>>>> struct A {
>>>>> struct B b;
>>>>> }
>>>>> struct C {
>>>>> struct B b;
>>>>> }
>>>>> 
>>>>> How do you know the above struct B *bp =...; is B::b from C
and not B::b from A?
>>>>> 
>>>>> (I agree you can know in the case of direct aggregates, but
I argue
>>>>> you have no way to know in the case of pointer arguments
without
>>>>> interprocedural analysis)
>>>>> It gets worse the more levels you have.
>>>>> 
>>>>> ie if you add
>>>>> struct B {
>>>>> struct E e;
>>>>> }
>>>>> 
>>>>> and have struct E *e = ...
>>>>> how do you know it's the B::e contained in struct C, or
the B::e
>>>>> contained in struct A?
>>>>> 
>>>>> 
>>>>> Again, i agree you can do both scalar and direct
aggregates, but not
>>>>> aggregates and scalars through pointers.
>>>>> 
>>>> 
>>>> I don't have immediate plan of pointer analysis. For the
given example, we will treat field accesses from bp (pointer to struct B) as
B::x.x, bp can be from either C or A.
>>>> 
>>>>> 
>>>>>> Implementing the Hierarchy
>>>>>> --------------------------
>>>>>> We can attach metadata to both scalar accesses and
aggregate accesses. Let's call scalar tags and aggregate tags.
>>>>>> Each tag can be a sequence of nodes in the type DAG.
>>>>>> !C::b2.a.x := [ "tbaa.path", !C,
"C::b2", !B, "B::a", !A, "A::x", !int ]
>>>>>> 
>>>>> 
>>>>> This can get quite deep quite quickly.
>>>>> Are there actually cases where knowing the outermost
aggregate type +
>>>>> {byte offset, size} of field access does not give you the
exact same
>>>>> disambiguation capability?
>>>> The answer is no. We should be able to deduce the access path
from {byte offset, size}.
>>>> However, I don't know an easy way to check alias(x,y) given
{byte offset, size} for x and y.
>>> 
>>> Well, that part is easy, it's just overlap, once you know they
are
>>> both contained in the same outermost type.
>> 
>> How do you figure out the outermost common type of two access
'x' and 'y'?
> 
> How are you annotating each access with a path as you go right now?
> It should be the same thing to just annotate it then.
> Without thinking too hard:
> If it's all direct aggregate access in the original source, whatever
> you see as the leftmost part should work.
> 
> I haven't thought hard about it, but I believe it should wrok.
> 
> 
> Note that again, neither works for pointers unless you can guarantee
> you have the entire access path, which is hard:
> 
> union u {
> struct A a;
> struct B b;
> };
> 
> int foo(struct A *a, struct B *b) {
> ....
> 
> }
> 
> void bar() {
> union u whoops;
> foo(&whoops.a, &whoops.b);
> }
We should have the same issue with the current TBAA if replacing struct A with
int and struct B with float.
Inside foo, we will assume access to *a will not alias with access to *b.
This issue is specific to TBAA, not struct-access-path aware TBAA.
Correct me if I am wrong :]

Manman> 
> If you annotated foo with a partial access path, you may or may not
> get the right answer.
> You would think your type dag solves this, because you'll see they
> share a supertype.
> 
> But it's actually worse than that:
> 
> file: bar.h
> union u {
> struct A a;
> struct B b;
> };
> 
> int foo(struct A *, struct B*);
> 
> file: foo.c
> 
> int foo(struct A *a, struct B *b) {
> ....
> 
> }
> 
> file: bar.c
> 
> #include "bar.h"
> 
> void bar() {
> union u whoops;
> foo(&whoops.a, &whoops.b);
> }
> 
> 
> You will never see compiling  foo.c that tells you both types are in a
> union together, or how to annotate these access paths.
> 
> Hence the "you need to punt on pointers without more analysis"
claim I made :)
> 
> 
>> Once that is figured out, it should be easy to check alias('x',
'y').
>>> 
>>> You have to track what the types are in the frontend to generate
the
>>> access path anyway, so you can compute the offsets + sizes into
those
>>> types at the same time.
>> Generating the offsets+sizes are not hard.
>>> 
>>> Essentially, it it translating your representation into something
that
>>> can be intersected in constant time, unless i'm missing
something
>>> about how you are annotating the access paths.
>> If we can query alias('x','y') in constant time,
irrelevant to the depth of each access, that will be great.
>> 
>> -Manman
>>

Shuxin Yang

2013-Mar-12 22:25 UTC

head link

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

Type-based alias analysis is intrinsically unsafe. I don't think it make 
big sense to see the whole program
being compiled and stitch together all type-include-graph together just 
for this optimization. We would otherwise
have to implement this optimization in interprocedural analysis phase 
(with whole program mode).

That said, if would be nice if front-end could figure out if a type is 
only accessiable inside a translation
unit; TBAA can take advantage this info to make it is unsafer.


On 3/11/13, 4:23 PM, Daniel Berlin wrote:> On Mon, Mar 11, 2013 at 3:45 PM, Manman Ren <mren at apple.com>
wrote:
>> On Mar 11, 2013, at 2:37 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>
>>> On Mon, Mar 11, 2013 at 2:06 PM, Manman Ren <mren at
apple.com> wrote:
>>>> On Mar 11, 2013, at 1:17 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>>>
>>>>> On Mon, Mar 11, 2013 at 11:41 AM, Manman Ren <mren at
apple.com> wrote:
>>>>>> Based on discussions with John McCall
>>>>>>
>>>>>> We currently focus on field accesses of structs, more
specifically, on fields that are scalars or structs.
>>>>>>
>>>>>> Fundamental rules from C11
>>>>>> --------------------------
>>>>>> An object shall have its stored value accessed only by
an lvalue expression that has one of the following types: [footnote: The intent
of this list is to specify those circumstances in which an object may or may not
be aliased.]
>>>>>> 1. a type compatible with the effective type of the
object,
>>>>>> 2. a qualified version of a type compatible with the
effective type of the object,
>>>>>> 3. a type that is the signed or unsigned type
corresponding to the effective type of the object,
>>>>>> 4. a type that is the signed or unsigned type
corresponding to a qualified version of the effective type of the object,
>>>>>> 5. an aggregate or union type that includes one of the
aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
>>>>>> 6. a character type.
>>>>>>
>>>>>> Example
>>>>>> -------
>>>>>> struct A {
>>>>>>    int x;
>>>>>>    int y;
>>>>>> };
>>>>>> struct B {
>>>>>>    A a;
>>>>>>    int z;
>>>>>> };
>>>>>> struct C {
>>>>>>    B b1;
>>>>>>    B b2;
>>>>>>    int *p;
>>>>>> };
>>>>>>
>>>>>> Type DAG:
>>>>>> int <- A::x <- A
>>>>>> int <- A::y <- A <- B::a <- B <- C::b1
<- C
>>>>>> int <----------------- B::z <- B <- C::b2
<- C
>>>>>> any pointer <--------------------- C::p  <- C
>>>>>>
>>>>>> The type DAG has two types of TBAA nodes:
>>>>>> 1> the existing scalar nodes
>>>>>> 2> the struct nodes (this is different from the
current tbaa.struct)
>>>>>>   A struct node has a unique name plus a list of pairs
(field name, field type).
>>>>>>   For example, struct node for "C" should
look like
>>>>>>   !4 = metadata !{"C", "C::b1",
metadata !3, "C::b2", metadata !3, "C::p", metadata !2}
>>>>>>   where !3 is the struct node for "B", !2 is
pointer type.
>>>>>>
>>>>>> Given a field access
>>>>>> struct B *bp = ...;
>>>>>> bp->a.x = 5;
>>>>>> we annotate it as B::a.x.
>>>>> In the case of multiple structures containing
substructures, how are
>>>>> you differentiating?
>>>>>
>>>>> IE given
>>>>>
>>>>> struct A {
>>>>> struct B b;
>>>>> }
>>>>> struct C {
>>>>> struct B b;
>>>>> }
>>>>>
>>>>> How do you know the above struct B *bp =...; is B::b from C
and not B::b from A?
>>>>>
>>>>> (I agree you can know in the case of direct aggregates, but
I argue
>>>>> you have no way to know in the case of pointer arguments
without
>>>>> interprocedural analysis)
>>>>> It gets worse the more levels you have.
>>>>>
>>>>> ie if you add
>>>>> struct B {
>>>>> struct E e;
>>>>> }
>>>>>
>>>>> and have struct E *e = ...
>>>>> how do you know it's the B::e contained in struct C, or
the B::e
>>>>> contained in struct A?
>>>>>
>>>>>
>>>>> Again, i agree you can do both scalar and direct
aggregates, but not
>>>>> aggregates and scalars through pointers.
>>>>>
>>>> I don't have immediate plan of pointer analysis. For the
given example, we will treat field accesses from bp (pointer to struct B) as
B::x.x, bp can be from either C or A.
>>>>
>>>>>> Implementing the Hierarchy
>>>>>> --------------------------
>>>>>> We can attach metadata to both scalar accesses and
aggregate accesses. Let's call scalar tags and aggregate tags.
>>>>>> Each tag can be a sequence of nodes in the type DAG.
>>>>>> !C::b2.a.x := [ "tbaa.path", !C,
"C::b2", !B, "B::a", !A, "A::x", !int ]
>>>>>>
>>>>> This can get quite deep quite quickly.
>>>>> Are there actually cases where knowing the outermost
aggregate type +
>>>>> {byte offset, size} of field access does not give you the
exact same
>>>>> disambiguation capability?
>>>> The answer is no. We should be able to deduce the access path
from {byte offset, size}.
>>>> However, I don't know an easy way to check alias(x,y) given
{byte offset, size} for x and y.
>>> Well, that part is easy, it's just overlap, once you know they
are
>>> both contained in the same outermost type.
>> How do you figure out the outermost common type of two access
'x' and 'y'?
> How are you annotating each access with a path as you go right now?
> It should be the same thing to just annotate it then.
> Without thinking too hard:
> If it's all direct aggregate access in the original source, whatever
> you see as the leftmost part should work.
>
> I haven't thought hard about it, but I believe it should wrok.
>
>
> Note that again, neither works for pointers unless you can guarantee
> you have the entire access path, which is hard:
>
> union u {
> struct A a;
> struct B b;
> };
>
> int foo(struct A *a, struct B *b) {
> ....
>
> }
>
> void bar() {
> union u whoops;
> foo(&whoops.a, &whoops.b);
> }
>
> If you annotated foo with a partial access path, you may or may not
> get the right answer.
> You would think your type dag solves this, because you'll see they
> share a supertype.
>
> But it's actually worse than that:
>
> file: bar.h
>   union u {
> struct A a;
> struct B b;
> };
>
> int foo(struct A *, struct B*);
>
> file: foo.c
>
> int foo(struct A *a, struct B *b) {
> ....
>
> }
>
> file: bar.c
>
> #include "bar.h"
>
> void bar() {
> union u whoops;
> foo(&whoops.a, &whoops.b);
> }
>
>
> You will never see compiling  foo.c that tells you both types are in a
> union together, or how to annotate these access paths.
>
> Hence the "you need to punt on pointers without more analysis"
claim I made :)
>
>
>> Once that is figured out, it should be easy to check alias('x',
'y').
>>> You have to track what the types are in the frontend to generate
the
>>> access path anyway, so you can compute the offsets + sizes into
those
>>> types at the same time.
>> Generating the offsets+sizes are not hard.
>>> Essentially, it it translating your representation into something
that
>>> can be intersected in constant time, unless i'm missing
something
>>> about how you are annotating the access paths.
>> If we can query alias('x','y') in constant time,
irrelevant to the depth of each access, that will be great.
>>
>> -Manman
>>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - Mar 2013 - [LLVMdev] PROPOSAL: struct-access-path aware TBAA

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

[LLVMdev] PROPOSAL: struct-access-path aware TBAA

Seemingly Similar Threads