Based on discussions with John McCall We currently focus on field accesses of structs, more specifically, on fields that are scalars or structs. Fundamental rules from C11 -------------------------- An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [footnote: The intent of this list is to specify those circumstances in which an object may or may not be aliased.] 1. a type compatible with the effective type of the object, 2. a qualified version of a type compatible with the effective type of the object, 3. a type that is the signed or unsigned type corresponding to the effective type of the object, 4. a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, 5. an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or 6. a character type. Example ------- struct A { int x; int y; }; struct B { A a; int z; }; struct C { B b1; B b2; int *p; }; Type DAG: int <- A::x <- A int <- A::y <- A <- B::a <- B <- C::b1 <- C int <----------------- B::z <- B <- C::b2 <- C any pointer <--------------------- C::p <- C The type DAG has two types of TBAA nodes: 1> the existing scalar nodes 2> the struct nodes (this is different from the current tbaa.struct) A struct node has a unique name plus a list of pairs (field name, field type). For example, struct node for "C" should look like !4 = metadata !{"C", "C::b1", metadata !3, "C::b2", metadata !3, "C::p", metadata !2} where !3 is the struct node for "B", !2 is pointer type. Given a field access struct B *bp = ...; bp->a.x = 5; we annotate it as B::a.x. This creates an obvious hierarchy of specificity: 'B::a.x' |-> 'A::x' |-> 'int' Each step outwards adds extra contextual information for an access to an 'int' object. We call each step inwards the super-type relation. It peels a field access from the left. There exists a second hierarchy, that of containment: 'B::a.x' |-> 'B::a' |-> 'B' Each step outwards "extends" the object into a subobject. Each step inwards "retracts" the object into an container. "retraction" peels a field access from the right; "extension" appends a field access at the right. Aliasing Rules -------------- alias(x,y) = rule(x,y) or rule(y,x) rule(x,y) has 3 cases: I> 'x' and 'y' do not have a common struct type rule_case1(x,y) returns true if the last element of 'x' can reach 'y' via type DAG | x | extends | z | super-type | y | II> 'x' and 'y' have a common struct type The field access starting from the first common type should match until the end of 'x' or 'y'. case a: | x | super-type |z'| extends | y | case b: | x | retracts | z | super-type | y | Combine the three cases to get rule(x,y): Check the first element of 'y', if it is not in 'x', return rule_case1(x,y) Check the next element of 'y' with the next element of 'x', if not the same, return false. When we reach the end of either 'x' or 'y', return true. If we only care about alias between scalar accesses, the above rules can be simplified. alias(scalar access x, scalar access y) = super_type(x,y) or super_type(y,x) where super_type(x,y) is true if 'x' can reach 'y' via super-type relation. Implementing the Hierarchy -------------------------- We can attach metadata to both scalar accesses and aggregate accesses. Let's call scalar tags and aggregate tags. Each tag can be a sequence of nodes in the type DAG. !C::b2.a.x := [ "tbaa.path", !C, "C::b2", !B, "B::a", !A, "A::x", !int ] Comments and questions are welcome. Manman
Krzysztof Parzyszek
2013-Mar-11 19:14 UTC
[LLVMdev] PROPOSAL: struct-access-path aware TBAA
On 3/11/2013 1:41 PM, Manman Ren wrote:> > | x | > extends > | z | > super-type > | y |What does this mean? "The type of z extends the type of x, and the type of z is a super-type of y"? -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
On Mon, Mar 11, 2013 at 11:41 AM, Manman Ren <mren at apple.com> wrote:> Based on discussions with John McCall > > We currently focus on field accesses of structs, more specifically, on fields that are scalars or structs. > > Fundamental rules from C11 > -------------------------- > An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [footnote: The intent of this list is to specify those circumstances in which an object may or may not be aliased.] > 1. a type compatible with the effective type of the object, > 2. a qualified version of a type compatible with the effective type of the object, > 3. a type that is the signed or unsigned type corresponding to the effective type of the object, > 4. a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, > 5. an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or > 6. a character type. >> Example > ------- > struct A { > int x; > int y; > }; > struct B { > A a; > int z; > }; > struct C { > B b1; > B b2; > int *p; > }; > > Type DAG: > int <- A::x <- A > int <- A::y <- A <- B::a <- B <- C::b1 <- C > int <----------------- B::z <- B <- C::b2 <- C > any pointer <--------------------- C::p <- C > > The type DAG has two types of TBAA nodes: > 1> the existing scalar nodes > 2> the struct nodes (this is different from the current tbaa.struct) > A struct node has a unique name plus a list of pairs (field name, field type). > For example, struct node for "C" should look like > !4 = metadata !{"C", "C::b1", metadata !3, "C::b2", metadata !3, "C::p", metadata !2} > where !3 is the struct node for "B", !2 is pointer type. > > Given a field access > struct B *bp = ...; > bp->a.x = 5; > we annotate it as B::a.x.In the case of multiple structures containing substructures, how are you differentiating? IE given struct A { struct B b; } struct C { struct B b; } How do you know the above struct B *bp =...; is B::b from C and not B::b from A? (I agree you can know in the case of direct aggregates, but I argue you have no way to know in the case of pointer arguments without interprocedural analysis) It gets worse the more levels you have. ie if you add struct B { struct E e; } and have struct E *e = ... how do you know it's the B::e contained in struct C, or the B::e contained in struct A? Again, i agree you can do both scalar and direct aggregates, but not aggregates and scalars through pointers.> Implementing the Hierarchy > -------------------------- > We can attach metadata to both scalar accesses and aggregate accesses. Let's call scalar tags and aggregate tags. > Each tag can be a sequence of nodes in the type DAG. > !C::b2.a.x := [ "tbaa.path", !C, "C::b2", !B, "B::a", !A, "A::x", !int ] >This can get quite deep quite quickly. Are there actually cases where knowing the outermost aggregate type + {byte offset, size} of field access does not give you the exact same disambiguation capability? I ask because in GCC, we determined the answer was "no", since pointers require special analysis anyway.
On Mar 11, 2013, at 12:14 PM, Krzysztof Parzyszek wrote:> On 3/11/2013 1:41 PM, Manman Ren wrote: >> >> | x | >> extends >> | z | >> super-type >> | y | > > What does this mean? "The type of z extends the type of x, and the type of z is a super-type of y"?Extends x into a subobject z, and z is a super-type of y. -Manman> > -Krzysztof > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Mar 11, 2013, at 1:17 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> On Mon, Mar 11, 2013 at 11:41 AM, Manman Ren <mren at apple.com> wrote: >> Based on discussions with John McCall >> >> We currently focus on field accesses of structs, more specifically, on fields that are scalars or structs. >> >> Fundamental rules from C11 >> -------------------------- >> An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [footnote: The intent of this list is to specify those circumstances in which an object may or may not be aliased.] >> 1. a type compatible with the effective type of the object, >> 2. a qualified version of a type compatible with the effective type of the object, >> 3. a type that is the signed or unsigned type corresponding to the effective type of the object, >> 4. a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, >> 5. an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or >> 6. a character type. >> > >> Example >> ------- >> struct A { >> int x; >> int y; >> }; >> struct B { >> A a; >> int z; >> }; >> struct C { >> B b1; >> B b2; >> int *p; >> }; >> >> Type DAG: >> int <- A::x <- A >> int <- A::y <- A <- B::a <- B <- C::b1 <- C >> int <----------------- B::z <- B <- C::b2 <- C >> any pointer <--------------------- C::p <- C >> >> The type DAG has two types of TBAA nodes: >> 1> the existing scalar nodes >> 2> the struct nodes (this is different from the current tbaa.struct) >> A struct node has a unique name plus a list of pairs (field name, field type). >> For example, struct node for "C" should look like >> !4 = metadata !{"C", "C::b1", metadata !3, "C::b2", metadata !3, "C::p", metadata !2} >> where !3 is the struct node for "B", !2 is pointer type. >> >> Given a field access >> struct B *bp = ...; >> bp->a.x = 5; >> we annotate it as B::a.x. > > In the case of multiple structures containing substructures, how are > you differentiating? > > IE given > > struct A { > struct B b; > } > struct C { > struct B b; > } > > How do you know the above struct B *bp =...; is B::b from C and not B::b from A? > > (I agree you can know in the case of direct aggregates, but I argue > you have no way to know in the case of pointer arguments without > interprocedural analysis) > It gets worse the more levels you have. > > ie if you add > struct B { > struct E e; > } > > and have struct E *e = ... > how do you know it's the B::e contained in struct C, or the B::e > contained in struct A? > > > Again, i agree you can do both scalar and direct aggregates, but not > aggregates and scalars through pointers. >I don't have immediate plan of pointer analysis. For the given example, we will treat field accesses from bp (pointer to struct B) as B::x.x, bp can be from either C or A.> >> Implementing the Hierarchy >> -------------------------- >> We can attach metadata to both scalar accesses and aggregate accesses. Let's call scalar tags and aggregate tags. >> Each tag can be a sequence of nodes in the type DAG. >> !C::b2.a.x := [ "tbaa.path", !C, "C::b2", !B, "B::a", !A, "A::x", !int ] >> > > This can get quite deep quite quickly. > Are there actually cases where knowing the outermost aggregate type + > {byte offset, size} of field access does not give you the exact same > disambiguation capability?The answer is no. We should be able to deduce the access path from {byte offset, size}. However, I don't know an easy way to check alias(x,y) given {byte offset, size} for x and y. Thanks, Manman> > I ask because in GCC, we determined the answer was "no", since > pointers require special analysis anyway.
On 3/11/13 1:17 PM, Daniel Berlin wrote:> On Mon, Mar 11, 2013 at 11:41 AM, Manman Ren <mren at apple.com> wrote: >> Based on discussions with John McCall >> >> We currently focus on field accesses of structs, more specifically, on fields that are scalars or structs. >> >> Fundamental rules from C11 >> -------------------------- >> An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [footnote: The intent of this list is to specify those circumstances in which an object may or may not be aliased.] >> 1. a type compatible with the effective type of the object, >> 2. a qualified version of a type compatible with the effective type of the object, >> 3. a type that is the signed or unsigned type corresponding to the effective type of the object, >> 4. a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, >> 5. an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or >> 6. a character type. >> >> Example >> ------- >> struct A { >> int x; >> int y; >> }; >> struct B { >> A a; >> int z; >> }; >> struct C { >> B b1; >> B b2; >> int *p; >> }; >> >> Type DAG: >> int <- A::x <- A >> int <- A::y <- A <- B::a <- B <- C::b1 <- C >> int <----------------- B::z <- B <- C::b2 <- C >> any pointer <--------------------- C::p <- C >> >> The type DAG has two types of TBAA nodes: >> 1> the existing scalar nodes >> 2> the struct nodes (this is different from the current tbaa.struct) >> A struct node has a unique name plus a list of pairs (field name, field type). >> For example, struct node for "C" should look like >> !4 = metadata !{"C", "C::b1", metadata !3, "C::b2", metadata !3, "C::p", metadata !2} >> where !3 is the struct node for "B", !2 is pointer type. >> >> Given a field access >> struct B *bp = ...; >> bp->a.x = 5; >> we annotate it as B::a.x. > In the case of multiple structures containing substructures, how are > you differentiating? > > IE given > > struct A { > struct B b; > } > struct C { > struct B b; > } > > How do you know the above struct B *bp =...; is B::b from C and not B::b from A?If I understand correct, the proposed graph is DAG, not tree, and it should be able to tackling the case where a type is included more than once. On the other hand, the info which is annotated to the memory access is kind of "immediate enclosing aggregate type", which should be unique.> > (I agree you can know in the case of direct aggregates, but I argue > you have no way to know in the case of pointer arguments without > interprocedural analysis) > It gets worse the more levels you have. > > ie if you add > struct B { > struct E e; > } > > and have struct E *e = ... > how do you know it's the B::e contained in struct C, or the B::e > contained in struct A? > > > Again, i agree you can do both scalar and direct aggregates, but not > aggregates and scalars through pointers. > >That is exactly the power of TBAA. If the both memory accesses are direct load/store, it dose not even need TBAA, check their base/offset/size is sufficient for disambiguation.