Jonas Paulsson via llvm-dev
2019-Jan-14 13:57 UTC
[llvm-dev] Aliasing rules difference between GCC and Clang
Hi, It was a while now since I first asked about TBAA, and I am now getting back to this, trying to understand TBAA as it seems that the issue I described earlier originally in this thread is still one of many "TODO"s in CGExpr.cpp (http://lists.llvm.org/pipermail/llvm-dev/2018-September/126290.html). I would like to help on this to the best of my abilities, but I'm not quite sure if there is a current plan of incremental changes, or if it would be possible to handle the array member of a struct case specifically at this point? I am tempted to believe that the extension to handle my particular case is not too far-fetched, as Eli indicated earlier: This program: struct A { int i; }; struct B { int i; }; void h(struct A *a, struct B *b, int *prim_i) { a->i = 0; b->i = 0; *prim_i = 0; } translates to %i = getelementptr inbounds %struct.A, %struct.A* %a, i32 0, i32 0 store i32 0, i32* %i, align 4, !tbaa !2 %i1 = getelementptr inbounds %struct.B, %struct.B* %b, i32 0, i32 0 store i32 0, i32* %i1, align 4, !tbaa !7 store i32 0, i32* %prim_i, align 4, !tbaa !9 with this type DAG: !2 = !{!3, !4, i64 0} !3 = !{!"A", !4, i64 0} !4 = !{!"int", !5, i64 0} !5 = !{!"omnipotent char", !6, i64 0} !6 = !{!"Simple C/C++ TBAA"} !7 = !{!8, !4, i64 0} !8 = !{!"B", !4, i64 0} !9 = !{!4, !4, i64 0} Just as one would hope for, TBAA produces a NoAlias result between the stores to %A and %B: NoAlias: store i32 0, i32* %i1, align 4, !tbaa !7 <-> store i32 0, i32* %i, align 4, !tbaa !2 MayAlias: store i32 0, i32* %prim_i, align 4, !tbaa !9 <-> store i32 0, i32* %i, align 4, !tbaa !2 MayAlias: store i32 0, i32* %prim_i, align 4, !tbaa !9 <-> store i32 0, i32* %i1, align 4, !tbaa !7 The above indicates that TBAA is capable of handling structs with scalar members, but if I change the program to look more like the benchmark, like: struct A { int i[2]; }; struct B { int i[2]; }; void h(struct A *a, struct B *b, int *prim_i) { a->i[0] = 0; b->i[0] = 0; *prim_i = 0; } , which translates to %i = getelementptr inbounds %struct.A, %struct.A* %a, i32 0, i32 0 %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %i, i64 0, i64 0 store i32 0, i32* %arrayidx, align 4, !tbaa !2 %i1 = getelementptr inbounds %struct.B, %struct.B* %b, i32 0, i32 0 %arrayidx2 = getelementptr inbounds [2 x i32], [2 x i32]* %i1, i64 0, i64 0 store i32 0, i32* %arrayidx2, align 4, !tbaa !2 store i32 0, i32* %prim_i, align 4, !tbaa !2 , the type DAG is generated as !2 = !{!3, !3, i64 0} !3 = !{!"int", !4, i64 0} !4 = !{!"omnipotent char", !5, i64 0} !5 = !{!"Simple C/C++ TBAA"} This makes all the stores alias, which is overly conservative, as %struct.A and %struct.B do not alias. It seems that when the struct member is an array, the base type becomes the element type. This is simply unhandled still in CodeGenTBAA ("TODO"). My question then is if it would be possible to instead create the DAG nodes !"A" and !"B" (as in the first type DAG above) and use them as the base types and then have the access type to be !"int", with the right offset into the struct node? A struct like struct S { int i[2]; long l[2]; }; would then be represented with a base node, something like !3 = !{!"S", !4, i64 0, !6, 8} !4 = !{!"int", !5, i64 0} !5 = !{!"omnipotent char", !6, i64 0} !6 = !{!"long", !5, i64 0} , giving !"int" as base at offset 0 and !"long" as base at offset 8 (given that i[2] is 8 bytes). An access like s->i[0], would then get a node like !2 = !{!3, %4, i64 0} , and this would give a NoAlias above, right? Did I understand this correctly, or did I miss anything relevant? What method(s) would need to be modified to handle this? Is it feasible to do this now, or are there other planned steps that somebody is working on? Would this improvement require the NewStructPathTBAA flag to be on, and if so, what is the outlook on this? @Ivan: For what I can see, you had a list of planned improvements. What are your thoughts on this? Thanks for any help, Jonas
Ivan Kosarev via llvm-dev
2019-Jan-17 17:09 UTC
[llvm-dev] Aliasing rules difference between GCC and Clang
Hello, Jonas, > It seems that when the struct member is an array, the base type > becomes the element type. This is simply unhandled still in > CodeGenTBAA ("TODO"). My question then is if it would be > possible to instead create the DAG nodes !"A" and !"B" (as in > the first type DAG above) and use them as the base types and > then have the access type to be !"int", with the right offset > into the struct node? Correct, the current default TBAA info is not capable of representing accesses to (ranges of) array elements, hence the decay to element-type access. Still, for elements indexed with constants. such as a->i[0] in your example, it is indeed possible to generate proper TBAA tags. Such tags would be no different from tags generated for accesses to non-array fields that reside at the same offset within the same base type. If there are real life cases where adding the support for constant-indexed accesses would make a difference, then I think that would be an improvement. > struct S { int i[2]; long l[2]; }; > ... > An access like s->i[0], would then get a node like > and this would give a NoAlias above, right? Did I understand > this correctly, or did I miss anything relevant? Yes, TBAA assumes that accesses to int's and long's do not alias. > Would this improvement require the NewStructPathTBAA flag to be > on, and if so, what is the outlook on this? It depends on whether handling of constant-indexed accesses would be enough for your use cases. One of the goals behind the new TBAA is to be able to accurately represent various kinds accesses to subobjects. This includes non-constant-indexed accesses to array elements. More info on how such accesses are supposed to be represented can be found here: https://reviews.llvm.org/D40975 If that's what you need, then the new TBAA info seems to be the most promising option. But please note that it's still a work in progress and is not mature enough for production use. Hope this helps. Regards, On 14/01/2019 15:57, Jonas Paulsson wrote:> CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you recognize the sender > and know the content is safe. If you suspect potential phishing or > spam email, report it to ReportSpam at accesssoftek.com > > Hi, > > It was a while now since I first asked about TBAA, and I am now getting > back to this, trying to understand TBAA as it seems that the issue I > described earlier originally in this thread is still one of many "TODO"s > in CGExpr.cpp > (http://lists.llvm.org/pipermail/llvm-dev/2018-September/126290.html). > > I would like to help on this to the best of my abilities, but I'm not > quite sure if there is a current plan of incremental changes, or if it > would be possible to handle the array member of a struct case > specifically at this point? > > I am tempted to believe that the extension to handle my particular case > is not too far-fetched, as Eli indicated earlier: > > This program: > > struct A { int i; }; > struct B { int i; }; > > void h(struct A *a, struct B *b, int *prim_i) { > a->i = 0; > b->i = 0; > *prim_i = 0; > } > > translates to > > %i = getelementptr inbounds %struct.A, %struct.A* %a, i32 0, i32 0 > store i32 0, i32* %i, align 4, !tbaa !2 > %i1 = getelementptr inbounds %struct.B, %struct.B* %b, i32 0, i32 0 > store i32 0, i32* %i1, align 4, !tbaa !7 > store i32 0, i32* %prim_i, align 4, !tbaa !9 > > with this type DAG: > > !2 = !{!3, !4, i64 0} > !3 = !{!"A", !4, i64 0} > !4 = !{!"int", !5, i64 0} > !5 = !{!"omnipotent char", !6, i64 0} > !6 = !{!"Simple C/C++ TBAA"} > !7 = !{!8, !4, i64 0} > !8 = !{!"B", !4, i64 0} > !9 = !{!4, !4, i64 0} > > Just as one would hope for, TBAA produces a NoAlias result between the > stores to %A and %B: > > NoAlias: store i32 0, i32* %i1, align 4, !tbaa !7 <-> store i32 0, > i32* %i, align 4, !tbaa !2 > MayAlias: store i32 0, i32* %prim_i, align 4, !tbaa !9 <-> store i32 > 0, i32* %i, align 4, !tbaa !2 > MayAlias: store i32 0, i32* %prim_i, align 4, !tbaa !9 <-> store i32 > 0, i32* %i1, align 4, !tbaa !7 > > The above indicates that TBAA is capable of handling structs with scalar > members, but if I change the program to look more like the benchmark, > like: > > struct A { int i[2]; }; > struct B { int i[2]; }; > > void h(struct A *a, struct B *b, int *prim_i) { > a->i[0] = 0; > b->i[0] = 0; > *prim_i = 0; > } > > , which translates to > > %i = getelementptr inbounds %struct.A, %struct.A* %a, i32 0, i32 0 > %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %i, i64 0, i64 0 > store i32 0, i32* %arrayidx, align 4, !tbaa !2 > %i1 = getelementptr inbounds %struct.B, %struct.B* %b, i32 0, i32 0 > %arrayidx2 = getelementptr inbounds [2 x i32], [2 x i32]* %i1, i64 0, > i64 0 > store i32 0, i32* %arrayidx2, align 4, !tbaa !2 > store i32 0, i32* %prim_i, align 4, !tbaa !2 > > , the type DAG is generated as > > !2 = !{!3, !3, i64 0} > !3 = !{!"int", !4, i64 0} > !4 = !{!"omnipotent char", !5, i64 0} > !5 = !{!"Simple C/C++ TBAA"} > > This makes all the stores alias, which is overly conservative, as > %struct.A and %struct.B do not alias. > > It seems that when the struct member is an array, the base type becomes > the element type. This is simply unhandled still in CodeGenTBAA > ("TODO"). My question then is if it would be possible to instead create > the DAG nodes !"A" and !"B" (as in the first type DAG above) and use > them as the base types and then have the access type to be !"int", with > the right offset into the struct node? > > A struct like > > struct S { int i[2]; long l[2]; }; > > would then be represented with a base node, something like > > !3 = !{!"S", !4, i64 0, !6, 8} > !4 = !{!"int", !5, i64 0} > !5 = !{!"omnipotent char", !6, i64 0} > !6 = !{!"long", !5, i64 0} > > , giving !"int" as base at offset 0 and !"long" as base at offset 8 > (given that i[2] is 8 bytes). > > An access like s->i[0], would then get a node like > > !2 = !{!3, %4, i64 0} > > , and this would give a NoAlias above, right? Did I understand this > correctly, or did I miss anything relevant? > > What method(s) would need to be modified to handle this? Is it feasible > to do this now, or are there other planned steps that somebody is > working on? > > Would this improvement require the NewStructPathTBAA flag to be on, and > if so, what is the outlook on this? > > @Ivan: For what I can see, you had a list of planned improvements. What > are your thoughts on this? > > Thanks for any help, > > Jonas >
Jonas Paulsson via llvm-dev
2019-Jan-18 07:52 UTC
[llvm-dev] Aliasing rules difference between GCC and Clang
Hi Ivan, On 2019-01-17 18:09, Ivan Kosarev wrote:> Hello, Jonas, > > > It seems that when the struct member is an array, the base type > > becomes the element type. This is simply unhandled still in > > CodeGenTBAA ("TODO"). My question then is if it would be > > possible to instead create the DAG nodes !"A" and !"B" (as in > > the first type DAG above) and use them as the base types and > > then have the access type to be !"int", with the right offset > > into the struct node? > > Correct, the current default TBAA info is not capable of representing > accesses to (ranges of) array elements, hence the decay to > element-type access. > > Still, for elements indexed with constants. such as a->i[0] in your > example, it is indeed possible to generate proper TBAA tags. Such tags > would be no different from tags generated for accesses to non-array > fields that reside at the same offset within the same base type. If > there are real life cases where adding the support for > constant-indexed accesses would make a difference, then I think that > would be an improvement. > > > struct S { int i[2]; long l[2]; }; > > ... > > An access like s->i[0], would then get a node like > > and this would give a NoAlias above, right? Did I understand > > this correctly, or did I miss anything relevant? > > Yes, TBAA assumes that accesses to int's and long's do not alias. >What I was actually hoping for was that by keeping the base type ('S') in the type DAG instead of just 'int', two accesses into two different access (struct) types both accessing an 'int' element would be 'NoAlias' by TBAA. I don't think handling just constant indexes would help (me). I was hoping that perhaps we could keep the current TBAA algorithm and just extend it by also adding the struct base type in the type DAG. So instead of letting an array (member) access become just 'int', it could become 'A' -> 'int', which would not alias with 'B' -> 'int'. Would this be a possible incremental improvement while working towards the new TBAA in your opinion? To me, it seems like a first good step to just handle different struct types, which does not require any size awareness. I am not sure however how simple this would be, and I would not want it to get in your way with the new TBAA. /Jonas> > Would this improvement require the NewStructPathTBAA flag to be > > on, and if so, what is the outlook on this? > > It depends on whether handling of constant-indexed accesses would be > enough for your use cases. > > One of the goals behind the new TBAA is to be able to accurately > represent various kinds accesses to subobjects. This includes > non-constant-indexed accesses to array elements. More info on how such > accesses are supposed to be represented can be found here: > > https://reviews.llvm.org/D40975 > > If that's what you need, then the new TBAA info seems to be the most > promising option. But please note that it's still a work in progress > and is not mature enough for production use. > > Hope this helps. > > Regards, > > > On 14/01/2019 15:57, Jonas Paulsson wrote: >> CAUTION: This email originated from outside of the organization. Do >> not click links or open attachments unless you recognize the sender >> and know the content is safe. If you suspect potential phishing or >> spam email, report it to ReportSpam at accesssoftek.com >> >> Hi, >> >> It was a while now since I first asked about TBAA, and I am now getting >> back to this, trying to understand TBAA as it seems that the issue I >> described earlier originally in this thread is still one of many "TODO"s >> in CGExpr.cpp >> (http://lists.llvm.org/pipermail/llvm-dev/2018-September/126290.html). >> >> I would like to help on this to the best of my abilities, but I'm not >> quite sure if there is a current plan of incremental changes, or if it >> would be possible to handle the array member of a struct case >> specifically at this point? >> >> I am tempted to believe that the extension to handle my particular case >> is not too far-fetched, as Eli indicated earlier: >> >> This program: >> >> struct A { int i; }; >> struct B { int i; }; >> >> void h(struct A *a, struct B *b, int *prim_i) { >> a->i = 0; >> b->i = 0; >> *prim_i = 0; >> } >> >> translates to >> >> %i = getelementptr inbounds %struct.A, %struct.A* %a, i32 0, i32 0 >> store i32 0, i32* %i, align 4, !tbaa !2 >> %i1 = getelementptr inbounds %struct.B, %struct.B* %b, i32 0, i32 0 >> store i32 0, i32* %i1, align 4, !tbaa !7 >> store i32 0, i32* %prim_i, align 4, !tbaa !9 >> >> with this type DAG: >> >> !2 = !{!3, !4, i64 0} >> !3 = !{!"A", !4, i64 0} >> !4 = !{!"int", !5, i64 0} >> !5 = !{!"omnipotent char", !6, i64 0} >> !6 = !{!"Simple C/C++ TBAA"} >> !7 = !{!8, !4, i64 0} >> !8 = !{!"B", !4, i64 0} >> !9 = !{!4, !4, i64 0} >> >> Just as one would hope for, TBAA produces a NoAlias result between the >> stores to %A and %B: >> >> NoAlias: store i32 0, i32* %i1, align 4, !tbaa !7 <-> store i32 0, >> i32* %i, align 4, !tbaa !2 >> MayAlias: store i32 0, i32* %prim_i, align 4, !tbaa !9 <-> store i32 >> 0, i32* %i, align 4, !tbaa !2 >> MayAlias: store i32 0, i32* %prim_i, align 4, !tbaa !9 <-> store i32 >> 0, i32* %i1, align 4, !tbaa !7 >> >> The above indicates that TBAA is capable of handling structs with scalar >> members, but if I change the program to look more like the benchmark, >> like: >> >> struct A { int i[2]; }; >> struct B { int i[2]; }; >> >> void h(struct A *a, struct B *b, int *prim_i) { >> a->i[0] = 0; >> b->i[0] = 0; >> *prim_i = 0; >> } >> >> , which translates to >> >> %i = getelementptr inbounds %struct.A, %struct.A* %a, i32 0, i32 0 >> %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %i, i64 0, >> i64 0 >> store i32 0, i32* %arrayidx, align 4, !tbaa !2 >> %i1 = getelementptr inbounds %struct.B, %struct.B* %b, i32 0, i32 0 >> %arrayidx2 = getelementptr inbounds [2 x i32], [2 x i32]* %i1, i64 0, >> i64 0 >> store i32 0, i32* %arrayidx2, align 4, !tbaa !2 >> store i32 0, i32* %prim_i, align 4, !tbaa !2 >> >> , the type DAG is generated as >> >> !2 = !{!3, !3, i64 0} >> !3 = !{!"int", !4, i64 0} >> !4 = !{!"omnipotent char", !5, i64 0} >> !5 = !{!"Simple C/C++ TBAA"} >> >> This makes all the stores alias, which is overly conservative, as >> %struct.A and %struct.B do not alias. >> >> It seems that when the struct member is an array, the base type becomes >> the element type. This is simply unhandled still in CodeGenTBAA >> ("TODO"). My question then is if it would be possible to instead create >> the DAG nodes !"A" and !"B" (as in the first type DAG above) and use >> them as the base types and then have the access type to be !"int", with >> the right offset into the struct node? >> >> A struct like >> >> struct S { int i[2]; long l[2]; }; >> >> would then be represented with a base node, something like >> >> !3 = !{!"S", !4, i64 0, !6, 8} >> !4 = !{!"int", !5, i64 0} >> !5 = !{!"omnipotent char", !6, i64 0} >> !6 = !{!"long", !5, i64 0} >> >> , giving !"int" as base at offset 0 and !"long" as base at offset 8 >> (given that i[2] is 8 bytes). >> >> An access like s->i[0], would then get a node like >> >> !2 = !{!3, %4, i64 0} >> >> , and this would give a NoAlias above, right? Did I understand this >> correctly, or did I miss anything relevant? >> >> What method(s) would need to be modified to handle this? Is it feasible >> to do this now, or are there other planned steps that somebody is >> working on? >> >> Would this improvement require the NewStructPathTBAA flag to be on, and >> if so, what is the outlook on this? >> >> @Ivan: For what I can see, you had a list of planned improvements. What >> are your thoughts on this? >> >> Thanks for any help, >> >> Jonas >> >