Finkel, Hal J. via llvm-dev
2016-Jul-25 23:02 UTC
[llvm-dev] Alias Analysis with inbound GEPs
Sent from my Verizon Wireless 4G LTE DROID On Jul 25, 2016 6:10 PM, Eli Friedman <eli.friedman at gmail.com<mailto:eli.friedman at gmail.com>> wrote:> > On Mon, Jul 25, 2016 at 2:06 PM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: >> >> >> ________________________________ >>> >>> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>> >>> To: "Hal Finkel" <hfinkel at anl.gov<mailto:hfinkel at anl.gov>> >>> Cc: "llvm-dev" <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> >>> Sent: Monday, July 25, 2016 2:46:34 PM >>> Subject: RE: [llvm-dev] Alias Analysis with inbound GEPs >>> >>> >>>> >>>> I’m checking aliasing of two pointers: >>>> >>>> >>>> >>>> %GEP1 = getelementptr inbounds %struct.s, %struct.s* %0, i64 0, i32 1, i64 %indvars.iv41, i64 %indvars.iv39 >>>> >>>> %GEP2 = getelementptr inbounds %struct.s, %struct.s* %0, i64 0, i32 16 >>>> >>>> >>>> >>>> The result I got is “PartialAlias” because the indices of the GEP1 are variable. >>> >>> That seems like a bug. PartialAlias should only be returned when we can prove a partial overlap. Otherwise, MayAlias should be returned. >>> >>> [Demikhovsky, Elena] There are some comments inside: >>> >>> // Statically, we can see that the base objects are the same, but the >>> >>> // pointers have dynamic offsets which we can't resolve. And none of our >>> >>> // little tricks above worked. >>> >>> // >>> >>> // TODO: Returning PartialAlias instead of MayAlias is a mild hack; the >>> >>> // practical effect of this is protecting TBAA in the case of dynamic >>> >>> // indices into arrays of unions or malloc'd memory. >>> >>> return PartialAlias; >> >> Ah, thanks! That, unfortunately, makes sense. >>> >>> >>>> >>>> Shouldn’t the “inbounds” keyword mean that the access to sub-array is also in-bounds? >>> >>> No. inbounds applies only to the whole object. >>>> >>>> I’m trying to reach “NoAlias” consensus between GEP1 and GEP2. >>> >>> Did the original code come from C or C++? What are we modeling here? >>> >>> [Demikhovsky, Elena] C-code: >>> >>> for (m=0; m < params->num; m++) { >>> >>> params->a[i][m] = expr; >>> >>> } >>> >>> %GEP1 is the store for params->a[i][m] >>> >>> %GEP2 is the load for params->num. >>> >>> The loop is not vectorized due to a possible collision between params->num and params->a[i][m]. If I take loading of params->num outside the loop, it is vectorized. >>> >>> Bounds of array “a” are known at compile time. Limit of “i” and “m” are runtime variables. >> >> The problem is, IIRC, it is not undefined behavior to access one structure field by over-indexing an earlier array member. C++ has rules for "safely-derived pointers", and I think they include all pointer arithmetic on addresses from subobjects. If array access is just pointer arithmetic, I'm not sure that helps you as much as you'd like. cc'ing Richard to correct me if necessary. >> > > It is actually undefined behavior. This is explicitly called out in Annex J.2: "An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) ". If you break it apart into separate steps, the interesting bit is that the implicit array-to-pointer conversion is not equivalent to a cast; "int* b = (int*)a;" is not equivalent to "int* b = *a;", even though the expressions have the same type and value. > > There currently isn't any way to model the aliasing behavior of the address-of operator or array-to-pointer decay in LLVM IR. See also http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html .It seems like we might we able to use TBAA metadata with struct field information to get this then. -Hal> > -Eli-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160725/22fe4309/attachment.html>
Daniel Berlin via llvm-dev
2016-Jul-25 23:58 UTC
[llvm-dev] Alias Analysis with inbound GEPs
> > > > > It is actually undefined behavior. This is explicitly called out in > Annex J.2: "An array subscript is out of range, even if an object is > apparently accessible with the given subscript (as in the lvalue expression > a[1][7] given the declaration int a[4][5]) ". If you break it apart into > separate steps, the interesting bit is that the implicit array-to-pointer > conversion is not equivalent to a cast; "int* b = (int*)a;" is not > equivalent to "int* b = *a;", even though the expressions have the same > type and value. > > > > There currently isn't any way to model the aliasing behavior of the > address-of operator or array-to-pointer decay in LLVM IR. See also > http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html . > > It seems like we might we able to use TBAA metadata with struct field > information to get this then. > >Kinda, but the validity of this info does not (and should not) depend on TBAA being enabled or not.> -Hal > > > > > -Eli > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160725/dd304690/attachment.html>
Demikhovsky, Elena via llvm-dev
2016-Jul-26 11:49 UTC
[llvm-dev] Alias Analysis with inbound GEPs
> It seems like we might we able to use TBAA metadata with struct field information to get this then.TBAA does not support struct fields yet. It shows what type will be stored, but does not have structure info.> See also http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html .It’s not, probably, the case. I need such “inbounds” for any sub-structure. - Elena From: Finkel, Hal J. [mailto:hfinkel at anl.gov] Sent: Tuesday, July 26, 2016 02:03 To: Eli Friedman <eli.friedman at gmail.com> Cc: llvm-dev <llvm-dev at lists.llvm.org>; Demikhovsky, Elena <elena.demikhovsky at intel.com>; Richard Smith <richard-llvm at metafoo.co.uk> Subject: Re: [llvm-dev] Alias Analysis with inbound GEPs Sent from my Verizon Wireless 4G LTE DROID On Jul 25, 2016 6:10 PM, Eli Friedman <eli.friedman at gmail.com<mailto:eli.friedman at gmail.com>> wrote:> > On Mon, Jul 25, 2016 at 2:06 PM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: >> >> >> ________________________________ >>> >>> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>> >>> To: "Hal Finkel" <hfinkel at anl.gov<mailto:hfinkel at anl.gov>> >>> Cc: "llvm-dev" <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> >>> Sent: Monday, July 25, 2016 2:46:34 PM >>> Subject: RE: [llvm-dev] Alias Analysis with inbound GEPs >>> >>> >>>> >>>> I’m checking aliasing of two pointers: >>>> >>>> >>>> >>>> %GEP1 = getelementptr inbounds %struct.s, %struct.s* %0, i64 0, i32 1, i64 %indvars.iv41, i64 %indvars.iv39 >>>> >>>> %GEP2 = getelementptr inbounds %struct.s, %struct.s* %0, i64 0, i32 16 >>>> >>>> >>>> >>>> The result I got is “PartialAlias” because the indices of the GEP1 are variable. >>> >>> That seems like a bug. PartialAlias should only be returned when we can prove a partial overlap. Otherwise, MayAlias should be returned. >>> >>> [Demikhovsky, Elena] There are some comments inside: >>> >>> // Statically, we can see that the base objects are the same, but the >>> >>> // pointers have dynamic offsets which we can't resolve. And none of our >>> >>> // little tricks above worked. >>> >>> // >>> >>> // TODO: Returning PartialAlias instead of MayAlias is a mild hack; the >>> >>> // practical effect of this is protecting TBAA in the case of dynamic >>> >>> // indices into arrays of unions or malloc'd memory. >>> >>> return PartialAlias; >> >> Ah, thanks! That, unfortunately, makes sense. >>> >>> >>>> >>>> Shouldn’t the “inbounds” keyword mean that the access to sub-array is also in-bounds? >>> >>> No. inbounds applies only to the whole object. >>>> >>>> I’m trying to reach “NoAlias” consensus between GEP1 and GEP2. >>> >>> Did the original code come from C or C++? What are we modeling here? >>> >>> [Demikhovsky, Elena] C-code: >>> >>> for (m=0; m < params->num; m++) { >>> >>> params->a[i][m] = expr; >>> >>> } >>> >>> %GEP1 is the store for params->a[i][m] >>> >>> %GEP2 is the load for params->num. >>> >>> The loop is not vectorized due to a possible collision between params->num and params->a[i][m]. If I take loading of params->num outside the loop, it is vectorized. >>> >>> Bounds of array “a” are known at compile time. Limit of “i” and “m” are runtime variables. >> >> The problem is, IIRC, it is not undefined behavior to access one structure field by over-indexing an earlier array member. C++ has rules for "safely-derived pointers", and I think they include all pointer arithmetic on addresses from subobjects. If array access is just pointer arithmetic, I'm not sure that helps you as much as you'd like. cc'ing Richard to correct me if necessary. >> > > It is actually undefined behavior. This is explicitly called out in Annex J.2: "An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) ". If you break it apart into separate steps, the interesting bit is that the implicit array-to-pointer conversion is not equivalent to a cast; "int* b = (int*)a;" is not equivalent to "int* b = *a;", even though the expressions have the same type and value. > > There currently isn't any way to model the aliasing behavior of the address-of operator or array-to-pointer decay in LLVM IR. See also http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html .It seems like we might we able to use TBAA metadata with struct field information to get this then. -Hal> > -Eli--------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160726/bb2065bb/attachment.html>
----- Original Message -----> From: "Daniel Berlin" <dberlin at dberlin.org> > To: "Hal J. Finkel" <hfinkel at anl.gov> > Cc: "Eli Friedman" <eli.friedman at gmail.com>, "llvm-dev" > <llvm-dev at lists.llvm.org>, "Richard Smith" > <richard-llvm at metafoo.co.uk> > Sent: Monday, July 25, 2016 6:58:27 PM > Subject: Re: [llvm-dev] Alias Analysis with inbound GEPs> > > > > > > It is actually undefined behavior. This is explicitly called out > > > in > > > Annex J.2: "An array subscript is out of range, even if an object > > > is apparently accessible with the given subscript (as in the > > > lvalue expression a[1][7] given the declaration int a[4][5]) ". > > > If > > > you break it apart into separate steps, the interesting bit is > > > that the implicit array-to-pointer conversion is not equivalent > > > to > > > a cast; "int* b = (int*)a;" is not equivalent to "int* b = *a;", > > > even though the expressions have the same type and value. > > > > > > > > There currently isn't any way to model the aliasing behavior of > > > the > > > address-of operator or array-to-pointer decay in LLVM IR. See > > > also > > > http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html . >> > It seems like we might we able to use TBAA metadata with struct > > field > > information to get this then. >> Kinda, but the validity of this info does not (and should not) depend > on TBAA being enabled or not.Makes sense. For whatever metadata we use to indicate this, ideally its emission should not be tied to whether -fno-strict-aliasing is passed (even if it does use the TBAA metadata infrastructure). -Hal> > -Hal >> > > > > > > -Eli > > > _______________________________________________ > > > LLVM Developers mailing list > > > llvm-dev at lists.llvm.org > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160726/c40b1548/attachment.html>
----- Original Message -----> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com> > To: "Hal J. Finkel" <hfinkel at anl.gov>, "Eli Friedman" > <eli.friedman at gmail.com> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Richard Smith" > <richard-llvm at metafoo.co.uk> > Sent: Tuesday, July 26, 2016 6:49:16 AM > Subject: RE: [llvm-dev] Alias Analysis with inbound GEPs> > It seems like we might we able to use TBAA metadata with struct > > field information to get this then. > TBAA does not support struct fields yet. > It shows what type will be stored, but does not have structure info.No, although, unfortunately, the documentation here seems like it is not up to date. I agree, however, that it might need some extension in this case. For example, $ cat /tmp/f.c struct S { int x; int y; }; int foo(struct S *s) { return s->x + s->y; }; $ clang -O3 -S -emit-llvm -o - /tmp/f.c ; ModuleID = '/tmp/f.c' source_filename = "/tmp/f.c" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" %struct.S = type { i32, i32 } ; Function Attrs: norecurse nounwind readonly uwtable define i32 @foo(%struct.S* nocapture readonly) local_unnamed_addr #0 { %2 = getelementptr inbounds %struct.S, %struct.S* %0, i64 0, i32 0 %3 = load i32, i32* %2, align 4, !tbaa !1 %4 = getelementptr inbounds %struct.S, %struct.S* %0, i64 0, i32 1 %5 = load i32, i32* %4, align 4, !tbaa !6 %6 = add nsw i32 %5, %3 ret i32 %6 } attributes #0 = { ... } !llvm.ident = !{!0} !0 = !{!"...."} !1 = !{!2, !3, i64 0} !2 = !{!"S", !3, i64 0, !3, i64 4} !3 = !{!"int", !4, i64 0} !4 = !{!"omnipotent char", !5, i64 0} !5 = !{!"Simple C/C++ TBAA"} !6 = !{!2, !3, i64 4} Note here that the two access to the structure have metadata nodes !1 and !6. If you look at those nodes, you'll see that (third argument), they identify the relevant field offset (0 bytes and 4 bytes in this case). -Hal> > See also > > http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html . > It’s not, probably, the case. I need such “inbounds” for any > sub-structure.> - Elena> From: Finkel, Hal J. [mailto:hfinkel at anl.gov] > Sent: Tuesday, July 26, 2016 02:03 > To: Eli Friedman <eli.friedman at gmail.com> > Cc: llvm-dev <llvm-dev at lists.llvm.org>; Demikhovsky, Elena > <elena.demikhovsky at intel.com>; Richard Smith > <richard-llvm at metafoo.co.uk> > Subject: Re: [llvm-dev] Alias Analysis with inbound GEPs> Sent from my Verizon Wireless 4G LTE DROID> On Jul 25, 2016 6:10 PM, Eli Friedman < eli.friedman at gmail.com > > wrote:> >> > On Mon, Jul 25, 2016 at 2:06 PM, Hal Finkel via llvm-dev < > > llvm-dev at lists.llvm.org > wrote:> >>> >>> >> ________________________________> >>>> >>> From: "Elena Demikhovsky" < elena.demikhovsky at intel.com >> >>> To: "Hal Finkel" < hfinkel at anl.gov >> >>> Cc: "llvm-dev" < llvm-dev at lists.llvm.org >> >>> Sent: Monday, July 25, 2016 2:46:34 PM> >>> Subject: RE: [llvm-dev] Alias Analysis with inbound GEPs> >>>> >>>> >>>>> >>>> I’m checking aliasing of two pointers:> >>>>> >>>>> >>>>> >>>> %GEP1 = getelementptr inbounds %struct.s, %struct.s* %0, i64 0, > >>>> i32 1, i64 %indvars.iv41, i64 %indvars.iv39> >>>>> >>>> %GEP2 = getelementptr inbounds %struct.s, %struct.s* %0, i64 0, > >>>> i32 16> >>>>> >>>>> >>>>> >>>> The result I got is “PartialAlias” because the indices of the > >>>> GEP1 are variable.> >>>> >>> That seems like a bug. PartialAlias should only be returned when > >>> we can prove a partial overlap. Otherwise, MayAlias should be > >>> returned.> >>>> >>> [Demikhovsky, Elena] There are some comments inside:> >>>> >>> // Statically, we can see that the base objects are the same, but > >>> the> >>>> >>> // pointers have dynamic offsets which we can't resolve. And none > >>> of our> >>>> >>> // little tricks above worked.> >>>> >>> //> >>>> >>> // TODO: Returning PartialAlias instead of MayAlias is a mild > >>> hack; the> >>>> >>> // practical effect of this is protecting TBAA in the case of > >>> dynamic> >>>> >>> // indices into arrays of unions or malloc'd memory.> >>>> >>> return PartialAlias;> >>> >> Ah, thanks! That, unfortunately, makes sense.> >>>> >>>> >>>>> >>>> Shouldn’t the “inbounds” keyword mean that the access to > >>>> sub-array is also in-bounds?> >>>> >>> No. inbounds applies only to the whole object.> >>>>> >>>> I’m trying to reach “NoAlias” consensus between GEP1 and GEP2.> >>>> >>> Did the original code come from C or C++? What are we modeling > >>> here?> >>>> >>> [Demikhovsky, Elena] C-code:> >>>> >>> for (m=0; m < params->num; m++) {> >>>> >>> params->a[i][m] = expr;> >>>> >>> }> >>>> >>> %GEP1 is the store for params->a[i][m]> >>>> >>> %GEP2 is the load for params->num.> >>>> >>> The loop is not vectorized due to a possible collision between > >>> params->num and params->a[i][m]. If I take loading of > >>> params->num outside the loop, it is vectorized.> >>>> >>> Bounds of array “a” are known at compile time. Limit of “i” and > >>> “m” are runtime variables.> >>> >> The problem is, IIRC, it is not undefined behavior to access one > >> structure field by over-indexing an earlier array member. C++ has > >> rules for "safely-derived pointers", and I think they include all > >> pointer arithmetic on addresses from subobjects. If array access > >> is just pointer arithmetic, I'm not sure that helps you as much > >> as you'd like. cc'ing Richard to correct me if necessary.> >>> >> > It is actually undefined behavior. This is explicitly called out in > > Annex J.2: "An array subscript is out of range, even if an object > > is apparently accessible with the given subscript (as in the > > lvalue expression a[1][7] given the declaration int a[4][5]) ". If > > you break it apart into separate steps, the interesting bit is > > that the implicit array-to-pointer conversion is not equivalent to > > a cast; "int* b = (int*)a;" is not equivalent to "int* b = *a;", > > even though the expressions have the same type and value.> >> > There currently isn't any way to model the aliasing behavior of the > > address-of operator or array-to-pointer decay in LLVM IR. See also > > http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html .> It seems like we might we able to use TBAA metadata with struct field > information to get this then.> -Hal> >> > -Eli > --------------------------------------------------------------------- > Intel Israel (74) Limited > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies.-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160726/564d9f55/attachment-0001.html>