Reid Kleckner
2014-Aug-11 20:13 UTC
[LLVMdev] [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?
+aliasing people I *think* this is valid, because the rules have always been described to me in terms of underlying storage type, and not access path. These are all ints, so all loads and stores can alias. On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel <hfinkel at anl.gov> wrote:> ----- Original Message ----- > > From: "Tim Northover" <t.p.northover at gmail.com> > > To: "Jonas Wagner" <jonas.wagner at epfl.ch> > > Cc: "cfe-dev Developers" <cfe-dev at cs.uiuc.edu>, "LLVM Developers > Mailing List" <llvmdev at cs.uiuc.edu> > > Sent: Friday, August 8, 2014 6:54:50 AM > > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc too > aggressive or LLVM need to improve? > > > > > your C program invokes undefined behavior when it dereferences > > > pointers that > > > have been converted to other types. See for example > > > > stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior > > > > I don't think it's quite that simple.The type-based aliasing rules > > come from 6.5p7 of C11, I think. That says: > > > > "An object shall have its stored value accessed only by an lvalue > > expression that has one of > > the following types: > > + a type compatible with the effective type of the object, > > [...] > > + an aggregate or union type that includes one of the > > aforementioned > > types among its members [...]" > > > > That would seem to allow this usage: aa (effective type "int") is > > being accessed via an lvalue "ptr[i]->index" of type "int". > > > > The second point would even seem to allow something like "ptr[i] > > ..." if aa was declared "int aa[2];", though that seems to be going > > too far. It also seems to be very difficult to pin down a meaning > > (from the standard) for "a->b" if a is not a pointer to an object > > with > > the correct effective type. So the entire area is probably one that's > > open to interpretation. > > > > I've added cfe-dev to the list; they're the *professional* language > > lawyers. > > Coincidentally, this also seems to be PR20585 (adding Jiangning Liu, the > reporter of that bug, to this thread too). > > -Hal > > > > > Cheers. > > > > Tim. > > _______________________________________________ > > cfe-dev mailing list > > cfe-dev at cs.uiuc.edu > > lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20140811/d50aefc0/attachment.html>
Daniel Berlin
2014-Aug-11 21:09 UTC
[LLVMdev] [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?
The access path matters (in some sense), but this is, AFIAK, valid no matter how you look at it. Let's take a look line by line #include <stdio.h> struct heap {int index; int b;}; struct heap **ptr; int aa; int main() { struct heap element; struct heap *array[2]; array[0] = (struct heap *)&aa; <- Okay so far. array[1] = &element; <- Clearly okay ptr = array; <- still okay so far aa = 1; <- not pointer related. int i; <- not pointer related for (i =0; i< 2; i++) { <- not pointer related printf("i is %d, aa is %d\n", i, aa); <- not pointer related ptr[i]->index = 0; <- Here is where it gets wonky. <rest of codeis irrelevan> First, ptr[i] is an lvalue, of type struct heap *, and ptr[i]-> is an lvalue of type struct heap (in C++03, this is 5.2.5 paragraph 3, check footnote 59). (I'm too lazy to parse the rules for whether E1.E2 is an lvalue, because it doesn't end up making a difference) Let's assume, for the sake of argument, the actual access to aa occurs through an lvalue of type "struct heap" rather than "int" In C++03 and C++11, it says: An object shall have its stored value accessed only by an lvalue expression that has one of the following types: a type compatible with the effective type of the object, a qualified version of a type compatible with the effective type of the object, a type that is the signed or unsigned type corresponding to the effective type of the object, a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or a character type. (C++11 adds something about dynamic type here) struct heap is "an aggregate or union type that includes one of the aforementioned types among it's members". Thus, this is legal to access this int through an lvalue expression that has a type of struct heap. Whether the actual store is legal for other reasons, i don't know. There are all kinds of rules about object alignment and value representation that aren't my baliwick. I leave it to another language lawyer to say whether it's okay to do a store to something that is essentially, a partial object. Note that GCC actually knows this is legal to alias, at least at the tree level. I debugged it there, and it definitely isn't eliminating it at a high level. It also completely understands the call to ->index = 0 affects "aa", and has a reload for aa before the printf call. I don't know what is eliminating this at the RTL level, but i can't see why it's illegal from *aliasing rules*. Maybe this is invalid for some other reason. } return 0; } ptr[i]->index = 0; On Mon, Aug 11, 2014 at 1:13 PM, Reid Kleckner <rnk at google.com> wrote:> +aliasing people > > I *think* this is valid, because the rules have always been described to me > in terms of underlying storage type, and not access path. These are all > ints, so all loads and stores can alias. > > > On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel <hfinkel at anl.gov> wrote: >> >> ----- Original Message ----- >> > From: "Tim Northover" <t.p.northover at gmail.com> >> > To: "Jonas Wagner" <jonas.wagner at epfl.ch> >> > Cc: "cfe-dev Developers" <cfe-dev at cs.uiuc.edu>, "LLVM Developers Mailing >> > List" <llvmdev at cs.uiuc.edu> >> > Sent: Friday, August 8, 2014 6:54:50 AM >> > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc too >> > aggressive or LLVM need to improve? >> > >> > > your C program invokes undefined behavior when it dereferences >> > > pointers that >> > > have been converted to other types. See for example >> > > >> > > stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior >> > >> > I don't think it's quite that simple.The type-based aliasing rules >> > come from 6.5p7 of C11, I think. That says: >> > >> > "An object shall have its stored value accessed only by an lvalue >> > expression that has one of >> > the following types: >> > + a type compatible with the effective type of the object, >> > [...] >> > + an aggregate or union type that includes one of the >> > aforementioned >> > types among its members [...]" >> > >> > That would seem to allow this usage: aa (effective type "int") is >> > being accessed via an lvalue "ptr[i]->index" of type "int". >> > >> > The second point would even seem to allow something like "ptr[i] >> > ..." if aa was declared "int aa[2];", though that seems to be going >> > too far. It also seems to be very difficult to pin down a meaning >> > (from the standard) for "a->b" if a is not a pointer to an object >> > with >> > the correct effective type. So the entire area is probably one that's >> > open to interpretation. >> > >> > I've added cfe-dev to the list; they're the *professional* language >> > lawyers. >> >> Coincidentally, this also seems to be PR20585 (adding Jiangning Liu, the >> reporter of that bug, to this thread too). >> >> -Hal >> >> > >> > Cheers. >> > >> > Tim. >> > _______________________________________________ >> > cfe-dev mailing list >> > cfe-dev at cs.uiuc.edu >> > lists.cs.uiuc.edu/mailman/listinfo/cfe-dev >> > >> >> -- >> Hal Finkel >> Assistant Computational Scientist >> Leadership Computing Facility >> Argonne National Laboratory >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu >> lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Richard Smith
2014-Aug-12 02:31 UTC
[LLVMdev] [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?
I'll take this from the C++ angle; the C rules are not the same, and I'm not confident they give the same answer. On Mon, Aug 11, 2014 at 2:09 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> The access path matters (in some sense), but this is, AFIAK, valid no > matter how you look at it. > > Let's take a look line by line > > #include <stdio.h> > struct heap {int index; int b;}; > struct heap **ptr; > int aa; > > int main() { > struct heap element; > struct heap *array[2]; > array[0] = (struct heap *)&aa; <- Okay so far. > array[1] = &element; <- Clearly okay > ptr = array; <- still okay so far > aa = 1; <- not pointer related. > int i; <- not pointer related > for (i =0; i< 2; i++) { <- not pointer related > printf("i is %d, aa is %d\n", i, aa); <- not pointer related > ptr[i]->index = 0; <- Here is where it gets wonky. > > <rest of codeis irrelevan> > > First, ptr[i] is an lvalue, of type struct heap *, and ptr[i]-> is an > lvalue of type struct heap (in C++03, this is 5.2.5 paragraph 3, check > footnote 59). >This is where we get the undefined behavior. 3.8/6: "[If you have a glvalue referring to storage but where there is no corresponding object, the] program has undefined behavior if: [...] the glvalue is used to access a non-static data member". There is no object of type 'heap' denoted by *ptr[0] (by 1.8/1, we can only create objects through definitions, new-expressions, and by creating temporary objects). So the behavior is undefined when we evaluate ptr[0]->index. (I'm too lazy to parse the rules for whether E1.E2 is an lvalue,> because it doesn't end up making a difference) > > Let's assume, for the sake of argument, the actual access to aa > occurs through an lvalue of type "struct heap" rather than "int" > In C++03 and C++11, it says: > > An object shall have its stored value accessed only by an lvalue > expression that has one of the following types: > > a type compatible with the effective type of the object, > a qualified version of a type compatible with the effective type of the > object, > a type that is the signed or unsigned type corresponding to the > effective type of the object, > a type that is the signed or unsigned type corresponding to a > qualified version of the effective type of the object, > an aggregate or union type that includes one of the aforementioned > types among its members (including, recursively, a member of a > subaggregate or contained union), or > a character type. > (C++11 adds something about dynamic type here) > > struct heap is "an aggregate or union type that includes one of the > aforementioned types among it's members". > > Thus, this is legal to access this int through an lvalue expression > that has a type of struct heap. > Whether the actual store is legal for other reasons, i don't know. > There are all kinds of rules about object alignment and value > representation that aren't my baliwick. I leave it to another > language lawyer to say whether it's okay to do a store to something > that is essentially, a partial object. > > Note that GCC actually knows this is legal to alias, at least at the > tree level. I debugged it there, and it definitely isn't eliminating > it at a high level. It also completely understands the call to ->index > = 0 affects "aa", and has a reload for aa before the printf call. > > I don't know what is eliminating this at the RTL level, but i can't > see why it's illegal from *aliasing rules*. Maybe this is invalid for > some other reason. > > > > > > > > } > return 0; > } > > ptr[i]->index = 0; > > On Mon, Aug 11, 2014 at 1:13 PM, Reid Kleckner <rnk at google.com> wrote: > > +aliasing people > > > > I *think* this is valid, because the rules have always been described to > me > > in terms of underlying storage type, and not access path. These are all > > ints, so all loads and stores can alias. > > > > > > On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel <hfinkel at anl.gov> wrote: > >> > >> ----- Original Message ----- > >> > From: "Tim Northover" <t.p.northover at gmail.com> > >> > To: "Jonas Wagner" <jonas.wagner at epfl.ch> > >> > Cc: "cfe-dev Developers" <cfe-dev at cs.uiuc.edu>, "LLVM Developers > Mailing > >> > List" <llvmdev at cs.uiuc.edu> > >> > Sent: Friday, August 8, 2014 6:54:50 AM > >> > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc too > >> > aggressive or LLVM need to improve? > >> > > >> > > your C program invokes undefined behavior when it dereferences > >> > > pointers that > >> > > have been converted to other types. See for example > >> > > > >> > > > stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior > >> > > >> > I don't think it's quite that simple.The type-based aliasing rules > >> > come from 6.5p7 of C11, I think. That says: > >> > > >> > "An object shall have its stored value accessed only by an lvalue > >> > expression that has one of > >> > the following types: > >> > + a type compatible with the effective type of the object, > >> > [...] > >> > + an aggregate or union type that includes one of the > >> > aforementioned > >> > types among its members [...]" > >> > > >> > That would seem to allow this usage: aa (effective type "int") is > >> > being accessed via an lvalue "ptr[i]->index" of type "int". > >> > > >> > The second point would even seem to allow something like "ptr[i] > >> > ..." if aa was declared "int aa[2];", though that seems to be going > >> > too far. It also seems to be very difficult to pin down a meaning > >> > (from the standard) for "a->b" if a is not a pointer to an object > >> > with > >> > the correct effective type. So the entire area is probably one that's > >> > open to interpretation. > >> > > >> > I've added cfe-dev to the list; they're the *professional* language > >> > lawyers. > >> > >> Coincidentally, this also seems to be PR20585 (adding Jiangning Liu, the > >> reporter of that bug, to this thread too). > >> > >> -Hal > >> > >> > > >> > Cheers. > >> > > >> > Tim. > >> > _______________________________________________ > >> > cfe-dev mailing list > >> > cfe-dev at cs.uiuc.edu > >> > lists.cs.uiuc.edu/mailman/listinfo/cfe-dev > >> > > >> > >> -- > >> Hal Finkel > >> Assistant Computational Scientist > >> Leadership Computing Facility > >> Argonne National Laboratory > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > >> lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > > lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20140811/59c6cad3/attachment.html>
Reasonably Related Threads
- [LLVMdev] [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?
- [LLVMdev] For alias analysis, It's gcc too aggressive or LLVM need to improve?
- [LLVMdev] For alias analysis, It's gcc too aggressive or LLVM need to improve?
- RFC: Resolving TBAA issues
- CFLAA