> > > > I do not believe the current proposal will solve all of those cases, > particularly when the fields are the same type and structures are > compatible but they cannot overlap in C/C++ anyway. > > One of the threads is titled "[PATCH] D20665: Claim NoAlias if two GEPsindex different fields of the same struct" For example, given struct { int arr_a[2]; int arr_b[2]; }; assume you cannot see the original allocation site. in llvm ir gep(arr_b, -1) is legally an access to arr_a[1]. You can use -1 even though it's going to be a pointer to [2 x i32]. Thus, you can't even tell that gep(arr_a, 0) and gep(arr_b, -1) do not overlap without being able to know *something* about the layout of fields in the structure you are talking about. I'd start with: It should not require tbaa to determine that loads from geps that arr_a and arr_b cannot overlap. It is true regardless of the types involved. In terms of "who cares", Google definitely compiles with -fno-strict-aliasing (because third party packages are still not clean enough), and last i looked, Apple did the same (but i admittedly have not kept up). GCC can definitely disambiguate field accesses (through points-to and otherwise) better than LLVM in a situation where strict aliasing is off. As an aside, i also can't build a sane field-sensitive points-to on our current type system, because the types and structures are already meaningless (and we are busy making it weaker, too). I don't think we are going to want to tie field-sensitive points-to to TBAA (you definitely want to be able to run the former without the latter), but right now that is the only metadata you can use. Finally, the merging of TBAA is definitely going to be more conservative than the merging of field offset info: If we merge a load of an int and a float, we will, IIRC, go to the nearest common ancestor in TBAA. The field offset info may actually still be identical between the two, but we will lose it by creating/or going to the common ancestor. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/f8e5c0dd/attachment.html>
> > > > Finally, the merging of TBAA is definitely going to be more conservative > than the merging of field offset info: If we merge a load of an int and a > float, we will, IIRC, go to the nearest common ancestor in TBAA. The > field offset info may actually still be identical between the two, but we > will lose it by creating/or going to the common ancestor. > >Imagine int - offset 4 float - offset 4 int - offset 12 merge(first int field, float) =mergeintfloat -no offset info You can no longer disambiguate this against second int field, even though it can't possibly overlap, not for type reasons, but for offset reasons.>> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/086cb942/attachment.html>
On 08/20/2017 12:02 PM, Daniel Berlin wrote:> > > > I do not believe the current proposal will solve all of those > cases, particularly when the fields are the same type and > structures are compatible but they cannot overlap in C/C++ anyway. > > One of the threads is titled "[PATCH] D20665: Claim NoAlias if two > GEPs index different fields of the same struct" > > For example, given > struct { > int arr_a[2]; > int arr_b[2]; > }; > assume you cannot see the original allocation site. > in llvm ir gep(arr_b, -1) is legally an access to arr_a[1]. > You can use -1 even though it's going to be a pointer to [2 x i32]. > Thus, you can't even tell that gep(arr_a, 0) and gep(arr_b, -1) do > not overlap without being able to know *something* about the layout of > fields in the structure you are talking about. > > I'd start with: It should not require tbaa to determine that loads > from geps that arr_a and arr_b cannot overlap. It is true regardless > of the types involved. > > In terms of "who cares", Google definitely compiles with > -fno-strict-aliasing (because third party packages are still not clean > enough), and last i looked, Apple did the same (but i admittedly have > not kept up).We definitely also have code that we compile that way as well. As it turns out, this is my motivation for developing the type sanitizer (so we have some tool that users can employ to clean up this kind of code). Patches have been posted for review.> > GCC can definitely disambiguate field accesses (through points-to and > otherwise) better than LLVM in a situation where strict aliasing is off. > > As an aside, i also can't build a sane field-sensitive points-to on > our current type system, because the types and structures are already > meaningless (and we are busy making it weaker, too). > I don't think we are going to want to tie field-sensitive points-to to > TBAA (you definitely want to be able to run the former without the > latter), but right now that is the only metadata you can use.This also brings up a good point. Even if we use the same metadata for both type and field analysis, I don't see why we can't disable the type portions without disabling the field analysis (essentially by emitting everything as one universally-aliasing type). Maybe we should do that for -fno-strict-aliasing? Thanks again, Hal> > Finally, the merging of TBAA is definitely going to be more > conservative than the merging of field offset info: If we merge a load > of an int and a float, we will, IIRC, go to the nearest common > ancestor in TBAA. The field offset info may actually still be > identical between the two, but we will lose it by creating/or going to > the common ancestor. > > > >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/b7de83dd/attachment.html>
On 08/20/2017 12:11 PM, Daniel Berlin wrote:> > > > Finally, the merging of TBAA is definitely going to be more > conservative than the merging of field offset info: If we merge a > load of an int and a float, we will, IIRC, go to the nearest > common ancestor in TBAA. The field offset info may actually > still be identical between the two, but we will lose it by > creating/or going to the common ancestor. > > > Imagine > int - offset 4 > float - offset 4 > int - offset 12 > > merge(first int field, float) => mergeintfloat -no offset info > You can no longer disambiguate this against second int field, even > though it can't possibly overlap, not for type reasons, but for offset > reasons.Okay, but I don't see why we have to do that. Could we not do merge(first int field, float) == mergeintfloat @ offset 4 (where mergeintfloat is probably char or similar) where we keep matching offsets? Or we encode some kind of disjunction directly (which certainly seems reasonable to me for access merging). Thanks again, Hal> > > > >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/08755ac0/attachment.html>
On Sun, Aug 20, 2017 at 11:22 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 08/20/2017 12:02 PM, Daniel Berlin wrote: > > >> >> I do not believe the current proposal will solve all of those cases, >> particularly when the fields are the same type and structures are >> compatible but they cannot overlap in C/C++ anyway. >> >> One of the threads is titled "[PATCH] D20665: Claim NoAlias if two GEPs > index different fields of the same struct" > > For example, given > struct { > int arr_a[2]; > int arr_b[2]; > }; > assume you cannot see the original allocation site. > in llvm ir gep(arr_b, -1) is legally an access to arr_a[1]. > You can use -1 even though it's going to be a pointer to [2 x i32]. > Thus, you can't even tell that gep(arr_a, 0) and gep(arr_b, -1) do not > overlap without being able to know *something* about the layout of fields > in the structure you are talking about. > > I'd start with: It should not require tbaa to determine that loads from > geps that arr_a and arr_b cannot overlap. It is true regardless of the > types involved. > > In terms of "who cares", Google definitely compiles with > -fno-strict-aliasing (because third party packages are still not clean > enough), and last i looked, Apple did the same (but i admittedly have not > kept up). > > > We definitely also have code that we compile that way as well. As it turns > out, this is my motivation for developing the type sanitizer (so we have > some tool that users can employ to clean up this kind of code). Patches > have been posted for review. > >(and we're looking into using it to do just that :P)> > GCC can definitely disambiguate field accesses (through points-to and > otherwise) better than LLVM in a situation where strict aliasing is off. > > As an aside, i also can't build a sane field-sensitive points-to on our > current type system, because the types and structures are already > meaningless (and we are busy making it weaker, too). > I don't think we are going to want to tie field-sensitive points-to to > TBAA (you definitely want to be able to run the former without the latter), > but right now that is the only metadata you can use. > > > This also brings up a good point. Even if we use the same metadata for > both type and field analysis, I don't see why we can't disable the type > portions without disabling the field analysis (essentially by emitting > everything as one universally-aliasing type). Maybe we should do that for > -fno-strict-aliasing? >That actually sounds very reasonable to me, if we can make it work. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/2ad670ee/attachment.html>
On 08/20/2017 12:02 PM, Daniel Berlin wrote:> > > > I do not believe the current proposal will solve all of those > cases, particularly when the fields are the same type and > structures are compatible but they cannot overlap in C/C++ anyway. > > One of the threads is titled "[PATCH] D20665: Claim NoAlias if two > GEPs index different fields of the same struct" > > For example, given > struct { > int arr_a[2]; > int arr_b[2]; > }; > assume you cannot see the original allocation site. > in llvm ir gep(arr_b, -1) is legally an access to arr_a[1]. > You can use -1 even though it's going to be a pointer to [2 x i32]. > Thus, you can't even tell that gep(arr_a, 0) and gep(arr_b, -1) do > not overlap without being able to know *something* about the layout of > fields in the structure you are talking about.Agreed (and this certainly does motivate keeping both size and offset information for the fields). The other thing that I think it's important to do in this respect is to record whether or not it's legal to do this kind of inter-field indexing. In C, I believe you can always legally do this. In C++, it is always true for standard-layout types, but otherwise, it is up to the implementation (i.e., to whatever the implementation allows the application of the offsetof macro). In saying this, I'm strengthening the wording in the standard in the following sense: The C++ rules for pointer arithmetic and safely-derived pointer values, at least for implementations with strict pointer safety, disallow this kind of inter-field addressing, except perhaps in the case of two adjacent variables in standard-layout classes, for everything. However, it's also clear that whenever you can apply the offsetof macro all of the relative offsets are part of the semantic model of the abstract machine, and due to practical considerations if nothing else, I suspect we can't reasonably restrict this behavior for standard-layout classes. Thanks again, Hal> > I'd start with: It should not require tbaa to determine that loads > from geps that arr_a and arr_b cannot overlap. It is true regardless > of the types involved. > > In terms of "who cares", Google definitely compiles with > -fno-strict-aliasing (because third party packages are still not clean > enough), and last i looked, Apple did the same (but i admittedly have > not kept up). > > GCC can definitely disambiguate field accesses (through points-to and > otherwise) better than LLVM in a situation where strict aliasing is off. > > As an aside, i also can't build a sane field-sensitive points-to on > our current type system, because the types and structures are already > meaningless (and we are busy making it weaker, too). > I don't think we are going to want to tie field-sensitive points-to to > TBAA (you definitely want to be able to run the former without the > latter), but right now that is the only metadata you can use. > > Finally, the merging of TBAA is definitely going to be more > conservative than the merging of field offset info: If we merge a load > of an int and a float, we will, IIRC, go to the nearest common > ancestor in TBAA. The field offset info may actually still be > identical between the two, but we will lose it by creating/or going to > the common ancestor. > > > >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/74aa2673/attachment.html>