On 08/20/2017 11:22 AM, Daniel Berlin via llvm-dev wrote:> Sorry, hit send early. > > > On Sun, Aug 20, 2017 at 9:16 AM, Daniel Berlin <dberlin at dberlin.org > <mailto:dberlin at dberlin.org>> wrote: > > > > On Sun, Aug 20, 2017 at 8:54 AM, Ivan A. Kosarev via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Hello Daniel, > > > The problem with the way you are trying to show this is that > > there are many ways to prove no-alias, and TBAA is one of them. > > The reason i stare at dump files and debug info is precisely to > > separate the TBAA portion from the rest. > > Makes sense to me. However, for a translation unit like this: > > struct BUF1 { ... }; > struct BUF2 { ... }; > > int foo(int n, struct BUF1* p, struct BUF2* q) { > for (int i = 0; i < n; i++) > p->b1 += q->b2; > return 0; > } > > I think we can be sure there are no ways for the compiler to > know that these two accesses do not overlap, except TBAA. > > > This is definitely false in general. > Again, speaking about GCC, the logic for whether fields can be > accessed is separate from the logic about whether TBAA says fields > can be accessed. > In some cases the flags to control the logic are both controlled > by fstrict-aliasing, but are unrelated to tbaa. >Our current TBAA combines these two things (field-offset-based determinations and strictly-type-based rules) into what we call TBAA. This proposal does likewise. Are there advantages to splitting them that we should consider? Thanks again, Hal> > Even if you have tried to place the fields at the same offset, as > you have, whether it can disambiguate the accesses can depend on > more than just TBAA, including alignment rules, etc. > > > You definitely may be able to come up with examples where only tbaa > *should* be active, but i don't think it's really a safe way to go > about testing assumptions about TBAA. > For example, it also assumes no bugs in the other methods of analysis, > which is defintitely not a safe assumption :) > > If you only care about the *end result* (IE whether it's allow to say > the accesses overlap) it is generally going to be okay, but again, > this assumes no bug in any implementation > > If you want to test tbaa specific things for real, you'd have to print > the tbaa trees and results as gcc sees them, for example. > That's really the only way to be sure. > > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/a55b48dc/attachment.html>
On Sun, Aug 20, 2017 at 9:27 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 08/20/2017 11:22 AM, Daniel Berlin via llvm-dev wrote: > > Sorry, hit send early. > > > On Sun, Aug 20, 2017 at 9:16 AM, Daniel Berlin <dberlin at dberlin.org> > wrote: > >> >> >> On Sun, Aug 20, 2017 at 8:54 AM, Ivan A. Kosarev via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Hello Daniel, >>> >>> > The problem with the way you are trying to show this is that >>> > there are many ways to prove no-alias, and TBAA is one of them. >>> > The reason i stare at dump files and debug info is precisely to >>> > separate the TBAA portion from the rest. >>> >>> Makes sense to me. However, for a translation unit like this: >>> >>> struct BUF1 { ... }; >>> struct BUF2 { ... }; >>> >>> int foo(int n, struct BUF1* p, struct BUF2* q) { >>> for (int i = 0; i < n; i++) >>> p->b1 += q->b2; >>> return 0; >>> } >>> >>> I think we can be sure there are no ways for the compiler to know that >>> these two accesses do not overlap, except TBAA. >> >> >> This is definitely false in general. >> Again, speaking about GCC, the logic for whether fields can be accessed >> is separate from the logic about whether TBAA says fields can be accessed. >> In some cases the flags to control the logic are both controlled by >> fstrict-aliasing, but are unrelated to tbaa. >> > > Our current TBAA combines these two things (field-offset-based > determinations and strictly-type-based rules) into what we call TBAA. This > proposal does likewise. Are there advantages to splitting them that we > should consider? > >Yes. GEP has no relation to original field accesses, as you know (IE we allow them to access negative offsets, etc) For a lot of these languages, more than the TBAA rules say that you can't just go marching through structures, etc. We also have cases TBAA cannot disambiguate but C/C++ says the fields can't overlap. We lack the metadata to correctly say they cannot, because it cannot be inferred from geps. (if you search for threads with taewook a while back, you will find some patches and discussion). I do not believe the current proposal will solve all of those cases, particularly when the fields are the same type and structures are compatible but they cannot overlap in C/C++ anyway. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/ba4525b0/attachment.html>
> > > > I do not believe the current proposal will solve all of those cases, > particularly when the fields are the same type and structures are > compatible but they cannot overlap in C/C++ anyway. > > One of the threads is titled "[PATCH] D20665: Claim NoAlias if two GEPsindex different fields of the same struct" For example, given struct { int arr_a[2]; int arr_b[2]; }; assume you cannot see the original allocation site. in llvm ir gep(arr_b, -1) is legally an access to arr_a[1]. You can use -1 even though it's going to be a pointer to [2 x i32]. Thus, you can't even tell that gep(arr_a, 0) and gep(arr_b, -1) do not overlap without being able to know *something* about the layout of fields in the structure you are talking about. I'd start with: It should not require tbaa to determine that loads from geps that arr_a and arr_b cannot overlap. It is true regardless of the types involved. In terms of "who cares", Google definitely compiles with -fno-strict-aliasing (because third party packages are still not clean enough), and last i looked, Apple did the same (but i admittedly have not kept up). GCC can definitely disambiguate field accesses (through points-to and otherwise) better than LLVM in a situation where strict aliasing is off. As an aside, i also can't build a sane field-sensitive points-to on our current type system, because the types and structures are already meaningless (and we are busy making it weaker, too). I don't think we are going to want to tie field-sensitive points-to to TBAA (you definitely want to be able to run the former without the latter), but right now that is the only metadata you can use. Finally, the merging of TBAA is definitely going to be more conservative than the merging of field offset info: If we merge a load of an int and a float, we will, IIRC, go to the nearest common ancestor in TBAA. The field offset info may actually still be identical between the two, but we will lose it by creating/or going to the common ancestor. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170820/f8e5c0dd/attachment.html>
Daniel, > GEP has no relation to original field accesses, as you know (IE > we allow them to access negative offsets, etc) > For a lot of these languages, more than the TBAA rules say that > you can't just go marching through structures, etc. So with the current approach we mix two different things: alias rules for types and information about specific accesses, such as offsets. What this means is, whatever we can conclude from considering a couple of accesses represented with such a mix, it can never extend beyond the scope of what Clang treats as a single access, that is, an expression of the form 'p->a.b.c'. Same expression split into parts, e.g., 'p2 = &p->a.b; p2->c', results in a less specific description of the access and, as a consequence, in a greater number of potential false positives. In turn, proving that 'p2' relates to 'p' is up to analyses that deal with memory locations and not memory accesses. Looks like long-term the current approach drives us nowhere. If I take it correctly, purifying TBAA information from offsets means we end up with a sort of alias sets. Then, offsets go to another metadata tag that encode accesses in terms of constraint expressions. These tags are supposed to be processed with what eventually should become an implementation of the field-sensitive points-to analysis. This would also resolve the BasicAA vs. TBAA responses issue. I wonder if !tbaa tags for loads and stores reworked to refer to both alias sets and constraint expressions would work as a transient format for groping our way toward full-size field-sensitive. Thanks, --