On Oct 7, 2013, at 11:49 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> > Hence it’s more meaningful to reason about TBAA in terms of its semantics rather than hypothesizing about how and why someone would produce it. > > That would be great, but it's not what the langref says, nor does it match up with the name of the thing you are creating, nor does it necessarily have optimal semantics, nor would it be great for future producers or the ability to do other analyses in those producers.Hey Daniel, Can you be more specific about your concerns here? It's true that we describe the TBAA nodes in terms of expression C-like type constraints, but the intention of the design has always been for it to be more general. Specifically, partitioning the heap for use-cases like what Phil is doing with Javascript was factored into the original design. We have even talked about adding type tags to represent C++ vtables (for example) since language loads and stores can't touch them (not even through char*). The datastructures and algorithms we have are powerful enough to express these sorts of things, and so long as a frontend abided by the rules, there shouldn't be a problem. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131010/b9028985/attachment.html>
On Thu, Oct 10, 2013 at 10:34 AM, Chris Lattner <clattner at apple.com> wrote:> On Oct 7, 2013, at 11:49 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > > >> Hence it’s more meaningful to reason about TBAA in terms of its semantics >> rather than hypothesizing about how and why someone would produce it. >> > > That would be great, but it's not what the langref says, nor does it match > up with the name of the thing you are creating, nor does it necessarily > have optimal semantics, nor would it be great for future producers or the > ability to do other analyses in those producers. > > > Hey Daniel, > > Can you be more specific about your concerns here? It's true that we > describe the TBAA nodes in terms of expression C-like type constraints, but > the intention of the design has always been for it to be more general. >> Specifically, partitioning the heap for use-cases like what Phil is doing > with Javascript was factored into the original design. We have even talked > about adding type tags to represent C++ vtables (for example) since > language loads and stores can't touch them (not even through char*). > > The datastructures and algorithms we have are powerful enough to express > these sorts of things, and so long as a frontend abided by the rules, there > shouldn't be a problem. >My concerns are simply that whether designed this way or not, it ends up fairly inconsistent. For example, what would the plan be when a frontend does something like clang does now for C/C++ (generate type based TBAA), and also wants to do something like Filip is suggesting (which is also doable on C/C++ with simple frontend analysis)? Generate a split tree of some sort, one side of which represents TBAA info, and the other side which represents partitioned abstract heaps?[1] It seems like that would be awfully confusing. However, it would now be necessary in order to use the new tbaa.read/tbaa.write metadata,, since they will only reference tbaa tags. But they only make a lot of sense on tbaa tags that point to partitioned heaps like filip's, so if you did want to actually to make use of them, you now have to put both the type info and the heap info in the same tree. You also run into issues with the existing metadata on loads/stores in this situation. It's again, entirely possible for a load to conflict with both a tbaa type, and a partitioned heap. In Filip's scheme, there is no way to represent this. Because of this, the only thing I essentially asked Filip to do was not place it in the same exact tree as we are currently putting type info into. Then your heap.read/heap.write metadata works with the heap tree (and you annotate loads/stores with your heap attributes), and your tbaa attributes work on the tbaa tree. You can tag a load/store with both a heap tag and a tbaa tag, and disambiguate based on both of them. Now, if the consensus is this is still a good idea, great. My suggestion would then be to update langref, rename the attributes, etc, to make this all more clear. --Dan [1] The other option of trying to generate some fake set of heaps that accurately represent the conflicts in both is, well, difficult :) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131010/c9d09c44/attachment.html>
> On Oct 10, 2013, at 8:53 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > > > > >> On Thu, Oct 10, 2013 at 10:34 AM, Chris Lattner <clattner at apple.com> wrote: >>> On Oct 7, 2013, at 11:49 PM, Daniel Berlin <dberlin at dberlin.org> wrote: >>>> >>>> Hence it’s more meaningful to reason about TBAA in terms of its semantics rather than hypothesizing about how and why someone would produce it. >>> >>> That would be great, but it's not what the langref says, nor does it match up with the name of the thing you are creating, nor does it necessarily have optimal semantics, nor would it be great for future producers or the ability to do other analyses in those producers. >> >> Hey Daniel, >> >> Can you be more specific about your concerns here? It's true that we describe the TBAA nodes in terms of expression C-like type constraints, but the intention of the design has always been for it to be more general. >> >> Specifically, partitioning the heap for use-cases like what Phil is doing with Javascript was factored into the original design. We have even talked about adding type tags to represent C++ vtables (for example) since language loads and stores can't touch them (not even through char*). >> >> The datastructures and algorithms we have are powerful enough to express these sorts of things, and so long as a frontend abided by the rules, there shouldn't be a problem. > > > My concerns are simply that whether designed this way or not, it ends up fairly inconsistent. > > For example, what would the plan be when a frontend does something like clang does now for C/C++ (generate type based TBAA), and also wants to do something like Filip is suggesting (which is also doable on C/C++ with simple frontend analysis)?Are you worried about clang doing this, or are you worried about WebKit doing this? Or are you worried about some other front end doing it? WebKit won't do it because we only have abstract heaps. We have no notion of types that originate from the source language. I've also considered - as a thought experiment - front ends for other languages. For example, in Java you would probably use TBAA just to express the space of fields (I.e. Field name at Class name at ClassLoader). Broadly I believe that using TBAA to literally express the types of the source language is more of a C-ism than a general use case. I think that higher level type safe languages have a more-or-less obvious mapping to abstract heaps, and they tend to be *mostly* just disjoint sets. This mapping usually doesn't involve describing the source language' style hierarchy, as much as it involves describing the proofs about aliasing that arise from that language's type system (and whatever analyses the frontend is able to perform). TBAA's ability to express a hierarchy, as opposed to just disjoint sets, is kind of an escape mechanism for the cases where disjoint sets are too constraining.> > Generate a split tree of some sort, one side of which represents TBAA info, and the other side which represents partitioned abstract heaps?[1] > It seems like that would be awfully confusing.Let's take the Java example, since that's sort of a great example of a practical sound type system. Heck, JS VMs try to *at best* infer types that are Java-esque. I've given an example above of abstract heaps that I would construct using TBAA. In your split tree world, what would the *other* TBAA data be? Why would you want to use TBAA for anything other than field names? (And yes I understand you'd use TBAA differently for array element types - but those would be disjoint to field types anyway. And having a hierarchy there is slightly useful.) I think it would be useful to get an example of what you're worried about and how it would manifest in IR and attributes generated from some concrete frontend.> > However, it would now be necessary in order to use the new tbaa.read/tbaa.write metadata,, since they will only reference tbaa tags. But they only make a lot of sense on tbaa tags that point to partitioned heaps like filip's, so if you did want to actually to make use of them, you now have to put both the type info and the heap info in the same tree. > > You also run into issues with the existing metadata on loads/stores in this situation. It's again, entirely possible for a load to conflict with both a tbaa type, and a partitioned heap. In Filip's scheme, there is no way to represent this. > > Because of this, the only thing I essentially asked Filip to do was not place it in the same exact tree as we are currently putting type info into. > > Then your heap.read/heap.write metadata works with the heap tree (and you annotate loads/stores with your heap attributes), and your tbaa attributes work on the tbaa tree. You can tag a load/store with both a heap tag and a tbaa tag, and disambiguate based on both of them. > > Now, if the consensus is this is still a good idea, great. My suggestion would then be to update langref, rename the attributes, etc, to make this all more clear. > > --Dan > [1] The other option of trying to generate some fake set of heaps that accurately represent the conflicts in both is, well, difficult :)-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131010/51b86aed/attachment.html>
On Oct 10, 2013, at 8:53 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> The datastructures and algorithms we have are powerful enough to express these sorts of things, and so long as a frontend abided by the rules, there shouldn't be a problem. > > > My concerns are simply that whether designed this way or not, it ends up fairly inconsistent. > > For example, what would the plan be when a frontend does something like clang does now for C/C++ (generate type based TBAA), and also wants to do something like Filip is suggesting (which is also doable on C/C++ with simple frontend analysis)?There are two possible answers here, depending on the constraints: 1. The frontend author could unify them into one grand tree, like struct field TBAA does. 2. The frontend author could model them as two separate TBAA trees, which the TBAA machinery in LLVM handles conservatively. The conservative handling of different TBAA trees is critical, because you can LTO (for example) Javascript into C++ code in principle, each using their own TBAA structure. LLVM is already well set for this, and it isn't an accident :-)> You also run into issues with the existing metadata on loads/stores in this situation. It's again, entirely possible for a load to conflict with both a tbaa type, and a partitioned heap. In Filip's scheme, there is no way to represent this.I'm not sure what you mean. The compiler will handle this conservatively. If you have two different schemas existing in the same application (e.g. due to LTO or due to a language implementing two different non-unified models for some weird reason) then the compiler just doesn't draw any aliasing implications from references using two different schemas. It is possible in principle to allow a load (for example) to have an arbitrary number of TBAA tags on it, which would solve this (allowing a single load to participate in multiple non-overlapping schemas) but I don't think it is worth the complexity at all. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131010/22023c9e/attachment.html>
On Oct 10, 2013, at 8:53 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> > > > On Thu, Oct 10, 2013 at 10:34 AM, Chris Lattner <clattner at apple.com> wrote: > On Oct 7, 2013, at 11:49 PM, Daniel Berlin <dberlin at dberlin.org> wrote: >> >> Hence it’s more meaningful to reason about TBAA in terms of its semantics rather than hypothesizing about how and why someone would produce it. >> >> That would be great, but it's not what the langref says, nor does it match up with the name of the thing you are creating, nor does it necessarily have optimal semantics, nor would it be great for future producers or the ability to do other analyses in those producers. > > Hey Daniel, > > Can you be more specific about your concerns here? It's true that we describe the TBAA nodes in terms of expression C-like type constraints, but the intention of the design has always been for it to be more general. > > Specifically, partitioning the heap for use-cases like what Phil is doing with Javascript was factored into the original design. We have even talked about adding type tags to represent C++ vtables (for example) since language loads and stores can't touch them (not even through char*). > > The datastructures and algorithms we have are powerful enough to express these sorts of things, and so long as a frontend abided by the rules, there shouldn't be a problem. > > > My concerns are simply that whether designed this way or not, it ends up fairly inconsistent. > > For example, what would the plan be when a frontend does something like clang does now for C/C++ (generate type based TBAA), and also wants to do something like Filip is suggesting (which is also doable on C/C++ with simple frontend analysis)? > > Generate a split tree of some sort, one side of which represents TBAA info, and the other side which represents partitioned abstract heaps?[1] > It seems like that would be awfully confusing. > > However, it would now be necessary in order to use the new tbaa.read/tbaa.write metadata,, since they will only reference tbaa tags. But they only make a lot of sense on tbaa tags that point to partitioned heaps like filip's, so if you did want to actually to make use of them, you now have to put both the type info and the heap info in the same tree. > > You also run into issues with the existing metadata on loads/stores in this situation. It's again, entirely possible for a load to conflict with both a tbaa type, and a partitioned heap. In Filip's scheme, there is no way to represent this. > > Because of this, the only thing I essentially asked Filip to do was not place it in the same exact tree as we are currently putting type info into. > > Then your heap.read/heap.write metadata works with the heap tree (and you annotate loads/stores with your heap attributes), and your tbaa attributes work on the tbaa tree. You can tag a load/store with both a heap tag and a tbaa tag, and disambiguate based on both of them. > > Now, if the consensus is this is still a good idea, great. My suggestion would then be to update langref, rename the attributes, etc, to make this all more clear. > > --Dan > [1] The other option of trying to generate some fake set of heaps that accurately represent the conflicts in both is, well, difficult :) > _______________________________________________JSC is able to make better use of the original TBAA design by encoding the access path within the type itself. Roughly: JSCType->Class->Field/Index This is what we're doing with C++ struct-path TBAA now, but simpler because only one access path is legal. JSC does need a hierarchical representation which makes TBAA a perfect fit. The only decision point to make now is whether to allow current TBAA to apply to calls. That seems entirely consistent with LLVM's approach to memory disambiguation. I do think this discussion is interesting and useful with respect to adding different styles of alias analysis in the future, but we are way beyond the scope of the RFC. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131010/4c82e7a9/attachment.html>