Hi all, TBAA on loads and stores is super useful for conveying high-level type knowledge to LLVM transformations. But often translating a high-level language into LLVM IR will result in runtime calls that are known at the high level to only affect (either by reading or by writing) some restricted subset of the heap. These aren't read-only calls; that is often too strong of a guarantee. A great example would be a GC barrier or GC allocation that may arise if LLVM is used as a backend for a compiler for a higher-level language. There are probably other examples that arise in high-level languages, but let's use allocation as an example for now. I can envision two ways of representing allocation in LLVM IR: Method #1: %object = call @runtime_allocate(... /* pass size and/or other data */) Method #2: SomeBlock: ... ; code preceding the allocation %objectFast = ... ; IR for the inlined allocation fast path, which produces %objectFast but may yield null if the fast path fails and we need to take slow path %didFail = icmp eq %objectFast, null br %didFail, label %AllocationSlowPath, label &Continue AllocationSlowPath: %objectSlow = call @runtime_allocate_slow_path(... /* pass size and/or other data */) br label %Continue Continue: %object = phi [%objectFast, SomeBlock], [%objectSlow, AllocationSlowPath] In both of these cases, the call instruction will appear to clobber the world leading to more pessimistic optimization. Moreover, it's not always correct to mark the allocation as being readonly; it may read or write observable state. But if you don't like the allocation example then you can imagine that a GC store barrier would have an almost identical pattern to #2 and it would definitely not be readonly. Anyway, such a pattern would lead to fewer loads being eliminated or hoisted, fewer stores being eliminated or sunk, etc - all due to those pesky calls. But the high-level code generator that produces this IR will know for sure that @runtime_allocate or @runtime_allocate_slow_path can only affect a very narrow subset of the heap: namely, it will only read and write the GC's data structures. So most of the loads and stores surrounding the allocation, that arose from source-level loads and stores, would definitely not have any interference with the call and this would be easy to convey using TBAA. Hence if we could decorate the calls to these runtime functions as only reading or writing certain TBAA types, then LLVM would be able to perform more CSE/GVN/LICM and probably other goodness as well. Perhaps there are alternate ways of representing allocation or write barriers - but in my experience when LLVM is integrated into a larger system the above IR appears to be optimal for other reasons; and anyways, this issue arises in a bunch of other situations. I'm currently working on an LLVM-based compiler backend for a JavaScript VM and in this project I can see the following cases have already arisen: - GC allocation slow paths - GC write barrier slow paths - Object reshaping slow paths (JS VMs get objects to be fast by sometimes changing their shape which isn't observable to the user and only affects certain well-defined fields) - Array access slow paths (if JS type inference tells me that at array really is an array but I cannot prove that the access is in-bounds then I still have some work to do for the out-of-bounds case but I can prove that it will only read/write some things rather than all of the things) There are likely to be more such cases. We out-of-line a lot of gnarly code, and that code is often compiled by a compiler other than LLVM and we won't necessarily be able to feed the IR of those runtime functions to LLVM. Sometimes that code is hand-written assembly so having LLVM analyze it is pretty much a non-starter. Much of that out-of-lined code has a narrow set of effects. We can certainly summarize what those effects are using TBAA, so that seems like the most pragmatic way of doing it. So here's a strawman for how to represent TBAA on calls. Ideally a call instruction would list what it reads and what it writes. In most cases this will be overkill, so if we see: call @foo(...) !tbaa !42 Then this ought to be interpreted as both reading and writing. We could then also have new kinds of meta-data references: !tbaa.read !tbaa.write So that saying: call @foo(...) !tbaa.read !42 !tbaa.write !84 Indicates that this call to @foo reads whatever !42 is and writes whatever !84 is. Calls that are not decorated with !tbaa, !tbaa.read, or !tbaa.write must be interpreted as if they read or wrote anything, subject of course to function attributes (like readonly). If you ignore the TBAA metadata on a call then you should also assume that the function read or wrote anything, again subject to function attributes. If a function call has both a readonly attribute and some TBAA data, then it should be safe to use either the attribute or the metadata for the purpose of dependency analysis. If you use both sources of information then the set of possible memory locations that are written is the intersection of the set allowed by the attributes and the set allowed by the TBAA metadata; for example if @foo (in the above example) were marked readonly then the !tbaa.write !84 can be ignored. I'm not entirely convinced that this representation really is totally general; for example you can imaging wanting to list the set of things read or written and I gather that this would be hard using my suggested syntax. But it's just a strawman - the bigger question is: can anyone see show-stopper issues that would prevent TBAA on calls from ever being a thing? Also this would be a sizable change - a bunch of dependency analysis stuff would have to be taught to look at calls in addition to loads and stores. But I think it could be broadly useful, maybe even beyond the high-level language compilers that I'm familiar with. Thoughts? -Filip
On Sep 25, 2013, at 5:50 PM, Filip Pizlo <fpizlo at apple.com> wrote:> Hi all, > > TBAA on loads and stores is super useful for conveying high-level type knowledge to LLVM transformations. But often translating a high-level language into LLVM IR will result in runtime calls that are known at the high level to only affect (either by reading or by writing) some restricted subset of the heap. These aren't read-only calls; that is often too strong of a guarantee. A great example would be a GC barrier or GC allocation that may arise if LLVM is used as a backend for a compiler for a higher-level language. There are probably other examples that arise in high-level languages, but let's use allocation as an example for now. I can envision two ways of representing allocation in LLVM IR: > > Method #1: > > %object = call @runtime_allocate(... /* pass size and/or other data */) > > Method #2: > > SomeBlock: > ... ; code preceding the allocation > %objectFast = ... ; IR for the inlined allocation fast path, which produces %objectFast but may yield null if the fast path fails and we need to take slow path > %didFail = icmp eq %objectFast, null > br %didFail, label %AllocationSlowPath, label &Continue > AllocationSlowPath: > %objectSlow = call @runtime_allocate_slow_path(... /* pass size and/or other data */) > br label %Continue > Continue: > %object = phi [%objectFast, SomeBlock], [%objectSlow, AllocationSlowPath] > > In both of these cases, the call instruction will appear to clobber the world leading to more pessimistic optimization. Moreover, it's not always correct to mark the allocation as being readonly; it may read or write observable state. But if you don't like the allocation example then you can imagine that a GC store barrier would have an almost identical pattern to #2 and it would definitely not be readonly. Anyway, such a pattern would lead to fewer loads being eliminated or hoisted, fewer stores being eliminated or sunk, etc - all due to those pesky calls. But the high-level code generator that produces this IR will know for sure that @runtime_allocate or @runtime_allocate_slow_path can only affect a very narrow subset of the heap: namely, it will only read and write the GC's data structures. So most of the loads and stores surrounding the allocation, that arose from source-level loads and stores, would definitely not have any interference with the call and this would ! > be easy to convey using TBAA. > > Hence if we could decorate the calls to these runtime functions as only reading or writing certain TBAA types, then LLVM would be able to perform more CSE/GVN/LICM and probably other goodness as well. > > Perhaps there are alternate ways of representing allocation or write barriers - but in my experience when LLVM is integrated into a larger system the above IR appears to be optimal for other reasons; and anyways, this issue arises in a bunch of other situations. I'm currently working on an LLVM-based compiler backend for a JavaScript VM and in this project I can see the following cases have already arisen: > > - GC allocation slow paths > - GC write barrier slow paths > - Object reshaping slow paths (JS VMs get objects to be fast by sometimes changing their shape which isn't observable to the user and only affects certain well-defined fields) > - Array access slow paths (if JS type inference tells me that at array really is an array but I cannot prove that the access is in-bounds then I still have some work to do for the out-of-bounds case but I can prove that it will only read/write some things rather than all of the things) > > There are likely to be more such cases. We out-of-line a lot of gnarly code, and that code is often compiled by a compiler other than LLVM and we won't necessarily be able to feed the IR of those runtime functions to LLVM. Sometimes that code is hand-written assembly so having LLVM analyze it is pretty much a non-starter. Much of that out-of-lined code has a narrow set of effects. We can certainly summarize what those effects are using TBAA, so that seems like the most pragmatic way of doing it. > > So here's a strawman for how to represent TBAA on calls. > > Ideally a call instruction would list what it reads and what it writes. In most cases this will be overkill, so if we see: > > call @foo(...) !tbaa !42 > > Then this ought to be interpreted as both reading and writing. We could then also have new kinds of meta-data references: > > !tbaa.read > !tbaa.write > > So that saying: > > call @foo(...) !tbaa.read !42 !tbaa.write !84 > > Indicates that this call to @foo reads whatever !42 is and writes whatever !84 is. > > Calls that are not decorated with !tbaa, !tbaa.read, or !tbaa.write must be interpreted as if they read or wrote anything, subject of course to function attributes (like readonly). If you ignore the TBAA metadata on a call then you should also assume that the function read or wrote anything, again subject to function attributes. If a function call has both a readonly attribute and some TBAA data, then it should be safe to use either the attribute or the metadata for the purpose of dependency analysis. If you use both sources of information then the set of possible memory locations that are written is the intersection of the set allowed by the attributes and the set allowed by the TBAA metadata; for example if @foo (in the above example) were marked readonly then the !tbaa.write !84 can be ignored. > > I'm not entirely convinced that this representation really is totally general; for example you can imaging wanting to list the set of things read or written and I gather that this would be hard using my suggested syntax. But it's just a strawman - the bigger question is: can anyone see show-stopper issues that would prevent TBAA on calls from ever being a thing? > > Also this would be a sizable change - a bunch of dependency analysis stuff would have to be taught to look at calls in addition to loads and stores. But I think it could be broadly useful, maybe even beyond the high-level language compilers that I'm familiar with. > > Thoughts?The general idea seems right to me. I can’t think of a better way to express the side effect of type-checks and special helpers provided by the runtime. I don’t have much opinion on the meta-data syntax other than that it should have room for extensions. I was expecting a little more discussion on this one. But FWIW I think it’s a good idea. -Andy
On Wed, Sep 25, 2013 at 5:50 PM, Filip Pizlo <fpizlo at apple.com> wrote:> Hi all, > > TBAA on loads and stores is super useful for conveying high-level type > knowledge to LLVM transformations. But often translating a high-level > language into LLVM IR will result in runtime calls that are known at the > high level to only affect (either by reading or by writing) some restricted > subset of the heap. These aren't read-only calls; that is often too strong > of a guarantee. A great example would be a GC barrier or GC allocation > that may arise if LLVM is used as a backend for a compiler for a > higher-level language. There are probably other examples that arise in > high-level languages, but let's use allocation as an example for now. I > can envision two ways of representing allocation in LLVM IR: > > Method #1: > > %object = call @runtime_allocate(... /* pass size and/or other > data */) > > Method #2: > > SomeBlock: > ... ; code preceding the allocation > %objectFast = ... ; IR for the inlined allocation fast path, > which produces %objectFast but may yield null if the fast path fails and we > need to take slow path > %didFail = icmp eq %objectFast, null > br %didFail, label %AllocationSlowPath, label &Continue > AllocationSlowPath: > %objectSlow = call @runtime_allocate_slow_path(... /* pass > size and/or other data */) > br label %Continue > Continue: > %object = phi [%objectFast, SomeBlock], [%objectSlow, > AllocationSlowPath] > > In both of these cases, the call instruction will appear to clobber the > world leading to more pessimistic optimization. Moreover, it's not always > correct to mark the allocation as being readonly; it may read or write > observable state. But if you don't like the allocation example then you > can imagine that a GC store barrier would have an almost identical pattern > to #2 and it would definitely not be readonly. Anyway, such a pattern > would lead to fewer loads being eliminated or hoisted, fewer stores being > eliminated or sunk, etc - all due to those pesky calls. But the high-level > code generator that produces this IR will know for sure that > @runtime_allocate or @runtime_allocate_slow_path can only affect a very > narrow subset of the heap: namely, it will only read and write the GC's > data structures. So most of the loads and stores surrounding the > allocation, that arose from source-level loads and stores, would definitely > not have any interference with the call and this would ! > be easy to convey using TBAA. > > Hence if we could decorate the calls to these runtime functions as only > reading or writing certain TBAA types,Out of curiosity, why is this tied to TBAA types, rather than variables or something else? IE what analysis are you performing that gives you type answers instead of answers in terms of local/global variables? The information you are talking about here is traditional mod/ref info that is completely independent of TBAA. The more specific the info you can get from the metadata, the better off you are eliminating loads/stores Otherwise, yes, it's certainly incredibly helpful to the other optimizations. However, unless there is a good reason, i'd suggest you just make it generic call read/write metadata that talks about values (be they incoming arguments, alloca'd memory, or whatever else). It's actually a lot easier to reason about individual heap values than entire types, unless, again, you only have an analysis that is providing answers about types (or your plan was simply to use the existing TBAA info and do a bottom-up closure over the entire callgraph, which would provide some value, but lose a lot of info due to external function calls) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131007/9132c4ae/attachment.html>
On Oct 7, 2013, at 3:49 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> > > > On Wed, Sep 25, 2013 at 5:50 PM, Filip Pizlo <fpizlo at apple.com> wrote: > Hi all, > > TBAA on loads and stores is super useful for conveying high-level type knowledge to LLVM transformations. But often translating a high-level language into LLVM IR will result in runtime calls that are known at the high level to only affect (either by reading or by writing) some restricted subset of the heap. These aren't read-only calls; that is often too strong of a guarantee. A great example would be a GC barrier or GC allocation that may arise if LLVM is used as a backend for a compiler for a higher-level language. There are probably other examples that arise in high-level languages, but let's use allocation as an example for now. I can envision two ways of representing allocation in LLVM IR: > > Method #1: > > %object = call @runtime_allocate(... /* pass size and/or other data */) > > Method #2: > > SomeBlock: > ... ; code preceding the allocation > %objectFast = ... ; IR for the inlined allocation fast path, which produces %objectFast but may yield null if the fast path fails and we need to take slow path > %didFail = icmp eq %objectFast, null > br %didFail, label %AllocationSlowPath, label &Continue > AllocationSlowPath: > %objectSlow = call @runtime_allocate_slow_path(... /* pass size and/or other data */) > br label %Continue > Continue: > %object = phi [%objectFast, SomeBlock], [%objectSlow, AllocationSlowPath] > > In both of these cases, the call instruction will appear to clobber the world leading to more pessimistic optimization. Moreover, it's not always correct to mark the allocation as being readonly; it may read or write observable state. But if you don't like the allocation example then you can imagine that a GC store barrier would have an almost identical pattern to #2 and it would definitely not be readonly. Anyway, such a pattern would lead to fewer loads being eliminated or hoisted, fewer stores being eliminated or sunk, etc - all due to those pesky calls. But the high-level code generator that produces this IR will know for sure that @runtime_allocate or @runtime_allocate_slow_path can only affect a very narrow subset of the heap: namely, it will only read and write the GC's data structures. So most of the loads and stores surrounding the allocation, that arose from source-level loads and stores, would definitely not have any interference with the call and this would ! > be easy to convey using TBAA. > > Hence if we could decorate the calls to these runtime functions as only reading or writing certain TBAA types, > > Out of curiosity, why is this tied to TBAA types, rather than variables or something else? > IE what analysis are you performing that gives you type answers instead of answers in terms of local/global variables?TBAA’s utility for non-C languages (like JavaScript, in my case) isn’t for reasoning about user-facing types, but for creating what you could alternatively think of as an “abstract heaps”. TBAA is perfect for this. For example I might know that a certain function clobbers a field called ‘foo’. In my frontend I create an "abstract heap" for each field name, and this gets lowered to LLVM TBAA. I already use this for telling LLVM about dependencies between loads and stores. If I emit a store as a result of lowering a JS “o.foo = …” operation then I will ascribe a TBAA node for the “foo” abstract heap. This is just one example of the many different abstract heaps (i.e. “TBAA nodes”) that my compiler uses. Note that most of the loads and stores that I’m concerned with do not involve alloca’s (i.e. locals) or “global variables” in any kind of way that LLVM could reason about. A mechanism that only works for locals and globals would be of zero use to me since all of my alloca’s are trivially killed by mem2reg and I don’t have any globals.> The information you are talking about here is traditional mod/ref info that is completely independent of TBAA.You lost me. I’m writing a compiler that generates LLVM IR. I have high-level knowledge that allows me to prove the set of abstract heap locations that may be read or written by any call, load, or store. What do you suggest as a mechanism by which I can communicate that information to LLVM? Are you saying that such a mechanism already exists and that I should use that instead? As far as I can tell, TBAA is the cleanest such mechanism that I have found thus far and its only shortcoming is that I cannot use it to annotate calls.> > The more specific the info you can get from the metadata, the better off you are eliminating loads/stores > > Otherwise, yes, it's certainly incredibly helpful to the other optimizations. > > However, unless there is a good reason, i'd suggest you just make it generic call read/write metadata that talks about values (be they incoming arguments, alloca'd memory, or whatever else).You lost me. How does this help a high-level compiler convey to LLVM that a particular call and a particular load or store don’t have any dependency? My proposal achieves this, albeit using a coarse-grained at a coarse-grained level: you have to express your knowledge of dependencies using a hierarchy of abstract sets of heap locations (i.e. exactly what TBAA gives you).> > It's actually a lot easier to reason about individual heap values than entire types, unless, again, you only have an analysis that is providing answers about types (or your plan was simply to use the existing TBAA info and do a bottom-up closure over the entire callgraph, which would provide some value, but lose a lot of info due to external function calls)The goal is to allow a front-end that already has knowledge about some high-level types to convey that knowledge to LLVM when generating LLVM IR. In my front-end, I actually don’t really do much analysis to produce the TBAA information - it mostly just falls out naturally from my own IR. I’m not sure what you mean about “individual heap values”. If you mean global variables, then that’s of no use to me. I don’t have any global variables. It’s interesting that using TBAA for this completely subsumes any approach based on globals: you could ascribe a distinct TBAA node to each global and get the same effect. But TBAA is more powerful because you can annotate something that you’ve already proven about a memory location, and this information can be orthogonal to the IR used to compute that memory location. This makes TBAA ideal for conveying dependency information from a high-level compiler to LLVM at the time of LLVM IR generation. -Filip -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131007/93e7a50b/attachment.html>