Sanjoy Das via llvm-dev
2016-Feb-17 02:06 UTC
[llvm-dev] RFC: Add guard intrinsics to LLVM
This is a proposal to add guard intrinsics to LLVM. Couple of glossary items: when I say "interpreter" I mean "the most conservative tier in the compilation stack" which can be an actual interpreter, a "splat compiler" or even a regular JIT that doesn't make optimistic assumptions. By "bailing out to the interpreter" I mean "side exit" as defined by Philip in http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/ # high level summary Guard intrinsics are intended to represent "if (!cond) leave();" like control flow in a structured manner. This kind of control flow is abundant in IR coming from safe languages due to things like range checks and null checks (and overflow checks, in some cases). From a high level, there are two distinct motivations for introducing guard intrinsics: - To keep control flow as simple and "straight line"-like as is reasonable - To allow certain kinds of "widening" transforms that cannot be soundly done in an explicit "check-and-branch" control flow representation ## straw man specification Signature: ``` declare void @llvm.guard_on(i1 %predicate) ;; requires [ "deopt"(...) ] ``` Semantics: `@llvm.guard_on` bails out to the interpreter (i.e. "deoptimizes" the currently execution function) if `%predicate` is `false`, meaning that after `@llvm.guard_on(i1 %t)` has executed `%t` can be assumed to be true. In this way, it is close to `@llvm.assume` or an `assert`, with one very important difference -- `@llvm.guard_on(i1 <false>)` is well defined (and not UB). `@llvm.guard_on` on a false predicate bails to the interpreter and that is always safe (but slow), and so `@llvm.guard_on(i1 false)` is basically a `noreturn` call that unconditionally transitions the current compilation to the interpreter. Bailing out to the interpreter involves re-creating the state of the interpreter frames as-if the compilee had been executing in the interpreter all along. This state is represented and maintained using a `"deopt"` operand bundle attached to the call to `@llvm.guard_on`. The verifier will reject calls to `@llvm.guard_on` without a `"deopt"` operand bundle. `@llvm.guard_on` cannot be `invoke`ed (that would be meaningless anyway, since the method it would've "thrown" into is about to go away). Example: ``` ... %rangeCheck0 = icmp ult i32 %arg0, 10 call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0 */) ] %rangeCheck1 = icmp ult i32 %arg0, 12 call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) ] ... ``` # details: motivation & alternatives As specified in the high level summary, there are two key motivations. The first, more obvious one is that we want the CFG to be less complex, even if that involves "hiding" control flow in guard intrinsics. I expect this to benefit both compile time and within-a-block optimization. The second, somewhat less obvious motivation to use guard intrinsics instead of explicit control flow is to allow check widening. ## check widening Consider: ``` ... %rangeCheck0 = icmp ult i32 6, %len ;; for a[6] call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0 */) ] call void @printf("hello world") %rangeCheck1 = icmp ult i32 7, %len ;; for a[7] call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) ] access a[6] and a[7] ... ``` we'd like to optimize it to ``` ... %rangeCheckWide = icmp ult i32 7, %len call void @llvm.guard_on(i1 %rangeCheckWide) [ "deopt"(/* deopt state 0 */) ] call void @printf("hello world") ;; %rangeCheck1 = icmp ult i32 7, %len ;; not needed anymore ;; call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) ] access a[6] and a[7] ... ``` This way we do a range check only on `7` -- if `7` is within bounds, then we know `6` is too. This transform is sound only because we know that the guard on `7 ult %len` will not simply throw an exception if the said predicate fails, but will bail out to the interpreter with the abstract state `/* deopt state 0 */`. In fact, if `%len` is `7`, the pre-transform program is supposed to print `"hello world"` and *then* throw an exception, and bailing out to the interpreter with `/* deopt state 0 */` will do exactly that. In other words, we're allowed to do speculative and aggressive transforms that make a guard fail that wouldn't have in the initial program. This is okay because a failed guard only bails to the interpreter, and the interpreter always Does The Right Thing(TM). In fact, it is correct (though unwise) to replace every guard's predicate with `false`. ## the problem with check widening and explicit control flow Widening is difficult to get right in an explicit "check-and-branch" representation. For instance, the above example in a "check-and-branch" representation would be (in pseudo C, and excluding the printf): ``` ... if (!(6 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; } if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } ... ``` The following transform is invalid: ``` ... if (!(7 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; } if (!(true)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } ... ``` since we do not know if the first side exit had been optimized under the assumption `!(6 < %len)` (via JumpThreading etc.). E.g. the "original" IR could have been ``` ... if (!(6 < %len)) { call @side_exit() [ "deopt"(!(6 < %len)) ]; unreachable; } if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } ... ``` which got optimized to ``` ... if (!(6 < %len)) { call @side_exit() [ "deopt"(true) ]; unreachable; } if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } ... ``` before the widening transform. The widening transform will now effectively pass in an incorrect value for `!(6 < %len)`. This isn't to say it is impossible to do check widening in a explicit control flow representation, just that is more natural to do it with guards. # details: semantics ## as-if control flow The observable behavior of `@llvm.guard_on` is specified as: ``` void @llvm.guard_on(i1 %pred) { entry: %unknown_cond = < unknown source > %cond = and i1 %unknown_cond, %pred br i1 %cond, label %left, label %right left: call void @bail_to_interpreter() [ "deopt"() ] noreturn unreachable right: ret void } ``` So, precisely speaking, `@llvm.guard_on` is guaranteed to bail to the interpreter if `%pred` is false, but it **may** bail to the interpreter if `%pred` is true. It is this bit that lets us soundly widen `%pred`, since all we're doing is "refining" `< unknown source >`. `@bail_to_interpreter` does not return to the current compilation, but it returns to the `"deopt"` continuation that is has been given (once inlined, the empty "deopt"() continuation will be fixed up to have the right continuation). ## applicable optimizations Apart from widening, most of the optimizations we're interested in are what's allowed by an equivalent `@llvm.assume`. Any conditional branches dominated by a guard on the same condition can be folded, multiple guards on the same condition can be CSE'ed, loop invariant guards can be hoisted out of loops etc. Ultimately, we'd like to recover the same quality of optimization as we currently get from the "check-and-branch" representation. With the "check-and-branch" representation, the optimizer is able to sink stores and computation into the slow path. This is something it cannot do in the guard_on representation, and we'll have to lower the guard_on representation to the "check-and-branch" representation at a suitable point in the pipeline to get this kind of optimization. ## lowering At some point in the compilation pipeline, we will have to lower `@llvm.guard_on` into explicit control flow, by "inlining" "an implementation" of `@llvm.guard_on` (or by some other means). I don't have a good sense on when in the pipeline this should be done -- the answer will depend on what we find as we make LLVM more aggressive around optimizing guards. ## environments without support for bailing to the interpreter Our VM has deeply integrated support for deoptimizations, but not all language runtimes do. If there is a need for non deoptimizing guards, we can consider introducing a variant of `@llvm.guard_on`: ``` declare void @llvm.exception_on(i1 %predicate, i32 %exceptionKind) ``` with this one having the semantics that it always throws an exception if `%predicate` fails. Only the non-widening optimizations for `@llvm.guard_on` will apply to `@llvm.exception_on`. ## memory effects (unresolved) [I haven't come up with a good model for the memory effects of `@llvm.guard_on`, suggestions are very welcome.] I'd really like to model `@llvm.guard_on` as a readonly function, since it does not write to memory if it returns; and e.g. forwarding loads across a call to `@llvm.guard_on` should be legal. However, I'm in a quandary around representing the "may never return" aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a load form `%ptr` across a guard on `%ptr != null`. There are couple of ways I can think of dealing with this, none of them are both easy and neat: - State that since `@llvm.guard_on` could have had an infinite loop in it, it may never return. Unfortunately, the LLVM IR has some rough edges on readonly infinite loops (since C++ disallows that), so LLVM will require some preparatory work before we can do this soundly. - State that `@llvm.guard_on` can unwind, and thus has non-local control flow. This can actually work (and is pretty close to the actual semantics), but is somewhat of hack since `@llvm.guard_on` doesn't _really_ throw an exception. - Special case `@llvm.guard_on` (yuck!). What do you think? -- Sanjoy
Philip Reames via llvm-dev
2016-Feb-17 23:40 UTC
[llvm-dev] RFC: Add guard intrinsics to LLVM
On 02/16/2016 06:06 PM, Sanjoy Das wrote:> This is a proposal to add guard intrinsics to LLVM.A couple of inline comments building on Sanjoy's points follow.> Couple of glossary items: when I say "interpreter" I mean "the most > conservative tier in the compilation stack" which can be an actual > interpreter, a "splat compiler" or even a regular JIT that doesn't > make optimistic assumptions. By "bailing out to the interpreter" I > mean "side exit" as defined by Philip in > http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/ > > > # high level summary > > Guard intrinsics are intended to represent "if (!cond) leave();" like > control flow in a structured manner. This kind of control flow is > abundant in IR coming from safe languages due to things like range > checks and null checks (and overflow checks, in some cases). From a > high level, there are two distinct motivations for introducing guard > intrinsics: > > - To keep control flow as simple and "straight line"-like as is > reasonable > > - To allow certain kinds of "widening" transforms that cannot be > soundly done in an explicit "check-and-branch" control flow > representation > > ## straw man specification > > Signature: > > ``` > declare void @llvm.guard_on(i1 %predicate) ;; requires [ "deopt"(...) ] > ``` > > Semantics: > > `@llvm.guard_on` bails out to the interpreter (i.e. "deoptimizes" the > currently execution function) if `%predicate` is `false`, meaning that > after `@llvm.guard_on(i1 %t)` has executed `%t` can be assumed to be > true. In this way, it is close to `@llvm.assume` or an `assert`, with > one very important difference -- `@llvm.guard_on(i1 <false>)` is well > defined (and not UB). `@llvm.guard_on` on a false predicate bails to > the interpreter and that is always safe (but slow), and so > `@llvm.guard_on(i1 false)` is basically a `noreturn` call that > unconditionally transitions the current compilation to the > interpreter.It's also worth noting that @llvm.guard_on(i1 true) is useful and well defined as well. When lowering, such a guard simply disappears, but it can be useful to keep around in the IR during optimization. It gives a well defined point for widening transforms to apply with a well known deoptimization state already available. I'd suggest a small change to Sanjoy's declaration. I think we should allow additional arguments to the guard, not just the condition. What exactly those arguments mean would be up to the runtime, but various runtimes might want to provide additional arguments to the OSR mechanism.> > Bailing out to the interpreter involves re-creating the state of the > interpreter frames as-if the compilee had been executing in the > interpreter all along. This state is represented and maintained using > a `"deopt"` operand bundle attached to the call to `@llvm.guard_on`. > The verifier will reject calls to `@llvm.guard_on` without a `"deopt"` > operand bundle.This introduces a very subtle point. The side exit only effects the *function* which contains the guard. A caller of that function in the same module may be returned to by either the function itself, or the interpreter after running the continuation implied by the guard. This introduces a complication for IPA/IPO; any guard (really, any side exit, of which guards are one form) has to be treated as a possible return point from the callee with an unknowable return value and memory state. In our prototype implementation, we've implemented this by doing a linear scan of the function in each IPO pass and bailing if we see a side exit. One approach we could consider upstream is to introduce a function attribute which indicates "can exit via side exit or guard". Any function containing a guard would be required to have the attribute. InlineFunction could propagate it to the caller when inlining a callee with it. FunctionAttrs could remove it if no guards (or callees with guards) were found. This would make the implementation of the bail in our IPO passes much cheaper, but it wouldn't remove the need for it.> `@llvm.guard_on` cannot be `invoke`ed (that would be > meaningless anyway, since the method it would've "thrown" into is > about to go away).I disagree with this bit. It needlessly complicates the inliner. Allowing an invoke of a guard which SimplifyCFG converts to calls just like it would a nothrow function seems much cleaner.> > > Example: > > ``` > ... > %rangeCheck0 = icmp ult i32 %arg0, 10 > call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0 */) ] > %rangeCheck1 = icmp ult i32 %arg0, 12 > call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) ] > ... > ``` > > > # details: motivation & alternatives > > As specified in the high level summary, there are two key motivations. > > The first, more obvious one is that we want the CFG to be less > complex, even if that involves "hiding" control flow in guard > intrinsics. I expect this to benefit both compile time and > within-a-block optimization.I want to really emphasize this point. I expect that using guard intrinsics would be a major compile time win for higher level languages. (I can't say for sure, we currently use the explicit control model.) Given many of our transforms are block-at-a-time, using guards might also lead to better optimization. I'm particularly interested in seeing how this influences instruction selection.> > The second, somewhat less obvious motivation to use guard intrinsics > instead of explicit control flow is to allow check widening. > > ## check widening > > Consider: > > ``` > ... > %rangeCheck0 = icmp ult i32 6, %len ;; for a[6] > call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0 */) ] > call void @printf("hello world") > %rangeCheck1 = icmp ult i32 7, %len ;; for a[7] > call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) ] > access a[6] and a[7] > ... > ``` > > we'd like to optimize it to > > ``` > ... > %rangeCheckWide = icmp ult i32 7, %len > call void @llvm.guard_on(i1 %rangeCheckWide) [ "deopt"(/* deopt state 0 */) ] > call void @printf("hello world") > ;; %rangeCheck1 = icmp ult i32 7, %len ;; not needed anymore > ;; call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) ] > access a[6] and a[7] > ... > ``` > > This way we do a range check only on `7` -- if `7` is within bounds, > then we know `6` is too. This transform is sound only because we know > that the guard on `7 ult %len` will not simply throw an exception if > the said predicate fails, but will bail out to the interpreter with > the abstract state `/* deopt state 0 */`. In fact, if `%len` is `7`, > the pre-transform program is supposed to print `"hello world"` and > *then* throw an exception, and bailing out to the interpreter with `/* > deopt state 0 */` will do exactly that. > > In other words, we're allowed to do speculative and aggressive > transforms that make a guard fail that wouldn't have in the initial > program. This is okay because a failed guard only bails to the > interpreter, and the interpreter always Does The Right Thing(TM). In > fact, it is correct (though unwise) to replace every guard's predicate > with `false`.Another optimization this allows is a more code size friendly version of IRCE. Today, we have to generate pre and post loops when removing bounds checks from an inner loop unless we can *prove* that no iterations would run in those loops. With a guard construct, we can find a guard before the loop and "widen" it to account for the pre and post iteration spaces. This transformation is sometimes known as "loop predication".> > ## the problem with check widening and explicit control flow > > Widening is difficult to get right in an explicit "check-and-branch" > representation. For instance, the above example in a > "check-and-branch" representation would be (in pseudo C, and excluding > the printf): > > ``` > ... > if (!(6 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; } > if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } > ... > ``` > > The following transform is invalid: > > ``` > ... > if (!(7 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; } > if (!(true)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } > ... > ``` > > since we do not know if the first side exit had been optimized under > the assumption `!(6 < %len)` (via JumpThreading etc.). E.g. the > "original" IR could have been > > ``` > ... > if (!(6 < %len)) { call @side_exit() [ "deopt"(!(6 < %len)) ]; unreachable; } > if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } > ... > ``` > > which got optimized to > > ``` > ... > if (!(6 < %len)) { call @side_exit() [ "deopt"(true) ]; unreachable; } > if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } > ... > ``` > > before the widening transform. The widening transform will now > effectively pass in an incorrect value for `!(6 < %len)`. > > This isn't to say it is impossible to do check widening in a explicit > control flow representation, just that is more natural to do it with > guards. > > > # details: semantics > > ## as-if control flow > > The observable behavior of `@llvm.guard_on` is specified as: > > ``` > void @llvm.guard_on(i1 %pred) { > entry: > %unknown_cond = < unknown source > > %cond = and i1 %unknown_cond, %pred > br i1 %cond, label %left, label %right > > left: > call void @bail_to_interpreter() [ "deopt"() ] noreturn > unreachable > > right: > ret void > } > ``` > > So, precisely speaking, `@llvm.guard_on` is guaranteed to bail to the > interpreter if `%pred` is false, but it **may** bail to the > interpreter if `%pred` is true. It is this bit that lets us soundly > widen `%pred`, since all we're doing is "refining" `< unknown source >`.Unless I'm misreading this, it looks like Sanjoy got the IR in the specification wrong. The intrinsic is specified to side exit if the condition is false (so that it's true in the caller after the guard), not the other way around. The text description appears correct.> > `@bail_to_interpreter` does not return to the current compilation, but > it returns to the `"deopt"` continuation that is has been given (once > inlined, the empty "deopt"() continuation will be fixed up to have the right > continuation).This "bail_to_interpreter" is a the more general form of side exit I mentioned above.> > > ## applicable optimizations > > Apart from widening, most of the optimizations we're interested in are > what's allowed by an equivalent `@llvm.assume`. Any conditional > branches dominated by a guard on the same condition can be folded, > multiple guards on the same condition can be CSE'ed, loop invariant > guards can be hoisted out of loops etc. > > Ultimately, we'd like to recover the same quality of optimization as > we currently get from the "check-and-branch" representation. With the > "check-and-branch" representation, the optimizer is able to sink > stores and computation into the slow path. This is something it cannot > do in the guard_on representation, and we'll have to lower the > guard_on representation to the "check-and-branch" representation at a > suitable point in the pipeline to get this kind of optimization. > > ## lowering > > At some point in the compilation pipeline, we will have to lower > `@llvm.guard_on` into explicit control flow, by "inlining" "an > implementation" of `@llvm.guard_on` (or by some other means). I don't > have a good sense on when in the pipeline this should be done -- the > answer will depend on what we find as we make LLVM more aggressive > around optimizing guards.As a starting point, I'd likely do this just before code gen prep with some custom sinking logic to pull instructions only used on the failing path into the newly introduced block. Long term, we'd probably want to do the same thing over MI, but mucking with control flow at that layer is a bit more complicated. Rather than framing this as inlining, I'd frame it as expansion to a well known body (pretty much the one Sanjoy gives above). The @bail_to_interpreter construct could be lowered directly to a function call to some well known symbol name (which JIT users can intercept and bind to whatever they want.) Something like __llvm_side_exit seems like a reasonable choice.> > ## environments without support for bailing to the interpreter > > Our VM has deeply integrated support for deoptimizations, but not all > language runtimes do. If there is a need for non deoptimizing guards, we > can consider introducing a variant of `@llvm.guard_on`: > > ``` > declare void @llvm.exception_on(i1 %predicate, i32 %exceptionKind) > ``` > > with this one having the semantics that it always throws an exception > if `%predicate` fails. Only the non-widening optimizations for > `@llvm.guard_on` will apply to `@llvm.exception_on`.Not sure we actually need this. A valid implementation of the side exit handler (which would do the OSR for us) is to call a runtime routine which generates and throws the exception. The only bit we might need is a distinction between widdenable and non-widenable guards.> > ## memory effects (unresolved) > > [I haven't come up with a good model for the memory effects of > `@llvm.guard_on`, suggestions are very welcome.] > > I'd really like to model `@llvm.guard_on` as a readonly function, > since it does not write to memory if it returns; and e.g. forwarding > loads across a call to `@llvm.guard_on` should be legal. > > However, I'm in a quandary around representing the "may never return" > aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a > load form `%ptr` across a guard on `%ptr != null`.Modeling this as memory dependence just seems wrong. We already have to model control dependence on functions which may throw. I don't think there's anything new here. The only unusual bit is that we're going to want to teach AliasAnalysis that the guard does write to any memory location (to allow forwarding) while still preserving the control dependence.> There are couple > of ways I can think of dealing with this, none of them are both easy > and neat: > > - State that since `@llvm.guard_on` could have had an infinite loop > in it, it may never return. Unfortunately, the LLVM IR has some > rough edges on readonly infinite loops (since C++ disallows that), > so LLVM will require some preparatory work before we can do this > soundly. > > - State that `@llvm.guard_on` can unwind, and thus has non-local > control flow. This can actually work (and is pretty close to > the actual semantics), but is somewhat of hack since > `@llvm.guard_on` doesn't _really_ throw an exception.Er, careful. Semantically, the guard *might* throw an exception. It could be that's what the interpreter does when evaluating the continuation implied by the guard and any of our callers have to account for the fact the function which contains the guard might throw. The easiest way to ensure that is to model the guard call as possibly throwing.> > - Special case `@llvm.guard_on` (yuck!). > > What do you think? > > -- Sanjoy
Sanjoy Das via llvm-dev
2016-Feb-18 00:41 UTC
[llvm-dev] RFC: Add guard intrinsics to LLVM
Replies inline. At a high level, it feels like we'll eventually need a new instruction to represent the kind of control flow a guard entails (to be clear: we should probably still start with an intrinsic) -- they are fairly well-behaved, i.e. readonly, nounwind etc. as far as the immediate "physical" caller is concerned, but not so as far as its callers's callers are concerned. On Wed, Feb 17, 2016 at 3:40 PM, Philip Reames <listmail at philipreames.com> wrote:>> one very important difference -- `@llvm.guard_on(i1 <false>)` is well >> defined (and not UB). `@llvm.guard_on` on a false predicate bails to >> the interpreter and that is always safe (but slow), and so >> `@llvm.guard_on(i1 false)` is basically a `noreturn` call that >> unconditionally transitions the current compilation to the >> interpreter. > > It's also worth noting that @llvm.guard_on(i1 true) is useful and well > defined as well. When lowering, such a guard simply disappears, but it can > be useful to keep around in the IR during optimization. It gives a well > defined point for widening transforms to apply with a well known > deoptimization state already available.Yes! Actually, I had exactly this in an earlier version of this writeup, which I removed to make the (already long) RFC shorter.> I'd suggest a small change to Sanjoy's declaration. I think we should allow > additional arguments to the guard, not just the condition. What exactly > those arguments mean would be up to the runtime, but various runtimes might > want to provide additional arguments to the OSR mechanism.We'll still have to make a call on the signature of the intrinsic (or are you suggesting a varargs intrinsic)? I suppose we could also have a family of intrinsics, that take on argument of variable type.>> Bailing out to the interpreter involves re-creating the state of the >> interpreter frames as-if the compilee had been executing in the >> interpreter all along. This state is represented and maintained using >> a `"deopt"` operand bundle attached to the call to `@llvm.guard_on`. >> The verifier will reject calls to `@llvm.guard_on` without a `"deopt"` >> operand bundle. > > This introduces a very subtle point. The side exit only effects the > *function* which contains the guard. A caller of that function in the same > module may be returned to by either the function itself, or the interpreter > after running the continuation implied by the guard. This introduces a > complication for IPA/IPO; any guard (really, any side exit, of which guards > are one form) has to be treated as a possible return point from the callee > with an unknowable return value and memory state.This is a really good point. This has strong implications for the guards memory effects as well -- even though a guard can be seen as readonly in its containing functions, things that call the containing function has to see the guard as read-write. IOW @foo below is read/write, even though v0 can be forwarded to v1: ``` int @foo(int cond) { int v0 = this->field; guard_on(cond); int v1 = this->field; return v0 + v1; } ``` As you point out, we're also introducing a newish kind of control flow here. It is not fundamentally new, since longjmp does something similar (but not quite the same). I hate to say this, but perhaps we're really looking at (eventually) a new instruction here, and not just a new intrinsic.>> `@llvm.guard_on` cannot be `invoke`ed (that would be >> meaningless anyway, since the method it would've "thrown" into is >> about to go away). > > I disagree with this bit. It needlessly complicates the inliner. Allowing > an invoke of a guard which SimplifyCFG converts to calls just like it would > a nothrow function seems much cleaner.SGTM.>> The observable behavior of `@llvm.guard_on` is specified as: >> >> ``` >> void @llvm.guard_on(i1 %pred) { >> entry: >> %unknown_cond = < unknown source > >> %cond = and i1 %unknown_cond, %pred >> br i1 %cond, label %left, label %right >> >> left: >> call void @bail_to_interpreter() [ "deopt"() ] noreturn >> unreachable >> >> right: >> ret void >> } >> ``` >> >> So, precisely speaking, `@llvm.guard_on` is guaranteed to bail to the >> interpreter if `%pred` is false, but it **may** bail to the >> interpreter if `%pred` is true. It is this bit that lets us soundly >> widen `%pred`, since all we're doing is "refining" `< unknown source >`. > > Unless I'm misreading this, it looks like Sanjoy got the IR in the > specification wrong. The intrinsic is specified to side exit if the > condition is false (so that it's true in the caller after the guard), not > the other way around. The text description appears correct.Yes, and thanks for catching. The branch should have been "br i1 %cond, label %right, label %left".>> `@bail_to_interpreter` does not return to the current compilation, but >> it returns to the `"deopt"` continuation that is has been given (once >> inlined, the empty "deopt"() continuation will be fixed up to have the >> right >> continuation). > > This "bail_to_interpreter" is a the more general form of side exit I > mentioned above.How is it more general?> As a starting point, I'd likely do this just before code gen prep with some > custom sinking logic to pull instructions only used on the failing path into > the newly introduced block. Long term, we'd probably want to do the same > thing over MI, but mucking with control flow at that layer is a bit more > complicated. > > Rather than framing this as inlining, I'd frame it as expansion to a well > known body (pretty much the one Sanjoy gives above). The > @bail_to_interpreter construct could be lowered directly to a function call > to some well known symbol name (which JIT users can intercept and bind to > whatever they want.) Something like __llvm_side_exit seems like a > reasonable choice.SGTM.>> with this one having the semantics that it always throws an exception >> if `%predicate` fails. Only the non-widening optimizations for >> `@llvm.guard_on` will apply to `@llvm.exception_on`. > > Not sure we actually need this. A valid implementation of the side exit > handler (which would do the OSR for us) is to call a runtime routine which > generates and throws the exception. The only bit we might need is a > distinction between widdenable and non-widenable guards.Yes, and that's the only distinction I'm trying to make here.>> ## memory effects (unresolved) >> >> [I haven't come up with a good model for the memory effects of >> `@llvm.guard_on`, suggestions are very welcome.] >> >> I'd really like to model `@llvm.guard_on` as a readonly function, >> since it does not write to memory if it returns; and e.g. forwarding >> loads across a call to `@llvm.guard_on` should be legal. >> >> However, I'm in a quandary around representing the "may never return" >> aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a >> load form `%ptr` across a guard on `%ptr != null`. > > Modeling this as memory dependence just seems wrong. We already have to > model control dependence on functions which may throw. I don't think > there's anything new here.I am trying to model this as control dependence, but the difficult bit is to do that while still maintaining that the call does not clobber any memory. I'm worried that there may be reasons (practical or theoretical) why we "readonly" functions always have to terminate and be nothrow.> The only unusual bit is that we're going to want to teach AliasAnalysis that > the guard does write to any memory location (to allow forwarding) while > still preserving the control dependence.So you're saying that we model the guard as otherwise read/write (thus sidestepping the readonly non-returning quandary) but teach AliasAnalysis that it doesn't clobber any memory? That would work. We can also use the same tool to solve the "may return to its caller's caller with arbitrary heap state" issue by teaching AA that a guard does not alias with reads in its own (physical) function, but clobbers the heap for other (physical) functions. Notation: I'm differentiating between physical functions == functions that create actual stack frames and inlined functions == logical Java functions that don't create separate physical frames. Inlining IR from one java level function into another usually creates a physical function that contains more than one logical function.>> >> There are couple >> of ways I can think of dealing with this, none of them are both easy >> and neat: >> >> - State that since `@llvm.guard_on` could have had an infinite loop >> in it, it may never return. Unfortunately, the LLVM IR has some >> rough edges on readonly infinite loops (since C++ disallows that), >> so LLVM will require some preparatory work before we can do this >> soundly. >> >> - State that `@llvm.guard_on` can unwind, and thus has non-local >> control flow. This can actually work (and is pretty close to >> the actual semantics), but is somewhat of hack since >> `@llvm.guard_on` doesn't _really_ throw an exception. > > Er, careful. Semantically, the guard *might* throw an exception. It could > be that's what the interpreter does when evaluating the continuation implied > by the guard and any of our callers have to account for the fact the > function which contains the guard might throw. The easiest way to ensure > that is to model the guard call as possibly throwing.Yes, it does not throw an exception into its own caller, but may throw into its caller's caller. -- Sanjoy
Eric Christopher via llvm-dev
2016-Feb-18 19:43 UTC
[llvm-dev] RFC: Add guard intrinsics to LLVM
Hi Sanjoy, A quick question here. With the bailing to the interpreter support that you're envisioning ("deopt operand bundle"), it appears to overlap quite a bit with the existing stack maps. What's the story/interaction/etc here? I agree that a simpler control flow is great when bailing to the interpreter - doing it with phi nodes is a recipe for pain and long compile times. Thanks! -eric On Tue, Feb 16, 2016 at 6:06 PM Sanjoy Das via llvm-dev < llvm-dev at lists.llvm.org> wrote:> This is a proposal to add guard intrinsics to LLVM. > > Couple of glossary items: when I say "interpreter" I mean "the most > conservative tier in the compilation stack" which can be an actual > interpreter, a "splat compiler" or even a regular JIT that doesn't > make optimistic assumptions. By "bailing out to the interpreter" I > mean "side exit" as defined by Philip in > http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/ > > > # high level summary > > Guard intrinsics are intended to represent "if (!cond) leave();" like > control flow in a structured manner. This kind of control flow is > abundant in IR coming from safe languages due to things like range > checks and null checks (and overflow checks, in some cases). From a > high level, there are two distinct motivations for introducing guard > intrinsics: > > - To keep control flow as simple and "straight line"-like as is > reasonable > > - To allow certain kinds of "widening" transforms that cannot be > soundly done in an explicit "check-and-branch" control flow > representation > > ## straw man specification > > Signature: > > ``` > declare void @llvm.guard_on(i1 %predicate) ;; requires [ "deopt"(...) ] > ``` > > Semantics: > > `@llvm.guard_on` bails out to the interpreter (i.e. "deoptimizes" the > currently execution function) if `%predicate` is `false`, meaning that > after `@llvm.guard_on(i1 %t)` has executed `%t` can be assumed to be > true. In this way, it is close to `@llvm.assume` or an `assert`, with > one very important difference -- `@llvm.guard_on(i1 <false>)` is well > defined (and not UB). `@llvm.guard_on` on a false predicate bails to > the interpreter and that is always safe (but slow), and so > `@llvm.guard_on(i1 false)` is basically a `noreturn` call that > unconditionally transitions the current compilation to the > interpreter. > > Bailing out to the interpreter involves re-creating the state of the > interpreter frames as-if the compilee had been executing in the > interpreter all along. This state is represented and maintained using > a `"deopt"` operand bundle attached to the call to `@llvm.guard_on`. > The verifier will reject calls to `@llvm.guard_on` without a `"deopt"` > operand bundle. `@llvm.guard_on` cannot be `invoke`ed (that would be > meaningless anyway, since the method it would've "thrown" into is > about to go away). > > > Example: > > ``` > ... > %rangeCheck0 = icmp ult i32 %arg0, 10 > call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0 */) > ] > %rangeCheck1 = icmp ult i32 %arg0, 12 > call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) > ] > ... > ``` > > > # details: motivation & alternatives > > As specified in the high level summary, there are two key motivations. > > The first, more obvious one is that we want the CFG to be less > complex, even if that involves "hiding" control flow in guard > intrinsics. I expect this to benefit both compile time and > within-a-block optimization. > > The second, somewhat less obvious motivation to use guard intrinsics > instead of explicit control flow is to allow check widening. > > ## check widening > > Consider: > > ``` > ... > %rangeCheck0 = icmp ult i32 6, %len ;; for a[6] > call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0 */) > ] > call void @printf("hello world") > %rangeCheck1 = icmp ult i32 7, %len ;; for a[7] > call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) > ] > access a[6] and a[7] > ... > ``` > > we'd like to optimize it to > > ``` > ... > %rangeCheckWide = icmp ult i32 7, %len > call void @llvm.guard_on(i1 %rangeCheckWide) [ "deopt"(/* deopt state 0 > */) ] > call void @printf("hello world") > ;; %rangeCheck1 = icmp ult i32 7, %len ;; not needed anymore > ;; call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 > */) ] > access a[6] and a[7] > ... > ``` > > This way we do a range check only on `7` -- if `7` is within bounds, > then we know `6` is too. This transform is sound only because we know > that the guard on `7 ult %len` will not simply throw an exception if > the said predicate fails, but will bail out to the interpreter with > the abstract state `/* deopt state 0 */`. In fact, if `%len` is `7`, > the pre-transform program is supposed to print `"hello world"` and > *then* throw an exception, and bailing out to the interpreter with `/* > deopt state 0 */` will do exactly that. > > In other words, we're allowed to do speculative and aggressive > transforms that make a guard fail that wouldn't have in the initial > program. This is okay because a failed guard only bails to the > interpreter, and the interpreter always Does The Right Thing(TM). In > fact, it is correct (though unwise) to replace every guard's predicate > with `false`. > > ## the problem with check widening and explicit control flow > > Widening is difficult to get right in an explicit "check-and-branch" > representation. For instance, the above example in a > "check-and-branch" representation would be (in pseudo C, and excluding > the printf): > > ``` > ... > if (!(6 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; } > if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } > ... > ``` > > The following transform is invalid: > > ``` > ... > if (!(7 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; } > if (!(true)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } > ... > ``` > > since we do not know if the first side exit had been optimized under > the assumption `!(6 < %len)` (via JumpThreading etc.). E.g. the > "original" IR could have been > > ``` > ... > if (!(6 < %len)) { call @side_exit() [ "deopt"(!(6 < %len)) ]; > unreachable; } > if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } > ... > ``` > > which got optimized to > > ``` > ... > if (!(6 < %len)) { call @side_exit() [ "deopt"(true) ]; unreachable; } > if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } > ... > ``` > > before the widening transform. The widening transform will now > effectively pass in an incorrect value for `!(6 < %len)`. > > This isn't to say it is impossible to do check widening in a explicit > control flow representation, just that is more natural to do it with > guards. > > > # details: semantics > > ## as-if control flow > > The observable behavior of `@llvm.guard_on` is specified as: > > ``` > void @llvm.guard_on(i1 %pred) { > entry: > %unknown_cond = < unknown source > > %cond = and i1 %unknown_cond, %pred > br i1 %cond, label %left, label %right > > left: > call void @bail_to_interpreter() [ "deopt"() ] noreturn > unreachable > > right: > ret void > } > ``` > > So, precisely speaking, `@llvm.guard_on` is guaranteed to bail to the > interpreter if `%pred` is false, but it **may** bail to the > interpreter if `%pred` is true. It is this bit that lets us soundly > widen `%pred`, since all we're doing is "refining" `< unknown source >`. > > `@bail_to_interpreter` does not return to the current compilation, but > it returns to the `"deopt"` continuation that is has been given (once > inlined, the empty "deopt"() continuation will be fixed up to have the > right > continuation). > > > ## applicable optimizations > > Apart from widening, most of the optimizations we're interested in are > what's allowed by an equivalent `@llvm.assume`. Any conditional > branches dominated by a guard on the same condition can be folded, > multiple guards on the same condition can be CSE'ed, loop invariant > guards can be hoisted out of loops etc. > > Ultimately, we'd like to recover the same quality of optimization as > we currently get from the "check-and-branch" representation. With the > "check-and-branch" representation, the optimizer is able to sink > stores and computation into the slow path. This is something it cannot > do in the guard_on representation, and we'll have to lower the > guard_on representation to the "check-and-branch" representation at a > suitable point in the pipeline to get this kind of optimization. > > ## lowering > > At some point in the compilation pipeline, we will have to lower > `@llvm.guard_on` into explicit control flow, by "inlining" "an > implementation" of `@llvm.guard_on` (or by some other means). I don't > have a good sense on when in the pipeline this should be done -- the > answer will depend on what we find as we make LLVM more aggressive > around optimizing guards. > > ## environments without support for bailing to the interpreter > > Our VM has deeply integrated support for deoptimizations, but not all > language runtimes do. If there is a need for non deoptimizing guards, we > can consider introducing a variant of `@llvm.guard_on`: > > ``` > declare void @llvm.exception_on(i1 %predicate, i32 %exceptionKind) > ``` > > with this one having the semantics that it always throws an exception > if `%predicate` fails. Only the non-widening optimizations for > `@llvm.guard_on` will apply to `@llvm.exception_on`. > > ## memory effects (unresolved) > > [I haven't come up with a good model for the memory effects of > `@llvm.guard_on`, suggestions are very welcome.] > > I'd really like to model `@llvm.guard_on` as a readonly function, > since it does not write to memory if it returns; and e.g. forwarding > loads across a call to `@llvm.guard_on` should be legal. > > However, I'm in a quandary around representing the "may never return" > aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a > load form `%ptr` across a guard on `%ptr != null`. There are couple > of ways I can think of dealing with this, none of them are both easy > and neat: > > - State that since `@llvm.guard_on` could have had an infinite loop > in it, it may never return. Unfortunately, the LLVM IR has some > rough edges on readonly infinite loops (since C++ disallows that), > so LLVM will require some preparatory work before we can do this > soundly. > > - State that `@llvm.guard_on` can unwind, and thus has non-local > control flow. This can actually work (and is pretty close to > the actual semantics), but is somewhat of hack since > `@llvm.guard_on` doesn't _really_ throw an exception. > > - Special case `@llvm.guard_on` (yuck!). > > What do you think? > > -- Sanjoy > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160218/847fcc0f/attachment.html>
Sanjoy Das via llvm-dev
2016-Feb-18 20:23 UTC
[llvm-dev] RFC: Add guard intrinsics to LLVM
Hi Eric, (That's a very valid question, thanks for asking!) The key difference between stackmaps and operand bundles it is more natural to teach the optimizer to manipulate deopt operand bundles, as opposed to values live in to a stackmap. This becomes a requirement[0] around inlining, since (say) we have: ``` void foo() { if (ptr == null) bail_to_interpreter(); [ "deopt"(i32 101) ] } void bar() { foo() [ "deopt"(i32 237) ] } ``` and we wish to inline foo into bar. Then the inlined body has to look like ``` void bar() { if (ptr == null) bail_to_interpreter() [ "deopt"(i32 237, i32 101) ] } // definition of foo() is elided ``` This is because if ptr is null, and we do have to bail to the interpeter, we cannot just create one frame, but have to create *two* interpreter frames, the younger of which is in state "i32 101" (obviously a real deopt state will be more complex) and the older of which is in state "i32 237". Once these frames have been created, execution will continue in the interpreter. (Just to be very explicit about it -- the deopt state "i32 237", when resumed to, will re-start interptered execution in the method @bar() "as if" foo() had just returned.) We want the inliner to do this kind of adjustment for us [1], and doing this over an intrinsic is awkward. For instance, an intrinsified version of the above will look like: ``` void foo() { if (ptr == null) call @call_with_deopt_state(bail_to_interpreter, i32 101) } void bar() { call @call_with_deopt_state(foo, i32 237) } ``` and the inliner and other IPO passes will have to have special knowledge of the @call_with_deopt_state intrinsic. Given how different the semantics of @call_with_deopt_state are from a normal call, it seemed more natural to make deoptimization state a first class citizen of LLVM IR. Secondly, the issues with deopt bundles and IPO this discussion uncovered are also applicable to stackmaps; and since deopt bundles are a distinct IR construct, phrasing the IPO / IPA restrictions will be easier. Does this answer your question? [0]: This isn't a problem clients who don't do much optimization within LLVM, but it is for us. [1]: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Utils/InlineFunction.cpp#L1486 -- Sanjoy
Philip Reames via llvm-dev
2016-Feb-18 20:29 UTC
[llvm-dev] RFC: Add guard intrinsics to LLVM
Sanjoy gave the long answer, let me give the short one. :) "deopt" argument bundles are used in the middle end, they are lowered into a statepoint, and generate the existing stackmap format. i.e. one builds on the other. On 02/18/2016 11:43 AM, Eric Christopher wrote:> Hi Sanjoy, > > A quick question here. With the bailing to the interpreter support > that you're envisioning ("deopt operand bundle"), it appears to > overlap quite a bit with the existing stack maps. What's the > story/interaction/etc here? I agree that a simpler control flow is > great when bailing to the interpreter - doing it with phi nodes is a > recipe for pain and long compile times. > > Thanks! > > -eric > > On Tue, Feb 16, 2016 at 6:06 PM Sanjoy Das via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > This is a proposal to add guard intrinsics to LLVM. > > Couple of glossary items: when I say "interpreter" I mean "the most > conservative tier in the compilation stack" which can be an actual > interpreter, a "splat compiler" or even a regular JIT that doesn't > make optimistic assumptions. By "bailing out to the interpreter" I > mean "side exit" as defined by Philip in > http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/ > > > # high level summary > > Guard intrinsics are intended to represent "if (!cond) leave();" like > control flow in a structured manner. This kind of control flow is > abundant in IR coming from safe languages due to things like range > checks and null checks (and overflow checks, in some cases). From a > high level, there are two distinct motivations for introducing guard > intrinsics: > > - To keep control flow as simple and "straight line"-like as is > reasonable > > - To allow certain kinds of "widening" transforms that cannot be > soundly done in an explicit "check-and-branch" control flow > representation > > ## straw man specification > > Signature: > > ``` > declare void @llvm.guard_on(i1 %predicate) ;; requires [ > "deopt"(...) ] > ``` > > Semantics: > > `@llvm.guard_on` bails out to the interpreter (i.e. "deoptimizes" the > currently execution function) if `%predicate` is `false`, meaning that > after `@llvm.guard_on(i1 %t)` has executed `%t` can be assumed to be > true. In this way, it is close to `@llvm.assume` or an `assert`, with > one very important difference -- `@llvm.guard_on(i1 <false>)` is well > defined (and not UB). `@llvm.guard_on` on a false predicate bails to > the interpreter and that is always safe (but slow), and so > `@llvm.guard_on(i1 false)` is basically a `noreturn` call that > unconditionally transitions the current compilation to the > interpreter. > > Bailing out to the interpreter involves re-creating the state of the > interpreter frames as-if the compilee had been executing in the > interpreter all along. This state is represented and maintained using > a `"deopt"` operand bundle attached to the call to `@llvm.guard_on`. > The verifier will reject calls to `@llvm.guard_on` without a `"deopt"` > operand bundle. `@llvm.guard_on` cannot be `invoke`ed (that would be > meaningless anyway, since the method it would've "thrown" into is > about to go away). > > > Example: > > ``` > ... > %rangeCheck0 = icmp ult i32 %arg0, 10 > call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt > state 0 */) ] > %rangeCheck1 = icmp ult i32 %arg0, 12 > call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt > state 1 */) ] > ... > ``` > > > # details: motivation & alternatives > > As specified in the high level summary, there are two key motivations. > > The first, more obvious one is that we want the CFG to be less > complex, even if that involves "hiding" control flow in guard > intrinsics. I expect this to benefit both compile time and > within-a-block optimization. > > The second, somewhat less obvious motivation to use guard intrinsics > instead of explicit control flow is to allow check widening. > > ## check widening > > Consider: > > ``` > ... > %rangeCheck0 = icmp ult i32 6, %len ;; for a[6] > call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt > state 0 */) ] > call void @printf("hello world") > %rangeCheck1 = icmp ult i32 7, %len ;; for a[7] > call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt > state 1 */) ] > access a[6] and a[7] > ... > ``` > > we'd like to optimize it to > > ``` > ... > %rangeCheckWide = icmp ult i32 7, %len > call void @llvm.guard_on(i1 %rangeCheckWide) [ "deopt"(/* deopt > state 0 */) ] > call void @printf("hello world") > ;; %rangeCheck1 = icmp ult i32 7, %len ;; not needed anymore > ;; call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt > state 1 */) ] > access a[6] and a[7] > ... > ``` > > This way we do a range check only on `7` -- if `7` is within bounds, > then we know `6` is too. This transform is sound only because we know > that the guard on `7 ult %len` will not simply throw an exception if > the said predicate fails, but will bail out to the interpreter with > the abstract state `/* deopt state 0 */`. In fact, if `%len` is `7`, > the pre-transform program is supposed to print `"hello world"` and > *then* throw an exception, and bailing out to the interpreter with `/* > deopt state 0 */` will do exactly that. > > In other words, we're allowed to do speculative and aggressive > transforms that make a guard fail that wouldn't have in the initial > program. This is okay because a failed guard only bails to the > interpreter, and the interpreter always Does The Right Thing(TM). In > fact, it is correct (though unwise) to replace every guard's predicate > with `false`. > > ## the problem with check widening and explicit control flow > > Widening is difficult to get right in an explicit "check-and-branch" > representation. For instance, the above example in a > "check-and-branch" representation would be (in pseudo C, and excluding > the printf): > > ``` > ... > if (!(6 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; } > if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } > ... > ``` > > The following transform is invalid: > > ``` > ... > if (!(7 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; } > if (!(true)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } > ... > ``` > > since we do not know if the first side exit had been optimized under > the assumption `!(6 < %len)` (via JumpThreading etc.). E.g. the > "original" IR could have been > > ``` > ... > if (!(6 < %len)) { call @side_exit() [ "deopt"(!(6 < %len)) ]; > unreachable; } > if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } > ... > ``` > > which got optimized to > > ``` > ... > if (!(6 < %len)) { call @side_exit() [ "deopt"(true) ]; > unreachable; } > if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; } > ... > ``` > > before the widening transform. The widening transform will now > effectively pass in an incorrect value for `!(6 < %len)`. > > This isn't to say it is impossible to do check widening in a explicit > control flow representation, just that is more natural to do it with > guards. > > > # details: semantics > > ## as-if control flow > > The observable behavior of `@llvm.guard_on` is specified as: > > ``` > void @llvm.guard_on(i1 %pred) { > entry: > %unknown_cond = < unknown source > > %cond = and i1 %unknown_cond, %pred > br i1 %cond, label %left, label %right > > left: > call void @bail_to_interpreter() [ "deopt"() ] noreturn > unreachable > > right: > ret void > } > ``` > > So, precisely speaking, `@llvm.guard_on` is guaranteed to bail to the > interpreter if `%pred` is false, but it **may** bail to the > interpreter if `%pred` is true. It is this bit that lets us soundly > widen `%pred`, since all we're doing is "refining" `< unknown > source >`. > > `@bail_to_interpreter` does not return to the current compilation, but > it returns to the `"deopt"` continuation that is has been given (once > inlined, the empty "deopt"() continuation will be fixed up to have > the right > continuation). > > > ## applicable optimizations > > Apart from widening, most of the optimizations we're interested in are > what's allowed by an equivalent `@llvm.assume`. Any conditional > branches dominated by a guard on the same condition can be folded, > multiple guards on the same condition can be CSE'ed, loop invariant > guards can be hoisted out of loops etc. > > Ultimately, we'd like to recover the same quality of optimization as > we currently get from the "check-and-branch" representation. With the > "check-and-branch" representation, the optimizer is able to sink > stores and computation into the slow path. This is something it cannot > do in the guard_on representation, and we'll have to lower the > guard_on representation to the "check-and-branch" representation at a > suitable point in the pipeline to get this kind of optimization. > > ## lowering > > At some point in the compilation pipeline, we will have to lower > `@llvm.guard_on` into explicit control flow, by "inlining" "an > implementation" of `@llvm.guard_on` (or by some other means). I don't > have a good sense on when in the pipeline this should be done -- the > answer will depend on what we find as we make LLVM more aggressive > around optimizing guards. > > ## environments without support for bailing to the interpreter > > Our VM has deeply integrated support for deoptimizations, but not all > language runtimes do. If there is a need for non deoptimizing > guards, we > can consider introducing a variant of `@llvm.guard_on`: > > ``` > declare void @llvm.exception_on(i1 %predicate, i32 %exceptionKind) > ``` > > with this one having the semantics that it always throws an exception > if `%predicate` fails. Only the non-widening optimizations for > `@llvm.guard_on` will apply to `@llvm.exception_on`. > > ## memory effects (unresolved) > > [I haven't come up with a good model for the memory effects of > `@llvm.guard_on`, suggestions are very welcome.] > > I'd really like to model `@llvm.guard_on` as a readonly function, > since it does not write to memory if it returns; and e.g. forwarding > loads across a call to `@llvm.guard_on` should be legal. > > However, I'm in a quandary around representing the "may never return" > aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a > load form `%ptr` across a guard on `%ptr != null`. There are couple > of ways I can think of dealing with this, none of them are both easy > and neat: > > - State that since `@llvm.guard_on` could have had an infinite loop > in it, it may never return. Unfortunately, the LLVM IR has some > rough edges on readonly infinite loops (since C++ disallows that), > so LLVM will require some preparatory work before we can do this > soundly. > > - State that `@llvm.guard_on` can unwind, and thus has non-local > control flow. This can actually work (and is pretty close to > the actual semantics), but is somewhat of hack since > `@llvm.guard_on` doesn't _really_ throw an exception. > > - Special case `@llvm.guard_on` (yuck!). > > What do you think? > > -- Sanjoy > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160218/0fd4b395/attachment.html>
Andrew Trick via llvm-dev
2016-Feb-19 19:12 UTC
[llvm-dev] RFC: Add guard intrinsics to LLVM
> On Feb 16, 2016, at 6:06 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > This is a proposal to add guard intrinsics to LLVM. > > Couple of glossary items: when I say "interpreter" I mean "the most > conservative tier in the compilation stack" which can be an actual > interpreter, a "splat compiler" or even a regular JIT that doesn't > make optimistic assumptions. By "bailing out to the interpreter" I > mean "side exit" as defined by Philip in > http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/ > > > # high level summary > > Guard intrinsics are intended to represent "if (!cond) leave();" like > control flow in a structured manner. This kind of control flow is > abundant in IR coming from safe languages due to things like range > checks and null checks (and overflow checks, in some cases). From a > high level, there are two distinct motivations for introducing guard > intrinsics: > > - To keep control flow as simple and "straight line"-like as is > reasonable > > - To allow certain kinds of "widening" transforms that cannot be > soundly done in an explicit "check-and-branch" control flow > representation > > ## straw man specification > > Signature: > > ``` > declare void @llvm.guard_on(i1 %predicate) ;; requires [ "deopt"(...) ] > ```Thanks Sanjoy. Sorry it took a couple days for me to read this. I like this approach a lot. I can envision multiple intrinsics leveraging the basic framework of operand bundles, but each with different semantics and lowering. I'm no longer considering extending the patchpoint design to cover these cases. Your semantics for @llvm.guard_on are perfect for managed runtimes. Just to give you the idea of a similar intrinsic we may want to introduce... I would also like to add a @llvm.trap_on(i1 %pred) intrinsic, with side effects and lowering as such: br i1 %cond, label %succeed, label %fail fail: call void @llvm.trap() unreachable This clearly doesn't need operand bundles, but using an intrinsic would permit special code motion semantics. It could be hoisted and merged with other traps, but the condition could never be widened beyond the union of the original conditions. Unlike deoptimizing guards, it would need to be sequenced with memory barriers, but could otherwise be hoisted as readnone. The semantics of "non deoptimizing guards" are different enough that they should be a different intrinsic. @trap_on would essentially be halfway between @guard_on and @exception_on.> ## memory effects (unresolved) > > [I haven't come up with a good model for the memory effects of > `@llvm.guard_on`, suggestions are very welcome.] > > I'd really like to model `@llvm.guard_on` as a readonly function, > since it does not write to memory if it returns; and e.g. forwarding > loads across a call to `@llvm.guard_on` should be legal.Of course it would be nice to consider these intrinsics readonly. (One thing we never fixed with patchpoints is the memory effects and that was probably hurting performance.) But has this bug been fixed yet: http://llvm.org/pr18912 Optimizer should consider "maythrow" calls (those without "nounwind) as having side effects? I vaguely remember some work being done to improve the situation, but last I checked LICM was still violating it, and who knows what else? BTW, if you do want readonly semantics, why would you want readonly to be implicit instead of explicit?> However, I'm in a quandary around representing the "may never return" > aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a > load form `%ptr` across a guard on `%ptr != null`. There are couple > of ways I can think of dealing with this, none of them are both easy > and neat: > > - State that since `@llvm.guard_on` could have had an infinite loop > in it, it may never return. Unfortunately, the LLVM IR has some > rough edges on readonly infinite loops (since C++ disallows that), > so LLVM will require some preparatory work before we can do this > soundly. > > - State that `@llvm.guard_on` can unwind, and thus has non-local > control flow. This can actually work (and is pretty close to > the actual semantics), but is somewhat of hack since > `@llvm.guard_on` doesn't _really_ throw an exception. > > - Special case `@llvm.guard_on` (yuck!). > > What do you think?I think you would need to mark @guard_on as may-unwind AND fix any lingering assumptions in LLVM that readonly calls are nounwind. (Loads can be CSE's across unwind calls but can't be hoisted across them). An alternative is to *not* mark them as readonly but rely on alias analysis. I think doing that would properly model the "unwind" path for the purpose of IPA/Inlining. Introducing TBAA on calls should be easy and sufficient to unblock the memory optimization you need. Of course, IPA still needs to know about side exits, which it really should with any may-unwind call. Other than that, I don't see the problem. Another good point mentioned later in the thread is that a readonly callee should *not* imply a readonly @guard_on or other "deopt" call site. The simple solution for this is to disallow "deopt" of readonly calls. -Andy
Sanjoy Das via llvm-dev
2016-Feb-21 20:24 UTC
[llvm-dev] RFC: Add guard intrinsics to LLVM
Hi Andy, Thanks for replying, responses inline below: On Fri, Feb 19, 2016 at 11:12 AM, Andrew Trick <atrick at apple.com> wrote:> This clearly doesn't need operand bundles, but using an intrinsic > would permit special code motion semantics. It could be hoisted and > merged with other traps, but the condition could never be widened > beyond the union of the original conditions. Unlike deoptimizing > guards, it would need to be sequenced with memory barriers, but couldBy memory barrier, do you mean things like fences?> otherwise be hoisted as readnone.Can you clarify this a little bit? Are you talking about things like: *ptr = 42 @trap_if(%foo) => @trap_if(%foo) *ptr = 42 ? I.e. if a failing guard postdominates a point "P" in the program, then the guard can be failed early at P. (We'd want a stronger condition than postdomination, since we won't want to hoist while (!*cond) { } @trap_if(*cond) => @trap_if(*cond) while (!*cond) { } ) [snip]> Of course it would be nice to consider these intrinsics > readonly. (One thing we never fixed with patchpoints is the memory > effects and that was probably hurting performance.) But has this bug > been fixed yet: http://llvm.org/pr18912 Optimizer should consider > "maythrow" calls (those without "nounwind) as having side effects? I > vaguely remember some work being done to improve the situation, but > last I checked LICM was still violating it, and who knows what else?The specific case in the bug was fixed by David in http://reviews.llvm.org/rL256728. But I agree with your concern, that this notion of "readnone calls are always okay to remove" may have leaked into other parts of LLVM.> BTW, if you do want readonly semantics, why would you want readonly > to be implicit instead of explicit?I don't understand the question. :) What's explicit vs. implicit here?> I think you would need to mark @guard_on as may-unwind AND fix any > lingering assumptions in LLVM that readonly calls are nounwind. (Loads > can be CSE's across unwind calls but can't be hoisted across them).Yes, but a failed guard doesn't exactly "unwind". I.e. if A invokes B and B fails a guard, then the guard may not always unwind to the A->B callsite's landingpad, but can also return to its normal no-exception successor; unless you inline B into A, in which case "B"s guard (now physically present in the A+B function) will return from A if it fails. In other words, we'll have to re-purpose the term "unwind" to mean either "throwing an exception" or "failed a guard". I'm fine with this, but it is a choice we need to make explicitly.> Another good point mentioned later in the thread is that a readonly > callee should *not* imply a readonly @guard_on or other "deopt" call > site. The simple solution for this is to disallow "deopt" of readonly > calls.Assuming we're talking about the same thing, the concern was more along the lines of: you cannot do IPA on a callsite that calls something that contains a potential deoptimization point, be it either for a @guard_on call, or a "normal" deoptimization safepoint. This is because if you have: ``` void @something() void callee() { call @something() [ "deopt"(state0) ] *ptr = 100; } void caller() { call @callee() // whether this is a "deopt" call site or a // a normal call site doesn't matter int k = *ptr; } ``` you cannot, say, fold `k` to `100` in @caller() because the deopt state, state0, if resumed to may actually write something else to *ptr. Making the call to @something() read/write/may unwind does not solve the problem -- even if the call to @something() wrote to *ptr (or threw an exception), we could still fold k to 100. The novel control flow here is that `@caller`, if deoptimized with `state0`, will execute some arbitrary code, and **return** to @caller at the callsite to @callee. (Btw: things are a little different inside our JVM because of the way we register dependencies, but I'm trying to make a LLVM-centric argument here.) At this point, I think it is most straightforward to do a slight variant of what Philip suggested to me offline: if a function can deoptimize, it needs to be marked as "mayBeOverridden", preferably by introducing a new function attribute or by setting the linkage type to one of those checked by GlobalValue::mayBeOverridden. This is the responsibility of the frontend, and resuming to the "deopt" continuation in a physical frame where the enclosing function was not marked mayBeOverridden is UB. -- Sanjoy