thr3ads.net - llvm dev - [llvm-dev] RFC: Add guard intrinsics to LLVM [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das via llvm-dev

2016-Feb-17 02:06 UTC

[llvm-dev] RFC: Add guard intrinsics to LLVM

This is a proposal to add guard intrinsics to LLVM.

Couple of glossary items: when I say "interpreter" I mean "the
most
conservative tier in the compilation stack" which can be an actual
interpreter, a "splat compiler" or even a regular JIT that doesn't
make optimistic assumptions.  By "bailing out to the interpreter" I
mean "side exit" as defined by Philip in
http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/


# high level summary

Guard intrinsics are intended to represent "if (!cond) leave();" like
control flow in a structured manner.  This kind of control flow is
abundant in IR coming from safe languages due to things like range
checks and null checks (and overflow checks, in some cases).  From a
high level, there are two distinct motivations for introducing guard
intrinsics:

 - To keep control flow as simple and "straight line"-like as is
   reasonable

 - To allow certain kinds of "widening" transforms that cannot be
   soundly done in an explicit "check-and-branch" control flow
   representation

## straw man specification

Signature:

```
declare void @llvm.guard_on(i1 %predicate) ;; requires [ "deopt"(...)
]
```

Semantics:

`@llvm.guard_on` bails out to the interpreter (i.e. "deoptimizes" the
currently execution function) if `%predicate` is `false`, meaning that
after `@llvm.guard_on(i1 %t)` has executed `%t` can be assumed to be
true.  In this way, it is close to `@llvm.assume` or an `assert`, with
one very important difference -- `@llvm.guard_on(i1 <false>)` is well
defined (and not UB).  `@llvm.guard_on` on a false predicate bails to
the interpreter and that is always safe (but slow), and so
`@llvm.guard_on(i1 false)` is basically a `noreturn` call that
unconditionally transitions the current compilation to the
interpreter.

Bailing out to the interpreter involves re-creating the state of the
interpreter frames as-if the compilee had been executing in the
interpreter all along.  This state is represented and maintained using
a `"deopt"` operand bundle attached to the call to `@llvm.guard_on`.
The verifier will reject calls to `@llvm.guard_on` without a `"deopt"`
operand bundle.  `@llvm.guard_on` cannot be `invoke`ed (that would be
meaningless anyway, since the method it would've "thrown" into is
about to go away).


Example:

```
  ...
  %rangeCheck0 = icmp ult i32 %arg0, 10
  call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0
*/) ]
  %rangeCheck1 = icmp ult i32 %arg0, 12
  call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1
*/) ]
  ...
```


# details: motivation & alternatives

As specified in the high level summary, there are two key motivations.

The first, more obvious one is that we want the CFG to be less
complex, even if that involves "hiding" control flow in guard
intrinsics.  I expect this to benefit both compile time and
within-a-block optimization.

The second, somewhat less obvious motivation to use guard intrinsics
instead of explicit control flow is to allow check widening.

## check widening

Consider:

```
  ...
  %rangeCheck0 = icmp ult i32 6, %len   ;; for a[6]
  call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0
*/) ]
  call void @printf("hello world")
  %rangeCheck1 = icmp ult i32 7, %len   ;; for a[7]
  call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1
*/) ]
  access a[6] and a[7]
  ...
```

we'd like to optimize it to

```
  ...
  %rangeCheckWide = icmp ult i32 7, %len
  call void @llvm.guard_on(i1 %rangeCheckWide) [ "deopt"(/* deopt
state 0 */) ]
  call void @printf("hello world")
  ;; %rangeCheck1 = icmp ult i32 7, %len  ;; not needed anymore
  ;; call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt
state 1 */) ]
  access a[6] and a[7]
  ...
```

This way we do a range check only on `7` -- if `7` is within bounds,
then we know `6` is too.  This transform is sound only because we know
that the guard on `7 ult %len` will not simply throw an exception if
the said predicate fails, but will bail out to the interpreter with
the abstract state `/* deopt state 0 */`.  In fact, if `%len` is `7`,
the pre-transform program is supposed to print `"hello world"` and
*then* throw an exception, and bailing out to the interpreter with `/*
deopt state 0 */` will do exactly that.

In other words, we're allowed to do speculative and aggressive
transforms that make a guard fail that wouldn't have in the initial
program.  This is okay because a failed guard only bails to the
interpreter, and the interpreter always Does The Right Thing(TM).  In
fact, it is correct (though unwise) to replace every guard's predicate
with `false`.

## the problem with check widening and explicit control flow

Widening is difficult to get right in an explicit "check-and-branch"
representation.  For instance, the above example in a
"check-and-branch" representation would be (in pseudo C, and excluding
the printf):

```
  ...
  if (!(6 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable;
}
  if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable;
}
  ...
```

The following transform is invalid:

```
  ...
  if (!(7 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable;
}
  if (!(true)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
  ...
```

since we do not know if the first side exit had been optimized under
the assumption `!(6 < %len)` (via JumpThreading etc.).  E.g. the
"original" IR could have been

```
  ...
  if (!(6 < %len)) { call @side_exit() [ "deopt"(!(6 < %len)) ];
unreachable; }
  if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable;
}
  ...
```

which got optimized to

```
  ...
  if (!(6 < %len)) { call @side_exit() [ "deopt"(true) ];
unreachable; }
  if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable;
}
  ...
```

before the widening transform. The widening transform will now
effectively pass in an incorrect value for `!(6 < %len)`.

This isn't to say it is impossible to do check widening in a explicit
control flow representation, just that is more natural to do it with
guards.


# details: semantics

## as-if control flow

The observable behavior of `@llvm.guard_on` is specified as:

```
  void @llvm.guard_on(i1 %pred) {
  entry:
    %unknown_cond = < unknown source >
    %cond = and i1 %unknown_cond, %pred
    br i1 %cond, label %left, label %right

  left:
    call void @bail_to_interpreter() [ "deopt"() ] noreturn
    unreachable

  right:
    ret void
  }
```

So, precisely speaking, `@llvm.guard_on` is guaranteed to bail to the
interpreter if `%pred` is false, but it **may** bail to the
interpreter if `%pred` is true.  It is this bit that lets us soundly
widen `%pred`, since all we're doing is "refining" `< unknown
source >`.

`@bail_to_interpreter` does not return to the current compilation, but
it returns to the `"deopt"` continuation that is has been given (once
inlined, the empty "deopt"() continuation will be fixed up to have the
right
continuation).


## applicable optimizations

Apart from widening, most of the optimizations we're interested in are
what's allowed by an equivalent `@llvm.assume`.  Any conditional
branches dominated by a guard on the same condition can be folded,
multiple guards on the same condition can be CSE'ed, loop invariant
guards can be hoisted out of loops etc.

Ultimately, we'd like to recover the same quality of optimization as
we currently get from the "check-and-branch" representation.  With the
"check-and-branch" representation, the optimizer is able to sink
stores and computation into the slow path. This is something it cannot
do in the guard_on representation, and we'll have to lower the
guard_on representation to the "check-and-branch" representation at a
suitable point in the pipeline to get this kind of optimization.

## lowering

At some point in the compilation pipeline, we will have to lower
`@llvm.guard_on` into explicit control flow, by "inlining" "an
implementation" of `@llvm.guard_on` (or by some other means).  I don't
have a good sense on when in the pipeline this should be done -- the
answer will depend on what we find as we make LLVM more aggressive
around optimizing guards.

## environments without support for bailing to the interpreter

Our VM has deeply integrated support for deoptimizations, but not all
language runtimes do.  If there is a need for non deoptimizing guards, we
can consider introducing a variant of `@llvm.guard_on`:

```
declare void @llvm.exception_on(i1 %predicate, i32 %exceptionKind)
```

with this one having the semantics that it always throws an exception
if `%predicate` fails.  Only the non-widening optimizations for
`@llvm.guard_on` will apply to `@llvm.exception_on`.

## memory effects (unresolved)

[I haven't come up with a good model for the memory effects of
 `@llvm.guard_on`, suggestions are very welcome.]

I'd really like to model `@llvm.guard_on` as a readonly function,
since it does not write to memory if it returns; and e.g. forwarding
loads across a call to `@llvm.guard_on` should be legal.

However, I'm in a quandary around representing the "may never
return"
aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a
load form `%ptr` across a guard on `%ptr != null`.  There are couple
of ways I can think of dealing with this, none of them are both easy
and neat:

 - State that since `@llvm.guard_on` could have had an infinite loop
   in it, it may never return. Unfortunately, the LLVM IR has some
   rough edges on readonly infinite loops (since C++ disallows that),
   so LLVM will require some preparatory work before we can do this
   soundly.

 - State that `@llvm.guard_on` can unwind, and thus has non-local
   control flow.  This can actually work (and is pretty close to
   the actual semantics), but is somewhat of hack since
   `@llvm.guard_on` doesn't _really_ throw an exception.

  - Special case `@llvm.guard_on` (yuck!).

What do you think?

-- Sanjoy

Philip Reames via llvm-dev

2016-Feb-17 23:40 UTC

head link

[llvm-dev] RFC: Add guard intrinsics to LLVM

On 02/16/2016 06:06 PM, Sanjoy Das wrote:> This is a proposal to add guard intrinsics to LLVM.A couple of inline comments building on Sanjoy's points
follow.> Couple of glossary items: when I say "interpreter" I mean
"the most
> conservative tier in the compilation stack" which can be an actual
> interpreter, a "splat compiler" or even a regular JIT that
doesn't
> make optimistic assumptions.  By "bailing out to the interpreter"
I
> mean "side exit" as defined by Philip in
> http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/
>
>
> # high level summary
>
> Guard intrinsics are intended to represent "if (!cond) leave();"
like
> control flow in a structured manner.  This kind of control flow is
> abundant in IR coming from safe languages due to things like range
> checks and null checks (and overflow checks, in some cases).  From a
> high level, there are two distinct motivations for introducing guard
> intrinsics:
>
>   - To keep control flow as simple and "straight line"-like as is
>     reasonable
>
>   - To allow certain kinds of "widening" transforms that cannot
be
>     soundly done in an explicit "check-and-branch" control flow
>     representation
>
> ## straw man specification
>
> Signature:
>
> ```
> declare void @llvm.guard_on(i1 %predicate) ;; requires [
"deopt"(...) ]
> ```
>
> Semantics:
>
> `@llvm.guard_on` bails out to the interpreter (i.e. "deoptimizes"
the
> currently execution function) if `%predicate` is `false`, meaning that
> after `@llvm.guard_on(i1 %t)` has executed `%t` can be assumed to be
> true.  In this way, it is close to `@llvm.assume` or an `assert`, with
> one very important difference -- `@llvm.guard_on(i1 <false>)` is well
> defined (and not UB).  `@llvm.guard_on` on a false predicate bails to
> the interpreter and that is always safe (but slow), and so
> `@llvm.guard_on(i1 false)` is basically a `noreturn` call that
> unconditionally transitions the current compilation to the
> interpreter.It's also worth noting that @llvm.guard_on(i1 true) is useful and well 
defined as well.  When lowering, such a guard simply disappears, but it 
can be useful to keep around in the IR during optimization.  It gives a 
well defined point for widening transforms to apply with a well known 
deoptimization state already available.

I'd suggest a small change to Sanjoy's declaration.  I think we should 
allow additional arguments to the guard, not just the condition.  What 
exactly those arguments mean would be up to the runtime, but various 
runtimes might want to provide additional arguments to the OSR mechanism.
>
> Bailing out to the interpreter involves re-creating the state of the
> interpreter frames as-if the compilee had been executing in the
> interpreter all along.  This state is represented and maintained using
> a `"deopt"` operand bundle attached to the call to
`@llvm.guard_on`.
> The verifier will reject calls to `@llvm.guard_on` without a
`"deopt"`
> operand bundle.This introduces a very subtle point.  The side exit only effects the 
*function* which contains the guard.  A caller of that function in the 
same module may be returned to by either the function itself, or the 
interpreter after running the continuation implied by the guard.  This 
introduces a complication for IPA/IPO; any guard (really, any side exit, 
of which guards are one form) has to be treated as a possible return 
point from the callee with an unknowable return value and memory state.

In our prototype implementation, we've implemented this by doing a 
linear scan of the function in each IPO pass and bailing if we see a 
side exit.  One approach we could consider upstream is to introduce a 
function attribute which indicates "can exit via side exit or guard".
Any function containing a guard would be required to have the 
attribute.  InlineFunction could propagate it to the caller when 
inlining a callee with it.  FunctionAttrs could remove it if no guards 
(or callees with guards) were found.  This would make the implementation 
of the bail in our IPO passes much cheaper, but it wouldn't remove the 
need for it.
> `@llvm.guard_on` cannot be `invoke`ed (that would be
> meaningless anyway, since the method it would've "thrown"
into is
> about to go away).I disagree with this bit.  It needlessly complicates the inliner. 
Allowing an invoke of a guard which SimplifyCFG converts to calls just 
like it would a nothrow function seems much cleaner.>
>
> Example:
>
> ```
>    ...
>    %rangeCheck0 = icmp ult i32 %arg0, 10
>    call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt
state 0 */) ]
>    %rangeCheck1 = icmp ult i32 %arg0, 12
>    call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt
state 1 */) ]
>    ...
> ```
>
>
> # details: motivation & alternatives
>
> As specified in the high level summary, there are two key motivations.
>
> The first, more obvious one is that we want the CFG to be less
> complex, even if that involves "hiding" control flow in guard
> intrinsics.  I expect this to benefit both compile time and
> within-a-block optimization.I want to really emphasize this point.  I expect that using guard 
intrinsics would be a major compile time win for higher level 
languages.  (I can't say for sure, we currently use the explicit control 
model.)

Given many of our transforms are block-at-a-time, using guards might 
also lead to better optimization.  I'm particularly interested in seeing 
how this influences instruction selection.>
> The second, somewhat less obvious motivation to use guard intrinsics
> instead of explicit control flow is to allow check widening.
>
> ## check widening
>
> Consider:
>
> ```
>    ...
>    %rangeCheck0 = icmp ult i32 6, %len   ;; for a[6]
>    call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt
state 0 */) ]
>    call void @printf("hello world")
>    %rangeCheck1 = icmp ult i32 7, %len   ;; for a[7]
>    call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt
state 1 */) ]
>    access a[6] and a[7]
>    ...
> ```
>
> we'd like to optimize it to
>
> ```
>    ...
>    %rangeCheckWide = icmp ult i32 7, %len
>    call void @llvm.guard_on(i1 %rangeCheckWide) [ "deopt"(/*
deopt state 0 */) ]
>    call void @printf("hello world")
>    ;; %rangeCheck1 = icmp ult i32 7, %len  ;; not needed anymore
>    ;; call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/*
deopt state 1 */) ]
>    access a[6] and a[7]
>    ...
> ```
>
> This way we do a range check only on `7` -- if `7` is within bounds,
> then we know `6` is too.  This transform is sound only because we know
> that the guard on `7 ult %len` will not simply throw an exception if
> the said predicate fails, but will bail out to the interpreter with
> the abstract state `/* deopt state 0 */`.  In fact, if `%len` is `7`,
> the pre-transform program is supposed to print `"hello world"`
and
> *then* throw an exception, and bailing out to the interpreter with `/*
> deopt state 0 */` will do exactly that.
>
> In other words, we're allowed to do speculative and aggressive
> transforms that make a guard fail that wouldn't have in the initial
> program.  This is okay because a failed guard only bails to the
> interpreter, and the interpreter always Does The Right Thing(TM).  In
> fact, it is correct (though unwise) to replace every guard's predicate
> with `false`.Another optimization this allows is a more code size friendly version of 
IRCE.  Today, we have to generate pre and post loops when removing 
bounds checks from an inner loop unless we can *prove* that no 
iterations would run in those loops.  With a guard construct, we can 
find a guard before the loop and "widen" it to account for the pre and
post iteration spaces.  This transformation is sometimes known as "loop 
predication".>
> ## the problem with check widening and explicit control flow
>
> Widening is difficult to get right in an explicit
"check-and-branch"
> representation.  For instance, the above example in a
> "check-and-branch" representation would be (in pseudo C, and
excluding
> the printf):
>
> ```
>    ...
>    if (!(6 < %len)) { call @side_exit() [ "deopt"(P) ];
unreachable; }
>    if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }
>    ...
> ```
>
> The following transform is invalid:
>
> ```
>    ...
>    if (!(7 < %len)) { call @side_exit() [ "deopt"(P) ];
unreachable; }
>    if (!(true)) { call @side_exit() [ "deopt"(Q) ]; unreachable;
}
>    ...
> ```
>
> since we do not know if the first side exit had been optimized under
> the assumption `!(6 < %len)` (via JumpThreading etc.).  E.g. the
> "original" IR could have been
>
> ```
>    ...
>    if (!(6 < %len)) { call @side_exit() [ "deopt"(!(6 <
%len)) ]; unreachable; }
>    if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }
>    ...
> ```
>
> which got optimized to
>
> ```
>    ...
>    if (!(6 < %len)) { call @side_exit() [ "deopt"(true) ];
unreachable; }
>    if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }
>    ...
> ```
>
> before the widening transform. The widening transform will now
> effectively pass in an incorrect value for `!(6 < %len)`.
>
> This isn't to say it is impossible to do check widening in a explicit
> control flow representation, just that is more natural to do it with
> guards.
>
>
> # details: semantics
>
> ## as-if control flow
>
> The observable behavior of `@llvm.guard_on` is specified as:
>
> ```
>    void @llvm.guard_on(i1 %pred) {
>    entry:
>      %unknown_cond = < unknown source >
>      %cond = and i1 %unknown_cond, %pred
>      br i1 %cond, label %left, label %right
>
>    left:
>      call void @bail_to_interpreter() [ "deopt"() ] noreturn
>      unreachable
>
>    right:
>      ret void
>    }
> ```
>
> So, precisely speaking, `@llvm.guard_on` is guaranteed to bail to the
> interpreter if `%pred` is false, but it **may** bail to the
> interpreter if `%pred` is true.  It is this bit that lets us soundly
> widen `%pred`, since all we're doing is "refining" `<
unknown source >`.Unless I'm misreading this, it looks like Sanjoy got the IR in the 
specification wrong.  The intrinsic is specified to side exit if the 
condition is false (so that it's true in the caller after the guard), 
not the other way around.  The text description appears
correct.>
> `@bail_to_interpreter` does not return to the current compilation, but
> it returns to the `"deopt"` continuation that is has been given
(once
> inlined, the empty "deopt"() continuation will be fixed up to
have the right
> continuation).This "bail_to_interpreter" is a the more general form of side exit I 
mentioned above.>
>
> ## applicable optimizations
>
> Apart from widening, most of the optimizations we're interested in are
> what's allowed by an equivalent `@llvm.assume`.  Any conditional
> branches dominated by a guard on the same condition can be folded,
> multiple guards on the same condition can be CSE'ed, loop invariant
> guards can be hoisted out of loops etc.
>
> Ultimately, we'd like to recover the same quality of optimization as
> we currently get from the "check-and-branch" representation. 
With the
> "check-and-branch" representation, the optimizer is able to sink
> stores and computation into the slow path. This is something it cannot
> do in the guard_on representation, and we'll have to lower the
> guard_on representation to the "check-and-branch" representation
at a
> suitable point in the pipeline to get this kind of optimization.
>
> ## lowering
>
> At some point in the compilation pipeline, we will have to lower
> `@llvm.guard_on` into explicit control flow, by "inlining"
"an
> implementation" of `@llvm.guard_on` (or by some other means).  I
don't
> have a good sense on when in the pipeline this should be done -- the
> answer will depend on what we find as we make LLVM more aggressive
> around optimizing guards.As a starting point, I'd likely do this just before code gen prep with 
some custom sinking logic to pull instructions only used on the failing 
path into the newly introduced block.  Long term, we'd probably want to 
do the same thing over MI, but mucking with control flow at that layer 
is a bit more complicated.

Rather than framing this as inlining, I'd frame it as expansion to a 
well known body (pretty much the one Sanjoy gives above).  The 
@bail_to_interpreter construct could be lowered directly to a function 
call to some well known symbol name (which JIT users can intercept and 
bind to whatever they want.)  Something like __llvm_side_exit seems like 
a reasonable choice.>
> ## environments without support for bailing to the interpreter
>
> Our VM has deeply integrated support for deoptimizations, but not all
> language runtimes do.  If there is a need for non deoptimizing guards, we
> can consider introducing a variant of `@llvm.guard_on`:
>
> ```
> declare void @llvm.exception_on(i1 %predicate, i32 %exceptionKind)
> ```
>
> with this one having the semantics that it always throws an exception
> if `%predicate` fails.  Only the non-widening optimizations for
> `@llvm.guard_on` will apply to `@llvm.exception_on`.Not sure we actually need this.  A valid implementation of the side exit 
handler (which would do the OSR for us) is to call a runtime routine 
which generates and throws the exception.  The only bit we might need is 
a distinction between widdenable and non-widenable guards.
>
> ## memory effects (unresolved)
>
> [I haven't come up with a good model for the memory effects of
>   `@llvm.guard_on`, suggestions are very welcome.]
>
> I'd really like to model `@llvm.guard_on` as a readonly function,
> since it does not write to memory if it returns; and e.g. forwarding
> loads across a call to `@llvm.guard_on` should be legal.
>
> However, I'm in a quandary around representing the "may never
return"
> aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a
> load form `%ptr` across a guard on `%ptr != null`.Modeling this as memory dependence just seems wrong.  We already have to 
model control dependence on functions which may throw.  I don't think 
there's anything new here.

The only unusual bit is that we're going to want to teach AliasAnalysis 
that the guard does write to any memory location (to allow forwarding) 
while still preserving the control dependence.> There are couple
> of ways I can think of dealing with this, none of them are both easy
> and neat:
>
>   - State that since `@llvm.guard_on` could have had an infinite loop
>     in it, it may never return. Unfortunately, the LLVM IR has some
>     rough edges on readonly infinite loops (since C++ disallows that),
>     so LLVM will require some preparatory work before we can do this
>     soundly.
>
>   - State that `@llvm.guard_on` can unwind, and thus has non-local
>     control flow.  This can actually work (and is pretty close to
>     the actual semantics), but is somewhat of hack since
>     `@llvm.guard_on` doesn't _really_ throw an exception.Er, careful.  Semantically, the guard *might* throw an exception. It 
could be that's what the interpreter does when evaluating the 
continuation implied by the guard and any of our callers have to account 
for the fact the function which contains the guard might throw.  The 
easiest way to ensure that is to model the guard call as possibly
throwing.>
>    - Special case `@llvm.guard_on` (yuck!).
>
> What do you think?
>
> -- Sanjoy

Sanjoy Das via llvm-dev

2016-Feb-18 00:41 UTC

head link

[llvm-dev] RFC: Add guard intrinsics to LLVM

Replies inline.

At a high level, it feels like we'll eventually need a new instruction
to represent the kind of control flow a guard entails (to be clear: we
should probably still start with an intrinsic) -- they are fairly
well-behaved, i.e. readonly, nounwind etc. as far as the immediate
"physical" caller is concerned, but not so as far as its callers's
callers are concerned.

On Wed, Feb 17, 2016 at 3:40 PM, Philip Reames
<listmail at philipreames.com> wrote:
>> one very important difference -- `@llvm.guard_on(i1 <false>)` is
well
>> defined (and not UB).  `@llvm.guard_on` on a false predicate bails to
>> the interpreter and that is always safe (but slow), and so
>> `@llvm.guard_on(i1 false)` is basically a `noreturn` call that
>> unconditionally transitions the current compilation to the
>> interpreter.
>
> It's also worth noting that @llvm.guard_on(i1 true) is useful and well
> defined as well.  When lowering, such a guard simply disappears, but it can
> be useful to keep around in the IR during optimization.  It gives a well
> defined point for widening transforms to apply with a well known
> deoptimization state already available.
Yes!  Actually, I had exactly this in an earlier version of this
writeup, which I removed to make the (already long) RFC shorter.
> I'd suggest a small change to Sanjoy's declaration.  I think we
should allow
> additional arguments to the guard, not just the condition.  What exactly
> those arguments mean would be up to the runtime, but various runtimes might
> want to provide additional arguments to the OSR mechanism.
We'll still have to make a call on the signature of the intrinsic (or
are you suggesting a varargs intrinsic)?

I suppose we could also have a family of intrinsics, that take on
argument of variable type.
>> Bailing out to the interpreter involves re-creating the state of the
>> interpreter frames as-if the compilee had been executing in the
>> interpreter all along.  This state is represented and maintained using
>> a `"deopt"` operand bundle attached to the call to
`@llvm.guard_on`.
>> The verifier will reject calls to `@llvm.guard_on` without a
`"deopt"`
>> operand bundle.
>
> This introduces a very subtle point.  The side exit only effects the
> *function* which contains the guard.  A caller of that function in the same
> module may be returned to by either the function itself, or the interpreter
> after running the continuation implied by the guard.  This introduces a
> complication for IPA/IPO; any guard (really, any side exit, of which guards
> are one form) has to be treated as a possible return point from the callee
> with an unknowable return value and memory state.
This is a really good point.  This has strong implications for the
guards memory effects as well -- even though a guard can be seen as
readonly in its containing functions, things that call the containing
function has to see the guard as read-write.  IOW @foo below is
read/write, even though v0 can be forwarded to v1:

```
int @foo(int cond) {
  int v0 = this->field;
  guard_on(cond);
  int v1 = this->field;
  return v0 + v1;
}
```

As you point out, we're also introducing a newish kind of control flow
here.  It is not fundamentally new, since longjmp does something
similar (but not quite the same).

I hate to say this, but perhaps we're really looking at (eventually) a
new instruction here, and not just a new intrinsic.
>> `@llvm.guard_on` cannot be `invoke`ed (that would be
>> meaningless anyway, since the method it would've "thrown"
into is
>> about to go away).
>
> I disagree with this bit.  It needlessly complicates the inliner. Allowing
> an invoke of a guard which SimplifyCFG converts to calls just like it would
> a nothrow function seems much cleaner.
SGTM.
>> The observable behavior of `@llvm.guard_on` is specified as:
>>
>> ```
>>    void @llvm.guard_on(i1 %pred) {
>>    entry:
>>      %unknown_cond = < unknown source >
>>      %cond = and i1 %unknown_cond, %pred
>>      br i1 %cond, label %left, label %right
>>
>>    left:
>>      call void @bail_to_interpreter() [ "deopt"() ] noreturn
>>      unreachable
>>
>>    right:
>>      ret void
>>    }
>> ```
>>
>> So, precisely speaking, `@llvm.guard_on` is guaranteed to bail to the
>> interpreter if `%pred` is false, but it **may** bail to the
>> interpreter if `%pred` is true.  It is this bit that lets us soundly
>> widen `%pred`, since all we're doing is "refining" `<
unknown source >`.
>
> Unless I'm misreading this, it looks like Sanjoy got the IR in the
> specification wrong.  The intrinsic is specified to side exit if the
> condition is false (so that it's true in the caller after the guard),
not
> the other way around.  The text description appears correct.
Yes, and thanks for catching.  The branch should have been "br i1
%cond, label %right, label %left".
>> `@bail_to_interpreter` does not return to the current compilation, but
>> it returns to the `"deopt"` continuation that is has been
given (once
>> inlined, the empty "deopt"() continuation will be fixed up to
have the
>> right
>> continuation).
>
> This "bail_to_interpreter" is a the more general form of side
exit I
> mentioned above.
How is it more general?
> As a starting point, I'd likely do this just before code gen prep with
some
> custom sinking logic to pull instructions only used on the failing path
into
> the newly introduced block.  Long term, we'd probably want to do the
same
> thing over MI, but mucking with control flow at that layer is a bit more
> complicated.
>
> Rather than framing this as inlining, I'd frame it as expansion to a
well
> known body (pretty much the one Sanjoy gives above).  The
> @bail_to_interpreter construct could be lowered directly to a function call
> to some well known symbol name (which JIT users can intercept and bind to
> whatever they want.)  Something like __llvm_side_exit seems like a
> reasonable choice.
SGTM.
>> with this one having the semantics that it always throws an exception
>> if `%predicate` fails.  Only the non-widening optimizations for
>> `@llvm.guard_on` will apply to `@llvm.exception_on`.
>
> Not sure we actually need this.  A valid implementation of the side exit
> handler (which would do the OSR for us) is to call a runtime routine which
> generates and throws the exception.  The only bit we might need is a
> distinction between widdenable and non-widenable guards.
Yes, and that's the only distinction I'm trying to make here.
>> ## memory effects (unresolved)
>>
>> [I haven't come up with a good model for the memory effects of
>>   `@llvm.guard_on`, suggestions are very welcome.]
>>
>> I'd really like to model `@llvm.guard_on` as a readonly function,
>> since it does not write to memory if it returns; and e.g. forwarding
>> loads across a call to `@llvm.guard_on` should be legal.
>>
>> However, I'm in a quandary around representing the "may never
return"
>> aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a
>> load form `%ptr` across a guard on `%ptr != null`.
>
> Modeling this as memory dependence just seems wrong.  We already have to
> model control dependence on functions which may throw.  I don't think
> there's anything new here.
I am trying to model this as control dependence, but the difficult bit
is to do that while still maintaining that the call does not clobber
any memory.  I'm worried that there may be reasons (practical or
theoretical) why we "readonly" functions always have to terminate and
be nothrow.
> The only unusual bit is that we're going to want to teach AliasAnalysis
that
> the guard does write to any memory location (to allow forwarding) while
> still preserving the control dependence.
So you're saying that we model the guard as otherwise read/write (thus
sidestepping the readonly non-returning quandary) but teach
AliasAnalysis that it doesn't clobber any memory?  That would work.

We can also use the same tool to solve the "may return to its caller's
caller with arbitrary heap state" issue by teaching AA that a guard
does not alias with reads in its own (physical) function, but clobbers
the heap for other (physical) functions.

Notation: I'm differentiating between physical functions == functions
that create actual stack frames and inlined functions == logical Java
functions that don't create separate physical frames.  Inlining IR
from one java level function into another usually creates a physical
function that contains more than one logical function.
>>
>> There are couple
>> of ways I can think of dealing with this, none of them are both easy
>> and neat:
>>
>>   - State that since `@llvm.guard_on` could have had an infinite loop
>>     in it, it may never return. Unfortunately, the LLVM IR has some
>>     rough edges on readonly infinite loops (since C++ disallows that),
>>     so LLVM will require some preparatory work before we can do this
>>     soundly.
>>
>>   - State that `@llvm.guard_on` can unwind, and thus has non-local
>>     control flow.  This can actually work (and is pretty close to
>>     the actual semantics), but is somewhat of hack since
>>     `@llvm.guard_on` doesn't _really_ throw an exception.
>
> Er, careful.  Semantically, the guard *might* throw an exception. It could
> be that's what the interpreter does when evaluating the continuation
implied
> by the guard and any of our callers have to account for the fact the
> function which contains the guard might throw.  The easiest way to ensure
> that is to model the guard call as possibly throwing.
Yes, it does not throw an exception into its own caller, but may throw
into its caller's caller.

-- Sanjoy

Eric Christopher via llvm-dev

2016-Feb-18 19:43 UTC

head link

[llvm-dev] RFC: Add guard intrinsics to LLVM

Hi Sanjoy,

A quick question here. With the bailing to the interpreter support that
you're envisioning ("deopt operand bundle"), it appears to overlap
quite a
bit with the existing stack maps. What's the story/interaction/etc here? I
agree that a simpler control flow is great when bailing to the interpreter
- doing it with phi nodes is a recipe for pain and long compile times.

Thanks!

-eric

On Tue, Feb 16, 2016 at 6:06 PM Sanjoy Das via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> This is a proposal to add guard intrinsics to LLVM.
>
> Couple of glossary items: when I say "interpreter" I mean
"the most
> conservative tier in the compilation stack" which can be an actual
> interpreter, a "splat compiler" or even a regular JIT that
doesn't
> make optimistic assumptions.  By "bailing out to the interpreter"
I
> mean "side exit" as defined by Philip in
> http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/
>
>
> # high level summary
>
> Guard intrinsics are intended to represent "if (!cond) leave();"
like
> control flow in a structured manner.  This kind of control flow is
> abundant in IR coming from safe languages due to things like range
> checks and null checks (and overflow checks, in some cases).  From a
> high level, there are two distinct motivations for introducing guard
> intrinsics:
>
>  - To keep control flow as simple and "straight line"-like as is
>    reasonable
>
>  - To allow certain kinds of "widening" transforms that cannot be
>    soundly done in an explicit "check-and-branch" control flow
>    representation
>
> ## straw man specification
>
> Signature:
>
> ```
> declare void @llvm.guard_on(i1 %predicate) ;; requires [
"deopt"(...) ]
> ```
>
> Semantics:
>
> `@llvm.guard_on` bails out to the interpreter (i.e. "deoptimizes"
the
> currently execution function) if `%predicate` is `false`, meaning that
> after `@llvm.guard_on(i1 %t)` has executed `%t` can be assumed to be
> true.  In this way, it is close to `@llvm.assume` or an `assert`, with
> one very important difference -- `@llvm.guard_on(i1 <false>)` is well
> defined (and not UB).  `@llvm.guard_on` on a false predicate bails to
> the interpreter and that is always safe (but slow), and so
> `@llvm.guard_on(i1 false)` is basically a `noreturn` call that
> unconditionally transitions the current compilation to the
> interpreter.
>
> Bailing out to the interpreter involves re-creating the state of the
> interpreter frames as-if the compilee had been executing in the
> interpreter all along.  This state is represented and maintained using
> a `"deopt"` operand bundle attached to the call to
`@llvm.guard_on`.
> The verifier will reject calls to `@llvm.guard_on` without a
`"deopt"`
> operand bundle.  `@llvm.guard_on` cannot be `invoke`ed (that would be
> meaningless anyway, since the method it would've "thrown"
into is
> about to go away).
>
>
> Example:
>
> ```
>   ...
>   %rangeCheck0 = icmp ult i32 %arg0, 10
>   call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt
state 0 */)
> ]
>   %rangeCheck1 = icmp ult i32 %arg0, 12
>   call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt
state 1 */)
> ]
>   ...
> ```
>
>
> # details: motivation & alternatives
>
> As specified in the high level summary, there are two key motivations.
>
> The first, more obvious one is that we want the CFG to be less
> complex, even if that involves "hiding" control flow in guard
> intrinsics.  I expect this to benefit both compile time and
> within-a-block optimization.
>
> The second, somewhat less obvious motivation to use guard intrinsics
> instead of explicit control flow is to allow check widening.
>
> ## check widening
>
> Consider:
>
> ```
>   ...
>   %rangeCheck0 = icmp ult i32 6, %len   ;; for a[6]
>   call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt
state 0 */)
> ]
>   call void @printf("hello world")
>   %rangeCheck1 = icmp ult i32 7, %len   ;; for a[7]
>   call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt
state 1 */)
> ]
>   access a[6] and a[7]
>   ...
> ```
>
> we'd like to optimize it to
>
> ```
>   ...
>   %rangeCheckWide = icmp ult i32 7, %len
>   call void @llvm.guard_on(i1 %rangeCheckWide) [ "deopt"(/* deopt
state 0
> */) ]
>   call void @printf("hello world")
>   ;; %rangeCheck1 = icmp ult i32 7, %len  ;; not needed anymore
>   ;; call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt
state 1
> */) ]
>   access a[6] and a[7]
>   ...
> ```
>
> This way we do a range check only on `7` -- if `7` is within bounds,
> then we know `6` is too.  This transform is sound only because we know
> that the guard on `7 ult %len` will not simply throw an exception if
> the said predicate fails, but will bail out to the interpreter with
> the abstract state `/* deopt state 0 */`.  In fact, if `%len` is `7`,
> the pre-transform program is supposed to print `"hello world"`
and
> *then* throw an exception, and bailing out to the interpreter with `/*
> deopt state 0 */` will do exactly that.
>
> In other words, we're allowed to do speculative and aggressive
> transforms that make a guard fail that wouldn't have in the initial
> program.  This is okay because a failed guard only bails to the
> interpreter, and the interpreter always Does The Right Thing(TM).  In
> fact, it is correct (though unwise) to replace every guard's predicate
> with `false`.
>
> ## the problem with check widening and explicit control flow
>
> Widening is difficult to get right in an explicit
"check-and-branch"
> representation.  For instance, the above example in a
> "check-and-branch" representation would be (in pseudo C, and
excluding
> the printf):
>
> ```
>   ...
>   if (!(6 < %len)) { call @side_exit() [ "deopt"(P) ];
unreachable; }
>   if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }
>   ...
> ```
>
> The following transform is invalid:
>
> ```
>   ...
>   if (!(7 < %len)) { call @side_exit() [ "deopt"(P) ];
unreachable; }
>   if (!(true)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
>   ...
> ```
>
> since we do not know if the first side exit had been optimized under
> the assumption `!(6 < %len)` (via JumpThreading etc.).  E.g. the
> "original" IR could have been
>
> ```
>   ...
>   if (!(6 < %len)) { call @side_exit() [ "deopt"(!(6 <
%len)) ];
> unreachable; }
>   if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }
>   ...
> ```
>
> which got optimized to
>
> ```
>   ...
>   if (!(6 < %len)) { call @side_exit() [ "deopt"(true) ];
unreachable; }
>   if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }
>   ...
> ```
>
> before the widening transform. The widening transform will now
> effectively pass in an incorrect value for `!(6 < %len)`.
>
> This isn't to say it is impossible to do check widening in a explicit
> control flow representation, just that is more natural to do it with
> guards.
>
>
> # details: semantics
>
> ## as-if control flow
>
> The observable behavior of `@llvm.guard_on` is specified as:
>
> ```
>   void @llvm.guard_on(i1 %pred) {
>   entry:
>     %unknown_cond = < unknown source >
>     %cond = and i1 %unknown_cond, %pred
>     br i1 %cond, label %left, label %right
>
>   left:
>     call void @bail_to_interpreter() [ "deopt"() ] noreturn
>     unreachable
>
>   right:
>     ret void
>   }
> ```
>
> So, precisely speaking, `@llvm.guard_on` is guaranteed to bail to the
> interpreter if `%pred` is false, but it **may** bail to the
> interpreter if `%pred` is true.  It is this bit that lets us soundly
> widen `%pred`, since all we're doing is "refining" `<
unknown source >`.
>
> `@bail_to_interpreter` does not return to the current compilation, but
> it returns to the `"deopt"` continuation that is has been given
(once
> inlined, the empty "deopt"() continuation will be fixed up to
have the
> right
> continuation).
>
>
> ## applicable optimizations
>
> Apart from widening, most of the optimizations we're interested in are
> what's allowed by an equivalent `@llvm.assume`.  Any conditional
> branches dominated by a guard on the same condition can be folded,
> multiple guards on the same condition can be CSE'ed, loop invariant
> guards can be hoisted out of loops etc.
>
> Ultimately, we'd like to recover the same quality of optimization as
> we currently get from the "check-and-branch" representation. 
With the
> "check-and-branch" representation, the optimizer is able to sink
> stores and computation into the slow path. This is something it cannot
> do in the guard_on representation, and we'll have to lower the
> guard_on representation to the "check-and-branch" representation
at a
> suitable point in the pipeline to get this kind of optimization.
>
> ## lowering
>
> At some point in the compilation pipeline, we will have to lower
> `@llvm.guard_on` into explicit control flow, by "inlining"
"an
> implementation" of `@llvm.guard_on` (or by some other means).  I
don't
> have a good sense on when in the pipeline this should be done -- the
> answer will depend on what we find as we make LLVM more aggressive
> around optimizing guards.
>
> ## environments without support for bailing to the interpreter
>
> Our VM has deeply integrated support for deoptimizations, but not all
> language runtimes do.  If there is a need for non deoptimizing guards, we
> can consider introducing a variant of `@llvm.guard_on`:
>
> ```
> declare void @llvm.exception_on(i1 %predicate, i32 %exceptionKind)
> ```
>
> with this one having the semantics that it always throws an exception
> if `%predicate` fails.  Only the non-widening optimizations for
> `@llvm.guard_on` will apply to `@llvm.exception_on`.
>
> ## memory effects (unresolved)
>
> [I haven't come up with a good model for the memory effects of
>  `@llvm.guard_on`, suggestions are very welcome.]
>
> I'd really like to model `@llvm.guard_on` as a readonly function,
> since it does not write to memory if it returns; and e.g. forwarding
> loads across a call to `@llvm.guard_on` should be legal.
>
> However, I'm in a quandary around representing the "may never
return"
> aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a
> load form `%ptr` across a guard on `%ptr != null`.  There are couple
> of ways I can think of dealing with this, none of them are both easy
> and neat:
>
>  - State that since `@llvm.guard_on` could have had an infinite loop
>    in it, it may never return. Unfortunately, the LLVM IR has some
>    rough edges on readonly infinite loops (since C++ disallows that),
>    so LLVM will require some preparatory work before we can do this
>    soundly.
>
>  - State that `@llvm.guard_on` can unwind, and thus has non-local
>    control flow.  This can actually work (and is pretty close to
>    the actual semantics), but is somewhat of hack since
>    `@llvm.guard_on` doesn't _really_ throw an exception.
>
>   - Special case `@llvm.guard_on` (yuck!).
>
> What do you think?
>
> -- Sanjoy
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160218/847fcc0f/attachment.html>

Sanjoy Das via llvm-dev

2016-Feb-18 20:23 UTC

head link

[llvm-dev] RFC: Add guard intrinsics to LLVM

Hi Eric,

(That's a very valid question, thanks for asking!)

The key difference between stackmaps and operand bundles it is more
natural to teach the optimizer to manipulate deopt operand bundles, as
opposed to values live in to a stackmap.  This becomes a requirement[0]
around inlining, since (say) we have:

```
void foo() {
  if (ptr == null) bail_to_interpreter(); [ "deopt"(i32 101) ]
}

void bar() {
  foo() [ "deopt"(i32 237) ]
}
```

and we wish to inline foo into bar.  Then the inlined body has to look like

```
void bar() {
  if (ptr == null)
    bail_to_interpreter() [ "deopt"(i32 237, i32 101) ]
}

// definition of foo() is elided
```

This is because if ptr is null, and we do have to bail to the
interpeter, we cannot just create one frame, but have to create *two*
interpreter frames, the younger of which is in state "i32 101"
(obviously a real deopt state will be more complex) and the older of
which is in state "i32 237".  Once these frames have been created,
execution will continue in the interpreter.

(Just to be very explicit about it -- the deopt state "i32 237", when
resumed to, will re-start interptered execution in the method @bar()
"as if" foo() had just returned.)

We want the inliner to do this kind of adjustment for us [1], and
doing this over an intrinsic is awkward.  For instance, an
intrinsified version of the above will look like:

```
void foo() {
  if (ptr == null) call @call_with_deopt_state(bail_to_interpreter, i32 101)
}

void bar() {
  call @call_with_deopt_state(foo, i32 237)
}
```

and the inliner and other IPO passes will have to have special
knowledge of the @call_with_deopt_state intrinsic.  Given how
different the semantics of @call_with_deopt_state are from a normal
call, it seemed more natural to make deoptimization state a first
class citizen of LLVM IR.

Secondly, the issues with deopt bundles and IPO this discussion
uncovered are also applicable to stackmaps; and since deopt bundles
are a distinct IR construct, phrasing the IPO / IPA restrictions will
be easier.

Does this answer your question?

[0]: This isn't a problem clients who don't do much optimization
within LLVM, but it is for us.

[1]:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Utils/InlineFunction.cpp#L1486

-- Sanjoy

Philip Reames via llvm-dev

2016-Feb-18 20:29 UTC

head link

[llvm-dev] RFC: Add guard intrinsics to LLVM

Sanjoy gave the long answer, let me give the short one.  :)

"deopt" argument bundles are used in the middle end, they are lowered 
into a statepoint, and generate the existing stackmap format.  i.e. one 
builds on the other.

On 02/18/2016 11:43 AM, Eric Christopher wrote:> Hi Sanjoy,
>
> A quick question here. With the bailing to the interpreter support 
> that you're envisioning ("deopt operand bundle"), it appears
to
> overlap quite a bit with the existing stack maps. What's the 
> story/interaction/etc here? I agree that a simpler control flow is 
> great when bailing to the interpreter - doing it with phi nodes is a 
> recipe for pain and long compile times.
>
> Thanks!
>
> -eric
>
> On Tue, Feb 16, 2016 at 6:06 PM Sanjoy Das via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     This is a proposal to add guard intrinsics to LLVM.
>
>     Couple of glossary items: when I say "interpreter" I mean
"the most
>     conservative tier in the compilation stack" which can be an actual
>     interpreter, a "splat compiler" or even a regular JIT that
doesn't
>     make optimistic assumptions.  By "bailing out to the
interpreter" I
>     mean "side exit" as defined by Philip in
>     http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/
>
>
>     # high level summary
>
>     Guard intrinsics are intended to represent "if (!cond)
leave();" like
>     control flow in a structured manner.  This kind of control flow is
>     abundant in IR coming from safe languages due to things like range
>     checks and null checks (and overflow checks, in some cases).  From a
>     high level, there are two distinct motivations for introducing guard
>     intrinsics:
>
>      - To keep control flow as simple and "straight line"-like as
is
>        reasonable
>
>      - To allow certain kinds of "widening" transforms that
cannot be
>        soundly done in an explicit "check-and-branch" control
flow
>        representation
>
>     ## straw man specification
>
>     Signature:
>
>     ```
>     declare void @llvm.guard_on(i1 %predicate) ;; requires [
>     "deopt"(...) ]
>     ```
>
>     Semantics:
>
>     `@llvm.guard_on` bails out to the interpreter (i.e.
"deoptimizes" the
>     currently execution function) if `%predicate` is `false`, meaning that
>     after `@llvm.guard_on(i1 %t)` has executed `%t` can be assumed to be
>     true.  In this way, it is close to `@llvm.assume` or an `assert`, with
>     one very important difference -- `@llvm.guard_on(i1 <false>)` is
well
>     defined (and not UB).  `@llvm.guard_on` on a false predicate bails to
>     the interpreter and that is always safe (but slow), and so
>     `@llvm.guard_on(i1 false)` is basically a `noreturn` call that
>     unconditionally transitions the current compilation to the
>     interpreter.
>
>     Bailing out to the interpreter involves re-creating the state of the
>     interpreter frames as-if the compilee had been executing in the
>     interpreter all along.  This state is represented and maintained using
>     a `"deopt"` operand bundle attached to the call to
`@llvm.guard_on`.
>     The verifier will reject calls to `@llvm.guard_on` without a
`"deopt"`
>     operand bundle.  `@llvm.guard_on` cannot be `invoke`ed (that would be
>     meaningless anyway, since the method it would've "thrown"
into is
>     about to go away).
>
>
>     Example:
>
>     ```
>       ...
>       %rangeCheck0 = icmp ult i32 %arg0, 10
>       call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/*
deopt
>     state 0 */) ]
>       %rangeCheck1 = icmp ult i32 %arg0, 12
>       call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/*
deopt
>     state 1 */) ]
>       ...
>     ```
>
>
>     # details: motivation & alternatives
>
>     As specified in the high level summary, there are two key motivations.
>
>     The first, more obvious one is that we want the CFG to be less
>     complex, even if that involves "hiding" control flow in guard
>     intrinsics.  I expect this to benefit both compile time and
>     within-a-block optimization.
>
>     The second, somewhat less obvious motivation to use guard intrinsics
>     instead of explicit control flow is to allow check widening.
>
>     ## check widening
>
>     Consider:
>
>     ```
>       ...
>       %rangeCheck0 = icmp ult i32 6, %len   ;; for a[6]
>       call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/*
deopt
>     state 0 */) ]
>       call void @printf("hello world")
>       %rangeCheck1 = icmp ult i32 7, %len   ;; for a[7]
>       call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/*
deopt
>     state 1 */) ]
>       access a[6] and a[7]
>       ...
>     ```
>
>     we'd like to optimize it to
>
>     ```
>       ...
>       %rangeCheckWide = icmp ult i32 7, %len
>       call void @llvm.guard_on(i1 %rangeCheckWide) [ "deopt"(/*
deopt
>     state 0 */) ]
>       call void @printf("hello world")
>       ;; %rangeCheck1 = icmp ult i32 7, %len  ;; not needed anymore
>       ;; call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/*
deopt
>     state 1 */) ]
>       access a[6] and a[7]
>       ...
>     ```
>
>     This way we do a range check only on `7` -- if `7` is within bounds,
>     then we know `6` is too.  This transform is sound only because we know
>     that the guard on `7 ult %len` will not simply throw an exception if
>     the said predicate fails, but will bail out to the interpreter with
>     the abstract state `/* deopt state 0 */`.  In fact, if `%len` is `7`,
>     the pre-transform program is supposed to print `"hello
world"` and
>     *then* throw an exception, and bailing out to the interpreter with `/*
>     deopt state 0 */` will do exactly that.
>
>     In other words, we're allowed to do speculative and aggressive
>     transforms that make a guard fail that wouldn't have in the initial
>     program.  This is okay because a failed guard only bails to the
>     interpreter, and the interpreter always Does The Right Thing(TM).  In
>     fact, it is correct (though unwise) to replace every guard's
predicate
>     with `false`.
>
>     ## the problem with check widening and explicit control flow
>
>     Widening is difficult to get right in an explicit
"check-and-branch"
>     representation.  For instance, the above example in a
>     "check-and-branch" representation would be (in pseudo C, and
excluding
>     the printf):
>
>     ```
>       ...
>       if (!(6 < %len)) { call @side_exit() [ "deopt"(P) ];
unreachable; }
>       if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }
>       ...
>     ```
>
>     The following transform is invalid:
>
>     ```
>       ...
>       if (!(7 < %len)) { call @side_exit() [ "deopt"(P) ];
unreachable; }
>       if (!(true)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }
>       ...
>     ```
>
>     since we do not know if the first side exit had been optimized under
>     the assumption `!(6 < %len)` (via JumpThreading etc.). E.g. the
>     "original" IR could have been
>
>     ```
>       ...
>       if (!(6 < %len)) { call @side_exit() [ "deopt"(!(6 <
%len)) ];
>     unreachable; }
>       if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }
>       ...
>     ```
>
>     which got optimized to
>
>     ```
>       ...
>       if (!(6 < %len)) { call @side_exit() [ "deopt"(true) ];
>     unreachable; }
>       if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }
>       ...
>     ```
>
>     before the widening transform. The widening transform will now
>     effectively pass in an incorrect value for `!(6 < %len)`.
>
>     This isn't to say it is impossible to do check widening in a
explicit
>     control flow representation, just that is more natural to do it with
>     guards.
>
>
>     # details: semantics
>
>     ## as-if control flow
>
>     The observable behavior of `@llvm.guard_on` is specified as:
>
>     ```
>       void @llvm.guard_on(i1 %pred) {
>       entry:
>         %unknown_cond = < unknown source >
>         %cond = and i1 %unknown_cond, %pred
>         br i1 %cond, label %left, label %right
>
>       left:
>         call void @bail_to_interpreter() [ "deopt"() ] noreturn
>         unreachable
>
>       right:
>         ret void
>       }
>     ```
>
>     So, precisely speaking, `@llvm.guard_on` is guaranteed to bail to the
>     interpreter if `%pred` is false, but it **may** bail to the
>     interpreter if `%pred` is true.  It is this bit that lets us soundly
>     widen `%pred`, since all we're doing is "refining" `<
unknown
>     source >`.
>
>     `@bail_to_interpreter` does not return to the current compilation, but
>     it returns to the `"deopt"` continuation that is has been
given (once
>     inlined, the empty "deopt"() continuation will be fixed up to
have
>     the right
>     continuation).
>
>
>     ## applicable optimizations
>
>     Apart from widening, most of the optimizations we're interested in
are
>     what's allowed by an equivalent `@llvm.assume`.  Any conditional
>     branches dominated by a guard on the same condition can be folded,
>     multiple guards on the same condition can be CSE'ed, loop invariant
>     guards can be hoisted out of loops etc.
>
>     Ultimately, we'd like to recover the same quality of optimization
as
>     we currently get from the "check-and-branch" representation. 
With the
>     "check-and-branch" representation, the optimizer is able to
sink
>     stores and computation into the slow path. This is something it cannot
>     do in the guard_on representation, and we'll have to lower the
>     guard_on representation to the "check-and-branch"
representation at a
>     suitable point in the pipeline to get this kind of optimization.
>
>     ## lowering
>
>     At some point in the compilation pipeline, we will have to lower
>     `@llvm.guard_on` into explicit control flow, by "inlining"
"an
>     implementation" of `@llvm.guard_on` (or by some other means).  I
don't
>     have a good sense on when in the pipeline this should be done -- the
>     answer will depend on what we find as we make LLVM more aggressive
>     around optimizing guards.
>
>     ## environments without support for bailing to the interpreter
>
>     Our VM has deeply integrated support for deoptimizations, but not all
>     language runtimes do.  If there is a need for non deoptimizing
>     guards, we
>     can consider introducing a variant of `@llvm.guard_on`:
>
>     ```
>     declare void @llvm.exception_on(i1 %predicate, i32 %exceptionKind)
>     ```
>
>     with this one having the semantics that it always throws an exception
>     if `%predicate` fails.  Only the non-widening optimizations for
>     `@llvm.guard_on` will apply to `@llvm.exception_on`.
>
>     ## memory effects (unresolved)
>
>     [I haven't come up with a good model for the memory effects of
>      `@llvm.guard_on`, suggestions are very welcome.]
>
>     I'd really like to model `@llvm.guard_on` as a readonly function,
>     since it does not write to memory if it returns; and e.g. forwarding
>     loads across a call to `@llvm.guard_on` should be legal.
>
>     However, I'm in a quandary around representing the "may never
return"
>     aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a
>     load form `%ptr` across a guard on `%ptr != null`.  There are couple
>     of ways I can think of dealing with this, none of them are both easy
>     and neat:
>
>      - State that since `@llvm.guard_on` could have had an infinite loop
>        in it, it may never return. Unfortunately, the LLVM IR has some
>        rough edges on readonly infinite loops (since C++ disallows that),
>        so LLVM will require some preparatory work before we can do this
>        soundly.
>
>      - State that `@llvm.guard_on` can unwind, and thus has non-local
>        control flow.  This can actually work (and is pretty close to
>        the actual semantics), but is somewhat of hack since
>        `@llvm.guard_on` doesn't _really_ throw an exception.
>
>       - Special case `@llvm.guard_on` (yuck!).
>
>     What do you think?
>
>     -- Sanjoy
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160218/0fd4b395/attachment.html>

Andrew Trick via llvm-dev

2016-Feb-19 19:12 UTC

head link

[llvm-dev] RFC: Add guard intrinsics to LLVM

> On Feb 16, 2016, at 6:06 PM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
> This is a proposal to add guard intrinsics to LLVM.
> 
> Couple of glossary items: when I say "interpreter" I mean
"the most
> conservative tier in the compilation stack" which can be an actual
> interpreter, a "splat compiler" or even a regular JIT that
doesn't
> make optimistic assumptions.  By "bailing out to the interpreter"
I
> mean "side exit" as defined by Philip in
> http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/
> 
> 
> # high level summary
> 
> Guard intrinsics are intended to represent "if (!cond) leave();"
like
> control flow in a structured manner.  This kind of control flow is
> abundant in IR coming from safe languages due to things like range
> checks and null checks (and overflow checks, in some cases).  From a
> high level, there are two distinct motivations for introducing guard
> intrinsics:
> 
> - To keep control flow as simple and "straight line"-like as is
>   reasonable
> 
> - To allow certain kinds of "widening" transforms that cannot be
>   soundly done in an explicit "check-and-branch" control flow
>   representation
> 
> ## straw man specification
> 
> Signature:
> 
> ```
> declare void @llvm.guard_on(i1 %predicate) ;; requires [
"deopt"(...) ]
> ```
Thanks Sanjoy. Sorry it took a couple days for me to read this.

I like this approach a lot. I can envision multiple intrinsics leveraging the
basic framework of operand bundles, but each with different semantics and
lowering. I'm no longer considering extending the patchpoint design to cover
these cases. Your semantics for @llvm.guard_on are perfect for managed runtimes.

Just to give you the idea of a similar intrinsic we may want to introduce... I
would also like to add a @llvm.trap_on(i1 %pred) intrinsic, with side effects
and lowering as such:

  br i1 %cond, label %succeed, label %fail

fail:
  call void @llvm.trap()
  unreachable

This clearly doesn't need operand bundles, but using an intrinsic would
permit special code motion semantics. It could be hoisted and merged with other
traps, but the condition could never be widened beyond the union of the original
conditions. Unlike deoptimizing guards, it would need to be sequenced with
memory barriers, but could otherwise be hoisted as readnone. The semantics of
"non deoptimizing guards" are different enough that they should be a
different intrinsic. @trap_on would essentially be halfway between @guard_on and
@exception_on.
> ## memory effects (unresolved)
> 
> [I haven't come up with a good model for the memory effects of
> `@llvm.guard_on`, suggestions are very welcome.]
> 
> I'd really like to model `@llvm.guard_on` as a readonly function,
> since it does not write to memory if it returns; and e.g. forwarding
> loads across a call to `@llvm.guard_on` should be legal.
Of course it would be nice to consider these intrinsics readonly. (One thing we
never fixed with patchpoints is the memory effects and that was probably hurting
performance.) But has this bug been fixed yet: http://llvm.org/pr18912 Optimizer
should consider "maythrow" calls (those without "nounwind) as
having side effects? I vaguely remember some work being done to improve the
situation, but last I checked LICM was still violating it, and who knows what
else?

BTW, if you do want readonly semantics, why would you want readonly to be
implicit instead of explicit?
> However, I'm in a quandary around representing the "may never
return"
> aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a
> load form `%ptr` across a guard on `%ptr != null`.  There are couple
> of ways I can think of dealing with this, none of them are both easy
> and neat:
> 
> - State that since `@llvm.guard_on` could have had an infinite loop
>   in it, it may never return. Unfortunately, the LLVM IR has some
>   rough edges on readonly infinite loops (since C++ disallows that),
>   so LLVM will require some preparatory work before we can do this
>   soundly.
> 
> - State that `@llvm.guard_on` can unwind, and thus has non-local
>   control flow.  This can actually work (and is pretty close to
>   the actual semantics), but is somewhat of hack since
>   `@llvm.guard_on` doesn't _really_ throw an exception.
> 
>  - Special case `@llvm.guard_on` (yuck!).
> 
> What do you think?

I think you would need to mark @guard_on as may-unwind AND fix any lingering
assumptions in LLVM that readonly calls are nounwind. (Loads can be CSE's
across unwind calls but can't be hoisted across them).

An alternative is to *not* mark them as readonly but rely on alias analysis. I
think doing that would properly model the "unwind" path for the
purpose of IPA/Inlining. Introducing TBAA on calls should be easy and sufficient
to unblock the memory optimization you need.

Of course, IPA still needs to know about side exits, which it really should with
any may-unwind call. Other than that, I don't see the problem.

Another good point mentioned later in the thread is that a readonly callee
should *not* imply a readonly @guard_on or other "deopt" call site.
The simple solution for this is to disallow "deopt" of readonly calls.

-Andy

Sanjoy Das via llvm-dev

2016-Feb-21 20:24 UTC

head link

[llvm-dev] RFC: Add guard intrinsics to LLVM

Hi Andy,

Thanks for replying, responses inline below:

On Fri, Feb 19, 2016 at 11:12 AM, Andrew Trick <atrick at apple.com>
wrote:> This clearly doesn't need operand bundles, but using an intrinsic
> would permit special code motion semantics. It could be hoisted and
> merged with other traps, but the condition could never be widened
> beyond the union of the original conditions. Unlike deoptimizing
> guards, it would need to be sequenced with memory barriers, but could
By memory barrier, do you mean things like fences?
> otherwise be hoisted as readnone.
Can you clarify this a little bit?  Are you talking about things like:

  *ptr = 42
  @trap_if(%foo)
=>
  @trap_if(%foo)
  *ptr = 42

?  I.e. if a failing guard postdominates a point "P" in the program,
then the guard can be failed early at P.

(We'd want a stronger condition than postdomination, since we won't
 want to hoist

    while (!*cond) { }
    @trap_if(*cond)
  =>
    @trap_if(*cond)
    while (!*cond) { }
)

[snip]
> Of course it would be nice to consider these intrinsics
> readonly. (One thing we never fixed with patchpoints is the memory
> effects and that was probably hurting performance.) But has this bug
> been fixed yet: http://llvm.org/pr18912 Optimizer should consider
> "maythrow" calls (those without "nounwind) as having side
effects? I
> vaguely remember some work being done to improve the situation, but
> last I checked LICM was still violating it, and who knows what else?
The specific case in the bug was fixed by David in
http://reviews.llvm.org/rL256728.  But I agree with your concern, that
this notion of "readnone calls are always okay to remove" may have
leaked into other parts of LLVM.
> BTW, if you do want readonly semantics, why would you want readonly
> to be implicit instead of explicit?
I don't understand the question. :) What's explicit vs. implicit here?
> I think you would need to mark @guard_on as may-unwind AND fix any
> lingering assumptions in LLVM that readonly calls are nounwind. (Loads
> can be CSE's across unwind calls but can't be hoisted across them).
Yes, but a failed guard doesn't exactly "unwind".  I.e. if A
invokes B
and B fails a guard, then the guard may not always unwind to the A->B
callsite's landingpad, but can also return to its normal no-exception
successor; unless you inline B into A, in which case "B"s guard (now
physically present in the A+B function) will return from A if it
fails.  In other words, we'll have to re-purpose the term "unwind"
to
mean either "throwing an exception" or "failed a guard". 
I'm fine
with this, but it is a choice we need to make explicitly.
> Another good point mentioned later in the thread is that a readonly
> callee should *not* imply a readonly @guard_on or other "deopt"
call
> site. The simple solution for this is to disallow "deopt" of
readonly
> calls.
Assuming we're talking about the same thing, the concern was more
along the lines of: you cannot do IPA on a callsite that calls
something that contains a potential deoptimization point, be it either
for a @guard_on call, or a "normal" deoptimization safepoint.  This is
because if you have:

```
void @something()

void callee() {
  call @something() [ "deopt"(state0) ]
  *ptr = 100;
}

void caller() {
  call @callee()  // whether this is a "deopt" call site or a
                  // a normal call site doesn't matter
  int k = *ptr;
}
```

you cannot, say, fold `k` to `100` in @caller() because the deopt
state, state0, if resumed to may actually write something else to
*ptr.

Making the call to @something() read/write/may unwind does not solve
the problem -- even if the call to @something() wrote to *ptr (or
threw an exception), we could still fold k to 100.  The novel control
flow here is that `@caller`, if deoptimized with `state0`, will
execute some arbitrary code, and **return** to @caller at the callsite
to @callee.

(Btw: things are a little different inside our JVM because of the way
we register dependencies, but I'm trying to make a LLVM-centric
argument here.)

At this point, I think it is most straightforward to do a slight
variant of what Philip suggested to me offline: if a function can
deoptimize, it needs to be marked as "mayBeOverridden", preferably by
introducing a new function attribute or by setting the linkage type to
one of those checked by GlobalValue::mayBeOverridden.  This is the
responsibility of the frontend, and resuming to the "deopt"
continuation in a physical frame where the enclosing function was not
marked mayBeOverridden is UB.

-- Sanjoy

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Feb 2016 - RFC: Add guard intrinsics to LLVM

[llvm-dev] RFC: Add guard intrinsics to LLVM

[llvm-dev] RFC: Add guard intrinsics to LLVM

[llvm-dev] RFC: Add guard intrinsics to LLVM

[llvm-dev] RFC: Add guard intrinsics to LLVM

[llvm-dev] RFC: Add guard intrinsics to LLVM

[llvm-dev] RFC: Add guard intrinsics to LLVM

[llvm-dev] RFC: Add guard intrinsics to LLVM

[llvm-dev] RFC: Add guard intrinsics to LLVM

Seemingly Similar Threads