Philip Reames via llvm-dev
2016-Jan-22 20:35 UTC
[llvm-dev] FYI: gc relocations on exception path w/RS4GC currently broken
For anyone following along on ToT using the gc.statepoint mechanism, you should know that ToT is currently not able to express arbitrary exceptional control flow and relocations along exceptional edges. This is a direct result of moving the gc.statepoint representation to using a token type landingpad. Essentially, we have a design inconsistency where we expect to be able to "resume" a phi of arbitrary landing pads, but we expect relocations to be tied specifically to a particular invoke. Chen, Joseph, and I have spent some time talking about how to resolve this. All of the schemes we've come up with representing relocations using gc.relocates on the exceptional path require either a change to how we define an invoke instruction (something we'd really like to avoid) or a new intrinsic with special treatment in the optimizer so that it basically "becomes part of" the landing pad without actually being the landing pad. None of us were particular thrilled by the changes involved. Given exceptional paths are nearly by definition cold, we're currently exploring another option. We're considering having RS4GC insert explicit spill slots at the IR level (via allocas) for values live along exceptional paths, and leaving all of the normal path values represented as gc.relocates. This avoids the need for another IR extension, makes it slightly easier to meet an ABI requirement Joseph has, and provides a better platform for lowering experimentation. Joseph is working on implementing this and will probably have something up for review next week or the week after. Once that's in, we're going to run some performance experiments to see if it's a viable lowering strategy even without Joseph's particular ABI requirement, and if so, make that the standard way of representing relocations on exceptional edges. Assuming this approach works, we're going to defer solving the problem of how to cleanly represent explicit relocations along the exceptional path until a later point in time. In particular, the value of the explicit relocations comes mainly from being able to lower them efficiently to register uses. Since the work to integrate relocations with the register allocator hasn't happened and doesn't look like it's going to happen in the near term (*), this seems like a reasonable compromise. Philip (*) To give some context on this, it turns out one of our initial starting assumptions was wrong in practice. We expected the quality of lowering for the gc arguments at statepoint/safepoint to be very important for overall code quality. While this may some day become true, we've found that whenever we encounter a hot safepoint, the problem is usually that we didn't inline appropriately. As a result, we've ended up fixing (out of tree) inlining or devirtualization bugs rather than working on the lowering itself. For us, a truly hot megamorphic call site has turned out to be a very rare beast. Worth noting is that this is only true because we're a high tier JIT with good profiling information. It's likely that other users who don't have the same design point may find the lowering far more problematic; in fact, we have some evidence this may already be true.
Reid Kleckner via llvm-dev
2016-Jan-22 21:38 UTC
[llvm-dev] FYI: gc relocations on exception path w/RS4GC currently broken
So, here's a crazy idea. What if we change the definition of dominance for
invokes that produce tokens so that the token return value is live out the
exceptional edge?
If that's too crazy, what if we used operand bundles to make a new token
that "forward declares" the statepoint token:
%exceptional_token = call token @llvm.gc.exceptional.token()
%normal_token = invoke @llvm.gc.experimental.statepoint(....) [
"eh_token" token %exceptional_token ]
to label %normal_dest unwind label %lpad_dest
...
lpad_dest:
%ehvals = { i8*, i32 } landingpad ... like usual
%p1 = @llvm.gc.relocate(token %exceptional_token)
A given exceptional token can be used with exactly one or zero GC
statepoint calls, so a late pass can map from one to the other and insert
reloads in the usual way.
That said, I imagine you want to use this exceptional token with more than
one invoke, so that you don't end up needing a landingpad per potentially
throwing call site. I think this design can be extended to handle that,
though.
On Fri, Jan 22, 2016 at 12:35 PM, Philip Reames via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> For anyone following along on ToT using the gc.statepoint mechanism, you
> should know that ToT is currently not able to express arbitrary exceptional
> control flow and relocations along exceptional edges. This is a direct
> result of moving the gc.statepoint representation to using a token type
> landingpad. Essentially, we have a design inconsistency where we expect to
> be able to "resume" a phi of arbitrary landing pads, but we
expect
> relocations to be tied specifically to a particular invoke.
>
> Chen, Joseph, and I have spent some time talking about how to resolve
> this. All of the schemes we've come up with representing relocations
using
> gc.relocates on the exceptional path require either a change to how we
> define an invoke instruction (something we'd really like to avoid) or a
new
> intrinsic with special treatment in the optimizer so that it basically
> "becomes part of" the landing pad without actually being the
landing pad.
> None of us were particular thrilled by the changes involved.
>
> Given exceptional paths are nearly by definition cold, we're currently
> exploring another option. We're considering having RS4GC insert
explicit
> spill slots at the IR level (via allocas) for values live along exceptional
> paths, and leaving all of the normal path values represented as
> gc.relocates. This avoids the need for another IR extension, makes it
> slightly easier to meet an ABI requirement Joseph has, and provides a
> better platform for lowering experimentation. Joseph is working on
> implementing this and will probably have something up for review next week
> or the week after. Once that's in, we're going to run some
performance
> experiments to see if it's a viable lowering strategy even without
Joseph's
> particular ABI requirement, and if so, make that the standard way of
> representing relocations on exceptional edges.
>
> Assuming this approach works, we're going to defer solving the problem
of
> how to cleanly represent explicit relocations along the exceptional path
> until a later point in time. In particular, the value of the explicit
> relocations comes mainly from being able to lower them efficiently to
> register uses. Since the work to integrate relocations with the register
> allocator hasn't happened and doesn't look like it's going to
happen in the
> near term (*), this seems like a reasonable compromise.
>
> Philip
>
> (*) To give some context on this, it turns out one of our initial starting
> assumptions was wrong in practice. We expected the quality of lowering for
> the gc arguments at statepoint/safepoint to be very important for overall
> code quality. While this may some day become true, we've found that
> whenever we encounter a hot safepoint, the problem is usually that we
> didn't inline appropriately. As a result, we've ended up fixing
(out of
> tree) inlining or devirtualization bugs rather than working on the lowering
> itself. For us, a truly hot megamorphic call site has turned out to be a
> very rare beast. Worth noting is that this is only true because we're
a
> high tier JIT with good profiling information. It's likely that other
> users who don't have the same design point may find the lowering far
more
> problematic; in fact, we have some evidence this may already be true.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160122/939e6811/attachment.html>
Sanjoy Das via llvm-dev
2016-Jan-22 22:03 UTC
[llvm-dev] FYI: gc relocations on exception path w/RS4GC currently broken
On Fri, Jan 22, 2016 at 1:38 PM, Reid Kleckner <rnk at google.com> wrote:> So, here's a crazy idea. What if we change the definition of dominance for > invokes that produce tokens so that the token return value is live out the > exceptional edge?We'd then need some way to prevent a gc.result(%token) from appearing on the landingpad basic block. Not that it can't be done, but we'd have to say something like "even though the token dominates both branches, only one of them can have gc.result uses of the token type". Now if LLVM allowed instructions to have multiple defs, we could have gc.statepoint produce two tokens, and have one of them dominate the exception path (the one that you couldn't have gc.result hang off of) and the other one dominate the normal path; but that'd be a very disruptive change I think.> If that's too crazy, what if we used operand bundles to make a new token > that "forward declares" the statepoint token: > %exceptional_token = call token @llvm.gc.exceptional.token() > %normal_token = invoke @llvm.gc.experimental.statepoint(....) [ "eh_token" > token %exceptional_token ] > to label %normal_dest unwind label %lpad_dest > ... > lpad_dest: > %ehvals = { i8*, i32 } landingpad ... like usual > %p1 = @llvm.gc.relocate(token %exceptional_token) > > A given exceptional token can be used with exactly one or zero GC statepoint > calls, so a late pass can map from one to the other and insert reloads in > the usual way.Then we'll be stuck with gc.relocate being at least readonly, since we don't want to allow the above to transform into this: %exceptional_token = call token @llvm.gc.exceptional.token() %p1 = @llvm.gc.relocate(token %exceptional_token) %normal_token = invoke @llvm.gc.experimental.statepoint(....) [ "eh_token" token %exceptional_token ] to label %normal_dest unwind label %lpad_dest ... lpad_dest: %ehvals = { i8*, i32 } landingpad ... like usual -- Sanjoy
Joseph Tremoulet via llvm-dev
2016-Jan-24 02:06 UTC
[llvm-dev] FYI: gc relocations on exception path w/RS4GC currently broken
[speaking to your 2nd idea, because the “change dominance” plan doesn’t
accommodate “you want to use this exceptional token with more than one invoke”
without token PHIs]
It’s an interesting idea, and some of the ideas we discussed and abandoned were
similar to pieces of it. I think my objections to doing this as stated would
be:
1. The lifetime of the token is no longer the lifetime where you’d want to
enregister the pointer that gets relocated, so it moves us away from the goal of
explicit relocate instructions in the first place – putting an alloca up where
you’d put the gc.exceptional.token call and putting stores and loads around the
EH edge uses IR artifacts in roughly the same places but without introducing new
IR constructs.
2. The “A given exceptional token can be used with exactly one or zero GC
statepoint calls” part places an awkward constraint on static code duplication
(by which I mean something like jump threading, loop unswitching, etc). I
suppose we already have a notion of “not duplicatable” that applies to
thusly-attributed calls and token defs, so code-wise it wouldn’t be invasive to
represent that constraint, but it feels like an unnatural constraint.
3. You’re right about “you want to use this exceptional token with more than one
invoke”. We pushed on some similar ideas a bit and realized we were basically
working on introducing a back-door way to have token PHIs (that’s arguably
exactly what you’ve done, as the uses are correlated with and occur in the
predecessor blocks by virtue of appearing on the terminators, and are wedded to
the join point by virtue of the exception edge constraints requiring the
landingpad to be the immediate successor of the invoke), which seemed to be an
indication that it might not be the best-suited abstraction for the job.
4. With unsplittable catchswitches in the mix, the invoke : splittable pad
relation is many : many. I’d rather have one gc.relocate per relocated value
after each catchpad, and I think this is going in the direction of requiring
(token PHIs or) one gc.relocate per relocated value per predecessor invoke
(including transitive predecessors through catchswitches) after each catchpad.
Or I guess gc.relocate could be variadic and just take all the predecessor
invokes’ tokens as arguments, but that still seems like more IR than should be
needed.
Thanks
-Joseph
From: Reid Kleckner [mailto:rnk at google.com]
Sent: Friday, January 22, 2016 4:39 PM
To: Philip Reames <listmail at philipreames.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; Joseph Tremoulet <jotrem at
microsoft.com>; Manuel Jacob <me at manueljacob.de>; chenli at
azulsystems.com; Sanjoy Das <sanjoy at playingwithpointers.com>
Subject: Re: [llvm-dev] FYI: gc relocations on exception path w/RS4GC currently
broken
So, here's a crazy idea. What if we change the definition of dominance for
invokes that produce tokens so that the token return value is live out the
exceptional edge?
If that's too crazy, what if we used operand bundles to make a new token
that "forward declares" the statepoint token:
%exceptional_token = call token @llvm.gc.exceptional.token()
%normal_token = invoke @llvm.gc.experimental.statepoint(....) [
"eh_token" token %exceptional_token ]
to label %normal_dest unwind label %lpad_dest
...
lpad_dest:
%ehvals = { i8*, i32 } landingpad ... like usual
%p1 = @llvm.gc.relocate(token %exceptional_token)
A given exceptional token can be used with exactly one or zero GC statepoint
calls, so a late pass can map from one to the other and insert reloads in the
usual way.
That said, I imagine you want to use this exceptional token with more than one
invoke, so that you don't end up needing a landingpad per potentially
throwing call site. I think this design can be extended to handle that, though.
On Fri, Jan 22, 2016 at 12:35 PM, Philip Reames via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
For anyone following along on ToT using the gc.statepoint mechanism, you should
know that ToT is currently not able to express arbitrary exceptional control
flow and relocations along exceptional edges. This is a direct result of moving
the gc.statepoint representation to using a token type landingpad. Essentially,
we have a design inconsistency where we expect to be able to "resume"
a phi of arbitrary landing pads, but we expect relocations to be tied
specifically to a particular invoke.
Chen, Joseph, and I have spent some time talking about how to resolve this. All
of the schemes we've come up with representing relocations using
gc.relocates on the exceptional path require either a change to how we define an
invoke instruction (something we'd really like to avoid) or a new intrinsic
with special treatment in the optimizer so that it basically "becomes part
of" the landing pad without actually being the landing pad. None of us
were particular thrilled by the changes involved.
Given exceptional paths are nearly by definition cold, we're currently
exploring another option. We're considering having RS4GC insert explicit
spill slots at the IR level (via allocas) for values live along exceptional
paths, and leaving all of the normal path values represented as gc.relocates.
This avoids the need for another IR extension, makes it slightly easier to meet
an ABI requirement Joseph has, and provides a better platform for lowering
experimentation. Joseph is working on implementing this and will probably have
something up for review next week or the week after. Once that's in,
we're going to run some performance experiments to see if it's a viable
lowering strategy even without Joseph's particular ABI requirement, and if
so, make that the standard way of representing relocations on exceptional edges.
Assuming this approach works, we're going to defer solving the problem of
how to cleanly represent explicit relocations along the exceptional path until a
later point in time. In particular, the value of the explicit relocations comes
mainly from being able to lower them efficiently to register uses. Since the
work to integrate relocations with the register allocator hasn't happened
and doesn't look like it's going to happen in the near term (*), this
seems like a reasonable compromise.
Philip
(*) To give some context on this, it turns out one of our initial starting
assumptions was wrong in practice. We expected the quality of lowering for the
gc arguments at statepoint/safepoint to be very important for overall code
quality. While this may some day become true, we've found that whenever we
encounter a hot safepoint, the problem is usually that we didn't inline
appropriately. As a result, we've ended up fixing (out of tree) inlining or
devirtualization bugs rather than working on the lowering itself. For us, a
truly hot megamorphic call site has turned out to be a very rare beast. Worth
noting is that this is only true because we're a high tier JIT with good
profiling information. It's likely that other users who don't have the
same design point may find the lowering far more problematic; in fact, we have
some evidence this may already be true.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev&data=01%7c01%7cjotrem%40microsoft.com%7cad0c71085d7448cbe9fb08d32374656b%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=uhqlKuPQz2JzEdsp%2by%2fggD8x9i5AembWF4xQ8aUzr38%3d>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160124/5f4b4cae/attachment.html>
Philip Reames via llvm-dev
2016-Jan-24 18:31 UTC
[llvm-dev] FYI: gc relocations on exception path w/RS4GC currently broken
These are essentially the same options we came up with. We didn't like the option of changing the dominance rules since it seemed like it could lead to illegal transforms. Consider a code sinker which uses dominance to tell when defs are available. If we had the exceptional path dominated by the invokes results, we could have a merge point which combined the normal and exceptional edge. Now, data dependence would prevent the sink (unless the use was entirely dead), but that seems unintuitive at best. Your second scheme was close to one we came up with, but we hadn't thought to use the operand bundles. This was the least ugly option, but why handle the additional complexity if the early spilling works cleanly in this case? Philip On 01/22/2016 01:38 PM, Reid Kleckner wrote:> So, here's a crazy idea. What if we change the definition of dominance > for invokes that produce tokens so that the token return value is live > out the exceptional edge? > > If that's too crazy, what if we used operand bundles to make a new > token that "forward declares" the statepoint token: > %exceptional_token = call token @llvm.gc.exceptional.token() > %normal_token = invoke @llvm.gc.experimental.statepoint(....) [ > "eh_token" token %exceptional_token ] > to label %normal_dest unwind label %lpad_dest > ... > lpad_dest: > %ehvals = { i8*, i32 } landingpad ... like usual > %p1 = @llvm.gc.relocate(token %exceptional_token) > > A given exceptional token can be used with exactly one or zero GC > statepoint calls, so a late pass can map from one to the other and > insert reloads in the usual way. > > That said, I imagine you want to use this exceptional token with more > than one invoke, so that you don't end up needing a landingpad per > potentially throwing call site. I think this design can be extended to > handle that, though. > > On Fri, Jan 22, 2016 at 12:35 PM, Philip Reames via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > For anyone following along on ToT using the gc.statepoint > mechanism, you should know that ToT is currently not able to > express arbitrary exceptional control flow and relocations along > exceptional edges. This is a direct result of moving the > gc.statepoint representation to using a token type landingpad. > Essentially, we have a design inconsistency where we expect to be > able to "resume" a phi of arbitrary landing pads, but we expect > relocations to be tied specifically to a particular invoke. > > Chen, Joseph, and I have spent some time talking about how to > resolve this. All of the schemes we've come up with representing > relocations using gc.relocates on the exceptional path require > either a change to how we define an invoke instruction (something > we'd really like to avoid) or a new intrinsic with special > treatment in the optimizer so that it basically "becomes part of" > the landing pad without actually being the landing pad. None of > us were particular thrilled by the changes involved. > > Given exceptional paths are nearly by definition cold, we're > currently exploring another option. We're considering having > RS4GC insert explicit spill slots at the IR level (via allocas) > for values live along exceptional paths, and leaving all of the > normal path values represented as gc.relocates. This avoids the > need for another IR extension, makes it slightly easier to meet an > ABI requirement Joseph has, and provides a better platform for > lowering experimentation. Joseph is working on implementing this > and will probably have something up for review next week or the > week after. Once that's in, we're going to run some performance > experiments to see if it's a viable lowering strategy even without > Joseph's particular ABI requirement, and if so, make that the > standard way of representing relocations on exceptional edges. > > Assuming this approach works, we're going to defer solving the > problem of how to cleanly represent explicit relocations along the > exceptional path until a later point in time. In particular, the > value of the explicit relocations comes mainly from being able to > lower them efficiently to register uses. Since the work to > integrate relocations with the register allocator hasn't happened > and doesn't look like it's going to happen in the near term (*), > this seems like a reasonable compromise. > > Philip > > (*) To give some context on this, it turns out one of our initial > starting assumptions was wrong in practice. We expected the > quality of lowering for the gc arguments at statepoint/safepoint > to be very important for overall code quality. While this may > some day become true, we've found that whenever we encounter a hot > safepoint, the problem is usually that we didn't inline > appropriately. As a result, we've ended up fixing (out of tree) > inlining or devirtualization bugs rather than working on the > lowering itself. For us, a truly hot megamorphic call site has > turned out to be a very rare beast. Worth noting is that this is > only true because we're a high tier JIT with good profiling > information. It's likely that other users who don't have the same > design point may find the lowering far more problematic; in fact, > we have some evidence this may already be true. > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160124/1b1f41d1/attachment.html>
Joseph Tremoulet via llvm-dev
2016-Feb-05 19:13 UTC
[llvm-dev] gc relocations on exception path w/RS4GC currently broken
Working on this, I've run into a couple potential issues regarding which
I'd like to solicit feedback.
To give a concrete example, we're talking about having RS4GC see a
GC-safepoint call like so:
%a = _ ; gc pointer
%b = _ ; gc pointer
...
invoke void @callee()
to label %cont unwind label %pad
cont:
_ = %a
...
pad:
landingpad _
_ = %b
...
and transform it into:
%b.gc_spill = alloca <ty>
...
%a = _
%b = _
...
store <ty> %b, <ty>* %b.gc_spill
%sp = invoke token @llvm.experimental.gc.statepoint(<arg list that
indicates %a is a gc pointer and %b.gc_spill holds a gc pointer>)
to label %cont unwind label %pad
cont:
%a.reloc = call <ty> @llvm.experimental.gc.relocate(token %sp,
<index of %a>, <index of %a>)
_ = %a.reloc
...
pad:
landingpad _
%b.gc_reload = load <ty>, <ty>* %b.gc_spill
_ = %b.gc_reload
...
which would then get lowered to a call with a stack map reporting %a (or the
slot that lowering spills %a to) and %b.gc_spill as holding live gc pointers.
Issue #1: obscurability of the %b.gc_spill use on the gc.statepoint invoke
Some target runtimes/GCs (CoreCLR include) need to have stack slots reported
directly by offset. If code runs between RS4GC and lowering that somehow
rewrites the argument on the statepoint corresponding to b's spill to be
anything other than a direct use of the static alloca that RS4GC allocated to
hold the spill, the best we could do is have the lowering introduce another
layer of indirection.
E.g., continuing the above example, if something after RS4GC obscures
%b.gc_spill on the statepoint:
%b.gc_spill = alloca <ty>
...
%a = _
%b = _
...
%p = _ ; like maybe a PHI that has %b.gc_spill as an incoming value
...
store <ty> %b, <ty>* %b.gc_spill ; may or may not have been
rewritten as store into %p
%sp = invoke token @llvm.experimental.gc.statepoint(<arg list that
indicates %a is a gc pointer and %p holds a gc pointer>)
to label %cont unwind label %pad
cont:
%a.reloc = call <ty> @llvm.experimental.gc.relocate(token %sp,
<index of %a>, <index of %a>)
_ = %a.reloc
...
pad:
landingpad _
%b.gc_reload = load <ty>, <ty>* %b.gc_spill
_ = %b.gc_reload
...
then lowering would effectively have to insert another indirection:
%b.gc_spill = alloca <ty>
...
%a = _
%b = _
...
%p = _ ; like maybe a PHI that has %b.gc_spill as an incoming value
...
store <ty> %b, <ty>* %b.gc_spill ; may or may not have been
rewritten as store into %p
%p.deref = load <ty>, <ty>* %p
%sp = invoke token @llvm.experimental.gc.statepoint(<arg list that
indicates %a and %p.deref are gc pointers>)
to label %cont unwind label %pad
cont:
store <ty> %p.deref, <ty>* %p
%a.reloc = call <ty> @llvm.experimental.gc.relocate(token %sp,
<index of %a>, <index of %a>)
_ = %a.reloc
...
pad:
landingpad _
store <ty> %p.deref, <ty>* %p
%b.gc_reload = load <ty>, <ty>* %b.gc_spill
_ = %b.gc_reload
...
(and the code to insert the %p <-> %p.deref loads/stores would have to be
something in CodeGenPrep or [ugh] direclty in StatepointLowering).
I'm curious to know if others think this is problematic or not. I know that
for LLILC we intend to run RS4GC late in the pass list and could probably just
discount the possibility of the spill slot allocas on the gc.statepoint invoke
getting obscured (or at least could be ok with having the bail-out lowering/CGP
code that patches things up with extra stores/loads, on the assumption that
it's rare in practice to hit these cases), but I'm not sure how
representative LLILC is of the community in that regard. Similarly, I have the
impression that we're moving generally toward wedding RS4GC more with CGP,
but I would be interested to know if I'm off the mark there.
Issue #2: Relocating derived pointers by IR injection
I know there's been some discussion about runtimes which require the
pointers reported directly to them to be base object pointers. So e.g. with
code like this:
%p = _ ; <some base object pointer>
%q = <getelementptr getting a pointer at some offset from %p>
...
call @callee()
_ = %q
then RS4GC will generate
%p = _ ; <some base object pointer>
%q = <getelementptr getting a pointer at some offset from %p>
...
%sp = call token @llvm.experimental.gc.statepoint(<args indicating %p and
%q are gc pointers, with %p as %q's base>)
%p.reloc = call @llvm.experimental.gc.relocate(token %sp, <index of
p>, <index of p>)
%q.reloc = call @llvm.experimental.gc.relocate(token %sp, <index of
p>, <index of q>)
_ = %q.reloc
The default lowering of that is to spill %p and %q to the stack just before the
call, and lower the gc.relocate calls as loads from those slots after the calls,
with the understanding that the stack map will communicate to the GC that
q's slot is derived from p's slot and the GC will update both pointers
appropriately. However, for targets where the interface with the runtime only
allows reporting base pointers, lowering would have to effect something like the
following transformation:
%p = _ ; <some base object pointer>
%q = <getelementptr getting a pointer at some offset from %p>
...
%d = <compute %q - %p>
%sp = call token @llvm.experimental.gc.statepoint(<args indicating %p is
a gc pointer>)
%p.reloc = call @llvm.experimental.gc.relocate(token %sp, <index of
p>, <index of p>) ; lower to load
%q.reloc = call @llvm.experimental.gc.relocate(token %sp, <index of
p>, <index of q>) ; lower to %p.reloc + %d
_ = %q.reloc
Now, if we consider the same situation but on an exception path where there are
no explicit gc.relocate calls because RS4GC spilled along the exception path:
%p.gc_spill = alloca _
%q.gc_spill = alloca _
%p = _ ; <some base object pointer>
%q = <getelementptr getting a pointer at some offset from %p>
...
store <ty> %p, <ty>* %p.gc_spill
store <ty> %q, <ty>* %q.gc_spill
%sp = invoke token @llvm.experimental.gc.statepoint(<args indicating
%p.gc_spill and %q.gc_spill hold gc pointers, with %p.gc_spill as
%q.gc_spill's base>)
to label _, unwind label %pad
pad:
landingpad _
%p.reload = load <ty>, <ty>* %p.gc_spill
%q.reload = load <ty>, <ty>* %q.gc_spill
_ = %q.reload
then the best you could do is something like this:
%p.gc_spill = alloca _
%q.gc_spill = alloca _
%p = _ ; <some base object pointer>
%q = <getelementptr getting a pointer at some offset from %p>
...
store <ty> %p, <ty>* %p.gc_spill
store <ty> %q, <ty>* %q.gc_spill
; compute difference
%p.to_compute_d = load <ty>, <ty*> %p.gc_spill
%q.to_compute_d = load <ty>, <ty*> %q.gc_spill
%d = <compute %q - %p>
%sp = invoke token @llvm.experimental.gc.statepoint(<args indicating
%p.gc_spill and %q.gc_spill hold gc pointers, with %p.gc_spill as
%q.gc_spill's base>)
to label _, unwind label %pad
pad:
landingpad _
%p.to_compute_q = load <ty>, <ty>* %p.gc_spill
%q.recomputed = <compute %p.to_compute_q + %d>
store <ty> %q.recomputed, <ty>* %q.gc_spill
; continue as normal
%p.reload = load <ty>, <ty>* %p.gc_spill
%q.reload = load <ty>, <ty>* %q.gc_spill
_ = %q.reload
There are a number of redundant loads/stores in that sequence, and I'm not
sure it's reasonable to generate them and expect them to get cleaned up
later (especially since "later" is post-optimizer).
So it seems like, for that use case, explicit spilling in RS4GC gets in the way
more than it helps. But I'm not sure how important that use case is to
anybody (Manuel, is this the approach PyPy is taking?), or what would be
preferable to people for whom it is important: simply to continue using
gc.relocates on the exceptional path and live with non-token linkage between
your landingpads and gc.relocates? A different scheme where RS4GC doesn't
generate `alloca`s and `store`s and `load`s directly, but rather some family of
intrinsics like `gc.spill.alloca`/`gc.spill.store`/`gc.spill.load` where the
conceptual slot's type would be token rather than pointer, so as to be
onobscurable until lowering? Something else entirely?
I'm interested to hear what others think, about both issues above.
Thanks
-Joseph
-----Original Message-----
From: Philip Reames [mailto:listmail at philipreames.com]
Sent: Friday, January 22, 2016 3:36 PM
To: llvm-dev <llvm-dev at lists.llvm.org>; Joseph Tremoulet <jotrem at
microsoft.com>; Manuel Jacob <me at manueljacob.de>; chenli@
<"azulsystems chenli"@azulsystems.com>; Sanjoy Das <sanjoy at
playingwithpointers.com>
Subject: FYI: gc relocations on exception path w/RS4GC currently broken
For anyone following along on ToT using the gc.statepoint mechanism, you should
know that ToT is currently not able to express arbitrary exceptional control
flow and relocations along exceptional edges. This is a direct result of moving
the gc.statepoint representation to using a token type landingpad. Essentially,
we have a design inconsistency where we expect to be able to "resume"
a phi of arbitrary landing pads, but we expect relocations to be tied
specifically to a particular invoke.
Chen, Joseph, and I have spent some time talking about how to resolve this. All
of the schemes we've come up with representing relocations using
gc.relocates on the exceptional path require either a change to how we define an
invoke instruction (something we'd really like to
avoid) or a new intrinsic with special treatment in the optimizer so that it
basically "becomes part of" the landing pad without actually being the
landing pad. None of us were particular thrilled by the changes involved.
Given exceptional paths are nearly by definition cold, we're currently
exploring another option. We're considering having RS4GC insert explicit
spill slots at the IR level (via allocas) for values live along exceptional
paths, and leaving all of the normal path values represented as gc.relocates.
This avoids the need for another IR extension, makes it slightly easier to meet
an ABI requirement Joseph has, and provides a better platform for lowering
experimentation. Joseph is working on implementing this and will probably have
something up for review next week or the week after. Once that's in,
we're going to run some performance experiments to see if it's a viable
lowering strategy even without Joseph's particular ABI requirement, and if
so, make that the standard way of representing relocations on exceptional edges.
Assuming this approach works, we're going to defer solving the problem of
how to cleanly represent explicit relocations along the exceptional path until a
later point in time. In particular, the value of the explicit relocations comes
mainly from being able to lower them efficiently to register uses. Since the
work to integrate relocations with the register allocator hasn't happened
and doesn't look like it's going to happen in the near term (*), this
seems like a reasonable compromise.
Philip
(*) To give some context on this, it turns out one of our initial
starting assumptions was wrong in practice. We expected the quality of
lowering for the gc arguments at statepoint/safepoint to be very
important for overall code quality. While this may some day become
true, we've found that whenever we encounter a hot safepoint, the
problem is usually that we didn't inline appropriately. As a result,
we've ended up fixing (out of tree) inlining or devirtualization bugs
rather than working on the lowering itself. For us, a truly hot
megamorphic call site has turned out to be a very rare beast. Worth
noting is that this is only true because we're a high tier JIT with good
profiling information. It's likely that other users who don't have the
same design point may find the lowering far more problematic; in fact,
we have some evidence this may already be true.
Joseph Tremoulet via llvm-dev
2016-Feb-05 20:22 UTC
[llvm-dev] gc relocations on exception path w/RS4GC currently broken
Sorry to reply to myself here, but I had an idea regarding "issue #2"
-- possibly what makes the most sense for those clients/targets is to pull the
pointer difference computation/reapplication into RS4GC itself -- it could have
a pass just before or after rematerialization, which runs based on a
configuration flag (eventually to be driven by GCStrategy), which performs
rewrites like below to ensure that only base pointers are live across
statepoints when it's done (plus a bit of bookkeeping w.r.t.
recomputeliveness and/or the ssa update at the end to make sure the rewritten
pointers don't get reported and do get uses of the original value replaced
with them)
Thanks
-Joseph
.
-----Original Message-----
Subject: Re: [llvm-dev] gc relocations on exception path w/RS4GC currently
broken
Working on this, I've run into a couple potential issues regarding which
I'd like to solicit feedback.
To give a concrete example, we're talking about having RS4GC see a
GC-safepoint call like so:
%a = _ ; gc pointer
%b = _ ; gc pointer
...
invoke void @callee()
to label %cont unwind label %pad
cont:
_ = %a
...
pad:
landingpad _
_ = %b
...
and transform it into:
%b.gc_spill = alloca <ty>
...
%a = _
%b = _
...
store <ty> %b, <ty>* %b.gc_spill
%sp = invoke token @llvm.experimental.gc.statepoint(<arg list that
indicates %a is a gc pointer and %b.gc_spill holds a gc pointer>)
to label %cont unwind label %pad
cont:
%a.reloc = call <ty> @llvm.experimental.gc.relocate(token %sp,
<index of %a>, <index of %a>)
_ = %a.reloc
...
pad:
landingpad _
%b.gc_reload = load <ty>, <ty>* %b.gc_spill
_ = %b.gc_reload
...
which would then get lowered to a call with a stack map reporting %a (or the
slot that lowering spills %a to) and %b.gc_spill as holding live gc pointers.
Issue #1: obscurability of the %b.gc_spill use on the gc.statepoint invoke
Some target runtimes/GCs (CoreCLR include) need to have stack slots reported
directly by offset. If code runs between RS4GC and lowering that somehow
rewrites the argument on the statepoint corresponding to b's spill to be
anything other than a direct use of the static alloca that RS4GC allocated to
hold the spill, the best we could do is have the lowering introduce another
layer of indirection.
E.g., continuing the above example, if something after RS4GC obscures
%b.gc_spill on the statepoint:
%b.gc_spill = alloca <ty>
...
%a = _
%b = _
...
%p = _ ; like maybe a PHI that has %b.gc_spill as an incoming value
...
store <ty> %b, <ty>* %b.gc_spill ; may or may not have been
rewritten as store into %p
%sp = invoke token @llvm.experimental.gc.statepoint(<arg list that
indicates %a is a gc pointer and %p holds a gc pointer>)
to label %cont unwind label %pad
cont:
%a.reloc = call <ty> @llvm.experimental.gc.relocate(token %sp,
<index of %a>, <index of %a>)
_ = %a.reloc
...
pad:
landingpad _
%b.gc_reload = load <ty>, <ty>* %b.gc_spill
_ = %b.gc_reload
...
then lowering would effectively have to insert another indirection:
%b.gc_spill = alloca <ty>
...
%a = _
%b = _
...
%p = _ ; like maybe a PHI that has %b.gc_spill as an incoming value
...
store <ty> %b, <ty>* %b.gc_spill ; may or may not have been
rewritten as store into %p
%p.deref = load <ty>, <ty>* %p
%sp = invoke token @llvm.experimental.gc.statepoint(<arg list that
indicates %a and %p.deref are gc pointers>)
to label %cont unwind label %pad
cont:
store <ty> %p.deref, <ty>* %p
%a.reloc = call <ty> @llvm.experimental.gc.relocate(token %sp,
<index of %a>, <index of %a>)
_ = %a.reloc
...
pad:
landingpad _
store <ty> %p.deref, <ty>* %p
%b.gc_reload = load <ty>, <ty>* %b.gc_spill
_ = %b.gc_reload
...
(and the code to insert the %p <-> %p.deref loads/stores would have to be
something in CodeGenPrep or [ugh] direclty in StatepointLowering).
I'm curious to know if others think this is problematic or not. I know that
for LLILC we intend to run RS4GC late in the pass list and could probably just
discount the possibility of the spill slot allocas on the gc.statepoint invoke
getting obscured (or at least could be ok with having the bail-out lowering/CGP
code that patches things up with extra stores/loads, on the assumption that
it's rare in practice to hit these cases), but I'm not sure how
representative LLILC is of the community in that regard. Similarly, I have the
impression that we're moving generally toward wedding RS4GC more with CGP,
but I would be interested to know if I'm off the mark there.
Issue #2: Relocating derived pointers by IR injection
I know there's been some discussion about runtimes which require the
pointers reported directly to them to be base object pointers. So e.g. with
code like this:
%p = _ ; <some base object pointer>
%q = <getelementptr getting a pointer at some offset from %p>
...
call @callee()
_ = %q
then RS4GC will generate
%p = _ ; <some base object pointer>
%q = <getelementptr getting a pointer at some offset from %p>
...
%sp = call token @llvm.experimental.gc.statepoint(<args indicating %p and
%q are gc pointers, with %p as %q's base>)
%p.reloc = call @llvm.experimental.gc.relocate(token %sp, <index of
p>, <index of p>)
%q.reloc = call @llvm.experimental.gc.relocate(token %sp, <index of
p>, <index of q>)
_ = %q.reloc
The default lowering of that is to spill %p and %q to the stack just before the
call, and lower the gc.relocate calls as loads from those slots after the calls,
with the understanding that the stack map will communicate to the GC that
q's slot is derived from p's slot and the GC will update both pointers
appropriately. However, for targets where the interface with the runtime only
allows reporting base pointers, lowering would have to effect something like the
following transformation:
%p = _ ; <some base object pointer>
%q = <getelementptr getting a pointer at some offset from %p>
...
%d = <compute %q - %p>
%sp = call token @llvm.experimental.gc.statepoint(<args indicating %p is
a gc pointer>)
%p.reloc = call @llvm.experimental.gc.relocate(token %sp, <index of
p>, <index of p>) ; lower to load
%q.reloc = call @llvm.experimental.gc.relocate(token %sp, <index of
p>, <index of q>) ; lower to %p.reloc + %d
_ = %q.reloc
Now, if we consider the same situation but on an exception path where there are
no explicit gc.relocate calls because RS4GC spilled along the exception path:
%p.gc_spill = alloca _
%q.gc_spill = alloca _
%p = _ ; <some base object pointer>
%q = <getelementptr getting a pointer at some offset from %p>
...
store <ty> %p, <ty>* %p.gc_spill
store <ty> %q, <ty>* %q.gc_spill
%sp = invoke token @llvm.experimental.gc.statepoint(<args indicating
%p.gc_spill and %q.gc_spill hold gc pointers, with %p.gc_spill as
%q.gc_spill's base>)
to label _, unwind label %pad
pad:
landingpad _
%p.reload = load <ty>, <ty>* %p.gc_spill
%q.reload = load <ty>, <ty>* %q.gc_spill
_ = %q.reload
then the best you could do is something like this:
%p.gc_spill = alloca _
%q.gc_spill = alloca _
%p = _ ; <some base object pointer>
%q = <getelementptr getting a pointer at some offset from %p>
...
store <ty> %p, <ty>* %p.gc_spill
store <ty> %q, <ty>* %q.gc_spill
; compute difference
%p.to_compute_d = load <ty>, <ty*> %p.gc_spill
%q.to_compute_d = load <ty>, <ty*> %q.gc_spill
%d = <compute %q - %p>
%sp = invoke token @llvm.experimental.gc.statepoint(<args indicating
%p.gc_spill and %q.gc_spill hold gc pointers, with %p.gc_spill as
%q.gc_spill's base>)
to label _, unwind label %pad
pad:
landingpad _
%p.to_compute_q = load <ty>, <ty>* %p.gc_spill
%q.recomputed = <compute %p.to_compute_q + %d>
store <ty> %q.recomputed, <ty>* %q.gc_spill
; continue as normal
%p.reload = load <ty>, <ty>* %p.gc_spill
%q.reload = load <ty>, <ty>* %q.gc_spill
_ = %q.reload
There are a number of redundant loads/stores in that sequence, and I'm not
sure it's reasonable to generate them and expect them to get cleaned up
later (especially since "later" is post-optimizer).
So it seems like, for that use case, explicit spilling in RS4GC gets in the way
more than it helps. But I'm not sure how important that use case is to
anybody (Manuel, is this the approach PyPy is taking?), or what would be
preferable to people for whom it is important: simply to continue using
gc.relocates on the exceptional path and live with non-token linkage between
your landingpads and gc.relocates? A different scheme where RS4GC doesn't
generate `alloca`s and `store`s and `load`s directly, but rather some family of
intrinsics like `gc.spill.alloca`/`gc.spill.store`/`gc.spill.load` where the
conceptual slot's type would be token rather than pointer, so as to be
onobscurable until lowering? Something else entirely?
I'm interested to hear what others think, about both issues above.
Thanks
-Joseph
-----Original Message-----
From: Philip Reames [mailto:listmail at philipreames.com]
Sent: Friday, January 22, 2016 3:36 PM
To: llvm-dev <llvm-dev at lists.llvm.org>; Joseph Tremoulet <jotrem at
microsoft.com>; Manuel Jacob <me at manueljacob.de>; chenli@
<"azulsystems
chenli"@https://na01.safelinks.protection.outlook.com/?url=azulsystems.com&data=01%7c01%7cjotrem%40microsoft.com%7c81b669bbdc6a4dec072208d32e607440%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Ru7P2lfCvpxYDglofpoxAOW%2bUVTwwEc7UQLXQ%2bj2pLs%3d>;
Sanjoy Das <sanjoy at playingwithpointers.com>
Subject: FYI: gc relocations on exception path w/RS4GC currently broken
For anyone following along on ToT using the gc.statepoint mechanism, you should
know that ToT is currently not able to express arbitrary exceptional control
flow and relocations along exceptional edges. This is a direct result of moving
the gc.statepoint representation to using a token type landingpad. Essentially,
we have a design inconsistency where we expect to be able to "resume"
a phi of arbitrary landing pads, but we expect relocations to be tied
specifically to a particular invoke.
Chen, Joseph, and I have spent some time talking about how to resolve this. All
of the schemes we've come up with representing relocations using
gc.relocates on the exceptional path require either a change to how we define an
invoke instruction (something we'd really like to
avoid) or a new intrinsic with special treatment in the optimizer so that it
basically "becomes part of" the landing pad without actually being the
landing pad. None of us were particular thrilled by the changes involved.
Given exceptional paths are nearly by definition cold, we're currently
exploring another option. We're considering having RS4GC insert explicit
spill slots at the IR level (via allocas) for values live along exceptional
paths, and leaving all of the normal path values represented as gc.relocates.
This avoids the need for another IR extension, makes it slightly easier to meet
an ABI requirement Joseph has, and provides a better platform for lowering
experimentation. Joseph is working on implementing this and will probably have
something up for review next week or the week after. Once that's in,
we're going to run some performance experiments to see if it's a viable
lowering strategy even without Joseph's particular ABI requirement, and if
so, make that the standard way of representing relocations on exceptional edges.
Assuming this approach works, we're going to defer solving the problem of
how to cleanly represent explicit relocations along the exceptional path until a
later point in time. In particular, the value of the explicit relocations comes
mainly from being able to lower them efficiently to register uses. Since the
work to integrate relocations with the register allocator hasn't happened
and doesn't look like it's going to happen in the near term (*), this
seems like a reasonable compromise.
Philip
(*) To give some context on this, it turns out one of our initial starting
assumptions was wrong in practice. We expected the quality of lowering for the
gc arguments at statepoint/safepoint to be very important for overall code
quality. While this may some day become true, we've found that whenever we
encounter a hot safepoint, the problem is usually that we didn't inline
appropriately. As a result, we've ended up fixing (out of tree) inlining or
devirtualization bugs rather than working on the lowering itself. For us, a
truly hot megamorphic call site has turned out to be a very rare beast. Worth
noting is that this is only true because we're a high tier JIT with good
profiling information. It's likely that other users who don't have the
same design point may find the lowering far more problematic; in fact, we have
some evidence this may already be true.
_______________________________________________
LLVM Developers mailing list
Possibly Parallel Threads
- gc relocations on exception path w/RS4GC currently broken
- FYI: gc relocations on exception path w/RS4GC currently broken
- Operand bundles and gc transition arguments
- PlaceSafepoints, operand bundles, and RewriteStatepointsForGC
- GC-parseable element atomic memcpy/memmove