thr3ads.net - llvm dev - [LLVMdev] Stackmaps: caller-save-registers passed as deopt args [Nov 2014]

If this information is useful, please help other people find it:
Share via:

Andrew Trick

2014-Nov-05 08:33 UTC

[LLVMdev] Stackmaps: caller-save-registers passed as deopt args

> On Oct 31, 2014, at 5:28 PM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
> Hi Kevin,
> 
> Thank you for starting this discussion!
Yes, sorry for being unresponsive for a few days. Sanjoy summarized the issues
perfectly.
> I think the distinction is really between whether the live values are
> "live on call" or "live on return".  Ideally we should
be able to mark
> values to be of either kind (liveoncall vs. liveonreturn) in the call
> to the patchpoint intrinsic.  This will make the semantics of
> patchpoint slightly odd though -- we'll end up with an SSA instruction
> where some inputs are expected to be live *after* the instruction has
> been retired.  Are there other SSA instructions that have this sort of
> semantics?
No, it’s a lowering issue. Machine instructions can declare that some registers
are early clobbered so that machine operands can’t use that register. In this
case it’s best just to think about it as a kind of calling convention though.
>  Nevertheless, I think splitting the inputs to a patchpoint
> into "function arguments", "values live on call" and
"values live on
> return" is a good idea.
I basically agree, but think we should have a new intrinsic that is even more
general. We should be able to tag any number of arguments as following some
convention/lowering rules. Those rules should not be baked into the intrinsic.

What Kevin is asking for is very reasonable and arguably a safer default, but we
need the current functionality as well.
As a short term fix, until a new intrinsic is ready, Kevin can add a string
attribute to the patchpoint call, say “live-on-return”.

Some work maybe needed to support this. One possibility is to add a machine
operand flag that is kind of the inverse of EarlyClobber, say LateUse - that’s
hard. An easier option would be to generate a second pseudo instruction
immediately after the patchpoint that simply uses all the live-on-return values.
A cop-out would be to give the call’s register mask operand a different flag so
that is is interpreted as a bunch of early-clobber registers. This last solution
would not allow both live-on-return and live-on-call operands in the same call.
Hopefully there’s something more clever that I haven’t thought of yet.
>> There was some discussion about what the behavior should be if you use
>> anyregcc, since in that case it seems reasonable to receive a
>> caller-save register if you are interested in using it inside the
>> patchpoint.  My thoughts are that in that case you might be better off
>> including the value as part of the function arguments instead of the
>> stackmap/deopt arguments.
> 
> I don't see why anything needs to be different for patchpoints running
> with anyregcc.  The call arguments always get lowered based on the
> calling convention, the "live on call" values get arbitrary stack
> slots / registers and the "live on return" values get assigned to
CSRs
> or stack slots.  If I've read X86RegisterInfo::getCalleeSavedRegs
> correctly, I think all registers are callee-saved for anyregcc so
> whatever you call using the anyregcc convention will have to respect
> that; but that's a different problem.
Yep.

-Andy
> 
> Thanks!
> -- Sanjoy

Sanjoy Das

2014-Nov-06 21:53 UTC

head link

[LLVMdev] Stackmaps: caller-save-registers passed as deopt args

> An easier option would be to generate a second
> pseudo instruction immediately after the patchpoint that simply uses
> all the live-on-return values.
Do you mean some variant of

PATCHPOINT ... (live_values = %vreg0, %vreg1)
FAKE-USE %vreg1 ;; %vreg1 needs to be live on return

For this kind of a solution, we'll have to prevent the register
allocator (and other lowering components, I'm not sure) playing tricks
with inserting fills and remats.  IOW, the above code should not
compile to

mov 16(%rsp), %r11
mov 24(%rsp), %r10
inc %r10

PATCHPOINT ... (live_values = %r11, %r10)
mov 24(%rsp), %r10
inc %r10
FAKE-USE %r10

Naively, both the register assignments to %vreg1, in PATCHPOINT and
FAKE-USE, are caller-saved and not live on return.  It may be possible
to reverse-engineer the computation that will compute the value of
%r10, but that's non-trivial.



One simple solution is to force spilling of live-on-return values at
the SelectionDAG layer (we do this for statepoints currently); but
that prevents us from using callee-saved registers and may be
suboptimal.



A completely orthogonal idea: it should also be possible to do
deopt-on-return only using live-on-call values if you're willing to
use invokes instead of calls.  Instead of

  call void @patchpoint(@callee_that_may_deopt, args, live_values)

emit

  invoke @callee_that_may_deopt(args)

  deopt_pad:
    ... landingpad ...
    call void @patchpoint(@deoptimize_me, ... live_values ...)


... live_values ... in deopt_pad now only needs to be live-on-call.  It
can actually even be function arguments to @deoptimize_me -- the
register / stack slot shuffling due to a fixed calling convention will
no longer happen on the (presumably hot) call, but only when
deoptimizing.  You'd have to deoptimize by "throwing" an exception
in
this case.

-- Sanjoy

Andrew Trick

2014-Nov-07 00:47 UTC

head link

[LLVMdev] Stackmaps: caller-save-registers passed as deopt args

> On Nov 6, 2014, at 1:53 PM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
>> An easier option would be to generate a second
>> pseudo instruction immediately after the patchpoint that simply uses
>> all the live-on-return values.
> 
> Do you mean some variant of
> 
> PATCHPOINT ... (live_values = %vreg0, %vreg1)
> FAKE-USE %vreg1 ;; %vreg1 needs to be live on return
> 
> For this kind of a solution, we'll have to prevent the register
> allocator (and other lowering components, I'm not sure) playing tricks
> with inserting fills and remats.  IOW, the above code should not
> compile to
> 
> mov 16(%rsp), %r11
> mov 24(%rsp), %r10
> inc %r10
> 
> PATCHPOINT ... (live_values = %r11, %r10)
> mov 24(%rsp), %r10
> inc %r10
> FAKE-USE %r10
> 
> Naively, both the register assignments to %vreg1, in PATCHPOINT and
> FAKE-USE, are caller-saved and not live on return.  It may be possible
> to reverse-engineer the computation that will compute the value of
> %r10, but that's non-trivial.
I was thinking FAKE_USE would need special handling. It would be bundled before
and after register allocation, but unbundled during register allocation so the
live interval of the live values would span the call instruction. The problem is
that the register allocator itself is free do something like you’ve shown above,
so it’s not a very robust solution and I don’t particularly like it. I thought
it would be easier than adding a “LateUse” MachineOperand flag, but now I’m not
so sure. Any ideas Quentin?
> One simple solution is to force spilling of live-on-return values at
> the SelectionDAG layer (we do this for statepoints currently); but
> that prevents us from using callee-saved registers and may be
> suboptimal.
If you need to do that then the design is broken. But it is an option for Kevin
to get things working.
> A completely orthogonal idea: it should also be possible to do
> deopt-on-return only using live-on-call values if you're willing to
> use invokes instead of calls.  Instead of
> 
>  call void @patchpoint(@callee_that_may_deopt, args, live_values)
> 
> emit
> 
>  invoke @callee_that_may_deopt(args)
> 
>  deopt_pad:
>    ... landingpad ...
>    call void @patchpoint(@deoptimize_me, ... live_values ...)
> 
> 
> ... live_values ... in deopt_pad now only needs to be live-on-call.  It
> can actually even be function arguments to @deoptimize_me -- the
> register / stack slot shuffling due to a fixed calling convention will
> no longer happen on the (presumably hot) call, but only when
> deoptimizing.  You'd have to deoptimize by "throwing" an
exception in
> this case.
That’s an elegant approach to avoiding the backend problems. The IR
representation makes perfect sense. I think your deopt mechanism would need to
patch the return address as if an exception were thrown (I guess that’s what you
mean by “throwing an exception”). Also, you will end up with exception tables
that need to be parsed in addition to stackmaps, which seems fairly horrible.
Agree?

I assume that statepoints have the same issue with both deopt values and GC
roots once you stop spilling them. Do you have a preferred or tentative solution
for it?

-Andy
> 
> -- Sanjoy

llvm dev - Nov 2014 - [LLVMdev] Stackmaps: caller-save-registers passed as deopt args

[LLVMdev] Stackmaps: caller-save-registers passed as deopt args

[LLVMdev] Stackmaps: caller-save-registers passed as deopt args

[LLVMdev] Stackmaps: caller-save-registers passed as deopt args