Andrew Trick
2014-Nov-05 08:33 UTC
[LLVMdev] Stackmaps: caller-save-registers passed as deopt args
> On Oct 31, 2014, at 5:28 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > Hi Kevin, > > Thank you for starting this discussion!Yes, sorry for being unresponsive for a few days. Sanjoy summarized the issues perfectly.> I think the distinction is really between whether the live values are > "live on call" or "live on return". Ideally we should be able to mark > values to be of either kind (liveoncall vs. liveonreturn) in the call > to the patchpoint intrinsic. This will make the semantics of > patchpoint slightly odd though -- we'll end up with an SSA instruction > where some inputs are expected to be live *after* the instruction has > been retired. Are there other SSA instructions that have this sort of > semantics?No, it’s a lowering issue. Machine instructions can declare that some registers are early clobbered so that machine operands can’t use that register. In this case it’s best just to think about it as a kind of calling convention though.> Nevertheless, I think splitting the inputs to a patchpoint > into "function arguments", "values live on call" and "values live on > return" is a good idea.I basically agree, but think we should have a new intrinsic that is even more general. We should be able to tag any number of arguments as following some convention/lowering rules. Those rules should not be baked into the intrinsic. What Kevin is asking for is very reasonable and arguably a safer default, but we need the current functionality as well. As a short term fix, until a new intrinsic is ready, Kevin can add a string attribute to the patchpoint call, say “live-on-return”. Some work maybe needed to support this. One possibility is to add a machine operand flag that is kind of the inverse of EarlyClobber, say LateUse - that’s hard. An easier option would be to generate a second pseudo instruction immediately after the patchpoint that simply uses all the live-on-return values. A cop-out would be to give the call’s register mask operand a different flag so that is is interpreted as a bunch of early-clobber registers. This last solution would not allow both live-on-return and live-on-call operands in the same call. Hopefully there’s something more clever that I haven’t thought of yet.>> There was some discussion about what the behavior should be if you use >> anyregcc, since in that case it seems reasonable to receive a >> caller-save register if you are interested in using it inside the >> patchpoint. My thoughts are that in that case you might be better off >> including the value as part of the function arguments instead of the >> stackmap/deopt arguments. > > I don't see why anything needs to be different for patchpoints running > with anyregcc. The call arguments always get lowered based on the > calling convention, the "live on call" values get arbitrary stack > slots / registers and the "live on return" values get assigned to CSRs > or stack slots. If I've read X86RegisterInfo::getCalleeSavedRegs > correctly, I think all registers are callee-saved for anyregcc so > whatever you call using the anyregcc convention will have to respect > that; but that's a different problem.Yep. -Andy> > Thanks! > -- Sanjoy
Sanjoy Das
2014-Nov-06 21:53 UTC
[LLVMdev] Stackmaps: caller-save-registers passed as deopt args
> An easier option would be to generate a second > pseudo instruction immediately after the patchpoint that simply uses > all the live-on-return values.Do you mean some variant of PATCHPOINT ... (live_values = %vreg0, %vreg1) FAKE-USE %vreg1 ;; %vreg1 needs to be live on return For this kind of a solution, we'll have to prevent the register allocator (and other lowering components, I'm not sure) playing tricks with inserting fills and remats. IOW, the above code should not compile to mov 16(%rsp), %r11 mov 24(%rsp), %r10 inc %r10 PATCHPOINT ... (live_values = %r11, %r10) mov 24(%rsp), %r10 inc %r10 FAKE-USE %r10 Naively, both the register assignments to %vreg1, in PATCHPOINT and FAKE-USE, are caller-saved and not live on return. It may be possible to reverse-engineer the computation that will compute the value of %r10, but that's non-trivial. One simple solution is to force spilling of live-on-return values at the SelectionDAG layer (we do this for statepoints currently); but that prevents us from using callee-saved registers and may be suboptimal. A completely orthogonal idea: it should also be possible to do deopt-on-return only using live-on-call values if you're willing to use invokes instead of calls. Instead of call void @patchpoint(@callee_that_may_deopt, args, live_values) emit invoke @callee_that_may_deopt(args) deopt_pad: ... landingpad ... call void @patchpoint(@deoptimize_me, ... live_values ...) ... live_values ... in deopt_pad now only needs to be live-on-call. It can actually even be function arguments to @deoptimize_me -- the register / stack slot shuffling due to a fixed calling convention will no longer happen on the (presumably hot) call, but only when deoptimizing. You'd have to deoptimize by "throwing" an exception in this case. -- Sanjoy
Andrew Trick
2014-Nov-07 00:47 UTC
[LLVMdev] Stackmaps: caller-save-registers passed as deopt args
> On Nov 6, 2014, at 1:53 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > >> An easier option would be to generate a second >> pseudo instruction immediately after the patchpoint that simply uses >> all the live-on-return values. > > Do you mean some variant of > > PATCHPOINT ... (live_values = %vreg0, %vreg1) > FAKE-USE %vreg1 ;; %vreg1 needs to be live on return > > For this kind of a solution, we'll have to prevent the register > allocator (and other lowering components, I'm not sure) playing tricks > with inserting fills and remats. IOW, the above code should not > compile to > > mov 16(%rsp), %r11 > mov 24(%rsp), %r10 > inc %r10 > > PATCHPOINT ... (live_values = %r11, %r10) > mov 24(%rsp), %r10 > inc %r10 > FAKE-USE %r10 > > Naively, both the register assignments to %vreg1, in PATCHPOINT and > FAKE-USE, are caller-saved and not live on return. It may be possible > to reverse-engineer the computation that will compute the value of > %r10, but that's non-trivial.I was thinking FAKE_USE would need special handling. It would be bundled before and after register allocation, but unbundled during register allocation so the live interval of the live values would span the call instruction. The problem is that the register allocator itself is free do something like you’ve shown above, so it’s not a very robust solution and I don’t particularly like it. I thought it would be easier than adding a “LateUse” MachineOperand flag, but now I’m not so sure. Any ideas Quentin?> One simple solution is to force spilling of live-on-return values at > the SelectionDAG layer (we do this for statepoints currently); but > that prevents us from using callee-saved registers and may be > suboptimal.If you need to do that then the design is broken. But it is an option for Kevin to get things working.> A completely orthogonal idea: it should also be possible to do > deopt-on-return only using live-on-call values if you're willing to > use invokes instead of calls. Instead of > > call void @patchpoint(@callee_that_may_deopt, args, live_values) > > emit > > invoke @callee_that_may_deopt(args) > > deopt_pad: > ... landingpad ... > call void @patchpoint(@deoptimize_me, ... live_values ...) > > > ... live_values ... in deopt_pad now only needs to be live-on-call. It > can actually even be function arguments to @deoptimize_me -- the > register / stack slot shuffling due to a fixed calling convention will > no longer happen on the (presumably hot) call, but only when > deoptimizing. You'd have to deoptimize by "throwing" an exception in > this case.That’s an elegant approach to avoiding the backend problems. The IR representation makes perfect sense. I think your deopt mechanism would need to patch the return address as if an exception were thrown (I guess that’s what you mean by “throwing an exception”). Also, you will end up with exception tables that need to be parsed in addition to stackmaps, which seems fairly horrible. Agree? I assume that statepoints have the same issue with both deopt values and GC roots once you stop spilling them. Do you have a preferred or tentative solution for it? -Andy> > -- Sanjoy