TLDR - I have a runtime which expects to be able to inspect certain arguments to a function even if that argument isn't used within the callee itself. DeadArgumentElimination doesn't respect this today. I want to add an argument that records an argument to a call as live even if the value is known to be not used in the callee. My use case ----------------- What my runtime is doing is trying to resolve a symbolic reference to a function from a call site which has been devirtualized by the compiler. Rather than saving what the devirtualized callee actually was, all the (LLVM based) in-memory compiler does is save a bit indicating that it proved the given call site was monomorphic. In LLVM, the call is represented as a patchable callsite using statepoints (could also be a patchpoint). Before actually running the code in question, we patch over the generated code with a call to a helper routine which knows how to resolve the actual callee and patch the direct call target back into the patchable code section. What's supposed to happen the first time this code is actually executed is that the running application thread calls into the helper routine, does a dynamic lookup of the callee (using the normal dynamic dispatch logic including all cornercases), patches the actual callee's entry address back into the source of the call, and then tail calls into the actual callee. However, there's a complication with the step involved with doing the dynamic dispatch. If the actual callee was visible to the LLVM compile, we might have proven that one of the arguments (say, the 'this' receiver pointer) was not used in the callee and replaced it with undef at the callsite. This breaks the dynamic lookup. (I really don't want to get into a discussion of whether this is the "right" way to implement such a thing. This approach has various advantages, but more importantly, it's a _reasonable_ runtime design. In my view, LLVM should be able to support any reasonable design, regardless of whether it's the best one or not.) The proposal ----------------- We add a new parameter attribute which can be placed either on a call site (call, invoke), or function declaration. The exact semantics are that the parameter so tagged must be considered live up until the prolog of the callee actual starts executing. It is illegal to make any assumptions in the caller about whether the callee uses this value or not. This attribute does not inhibit inlining. The semantics only apply if a call must be emitted (including tail or sibling calls). My tentative name is liveoncall, but I'm open to better names. Feel free to make suggestions. Today, the actual implementation would be quite simple. It will basically consist of a single special case in DeadArgumentElimination. In the long run, we might have to extend this to other inter-procedural analysis and optimization passes, but I suspect the diff will remain small. Comparables & Alternatives ------------------------------------- Today, the "meta arguments" to the patchpoint have a semantic which is similar to that proposed here. They have the "liveoncall" property, but they *also* have the freedom to be freely allocated by the register allocator. My proposed attribute does not allow this degree of freedom. Similarly, statepoints support "deopt arguments", "transition arguments", and "gc arguments". All of them have the liveoncall property, but they also have additional restrictions on liveness (such as "live-during-call" or "live-on-return") and placement. In DeadArgumentElimination, we already have support for interposable functions. The restrictions are similar, but apply to all arguments to a function rather than a subset. You could view my proposed attribute as allowing interposition of the callee, but with restricted semantics on the interposed implementation. An alternate approach would be to insert a dummy use into the callee, lower it to a noop late in the backend, and teach the inliner to remove it after inlining. I suspect this would be both harder to implement and harder to optimize around. Philip
Hey Philip I have no problem with this but I'm also not experienced enough with patch points to give a formal LGTM or anything like that. However, I do wonder about another pass impacting calls. I can't remember it's name right now but it's basically IPO SROA. It would be able to take an argument you've tagged here, and if it's a struct, split it in 2. Can you imagine ever creating a call in your runtime where that would be a problem? If you can't then no worries and I can't think of anything better named than liveoncall. However, if you can then how about something simple like noopt on the argument which just turns off any optimization on that argument? Alternatively, the intrinsic equivalent to this is sideeffects, but that might be a bit overkill as it could mean just about anything. I'd prefer to keep this quite specific. Cheers Pete Sent from my iPhone> On Jun 1, 2015, at 4:28 PM, Philip Reames <listmail at philipreames.com> wrote: > > TLDR - I have a runtime which expects to be able to inspect certain arguments to a function even if that argument isn't used within the callee itself. DeadArgumentElimination doesn't respect this today. I want to add an argument that records an argument to a call as live even if the value is known to be not used in the callee. > > My use case > ----------------- > > What my runtime is doing is trying to resolve a symbolic reference to a function from a call site which has been devirtualized by the compiler. > > Rather than saving what the devirtualized callee actually was, all the (LLVM based) in-memory compiler does is save a bit indicating that it proved the given call site was monomorphic. In LLVM, the call is represented as a patchable callsite using statepoints (could also be a patchpoint). Before actually running the code in question, we patch over the generated code with a call to a helper routine which knows how to resolve the actual callee and patch the direct call target back into the patchable code section. > > What's supposed to happen the first time this code is actually executed is that the running application thread calls into the helper routine, does a dynamic lookup of the callee (using the normal dynamic dispatch logic including all cornercases), patches the actual callee's entry address back into the source of the call, and then tail calls into the actual callee. However, there's a complication with the step involved with doing the dynamic dispatch. If the actual callee was visible to the LLVM compile, we might have proven that one of the arguments (say, the 'this' receiver pointer) was not used in the callee and replaced it with undef at the callsite. This breaks the dynamic lookup. > > (I really don't want to get into a discussion of whether this is the "right" way to implement such a thing. This approach has various advantages, but more importantly, it's a _reasonable_ runtime design. In my view, LLVM should be able to support any reasonable design, regardless of whether it's the best one or not.) > > The proposal > ----------------- > > We add a new parameter attribute which can be placed either on a call site (call, invoke), or function declaration. The exact semantics are that the parameter so tagged must be considered live up until the prolog of the callee actual starts executing. It is illegal to make any assumptions in the caller about whether the callee uses this value or not. This attribute does not inhibit inlining. The semantics only apply if a call must be emitted (including tail or sibling calls). > > My tentative name is liveoncall, but I'm open to better names. Feel free to make suggestions. > > Today, the actual implementation would be quite simple. It will basically consist of a single special case in DeadArgumentElimination. In the long run, we might have to extend this to other inter-procedural analysis and optimization passes, but I suspect the diff will remain small. > > Comparables & Alternatives > ------------------------------------- > Today, the "meta arguments" to the patchpoint have a semantic which is similar to that proposed here. They have the "liveoncall" property, but they *also* have the freedom to be freely allocated by the register allocator. My proposed attribute does not allow this degree of freedom. > > Similarly, statepoints support "deopt arguments", "transition arguments", and "gc arguments". All of them have the liveoncall property, but they also have additional restrictions on liveness (such as "live-during-call" or "live-on-return") and placement. > > In DeadArgumentElimination, we already have support for interposable functions. The restrictions are similar, but apply to all arguments to a function rather than a subset. You could view my proposed attribute as allowing interposition of the callee, but with restricted semantics on the interposed implementation. > > An alternate approach would be to insert a dummy use into the callee, lower it to a noop late in the backend, and teach the inliner to remove it after inlining. I suspect this would be both harder to implement and harder to optimize around. > > Philip > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> On Jun 1, 2015, at 4:28 PM, Philip Reames <listmail at philipreames.com> wrote: > > An alternate approach would be to insert a dummy use into the callee, lower it to a noop late in the backend, and teach the inliner to remove it after inlining. I suspect this would be both harder to implement and harder to optimize around.Hi Philip, Without knowing more, using an intrinsic for this seems like a better way to go: the intrinsic call will keep value alive, and you can special case the behavior for inlining in the inliner, which is apparently the only place where this is problematic. I suspect that this will be a lot less impact on the existing compiler, be much less likely to break going forward, and also trivially composes across other existing argument attributes. What are the disadvantages of going with a new intrinsic? -Chris
On 06/01/2015 08:13 PM, Pete Cooper wrote:> Hey Philip > > I have no problem with this but I'm also not experienced enough with patch points to give a formal LGTM or anything like that. > > However, I do wonder about another pass impacting calls. I can't remember it's name right now but it's basically IPO SROA. It would be able to take an argument you've tagged here, and if it's a struct, split it in 2. > > Can you imagine ever creating a call in your runtime where that would be a problem?In my specific case, not really. We break apart all structs into their component pieces for ABI reasons. But your point is a good one and is definitely worth considering in terms of general usage in LLVM. The challenge here is that the splitting you describe might not be an optimization. I'm not sure, but I think there are calling conventions already in tree that require exactly this type of splitting. I think this is currently done in the frontend (clang), but if we wanted to change that, it would be problematic. I think we'd need to restrict this to ABI primitive types, but that's not unreasonable to do. Definitely something to document though.> > If you can't then no worries and I can't think of anything better named than liveoncall. However, if you can then how about something simple like noopt on the argument which just turns off any optimization on that argument? > > Alternatively, the intrinsic equivalent to this is sideeffects, but that might be a bit overkill as it could mean just about anything. I'd prefer to keep this quite specific. > > Cheers > Pete > > Sent from my iPhone > >> On Jun 1, 2015, at 4:28 PM, Philip Reames <listmail at philipreames.com> wrote: >> >> TLDR - I have a runtime which expects to be able to inspect certain arguments to a function even if that argument isn't used within the callee itself. DeadArgumentElimination doesn't respect this today. I want to add an argument that records an argument to a call as live even if the value is known to be not used in the callee. >> >> My use case >> ----------------- >> >> What my runtime is doing is trying to resolve a symbolic reference to a function from a call site which has been devirtualized by the compiler. >> >> Rather than saving what the devirtualized callee actually was, all the (LLVM based) in-memory compiler does is save a bit indicating that it proved the given call site was monomorphic. In LLVM, the call is represented as a patchable callsite using statepoints (could also be a patchpoint). Before actually running the code in question, we patch over the generated code with a call to a helper routine which knows how to resolve the actual callee and patch the direct call target back into the patchable code section. >> >> What's supposed to happen the first time this code is actually executed is that the running application thread calls into the helper routine, does a dynamic lookup of the callee (using the normal dynamic dispatch logic including all cornercases), patches the actual callee's entry address back into the source of the call, and then tail calls into the actual callee. However, there's a complication with the step involved with doing the dynamic dispatch. If the actual callee was visible to the LLVM compile, we might have proven that one of the arguments (say, the 'this' receiver pointer) was not used in the callee and replaced it with undef at the callsite. This breaks the dynamic lookup. >> >> (I really don't want to get into a discussion of whether this is the "right" way to implement such a thing. This approach has various advantages, but more importantly, it's a _reasonable_ runtime design. In my view, LLVM should be able to support any reasonable design, regardless of whether it's the best one or not.) >> >> The proposal >> ----------------- >> >> We add a new parameter attribute which can be placed either on a call site (call, invoke), or function declaration. The exact semantics are that the parameter so tagged must be considered live up until the prolog of the callee actual starts executing. It is illegal to make any assumptions in the caller about whether the callee uses this value or not. This attribute does not inhibit inlining. The semantics only apply if a call must be emitted (including tail or sibling calls). >> >> My tentative name is liveoncall, but I'm open to better names. Feel free to make suggestions. >> >> Today, the actual implementation would be quite simple. It will basically consist of a single special case in DeadArgumentElimination. In the long run, we might have to extend this to other inter-procedural analysis and optimization passes, but I suspect the diff will remain small. >> >> Comparables & Alternatives >> ------------------------------------- >> Today, the "meta arguments" to the patchpoint have a semantic which is similar to that proposed here. They have the "liveoncall" property, but they *also* have the freedom to be freely allocated by the register allocator. My proposed attribute does not allow this degree of freedom. >> >> Similarly, statepoints support "deopt arguments", "transition arguments", and "gc arguments". All of them have the liveoncall property, but they also have additional restrictions on liveness (such as "live-during-call" or "live-on-return") and placement. >> >> In DeadArgumentElimination, we already have support for interposable functions. The restrictions are similar, but apply to all arguments to a function rather than a subset. You could view my proposed attribute as allowing interposition of the callee, but with restricted semantics on the interposed implementation. >> >> An alternate approach would be to insert a dummy use into the callee, lower it to a noop late in the backend, and teach the inliner to remove it after inlining. I suspect this would be both harder to implement and harder to optimize around. >> >> Philip >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On 06/04/2015 11:00 AM, Chris Lattner wrote:>> On Jun 1, 2015, at 4:28 PM, Philip Reames <listmail at philipreames.com> wrote: >> >> An alternate approach would be to insert a dummy use into the callee, lower it to a noop late in the backend, and teach the inliner to remove it after inlining. I suspect this would be both harder to implement and harder to optimize around. > > Hi Philip, > > Without knowing more, using an intrinsic for this seems like a better way to go: the intrinsic call will keep value alive, and you can special case the behavior for inlining in the inliner, which is apparently the only place where this is problematic. I suspect that this will be a lot less impact on the existing compiler, be much less likely to break going forward, and also trivially composes across other existing argument attributes. > > What are the disadvantages of going with a new intrinsic?The intrinsic I was thinking of would be something along the lines of: void @llvm.live_on_call(<type> val) readnone One challenge would be that I'd want the intrinsic to have readnone memory semantics, but such an intrinsic with a void return would be trivially dropped. We already somewhat hack around this for things like assumes, so I could probably extend the logic. Places that would need extended include: - InlineCostAnalysis - discount the cost of the call - InlineFunction - remove the call after inlining - AliasAnalyis/MDA/etc.. (per above comment) - Verifier - must be in entry block You're right, that doesn't actually sound that bad. It's a bit more code, but not *that* much more. Semantically, it makes a bit more sense to me as a parameter attribute - it really is an requirement of the *call*, not of the *callee* - but I could see doing the intrinsic instead. It's not that confusing. As it happens, my immediate motivation to implement this has disappeared. It turned out the symptom I was investigating when I came up with this proposal was merely a hint of a larger problem that became apparent once we started really looking at what dead argument elimination was doing in the original case. We use a limited form of function interposition which replaces the implementation of the callee, and dead arg elimination is unsound in this context. (i.e. just because we analyzed what the method did as if were compiled doesn't mean it's safe to run that method in the interpreter. That unused pointer argument might actually be accessed and it had better be valid and dereferenceable even if the result is provably irrelevant to the final result.) Any solution I've come up with to the underlying problems solves this somewhat by accident, so I'm probably going to just set this aside for the moment and come back to it in the future if I need it again. Philip
Possibly Parallel Threads
- [LLVMdev] Stackmaps: caller-save-registers passed as deopt args
- [LLVMdev] Stackmaps: caller-save-registers passed as deopt args
- [LLVMdev] design question on inlining through statepoints and patchpoints
- [LLVMdev] design question on inlining through statepoints and patchpoints
- RFC: New function attribute "patchable-prologue"="<kind>"