Kavon Farvardin via llvm-dev
2017-Apr-17 22:52 UTC
[llvm-dev] [RFC] Adding CPS call support
(Sorry for the 2nd email Eli, I forgot to reply-all).> I'm not following how explicitly representing the return address of a call in the IR before isel actually solves any relevant issue. We already pass the return address implicitly as an argument to every call; you can retrieve it with llvm.returnaddress if you need it.Unfortunately the @llvm.returnaddress intrinsic does not solve the problem, as it only reads the return address pushed onto the LLVM stack by a call. We would then need a way to move the LLVM stack pointer back to where it was before the call, because a CPS call _must_ not grow the LLVM stack (it will never be popped!), so a 'call' instruction will not do.> You can't branch across functions like you're proposing because the stack and callee-save registers won't be in the right state. LLVM will inevitably save values to the stack for calls which are not tail calls. Therefore, this proposal simply doesn't work.It might be helpful to think of the CPS transformation as turning the program into a form such that it always moves "forward", in the sense that the tail calls _never_ return. A "return" is simply another tail call (i.e., a jump) to a return address, which looks no different than a function call. Thus, we have no callee-saved registers to worry about since the tail calls never come back. In addition, there are no live values in the LLVM stack frame, since they are all passed in the CPS call (which uses a different stack). Thus, retpt receives all of its values from the struct it extracts from. While the 'cps' marked call appears to be non-tail, it will be lowered to a tail call. It's also worth noting that this proposal is based on something that already works in LLVM, using a special assembly routine. We currently this workaround in Manticore when a thread is preempted. But, it has too much overhead for GHC, since the workaround would penalize general function calls. This is why support is necessary for GHC (and related compilers) in order to use LLVM effectively. ~kavon> On Apr 17, 2017, at 7:47 PM, Friedman, Eli <efriedma at codeaurora.org> wrote: > > On 4/17/2017 8:30 AM, Kavon Farvardin via llvm-dev wrote: >> Summary >> ======>> >> There is a need for dedicated continuation-passing style (CPS) calls in LLVM to >> support functional languages. Herein I describe the problem and propose a >> solution. Feedback and/or tips are greatly appreciated, as our goal is to >> implement these changes so they can be merged into LLVM trunk. >> >> >> Problem >> ======>> >> Implementations of functional languages like Haskell and ML (e.g., GHC and >> Manticore) use a continuation-passing style (CPS) transformation in order to >> manage the call stack explicitly. This is done prior to generating LLVM IR, so >> the implicit call stack within LLVM is not used for call and return. >> >> When making a non-tail call while in CPS, we initialize a stack frame for the >> return through our own stack pointer, and then pass that stack pointer to the >> callee when we jump to it. It is here when we run into a problem in LLVM. >> >> Consider the following CPS call to @bar and how it will return: >> >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; >> >> define void @foo (i8** %sp, ...) { >> someBlk: >> ; ... >> ; finish stack frame by writing return address >> %retAddr = blockaddress(@foo, %retpt) >> store i8* %retAddr, i8** %sp >> ; jump to @bar >> tail call void @bar(i8** %sp, ...) >> >> retpt: ; <- how can @bar "call" %retpt? >> %sp2 = ??? >> %val = ??? >> ; ... >> >> } >> >> define void @bar (i8** %sp, ...) { >> ; perform a return >> %retAddr0 = load i8*, i8** %sp >> %retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)* >> %val = bitcast i64 1 to i64 >> ; jump back to %retpt in @foo, passing %sp and %val >> tail call void %retAddr1(i8** %sp, i64 %val) >> } >> >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; >> >> There is currently no way to jump back to %retpt from another function, as block >> addresses have restricted usage in LLVM [1]. Our main difficulty is that we >> cannot jump to a block address without knowing its calling convention, i.e., the >> particular machine registers (or memory locations) that the block expects >> incoming values to be passed in. >> >> The workaround we have been using in GHC for LLVM is to break apart every >> function, placing the code for the continuation of each call into a new >> function. We do this only so that we can store a function pointer instead of a >> block address to our stack. This particularly gross transformation inhibits >> optimizations in both GHC and LLVM, and we would like to remove the need for it. >> >> >> Proposal >> =======>> >> I believe the lowest-impact method of fixing this problem with LLVM is the >> following: >> >> First, we add a special 'cps' call instruction marker to be used on non-tail >> calls. Then, we use a specialized calling convention for these non-tail calls, >> which fix the returned values to specific locations in the machine code [2]. >> >> To help illustrate what's going on, let's rewrite the above example using the >> proposed 'cps' call: >> >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; >> >> define { ... } @foo (i8** %sp, ...) { >> someBlk: >> ; ... >> ; finish stack frame by writing return address >> %retAddr = blockaddress(@foo, %retpt) >> store i8* %retAddr, i8** %sp >> ; jump to @bar >> %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...) >> br label %retpt > > I'm not following how explicitly representing the return address of a call in the IR before isel actually solves any relevant issue. We already pass the return address implicitly as an argument to every call; you can retrieve it with llvm.returnaddress if you need it. > > You can't branch across functions like you're proposing because the stack and callee-save registers won't be in the right state. LLVM will inevitably save values to the stack for calls which are not tail calls. Therefore, this proposal simply doesn't work. > > -Eli > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project >
On 4/17/2017 3:52 PM, Kavon Farvardin wrote:> (Sorry for the 2nd email Eli, I forgot to reply-all). > >> I'm not following how explicitly representing the return address of a call in the IR before isel actually solves any relevant issue. We already pass the return address implicitly as an argument to every call; you can retrieve it with llvm.returnaddress if you need it. > > Unfortunately the @llvm.returnaddress intrinsic does not solve the problem, as it only reads the return address pushed onto the LLVM stack by a call. We would then need a way to move the LLVM stack pointer back to where it was before the call, because a CPS call _must_ not grow the LLVM stack (it will never be popped!), so a 'call' instruction will not do.Most architectures have a call instruction which does not push anything onto the stack; e.g. on ARM, the "BL" instruction saves the return address in LR. On other architectures, you can emulate this (for example, you could lower an IR "call" to LEA+JMP on x86-64).>> You can't branch across functions like you're proposing because the stack and callee-save registers won't be in the right state. LLVM will inevitably save values to the stack for calls which are not tail calls. Therefore, this proposal simply doesn't work. > It might be helpful to think of the CPS transformation as turning the program into a form such that it always moves "forward", in the sense that the tail calls _never_ return. A "return" is simply another tail call (i.e., a jump) to a return address, which looks no different than a function call. > > Thus, we have no callee-saved registers to worry about since the tail calls never come back. In addition, there are no live values in the LLVM stack frame, since they are all passed in the CPS call (which uses a different stack). Thus, retpt receives all of its values from the struct it extracts from. While the 'cps' marked call appears to be non-tail, it will be lowered to a tail call.The return address for the current function is fundamentally live across any non-tail call. And LLVM will hoist other operations across non-tail calls, and in the process introduce values which are live across calls. You need to save those values somewhere. The key question is where. Your proposal tries to address that by explicitly saving/restoring the return address onto the GHC stack, but you're working at the wrong level; you can't get a complete list of which values are live across calls until register allocation. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Kavon Farvardin via llvm-dev
2017-Apr-18 10:47 UTC
[llvm-dev] [RFC] Adding CPS call support
> Most architectures have a call instruction which does not push anything onto the stack; e.g. on ARM, the "BL" instruction saves the return address in LR. On other architectures, you can emulate this (for example, you could lower an IR "call" to LEA+JMP on x86-64).This is similar to what I was originally thinking, since the end goal of all of this is the following machine code for a call (I'll use an x86-64 example): leaq _retpt(%rip), %scratchReg movq %scratchReg, (%ghcSPReg) jmp _bar But, if we want to end up with the above machine code using a custom lowering of just the following IR instruction, %retVals = cps call ghccc @bar (... args ...) _without_ explicitly taking the block address and writing it to the GHC stack pointer beforehand, there are some things to consider: 1. How do we prevent IR optimizations from eliminating the %retpt block? If we do not explicitly take the address of the %retpt block, a pass such as simplifycfg has no reason to preserve the block, and may merge it with its only predecessor. 2. Where in the GHC stack frame do we write the block address? We could hardcode the fact that the stack pointer must be pointing to the field where the return address should be written to when making a 'cps' IR call. This is fine for GHC, but is a seemingly unnecessary restriction. 3. How do we know which argument is the GHC stack pointer? Similarly, we could hardcode the fact that, say, the first argument of every 'cps' IR call is the GHC stack pointer. Then we know from the calling convention which register it is in. This again fine for GHC, but reduces the generality of the feature.> The return address for the current function is fundamentally live across any non-tail call.I'm not quite sure what you mean by this. While there may be a return address handed to the main function (and thus passed to every other function) it's either unused or not known to LLVM.> And LLVM will hoist other operations across non-tail calls, and in the process introduce values which are live across calls. You need to save those values somewhere. The key question is where. Your proposal tries to address that by explicitly saving/restoring the return address onto the GHC stack, but you're working at the wrong level; you can't get a complete list of which values are live across calls until register allocation.The part I had omitted from the proposal is the initialization of the rest of the GHC stack frame, which is laid out in GHC by finding all of the values that are live across the call before CPS conversion. While the full list of values preserved by LLVM are not known until register allocation, there are some properties of the LLVM IR we will initially generate that are important to consider: 1. There are no IR values live across the non-tail 'cps' IR call, since they are all passed to the callee. 2. All IR values used after the 'cps' IR call come from the struct returned by that call. Thus, it should not be possible to hoist instructions above this particular non-tail call, as they are all derived from the values returned in the struct. Any register spills allocated to the LLVM stack before the non-tail call are dead, as they have been copied to the GHC stack frame if they were live. If you have a particular example in mind that would be great. ~kavon> On Apr 18, 2017, at 2:32 AM, Friedman, Eli <efriedma at codeaurora.org> wrote: > > On 4/17/2017 3:52 PM, Kavon Farvardin wrote: >> (Sorry for the 2nd email Eli, I forgot to reply-all). >> >>> I'm not following how explicitly representing the return address of a call in the IR before isel actually solves any relevant issue. We already pass the return address implicitly as an argument to every call; you can retrieve it with llvm.returnaddress if you need it. >> >> Unfortunately the @llvm.returnaddress intrinsic does not solve the problem, as it only reads the return address pushed onto the LLVM stack by a call. We would then need a way to move the LLVM stack pointer back to where it was before the call, because a CPS call _must_ not grow the LLVM stack (it will never be popped!), so a 'call' instruction will not do. > > Most architectures have a call instruction which does not push anything onto the stack; e.g. on ARM, the "BL" instruction saves the return address in LR. On other architectures, you can emulate this (for example, you could lower an IR "call" to LEA+JMP on x86-64). > >>> You can't branch across functions like you're proposing because the stack and callee-save registers won't be in the right state. LLVM will inevitably save values to the stack for calls which are not tail calls. Therefore, this proposal simply doesn't work. >> It might be helpful to think of the CPS transformation as turning the program into a form such that it always moves "forward", in the sense that the tail calls _never_ return. A "return" is simply another tail call (i.e., a jump) to a return address, which looks no different than a function call. >> >> Thus, we have no callee-saved registers to worry about since the tail calls never come back. In addition, there are no live values in the LLVM stack frame, since they are all passed in the CPS call (which uses a different stack). Thus, retpt receives all of its values from the struct it extracts from. While the 'cps' marked call appears to be non-tail, it will be lowered to a tail call. > > The return address for the current function is fundamentally live across any non-tail call. And LLVM will hoist other operations across non-tail calls, and in the process introduce values which are live across calls. You need to save those values somewhere. The key question is where. Your proposal tries to address that by explicitly saving/restoring the return address onto the GHC stack, but you're working at the wrong level; you can't get a complete list of which values are live across calls until register allocation. > > -Eli > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project