Kavon Farvardin via llvm-dev
2017-Apr-19 17:38 UTC
[llvm-dev] [RFC] Adding CPS call support
> The semantics around inlining alone are problematic enough to warrant serious hesitation.There are nicer ways to embed CPS call/return into LLVM; I just figured that there would not be much support for adding a new terminator because it would change a lot of code. Ideally we would have a block terminator like: cps call ghccc @bar (.. args ..) returnsto label %retpt Where the "returnsto" is optional to cover both CPS calls and returns. Another alternative would be to emulate this terminator using an intrinsic. So, my question is: would there be more support for a version of this terminator (whether as an intrinsic or not) instead of what I was proposing earlier? ~kavon> On Apr 18, 2017, at 7:24 PM, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > On 04/17/2017 08:30 AM, Kavon Farvardin via llvm-dev wrote: >> Summary >> ======>> >> There is a need for dedicated continuation-passing style (CPS) calls in LLVM to >> support functional languages. Herein I describe the problem and propose a >> solution. Feedback and/or tips are greatly appreciated, as our goal is to >> implement these changes so they can be merged into LLVM trunk. >> >> >> Problem >> ======>> >> Implementations of functional languages like Haskell and ML (e.g., GHC and >> Manticore) use a continuation-passing style (CPS) transformation in order to >> manage the call stack explicitly. This is done prior to generating LLVM IR, so >> the implicit call stack within LLVM is not used for call and return. >> >> When making a non-tail call while in CPS, we initialize a stack frame for the >> return through our own stack pointer, and then pass that stack pointer to the >> callee when we jump to it. It is here when we run into a problem in LLVM. >> >> Consider the following CPS call to @bar and how it will return: >> >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; >> >> define void @foo (i8** %sp, ...) { >> someBlk: >> ; ... >> ; finish stack frame by writing return address >> %retAddr = blockaddress(@foo, %retpt) >> store i8* %retAddr, i8** %sp >> ; jump to @bar >> tail call void @bar(i8** %sp, ...) >> >> retpt: ; <- how can @bar "call" %retpt? >> %sp2 = ??? >> %val = ??? >> ; ... >> >> } >> >> define void @bar (i8** %sp, ...) { >> ; perform a return >> %retAddr0 = load i8*, i8** %sp >> %retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)* >> %val = bitcast i64 1 to i64 >> ; jump back to %retpt in @foo, passing %sp and %val >> tail call void %retAddr1(i8** %sp, i64 %val) >> } >> >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; >> >> There is currently no way to jump back to %retpt from another function, as block >> addresses have restricted usage in LLVM [1]. Our main difficulty is that we >> cannot jump to a block address without knowing its calling convention, i.e., the >> particular machine registers (or memory locations) that the block expects >> incoming values to be passed in. >> >> The workaround we have been using in GHC for LLVM is to break apart every >> function, placing the code for the continuation of each call into a new >> function. We do this only so that we can store a function pointer instead of a >> block address to our stack. This particularly gross transformation inhibits >> optimizations in both GHC and LLVM, and we would like to remove the need for it. > Can you give a bit more information on what this transformation breaks optimization wise? You're essentially explicitly representing the continuation and converting all calls into tail calls with an explicit continuation argument. I would expect the LLVM optimizer to do reasonable well with such a representation. >> >> >> Proposal >> =======>> >> I believe the lowest-impact method of fixing this problem with LLVM is the >> following: >> >> First, we add a special 'cps' call instruction marker to be used on non-tail >> calls. Then, we use a specialized calling convention for these non-tail calls, >> which fix the returned values to specific locations in the machine code [2]. >> >> To help illustrate what's going on, let's rewrite the above example using the >> proposed 'cps' call: >> >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; >> >> define { ... } @foo (i8** %sp, ...) { >> someBlk: >> ; ... >> ; finish stack frame by writing return address >> %retAddr = blockaddress(@foo, %retpt) >> store i8* %retAddr, i8** %sp >> ; jump to @bar >> %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...) >> br label %retpt >> >> retpt: >> %sp2 = extractvalue {i8**, i64} %retVals, 0 >> %val = extractvalue {i8**, i64} %retVals, 1 >> ; ... >> >> } >> >> define {i8**, i64} @bar (i8** %sp, ...) { >> ; perform a return >> %retAddr0 = load i8*, i8** %sp >> %retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)* >> %val = bitcast i64 1 to i64 >> ; jump back to %retpt in @foo, passing %sp and %val >> tail call ghccc void %retAddr1(i8** %sp, i64 %val) >> >> unreachable ; <- ideally this would be our terminator, >> ; but 'unreachable' breaks TCO, so we will >> ; emit a ret of the struct "returned" by the call. >> } >> >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; >> >> The important point here is that the 'cps' marked call will lower to a jump. The >> 'cps' call marker means that the callee knows how to return using the arguments >> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot use a >> 'ret' instruction if it is 'cps' called. >> >> Either before or during 'cps' call lowering, any instructions following the >> 'cps' call to @bar are sunk into the the block %retpt, and the unconditional >> branch to %retpt is deleted/ignored. We include that branch to preserve >> control-flow information for LLVM IR optimization passes. >> >> The 'extractvalue' instructions are what ensure the calling convention of >> %retpt, since the fields of the struct %retVals are returned in physical >> registers dictated by the (modified) ghccc convention. Those same physical >> registers are used by the ghccc tail call in @bar when it jumps back to %retpt. >> So, the call & return convention of ghccc ensures that everything matches up. >> >> >> Interaction with LLVM >> ====================>> >> (1) Caller-saved Values >> >> One may wonder how this would work if there are caller-saved values of the 'cps' >> call. But, in our first example, which closely matches what CPS code looks like, >> the call to @bar was in tail position. Thus, in the second example, there are no >> caller-saved values for the 'cps' call to @bar, as all live values were passed >> as arguments in the call. >> >> This caller-saved part is a bit subtle. It works fine in my experience [2] when >> @bar is a function not visible to LLVM. My impression is that even if @bar is >> visible to LLVM, there is still no issue, but if you can think of any corner >> cases that would be great! >> >> (2) Inlining >> >> My gut feeling is that we cannot inline a 'cps' marked call-site without more >> effort. This is because we might end up with something odd like this once the >> dust settles: >> >> %retAddr = blockaddress(@foo, %retpt) >> %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)* >> tail call ghccc %retAddr1 ( %sp, ... ) >> >> We could add a pass that turns the above sequence into just an unconditional >> branch to %retpt, using a phi-node to replace each 'extractvalue' instruction in >> that block. >> >> I'm not sure whether inlining in LLVM is important for us yet, as we tend to >> inline quite a lot before generating LLVM IR. I don't think this additional fix- >> up pass would be too difficult to implement if it's desired. >> >> >> Implementation Sketch and Conclusion >> ===================================>> >> My current plan is to add this special lowering of 'cps' calls during the >> translation from LLVM IR to the SelectionDAG. I welcome any suggestions or tips >> on the best way to approach this. An important goal for us is to merge this into >> trunk since we do not want to bundle a special version of LLVM with GHC. >> >> Please let me know soon if you have any objections to this feature. >> >> Thanks for reading, >> Kavon > Honestly, this proposed design seems seriously problematic. The semantics around inlining alone are problematic enough to warrant serious hesitation. I think you need to take a step back, describe the original problem you're trying to solve and why the standard conversion approaches don't work out. You'd need to strongly motivate a change of this magnitude and I really don't feel like you've done so so far. >> >> >> References >> =========>> >> [1] http://llvm.org/docs/LangRef.html#blockaddress >> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
I don't think you need a terminator at the IR level, you need one at the MI level. You can lower a CPS-call to some call pseudo-instruction during ISel and then break the MBB in two at some later point. On Wed, Apr 19, 2017 at 10:38 AM, Kavon Farvardin <kavon at farvard.in> wrote:> > The semantics around inlining alone are problematic enough to warrant > serious hesitation. > > There are nicer ways to embed CPS call/return into LLVM; I just figured > that there would not be much support for adding a new terminator because it > would change a lot of code. Ideally we would have a block terminator like: > > cps call ghccc @bar (.. args ..) returnsto label %retpt > > Where the "returnsto" is optional to cover both CPS calls and returns. > Another alternative would be to emulate this terminator using an intrinsic. > > So, my question is: would there be more support for a version of this > terminator (whether as an intrinsic or not) instead of what I was proposing > earlier? > > ~kavon > > > On Apr 18, 2017, at 7:24 PM, Philip Reames via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > > > > > On 04/17/2017 08:30 AM, Kavon Farvardin via llvm-dev wrote: > >> Summary > >> ======> >> > >> There is a need for dedicated continuation-passing style (CPS) calls in > LLVM to > >> support functional languages. Herein I describe the problem and propose > a > >> solution. Feedback and/or tips are greatly appreciated, as our goal is > to > >> implement these changes so they can be merged into LLVM trunk. > >> > >> > >> Problem > >> ======> >> > >> Implementations of functional languages like Haskell and ML (e.g., GHC > and > >> Manticore) use a continuation-passing style (CPS) transformation in > order to > >> manage the call stack explicitly. This is done prior to generating LLVM > IR, so > >> the implicit call stack within LLVM is not used for call and return. > >> > >> When making a non-tail call while in CPS, we initialize a stack frame > for the > >> return through our own stack pointer, and then pass that stack pointer > to the > >> callee when we jump to it. It is here when we run into a problem in > LLVM. > >> > >> Consider the following CPS call to @bar and how it will return: > >> > >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > >> > >> define void @foo (i8** %sp, ...) { > >> someBlk: > >> ; ... > >> ; finish stack frame by writing return address > >> %retAddr = blockaddress(@foo, %retpt) > >> store i8* %retAddr, i8** %sp > >> ; jump to @bar > >> tail call void @bar(i8** %sp, ...) > >> > >> retpt: ; <- how can @bar "call" %retpt? > >> %sp2 = ??? > >> %val = ??? > >> ; ... > >> > >> } > >> > >> define void @bar (i8** %sp, ...) { > >> ; perform a return > >> %retAddr0 = load i8*, i8** %sp > >> %retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)* > >> %val = bitcast i64 1 to i64 > >> ; jump back to %retpt in @foo, passing %sp and %val > >> tail call void %retAddr1(i8** %sp, i64 %val) > >> } > >> > >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > >> > >> There is currently no way to jump back to %retpt from another function, > as block > >> addresses have restricted usage in LLVM [1]. Our main difficulty is > that we > >> cannot jump to a block address without knowing its calling convention, > i.e., the > >> particular machine registers (or memory locations) that the block > expects > >> incoming values to be passed in. > >> > >> The workaround we have been using in GHC for LLVM is to break apart > every > >> function, placing the code for the continuation of each call into a new > >> function. We do this only so that we can store a function pointer > instead of a > >> block address to our stack. This particularly gross transformation > inhibits > >> optimizations in both GHC and LLVM, and we would like to remove the > need for it. > > Can you give a bit more information on what this transformation breaks > optimization wise? You're essentially explicitly representing the > continuation and converting all calls into tail calls with an explicit > continuation argument. I would expect the LLVM optimizer to do reasonable > well with such a representation. > >> > >> > >> Proposal > >> =======> >> > >> I believe the lowest-impact method of fixing this problem with LLVM is > the > >> following: > >> > >> First, we add a special 'cps' call instruction marker to be used on > non-tail > >> calls. Then, we use a specialized calling convention for these non-tail > calls, > >> which fix the returned values to specific locations in the machine code > [2]. > >> > >> To help illustrate what's going on, let's rewrite the above example > using the > >> proposed 'cps' call: > >> > >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > >> > >> define { ... } @foo (i8** %sp, ...) { > >> someBlk: > >> ; ... > >> ; finish stack frame by writing return address > >> %retAddr = blockaddress(@foo, %retpt) > >> store i8* %retAddr, i8** %sp > >> ; jump to @bar > >> %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...) > >> br label %retpt > >> > >> retpt: > >> %sp2 = extractvalue {i8**, i64} %retVals, 0 > >> %val = extractvalue {i8**, i64} %retVals, 1 > >> ; ... > >> > >> } > >> > >> define {i8**, i64} @bar (i8** %sp, ...) { > >> ; perform a return > >> %retAddr0 = load i8*, i8** %sp > >> %retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)* > >> %val = bitcast i64 1 to i64 > >> ; jump back to %retpt in @foo, passing %sp and %val > >> tail call ghccc void %retAddr1(i8** %sp, i64 %val) > >> > >> unreachable ; <- ideally this would be our terminator, > >> ; but 'unreachable' breaks TCO, so we will > >> ; emit a ret of the struct "returned" by the call. > >> } > >> > >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > >> > >> The important point here is that the 'cps' marked call will lower to a > jump. The > >> 'cps' call marker means that the callee knows how to return using the > arguments > >> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot > use a > >> 'ret' instruction if it is 'cps' called. > >> > >> Either before or during 'cps' call lowering, any instructions following > the > >> 'cps' call to @bar are sunk into the the block %retpt, and the > unconditional > >> branch to %retpt is deleted/ignored. We include that branch to preserve > >> control-flow information for LLVM IR optimization passes. > >> > >> The 'extractvalue' instructions are what ensure the calling convention > of > >> %retpt, since the fields of the struct %retVals are returned in physical > >> registers dictated by the (modified) ghccc convention. Those same > physical > >> registers are used by the ghccc tail call in @bar when it jumps back to > %retpt. > >> So, the call & return convention of ghccc ensures that everything > matches up. > >> > >> > >> Interaction with LLVM > >> ====================> >> > >> (1) Caller-saved Values > >> > >> One may wonder how this would work if there are caller-saved values of > the 'cps' > >> call. But, in our first example, which closely matches what CPS code > looks like, > >> the call to @bar was in tail position. Thus, in the second example, > there are no > >> caller-saved values for the 'cps' call to @bar, as all live values were > passed > >> as arguments in the call. > >> > >> This caller-saved part is a bit subtle. It works fine in my experience > [2] when > >> @bar is a function not visible to LLVM. My impression is that even if > @bar is > >> visible to LLVM, there is still no issue, but if you can think of any > corner > >> cases that would be great! > >> > >> (2) Inlining > >> > >> My gut feeling is that we cannot inline a 'cps' marked call-site > without more > >> effort. This is because we might end up with something odd like this > once the > >> dust settles: > >> > >> %retAddr = blockaddress(@foo, %retpt) > >> %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)* > >> tail call ghccc %retAddr1 ( %sp, ... ) > >> > >> We could add a pass that turns the above sequence into just an > unconditional > >> branch to %retpt, using a phi-node to replace each 'extractvalue' > instruction in > >> that block. > >> > >> I'm not sure whether inlining in LLVM is important for us yet, as we > tend to > >> inline quite a lot before generating LLVM IR. I don't think this > additional fix- > >> up pass would be too difficult to implement if it's desired. > >> > >> > >> Implementation Sketch and Conclusion > >> ===================================> >> > >> My current plan is to add this special lowering of 'cps' calls during > the > >> translation from LLVM IR to the SelectionDAG. I welcome any suggestions > or tips > >> on the best way to approach this. An important goal for us is to merge > this into > >> trunk since we do not want to bundle a special version of LLVM with GHC. > >> > >> Please let me know soon if you have any objections to this feature. > >> > >> Thanks for reading, > >> Kavon > > Honestly, this proposed design seems seriously problematic. The > semantics around inlining alone are problematic enough to warrant serious > hesitation. I think you need to take a step back, describe the original > problem you're trying to solve and why the standard conversion approaches > don't work out. You'd need to strongly motivate a change of this magnitude > and I really don't feel like you've done so so far. > >> > >> > >> References > >> =========> >> > >> [1] http://llvm.org/docs/LangRef.html#blockaddress > >> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170419/0fcbf034/attachment.html>
On Wed, Apr 19, 2017 at 10:38 AM, Kavon Farvardin via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > The semantics around inlining alone are problematic enough to warrant > serious hesitation. > > There are nicer ways to embed CPS call/return into LLVM; I just figured > that there would not be much support for adding a new terminator because it > would change a lot of code. Ideally we would have a block terminator like: > > cps call ghccc @bar (.. args ..) returnsto label %retpt > > Where the "returnsto" is optional to cover both CPS calls and returns. > Another alternative would be to emulate this terminator using an intrinsic. > >It seems that what you need to support here is multi-prologue functions. Current coroutines assume a large shared prologue and switch on entry. I was speaking with Reid earlier today, and that is something that should be possible. You would still need to do something similar to the coroutine support to find out which values are live across a cps call and put them into some frame that gets restored in the prologue. Prologues for functional languages are likely to be short. If this turns out to be effective, it could be useful for C++ coroutines that have small prologues.> So, my question is: would there be more support for a version of this > terminator (whether as an intrinsic or not) instead of what I was proposing > earlier? > > ~kavon > > > On Apr 18, 2017, at 7:24 PM, Philip Reames via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > > > > > On 04/17/2017 08:30 AM, Kavon Farvardin via llvm-dev wrote: > >> Summary > >> ======> >> > >> There is a need for dedicated continuation-passing style (CPS) calls in > LLVM to > >> support functional languages. Herein I describe the problem and propose > a > >> solution. Feedback and/or tips are greatly appreciated, as our goal is > to > >> implement these changes so they can be merged into LLVM trunk. > >> > >> > >> Problem > >> ======> >> > >> Implementations of functional languages like Haskell and ML (e.g., GHC > and > >> Manticore) use a continuation-passing style (CPS) transformation in > order to > >> manage the call stack explicitly. This is done prior to generating LLVM > IR, so > >> the implicit call stack within LLVM is not used for call and return. > >> > >> When making a non-tail call while in CPS, we initialize a stack frame > for the > >> return through our own stack pointer, and then pass that stack pointer > to the > >> callee when we jump to it. It is here when we run into a problem in > LLVM. > >> > >> Consider the following CPS call to @bar and how it will return: > >> > >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > >> > >> define void @foo (i8** %sp, ...) { > >> someBlk: > >> ; ... > >> ; finish stack frame by writing return address > >> %retAddr = blockaddress(@foo, %retpt) > >> store i8* %retAddr, i8** %sp > >> ; jump to @bar > >> tail call void @bar(i8** %sp, ...) > >> > >> retpt: ; <- how can @bar "call" %retpt? > >> %sp2 = ??? > >> %val = ??? > >> ; ... > >> > >> } > >> > >> define void @bar (i8** %sp, ...) { > >> ; perform a return > >> %retAddr0 = load i8*, i8** %sp > >> %retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)* > >> %val = bitcast i64 1 to i64 > >> ; jump back to %retpt in @foo, passing %sp and %val > >> tail call void %retAddr1(i8** %sp, i64 %val) > >> } > >> > >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > >> > >> There is currently no way to jump back to %retpt from another function, > as block > >> addresses have restricted usage in LLVM [1]. Our main difficulty is > that we > >> cannot jump to a block address without knowing its calling convention, > i.e., the > >> particular machine registers (or memory locations) that the block > expects > >> incoming values to be passed in. > >> > >> The workaround we have been using in GHC for LLVM is to break apart > every > >> function, placing the code for the continuation of each call into a new > >> function. We do this only so that we can store a function pointer > instead of a > >> block address to our stack. This particularly gross transformation > inhibits > >> optimizations in both GHC and LLVM, and we would like to remove the > need for it. > > Can you give a bit more information on what this transformation breaks > optimization wise? You're essentially explicitly representing the > continuation and converting all calls into tail calls with an explicit > continuation argument. I would expect the LLVM optimizer to do reasonable > well with such a representation. > >> > >> > >> Proposal > >> =======> >> > >> I believe the lowest-impact method of fixing this problem with LLVM is > the > >> following: > >> > >> First, we add a special 'cps' call instruction marker to be used on > non-tail > >> calls. Then, we use a specialized calling convention for these non-tail > calls, > >> which fix the returned values to specific locations in the machine code > [2]. > >> > >> To help illustrate what's going on, let's rewrite the above example > using the > >> proposed 'cps' call: > >> > >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > >> > >> define { ... } @foo (i8** %sp, ...) { > >> someBlk: > >> ; ... > >> ; finish stack frame by writing return address > >> %retAddr = blockaddress(@foo, %retpt) > >> store i8* %retAddr, i8** %sp > >> ; jump to @bar > >> %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...) > >> br label %retpt > >> > >> retpt: > >> %sp2 = extractvalue {i8**, i64} %retVals, 0 > >> %val = extractvalue {i8**, i64} %retVals, 1 > >> ; ... > >> > >> } > >> > >> define {i8**, i64} @bar (i8** %sp, ...) { > >> ; perform a return > >> %retAddr0 = load i8*, i8** %sp > >> %retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)* > >> %val = bitcast i64 1 to i64 > >> ; jump back to %retpt in @foo, passing %sp and %val > >> tail call ghccc void %retAddr1(i8** %sp, i64 %val) > >> > >> unreachable ; <- ideally this would be our terminator, > >> ; but 'unreachable' breaks TCO, so we will > >> ; emit a ret of the struct "returned" by the call. > >> } > >> > >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > >> > >> The important point here is that the 'cps' marked call will lower to a > jump. The > >> 'cps' call marker means that the callee knows how to return using the > arguments > >> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot > use a > >> 'ret' instruction if it is 'cps' called. > >> > >> Either before or during 'cps' call lowering, any instructions following > the > >> 'cps' call to @bar are sunk into the the block %retpt, and the > unconditional > >> branch to %retpt is deleted/ignored. We include that branch to preserve > >> control-flow information for LLVM IR optimization passes. > >> > >> The 'extractvalue' instructions are what ensure the calling convention > of > >> %retpt, since the fields of the struct %retVals are returned in physical > >> registers dictated by the (modified) ghccc convention. Those same > physical > >> registers are used by the ghccc tail call in @bar when it jumps back to > %retpt. > >> So, the call & return convention of ghccc ensures that everything > matches up. > >> > >> > >> Interaction with LLVM > >> ====================> >> > >> (1) Caller-saved Values > >> > >> One may wonder how this would work if there are caller-saved values of > the 'cps' > >> call. But, in our first example, which closely matches what CPS code > looks like, > >> the call to @bar was in tail position. Thus, in the second example, > there are no > >> caller-saved values for the 'cps' call to @bar, as all live values were > passed > >> as arguments in the call. > >> > >> This caller-saved part is a bit subtle. It works fine in my experience > [2] when > >> @bar is a function not visible to LLVM. My impression is that even if > @bar is > >> visible to LLVM, there is still no issue, but if you can think of any > corner > >> cases that would be great! > >> > >> (2) Inlining > >> > >> My gut feeling is that we cannot inline a 'cps' marked call-site > without more > >> effort. This is because we might end up with something odd like this > once the > >> dust settles: > >> > >> %retAddr = blockaddress(@foo, %retpt) > >> %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)* > >> tail call ghccc %retAddr1 ( %sp, ... ) > >> > >> We could add a pass that turns the above sequence into just an > unconditional > >> branch to %retpt, using a phi-node to replace each 'extractvalue' > instruction in > >> that block. > >> > >> I'm not sure whether inlining in LLVM is important for us yet, as we > tend to > >> inline quite a lot before generating LLVM IR. I don't think this > additional fix- > >> up pass would be too difficult to implement if it's desired. > >> > >> > >> Implementation Sketch and Conclusion > >> ===================================> >> > >> My current plan is to add this special lowering of 'cps' calls during > the > >> translation from LLVM IR to the SelectionDAG. I welcome any suggestions > or tips > >> on the best way to approach this. An important goal for us is to merge > this into > >> trunk since we do not want to bundle a special version of LLVM with GHC. > >> > >> Please let me know soon if you have any objections to this feature. > >> > >> Thanks for reading, > >> Kavon > > Honestly, this proposed design seems seriously problematic. The > semantics around inlining alone are problematic enough to warrant serious > hesitation. I think you need to take a step back, describe the original > problem you're trying to solve and why the standard conversion approaches > don't work out. You'd need to strongly motivate a change of this magnitude > and I really don't feel like you've done so so far. > >> > >> > >> References > >> =========> >> > >> [1] http://llvm.org/docs/LangRef.html#blockaddress > >> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170419/5982ebb7/attachment.html>
Kavon Farvardin via llvm-dev
2017-Apr-21 12:47 UTC
[llvm-dev] [RFC] Adding CPS call support
> I don't think you need a terminator at the IR level, you need one at the MI level. You can lower a CPS-call to some call pseudo-instruction during ISel and then break the MBB in two at some later point.Thanks for this suggestion, Reid! This has turned out to be a lot easier than doing it during isel. I'm currently working on modifying the MBB at the CPS pseudo-instruction during ExpandISelPseudos.> It seems that what you need to support here is multi-prologue functions.Basically, yes. I think of it more like needing functions that have multiple entry blocks. If we could attach a physical-register convention to an IR block, that would solve everything.> Current coroutines assume a large shared prologue and switch on entry. I was speaking with Reid earlier today, and that is something that should be possible.Yes, we could also manually generate, in each function, that large switch that dispatches to these return points, but of course that's a lot of overhead to pay for call/return.> You would still need to do something similar to the coroutine support to find out which values are live across a cps call and put them into some frame that gets restored in the prologue.Luckily, we've already done this in order to layout our stack (this is part of the CPS transformation in GHC), so, we don't have anything live across the call once we reach the point where we generate LLVM IR. We'd also want to avoid putting anything in LLVM coroutine frames anyway, since the garbage collector needs to know the layout of the frame to know which slots contain pointers. ~kavon> On Apr 19, 2017, at 11:33 PM, Kyle Butt <iteratee at google.com> wrote: > > > > On Wed, Apr 19, 2017 at 10:38 AM, Kavon Farvardin via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > The semantics around inlining alone are problematic enough to warrant serious hesitation. > > There are nicer ways to embed CPS call/return into LLVM; I just figured that there would not be much support for adding a new terminator because it would change a lot of code. Ideally we would have a block terminator like: > > cps call ghccc @bar (.. args ..) returnsto label %retpt > > Where the "returnsto" is optional to cover both CPS calls and returns. Another alternative would be to emulate this terminator using an intrinsic. > > > It seems that what you need to support here is multi-prologue functions. Current coroutines assume a large shared prologue and switch on entry. I was speaking with Reid earlier today, and that is something that should be possible. > You would still need to do something similar to the coroutine support to find out which values are live across a cps call and put them into some frame that gets restored in the prologue. > Prologues for functional languages are likely to be short. If this turns out to be effective, it could be useful for C++ coroutines that have small prologues. > > So, my question is: would there be more support for a version of this terminator (whether as an intrinsic or not) instead of what I was proposing earlier? > > ~kavon > > > On Apr 18, 2017, at 7:24 PM, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > > > > > > > On 04/17/2017 08:30 AM, Kavon Farvardin via llvm-dev wrote: > >> Summary > >> ======> >> > >> There is a need for dedicated continuation-passing style (CPS) calls in LLVM to > >> support functional languages. Herein I describe the problem and propose a > >> solution. Feedback and/or tips are greatly appreciated, as our goal is to > >> implement these changes so they can be merged into LLVM trunk. > >> > >> > >> Problem > >> ======> >> > >> Implementations of functional languages like Haskell and ML (e.g., GHC and > >> Manticore) use a continuation-passing style (CPS) transformation in order to > >> manage the call stack explicitly. This is done prior to generating LLVM IR, so > >> the implicit call stack within LLVM is not used for call and return. > >> > >> When making a non-tail call while in CPS, we initialize a stack frame for the > >> return through our own stack pointer, and then pass that stack pointer to the > >> callee when we jump to it. It is here when we run into a problem in LLVM. > >> > >> Consider the following CPS call to @bar and how it will return: > >> > >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > >> > >> define void @foo (i8** %sp, ...) { > >> someBlk: > >> ; ... > >> ; finish stack frame by writing return address > >> %retAddr = blockaddress(@foo, %retpt) > >> store i8* %retAddr, i8** %sp > >> ; jump to @bar > >> tail call void @bar(i8** %sp, ...) > >> > >> retpt: ; <- how can @bar "call" %retpt? > >> %sp2 = ??? > >> %val = ??? > >> ; ... > >> > >> } > >> > >> define void @bar (i8** %sp, ...) { > >> ; perform a return > >> %retAddr0 = load i8*, i8** %sp > >> %retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)* > >> %val = bitcast i64 1 to i64 > >> ; jump back to %retpt in @foo, passing %sp and %val > >> tail call void %retAddr1(i8** %sp, i64 %val) > >> } > >> > >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > >> > >> There is currently no way to jump back to %retpt from another function, as block > >> addresses have restricted usage in LLVM [1]. Our main difficulty is that we > >> cannot jump to a block address without knowing its calling convention, i.e., the > >> particular machine registers (or memory locations) that the block expects > >> incoming values to be passed in. > >> > >> The workaround we have been using in GHC for LLVM is to break apart every > >> function, placing the code for the continuation of each call into a new > >> function. We do this only so that we can store a function pointer instead of a > >> block address to our stack. This particularly gross transformation inhibits > >> optimizations in both GHC and LLVM, and we would like to remove the need for it. > > Can you give a bit more information on what this transformation breaks optimization wise? You're essentially explicitly representing the continuation and converting all calls into tail calls with an explicit continuation argument. I would expect the LLVM optimizer to do reasonable well with such a representation. > >> > >> > >> Proposal > >> =======> >> > >> I believe the lowest-impact method of fixing this problem with LLVM is the > >> following: > >> > >> First, we add a special 'cps' call instruction marker to be used on non-tail > >> calls. Then, we use a specialized calling convention for these non-tail calls, > >> which fix the returned values to specific locations in the machine code [2]. > >> > >> To help illustrate what's going on, let's rewrite the above example using the > >> proposed 'cps' call: > >> > >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > >> > >> define { ... } @foo (i8** %sp, ...) { > >> someBlk: > >> ; ... > >> ; finish stack frame by writing return address > >> %retAddr = blockaddress(@foo, %retpt) > >> store i8* %retAddr, i8** %sp > >> ; jump to @bar > >> %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...) > >> br label %retpt > >> > >> retpt: > >> %sp2 = extractvalue {i8**, i64} %retVals, 0 > >> %val = extractvalue {i8**, i64} %retVals, 1 > >> ; ... > >> > >> } > >> > >> define {i8**, i64} @bar (i8** %sp, ...) { > >> ; perform a return > >> %retAddr0 = load i8*, i8** %sp > >> %retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)* > >> %val = bitcast i64 1 to i64 > >> ; jump back to %retpt in @foo, passing %sp and %val > >> tail call ghccc void %retAddr1(i8** %sp, i64 %val) > >> > >> unreachable ; <- ideally this would be our terminator, > >> ; but 'unreachable' breaks TCO, so we will > >> ; emit a ret of the struct "returned" by the call. > >> } > >> > >> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > >> > >> The important point here is that the 'cps' marked call will lower to a jump. The > >> 'cps' call marker means that the callee knows how to return using the arguments > >> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot use a > >> 'ret' instruction if it is 'cps' called. > >> > >> Either before or during 'cps' call lowering, any instructions following the > >> 'cps' call to @bar are sunk into the the block %retpt, and the unconditional > >> branch to %retpt is deleted/ignored. We include that branch to preserve > >> control-flow information for LLVM IR optimization passes. > >> > >> The 'extractvalue' instructions are what ensure the calling convention of > >> %retpt, since the fields of the struct %retVals are returned in physical > >> registers dictated by the (modified) ghccc convention. Those same physical > >> registers are used by the ghccc tail call in @bar when it jumps back to %retpt. > >> So, the call & return convention of ghccc ensures that everything matches up. > >> > >> > >> Interaction with LLVM > >> ====================> >> > >> (1) Caller-saved Values > >> > >> One may wonder how this would work if there are caller-saved values of the 'cps' > >> call. But, in our first example, which closely matches what CPS code looks like, > >> the call to @bar was in tail position. Thus, in the second example, there are no > >> caller-saved values for the 'cps' call to @bar, as all live values were passed > >> as arguments in the call. > >> > >> This caller-saved part is a bit subtle. It works fine in my experience [2] when > >> @bar is a function not visible to LLVM. My impression is that even if @bar is > >> visible to LLVM, there is still no issue, but if you can think of any corner > >> cases that would be great! > >> > >> (2) Inlining > >> > >> My gut feeling is that we cannot inline a 'cps' marked call-site without more > >> effort. This is because we might end up with something odd like this once the > >> dust settles: > >> > >> %retAddr = blockaddress(@foo, %retpt) > >> %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)* > >> tail call ghccc %retAddr1 ( %sp, ... ) > >> > >> We could add a pass that turns the above sequence into just an unconditional > >> branch to %retpt, using a phi-node to replace each 'extractvalue' instruction in > >> that block. > >> > >> I'm not sure whether inlining in LLVM is important for us yet, as we tend to > >> inline quite a lot before generating LLVM IR. I don't think this additional fix- > >> up pass would be too difficult to implement if it's desired. > >> > >> > >> Implementation Sketch and Conclusion > >> ===================================> >> > >> My current plan is to add this special lowering of 'cps' calls during the > >> translation from LLVM IR to the SelectionDAG. I welcome any suggestions or tips > >> on the best way to approach this. An important goal for us is to merge this into > >> trunk since we do not want to bundle a special version of LLVM with GHC. > >> > >> Please let me know soon if you have any objections to this feature. > >> > >> Thanks for reading, > >> Kavon > > Honestly, this proposed design seems seriously problematic. The semantics around inlining alone are problematic enough to warrant serious hesitation. I think you need to take a step back, describe the original problem you're trying to solve and why the standard conversion approaches don't work out. You'd need to strongly motivate a change of this magnitude and I really don't feel like you've done so so far. > >> > >> > >> References > >> =========> >> > >> [1] http://llvm.org/docs/LangRef.html#blockaddress <http://llvm.org/docs/LangRef.html#blockaddress> > >> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf <http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf> > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170421/a99efd04/attachment.html>