thr3ads.net - llvm dev - [llvm-dev] [RFC] Adding CPS call support [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Kavon Farvardin via llvm-dev

2017-Apr-17 23:04 UTC

[llvm-dev] [RFC] Adding CPS call support

> Is there a reason you can't use the algorithm from the paper "A
Correspondence between Continuation Passing Style and Static Single Assignment
Form" to convert your IR to LLVM's SSA IR?

Yes, there are a few reasons. 

Undoing the CPS transformation earlier in the pipeline would mean that we are
using LLVM's built-in stack. The special layout and usage of the stack in
GHC is achieved through CPS, so it is baked the compiler and garbage-collected
runtime system.

~kavon
> On Apr 17, 2017, at 8:56 PM, Manuel Jacob <me at manueljacob.de>
wrote:
> 
> Hi Kavon,
> 
> Is there a reason you can't use the algorithm from the paper "A
Correspondence between Continuation Passing Style and Static Single Assignment
Form" to convert your IR to LLVM's SSA IR?
> 
> -Manuel
> 
> On 2017-04-17 17:30, Kavon Farvardin via llvm-dev wrote:
>> Summary
>> ======>> There is a need for dedicated continuation-passing style
(CPS) calls in LLVM to
>> support functional languages. Herein I describe the problem and propose
a
>> solution. Feedback and/or tips are greatly appreciated, as our goal is
to
>> implement these changes so they can be merged into LLVM trunk.
>> Problem
>> ======>> Implementations of functional languages like Haskell and
ML (e.g., GHC and
>> Manticore) use a continuation-passing style (CPS) transformation in
order to
>> manage the call stack explicitly. This is done prior to generating LLVM
IR, so
>> the implicit call stack within LLVM is not used for call and return.
>> When making a non-tail call while in CPS, we initialize a stack frame
for the
>> return through our own stack pointer, and then pass that stack pointer
to the
>> callee when we jump to it. It is here when we run into a problem in
LLVM.
>> Consider the following CPS call to @bar and how it will return:
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> define void @foo (i8** %sp, ...) {
>> someBlk:
>>    ; ...
>>    ; finish stack frame by writing return address
>>  %retAddr = blockaddress(@foo, %retpt)
>>  store i8* %retAddr, i8** %sp
>>    ; jump to @bar
>>  tail call void @bar(i8** %sp, ...)
>> retpt: 				; <- how can @bar "call" %retpt?
>>   %sp2 = ???
>>   %val = ???
>>   ; ...
>> }
>> define void @bar (i8** %sp, ...) {
>> 	  ; perform a return
>> 	%retAddr0 = load i8*, i8** %sp
>> 	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>> 	%val = bitcast i64 1 to i64
>> 	  ; jump back to %retpt in @foo, passing %sp and %val
>> 	tail call void %retAddr1(i8** %sp, i64 %val)
>> }
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> There is currently no way to jump back to %retpt from another function,
as block
>> addresses have restricted usage in LLVM [1]. Our main difficulty is
that we
>> cannot jump to a block address without knowing its calling convention,
i.e., the
>> particular machine registers (or memory locations) that the block
expects
>> incoming values to be passed in.
>> The workaround we have been using in GHC for LLVM is to break apart
every
>> function, placing the code for the continuation of each call into a new
>> function. We do this only so that we can store a function pointer
instead of a
>> block address to our stack. This particularly gross transformation
inhibits
>> optimizations in both GHC and LLVM, and we would like to remove the
need for it.
>> Proposal
>> =======>> I believe the lowest-impact method of fixing this
problem with LLVM is the
>> following:
>> First, we add a special 'cps' call instruction marker to be
used on non-tail
>> calls. Then, we use a specialized calling convention for these non-tail
calls,
>> which fix the returned values to specific locations in the machine code
[2].
>> To help illustrate what's going on, let's rewrite the above
example using the
>> proposed 'cps' call:
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> define { ... } @foo (i8** %sp, ...) {
>> someBlk:
>>    ; ...
>>    ; finish stack frame by writing return address
>>  %retAddr = blockaddress(@foo, %retpt)
>>  store i8* %retAddr, i8** %sp
>>    ; jump to @bar
>>  %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>>  br label %retpt
>> retpt:
>>   %sp2 = extractvalue {i8**, i64} %retVals, 0
>>   %val = extractvalue {i8**, i64} %retVals, 1
>>   ; ...
>> }
>> define {i8**, i64} @bar (i8** %sp, ...) {
>> 	  ; perform a return
>> 	%retAddr0 = load i8*, i8** %sp
>> 	%retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
>> 	%val = bitcast i64 1 to i64
>> 	  ; jump back to %retpt in @foo, passing %sp and %val
>> 	tail call ghccc void %retAddr1(i8** %sp, i64 %val)
>> 	unreachable   ; <- ideally this would be our terminator,
>> 	              ; but 'unreachable' breaks TCO, so we will
>> 	              ; emit a ret of the struct "returned" by the
call.
>> }
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> The important point here is that the 'cps' marked call will
lower to a jump. The
>> 'cps' call marker means that the callee knows how to return
using the arguments
>> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot
use a
>> 'ret' instruction if it is 'cps' called.
>> Either before or during 'cps' call lowering, any instructions
following the
>> 'cps' call to @bar are sunk into the the block %retpt, and the
unconditional
>> branch to %retpt is deleted/ignored. We include that branch to preserve
>> control-flow information for LLVM IR optimization passes.
>> The 'extractvalue' instructions are what ensure the calling
convention of
>> %retpt, since the fields of the struct %retVals are returned in
physical
>> registers dictated by the (modified) ghccc convention. Those same
physical
>> registers are used by the ghccc tail call in @bar when it jumps back to
%retpt.
>> So, the call & return convention of ghccc ensures that everything
matches up.
>> Interaction with LLVM
>> ====================>> (1) Caller-saved Values
>> One may wonder how this would work if there are caller-saved values of
the 'cps'
>> call. But, in our first example, which closely matches what CPS code
looks like,
>> the call to @bar was in tail position. Thus, in the second example,
there are no
>> caller-saved values for the 'cps' call to @bar, as all live
values were passed
>> as arguments in the call.
>> This caller-saved part is a bit subtle. It works fine in my experience
[2] when
>> @bar is a function not visible to LLVM. My impression is that even if
@bar is
>> visible to LLVM, there is still no issue, but if you can think of any
corner
>> cases that would be great!
>> (2) Inlining
>> My gut feeling is that we cannot inline a 'cps' marked
call-site without more
>> effort. This is because we might end up with something odd like this
once the
>> dust settles:
>>    %retAddr = blockaddress(@foo, %retpt)
>>    %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
>>    tail call ghccc %retAddr1 ( %sp, ... )
>> We could add a pass that turns the above sequence into just an
unconditional
>> branch to %retpt, using a phi-node to replace each
'extractvalue' instruction in
>> that block.
>> I'm not sure whether inlining in LLVM is important for us yet, as
we tend to
>> inline quite a lot before generating LLVM IR. I don't think this
additional fix-
>> up pass would be too difficult to implement if it's desired.
>> Implementation Sketch and Conclusion
>> ===================================>> My current plan is to add
this special lowering of 'cps' calls during the
>> translation from LLVM IR to the SelectionDAG. I welcome any suggestions
or tips
>> on the best way to approach this. An important goal for us is to merge
this into
>> trunk since we do not want to bundle a special version of LLVM with
GHC.
>> Please let me know soon if you have any objections to this feature.
>> Thanks for reading,
>> Kavon
>> References
>> =========>> [1] http://llvm.org/docs/LangRef.html#blockaddress
>> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Philip Reames via llvm-dev

2017-Apr-18 18:27 UTC

head link

[llvm-dev] [RFC] Adding CPS call support

On 04/17/2017 04:04 PM, Kavon Farvardin via llvm-dev
wrote:>> Is there a reason you can't use the algorithm from the paper
"A Correspondence between Continuation Passing Style and Static Single
Assignment Form" to convert your IR to LLVM's SSA IR?
>
> Yes, there are a few reasons.
>
> Undoing the CPS transformation earlier in the pipeline would mean that we
are using LLVM's built-in stack. The special layout and usage of the stack
in GHC is achieved through CPS, so it is baked the compiler and
garbage-collected runtime system.Can you give a bit more detail here?  LLVM does provide support for 
describing GC frame maps.

p.s. You're going to have to justify the design of the runtime a bit 
here.  Extending the IR to workaround a buggy or poorly structured 
runtime is not going to be sufficient justification.  *Why* does the 
runtime need the specific runtime stack structure used?  What 
alternatives exist and why should those be rejected?>
> ~kavon
>
>> On Apr 17, 2017, at 8:56 PM, Manuel Jacob <me at manueljacob.de>
wrote:
>>
>> Hi Kavon,
>>
>> Is there a reason you can't use the algorithm from the paper
"A Correspondence between Continuation Passing Style and Static Single
Assignment Form" to convert your IR to LLVM's SSA IR?
>>
>> -Manuel
>>
>> On 2017-04-17 17:30, Kavon Farvardin via llvm-dev wrote:
>>> Summary
>>> ======>>> There is a need for dedicated
continuation-passing style (CPS) calls in LLVM to
>>> support functional languages. Herein I describe the problem and
propose a
>>> solution. Feedback and/or tips are greatly appreciated, as our goal
is to
>>> implement these changes so they can be merged into LLVM trunk.
>>> Problem
>>> ======>>> Implementations of functional languages like
Haskell and ML (e.g., GHC and
>>> Manticore) use a continuation-passing style (CPS) transformation in
order to
>>> manage the call stack explicitly. This is done prior to generating
LLVM IR, so
>>> the implicit call stack within LLVM is not used for call and
return.
>>> When making a non-tail call while in CPS, we initialize a stack
frame for the
>>> return through our own stack pointer, and then pass that stack
pointer to the
>>> callee when we jump to it. It is here when we run into a problem in
LLVM.
>>> Consider the following CPS call to @bar and how it will return:
>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>> define void @foo (i8** %sp, ...) {
>>> someBlk:
>>>     ; ...
>>>     ; finish stack frame by writing return address
>>>   %retAddr = blockaddress(@foo, %retpt)
>>>   store i8* %retAddr, i8** %sp
>>>     ; jump to @bar
>>>   tail call void @bar(i8** %sp, ...)
>>> retpt: 				; <- how can @bar "call" %retpt?
>>>    %sp2 = ???
>>>    %val = ???
>>>    ; ...
>>> }
>>> define void @bar (i8** %sp, ...) {
>>> 	  ; perform a return
>>> 	%retAddr0 = load i8*, i8** %sp
>>> 	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>>> 	%val = bitcast i64 1 to i64
>>> 	  ; jump back to %retpt in @foo, passing %sp and %val
>>> 	tail call void %retAddr1(i8** %sp, i64 %val)
>>> }
>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>> There is currently no way to jump back to %retpt from another
function, as block
>>> addresses have restricted usage in LLVM [1]. Our main difficulty is
that we
>>> cannot jump to a block address without knowing its calling
convention, i.e., the
>>> particular machine registers (or memory locations) that the block
expects
>>> incoming values to be passed in.
>>> The workaround we have been using in GHC for LLVM is to break apart
every
>>> function, placing the code for the continuation of each call into a
new
>>> function. We do this only so that we can store a function pointer
instead of a
>>> block address to our stack. This particularly gross transformation
inhibits
>>> optimizations in both GHC and LLVM, and we would like to remove the
need for it.
>>> Proposal
>>> =======>>> I believe the lowest-impact method of fixing
this problem with LLVM is the
>>> following:
>>> First, we add a special 'cps' call instruction marker to be
used on non-tail
>>> calls. Then, we use a specialized calling convention for these
non-tail calls,
>>> which fix the returned values to specific locations in the machine
code [2].
>>> To help illustrate what's going on, let's rewrite the above
example using the
>>> proposed 'cps' call:
>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>> define { ... } @foo (i8** %sp, ...) {
>>> someBlk:
>>>     ; ...
>>>     ; finish stack frame by writing return address
>>>   %retAddr = blockaddress(@foo, %retpt)
>>>   store i8* %retAddr, i8** %sp
>>>     ; jump to @bar
>>>   %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>>>   br label %retpt
>>> retpt:
>>>    %sp2 = extractvalue {i8**, i64} %retVals, 0
>>>    %val = extractvalue {i8**, i64} %retVals, 1
>>>    ; ...
>>> }
>>> define {i8**, i64} @bar (i8** %sp, ...) {
>>> 	  ; perform a return
>>> 	%retAddr0 = load i8*, i8** %sp
>>> 	%retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
>>> 	%val = bitcast i64 1 to i64
>>> 	  ; jump back to %retpt in @foo, passing %sp and %val
>>> 	tail call ghccc void %retAddr1(i8** %sp, i64 %val)
>>> 	unreachable   ; <- ideally this would be our terminator,
>>> 	              ; but 'unreachable' breaks TCO, so we will
>>> 	              ; emit a ret of the struct "returned" by
the call.
>>> }
>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>> The important point here is that the 'cps' marked call will
lower to a jump. The
>>> 'cps' call marker means that the callee knows how to return
using the arguments
>>> explicitly passed to it, i.e., the stack pointer %sp. The callee
cannot use a
>>> 'ret' instruction if it is 'cps' called.
>>> Either before or during 'cps' call lowering, any
instructions following the
>>> 'cps' call to @bar are sunk into the the block %retpt, and
the unconditional
>>> branch to %retpt is deleted/ignored. We include that branch to
preserve
>>> control-flow information for LLVM IR optimization passes.
>>> The 'extractvalue' instructions are what ensure the calling
convention of
>>> %retpt, since the fields of the struct %retVals are returned in
physical
>>> registers dictated by the (modified) ghccc convention. Those same
physical
>>> registers are used by the ghccc tail call in @bar when it jumps
back to %retpt.
>>> So, the call & return convention of ghccc ensures that
everything matches up.
>>> Interaction with LLVM
>>> ====================>>> (1) Caller-saved Values
>>> One may wonder how this would work if there are caller-saved values
of the 'cps'
>>> call. But, in our first example, which closely matches what CPS
code looks like,
>>> the call to @bar was in tail position. Thus, in the second example,
there are no
>>> caller-saved values for the 'cps' call to @bar, as all live
values were passed
>>> as arguments in the call.
>>> This caller-saved part is a bit subtle. It works fine in my
experience [2] when
>>> @bar is a function not visible to LLVM. My impression is that even
if @bar is
>>> visible to LLVM, there is still no issue, but if you can think of
any corner
>>> cases that would be great!
>>> (2) Inlining
>>> My gut feeling is that we cannot inline a 'cps' marked
call-site without more
>>> effort. This is because we might end up with something odd like
this once the
>>> dust settles:
>>>     %retAddr = blockaddress(@foo, %retpt)
>>>     %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
>>>     tail call ghccc %retAddr1 ( %sp, ... )
>>> We could add a pass that turns the above sequence into just an
unconditional
>>> branch to %retpt, using a phi-node to replace each
'extractvalue' instruction in
>>> that block.
>>> I'm not sure whether inlining in LLVM is important for us yet,
as we tend to
>>> inline quite a lot before generating LLVM IR. I don't think
this additional fix-
>>> up pass would be too difficult to implement if it's desired.
>>> Implementation Sketch and Conclusion
>>> ===================================>>> My current plan is
to add this special lowering of 'cps' calls during the
>>> translation from LLVM IR to the SelectionDAG. I welcome any
suggestions or tips
>>> on the best way to approach this. An important goal for us is to
merge this into
>>> trunk since we do not want to bundle a special version of LLVM with
GHC.
>>> Please let me know soon if you have any objections to this feature.
>>> Thanks for reading,
>>> Kavon
>>> References
>>> =========>>> [1]
http://llvm.org/docs/LangRef.html#blockaddress
>>> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Kavon Farvardin via llvm-dev

2017-Apr-18 20:08 UTC

head link

[llvm-dev] [RFC] Adding CPS call support

Before I try to respond to all of these points, let's consider limiting the
scope of the proposed changes:

Instead of adding a 'cps' call marker, what if I were to add a custom
lowering (during isel) for calls marked with the 'ghccc' calling
convention? There is already language specific lowering in isel for Swift, so I
imagine this would more acceptable?

~kavon
> On Apr 18, 2017, at 7:27 PM, Philip Reames <listmail at
philipreames.com> wrote:
> 
> 
> 
> On 04/17/2017 04:04 PM, Kavon Farvardin via llvm-dev wrote:
>>> Is there a reason you can't use the algorithm from the paper
"A Correspondence between Continuation Passing Style and Static Single
Assignment Form" to convert your IR to LLVM's SSA IR?
>> 
>> Yes, there are a few reasons.
>> 
>> Undoing the CPS transformation earlier in the pipeline would mean that
we are using LLVM's built-in stack. The special layout and usage of the
stack in GHC is achieved through CPS, so it is baked the compiler and
garbage-collected runtime system.
> Can you give a bit more detail here?  LLVM does provide support for
describing GC frame maps.
> 
> p.s. You're going to have to justify the design of the runtime a bit
here. Extending the IR to workaround a buggy or poorly structured runtime is not
going to be sufficient justification.  *Why* does the runtime need the specific
runtime stack structure used?  What alternatives exist and why should those be
rejected?
>> 
>> ~kavon
>> 
>>> On Apr 17, 2017, at 8:56 PM, Manuel Jacob <me at
manueljacob.de> wrote:
>>> 
>>> Hi Kavon,
>>> 
>>> Is there a reason you can't use the algorithm from the paper
"A Correspondence between Continuation Passing Style and Static Single
Assignment Form" to convert your IR to LLVM's SSA IR?
>>> 
>>> -Manuel
>>> 
>>> On 2017-04-17 17:30, Kavon Farvardin via llvm-dev wrote:
>>>> Summary
>>>> ======>>>> There is a need for dedicated
continuation-passing style (CPS) calls in LLVM to
>>>> support functional languages. Herein I describe the problem and
propose a
>>>> solution. Feedback and/or tips are greatly appreciated, as our
goal is to
>>>> implement these changes so they can be merged into LLVM trunk.
>>>> Problem
>>>> ======>>>> Implementations of functional languages
like Haskell and ML (e.g., GHC and
>>>> Manticore) use a continuation-passing style (CPS)
transformation in order to
>>>> manage the call stack explicitly. This is done prior to
generating LLVM IR, so
>>>> the implicit call stack within LLVM is not used for call and
return.
>>>> When making a non-tail call while in CPS, we initialize a stack
frame for the
>>>> return through our own stack pointer, and then pass that stack
pointer to the
>>>> callee when we jump to it. It is here when we run into a
problem in LLVM.
>>>> Consider the following CPS call to @bar and how it will return:
>>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>>> define void @foo (i8** %sp, ...) {
>>>> someBlk:
>>>>    ; ...
>>>>    ; finish stack frame by writing return address
>>>>  %retAddr = blockaddress(@foo, %retpt)
>>>>  store i8* %retAddr, i8** %sp
>>>>    ; jump to @bar
>>>>  tail call void @bar(i8** %sp, ...)
>>>> retpt: 				; <- how can @bar "call" %retpt?
>>>>   %sp2 = ???
>>>>   %val = ???
>>>>   ; ...
>>>> }
>>>> define void @bar (i8** %sp, ...) {
>>>> 	  ; perform a return
>>>> 	%retAddr0 = load i8*, i8** %sp
>>>> 	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>>>> 	%val = bitcast i64 1 to i64
>>>> 	  ; jump back to %retpt in @foo, passing %sp and %val
>>>> 	tail call void %retAddr1(i8** %sp, i64 %val)
>>>> }
>>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>>> There is currently no way to jump back to %retpt from another
function, as block
>>>> addresses have restricted usage in LLVM [1]. Our main
difficulty is that we
>>>> cannot jump to a block address without knowing its calling
convention, i.e., the
>>>> particular machine registers (or memory locations) that the
block expects
>>>> incoming values to be passed in.
>>>> The workaround we have been using in GHC for LLVM is to break
apart every
>>>> function, placing the code for the continuation of each call
into a new
>>>> function. We do this only so that we can store a function
pointer instead of a
>>>> block address to our stack. This particularly gross
transformation inhibits
>>>> optimizations in both GHC and LLVM, and we would like to remove
the need for it.
>>>> Proposal
>>>> =======>>>> I believe the lowest-impact method of
fixing this problem with LLVM is the
>>>> following:
>>>> First, we add a special 'cps' call instruction marker
to be used on non-tail
>>>> calls. Then, we use a specialized calling convention for these
non-tail calls,
>>>> which fix the returned values to specific locations in the
machine code [2].
>>>> To help illustrate what's going on, let's rewrite the
above example using the
>>>> proposed 'cps' call:
>>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>>> define { ... } @foo (i8** %sp, ...) {
>>>> someBlk:
>>>>    ; ...
>>>>    ; finish stack frame by writing return address
>>>>  %retAddr = blockaddress(@foo, %retpt)
>>>>  store i8* %retAddr, i8** %sp
>>>>    ; jump to @bar
>>>>  %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>>>>  br label %retpt
>>>> retpt:
>>>>   %sp2 = extractvalue {i8**, i64} %retVals, 0
>>>>   %val = extractvalue {i8**, i64} %retVals, 1
>>>>   ; ...
>>>> }
>>>> define {i8**, i64} @bar (i8** %sp, ...) {
>>>> 	  ; perform a return
>>>> 	%retAddr0 = load i8*, i8** %sp
>>>> 	%retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
>>>> 	%val = bitcast i64 1 to i64
>>>> 	  ; jump back to %retpt in @foo, passing %sp and %val
>>>> 	tail call ghccc void %retAddr1(i8** %sp, i64 %val)
>>>> 	unreachable   ; <- ideally this would be our terminator,
>>>> 	              ; but 'unreachable' breaks TCO, so we
will
>>>> 	              ; emit a ret of the struct "returned"
by the call.
>>>> }
>>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>>> The important point here is that the 'cps' marked call
will lower to a jump. The
>>>> 'cps' call marker means that the callee knows how to
return using the arguments
>>>> explicitly passed to it, i.e., the stack pointer %sp. The
callee cannot use a
>>>> 'ret' instruction if it is 'cps' called.
>>>> Either before or during 'cps' call lowering, any
instructions following the
>>>> 'cps' call to @bar are sunk into the the block %retpt,
and the unconditional
>>>> branch to %retpt is deleted/ignored. We include that branch to
preserve
>>>> control-flow information for LLVM IR optimization passes.
>>>> The 'extractvalue' instructions are what ensure the
calling convention of
>>>> %retpt, since the fields of the struct %retVals are returned in
physical
>>>> registers dictated by the (modified) ghccc convention. Those
same physical
>>>> registers are used by the ghccc tail call in @bar when it jumps
back to %retpt.
>>>> So, the call & return convention of ghccc ensures that
everything matches up.
>>>> Interaction with LLVM
>>>> ====================>>>> (1) Caller-saved Values
>>>> One may wonder how this would work if there are caller-saved
values of the 'cps'
>>>> call. But, in our first example, which closely matches what CPS
code looks like,
>>>> the call to @bar was in tail position. Thus, in the second
example, there are no
>>>> caller-saved values for the 'cps' call to @bar, as all
live values were passed
>>>> as arguments in the call.
>>>> This caller-saved part is a bit subtle. It works fine in my
experience [2] when
>>>> @bar is a function not visible to LLVM. My impression is that
even if @bar is
>>>> visible to LLVM, there is still no issue, but if you can think
of any corner
>>>> cases that would be great!
>>>> (2) Inlining
>>>> My gut feeling is that we cannot inline a 'cps' marked
call-site without more
>>>> effort. This is because we might end up with something odd like
this once the
>>>> dust settles:
>>>>    %retAddr = blockaddress(@foo, %retpt)
>>>>    %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
>>>>    tail call ghccc %retAddr1 ( %sp, ... )
>>>> We could add a pass that turns the above sequence into just an
unconditional
>>>> branch to %retpt, using a phi-node to replace each
'extractvalue' instruction in
>>>> that block.
>>>> I'm not sure whether inlining in LLVM is important for us
yet, as we tend to
>>>> inline quite a lot before generating LLVM IR. I don't think
this additional fix-
>>>> up pass would be too difficult to implement if it's
desired.
>>>> Implementation Sketch and Conclusion
>>>> ===================================>>>> My current
plan is to add this special lowering of 'cps' calls during the
>>>> translation from LLVM IR to the SelectionDAG. I welcome any
suggestions or tips
>>>> on the best way to approach this. An important goal for us is
to merge this into
>>>> trunk since we do not want to bundle a special version of LLVM
with GHC.
>>>> Please let me know soon if you have any objections to this
feature.
>>>> Thanks for reading,
>>>> Kavon
>>>> References
>>>> =========>>>> [1]
http://llvm.org/docs/LangRef.html#blockaddress
>>>> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Apr 2017 - [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support