thr3ads.net - llvm dev - [llvm-dev] [RFC] Adding CPS call support [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Kavon Farvardin via llvm-dev

2017-Apr-17 15:30 UTC

[llvm-dev] [RFC] Adding CPS call support

Summary
======
There is a need for dedicated continuation-passing style (CPS) calls in LLVM to
support functional languages. Herein I describe the problem and propose a
solution. Feedback and/or tips are greatly appreciated, as our goal is to
implement these changes so they can be merged into LLVM trunk.


Problem
======
Implementations of functional languages like Haskell and ML (e.g., GHC and
Manticore) use a continuation-passing style (CPS) transformation in order to
manage the call stack explicitly. This is done prior to generating LLVM IR, so
the implicit call stack within LLVM is not used for call and return.

When making a non-tail call while in CPS, we initialize a stack frame for the
return through our own stack pointer, and then pass that stack pointer to the
callee when we jump to it. It is here when we run into a problem in LLVM.

Consider the following CPS call to @bar and how it will return:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

define void @foo (i8** %sp, ...) {
someBlk:
    ; ...
    ; finish stack frame by writing return address
  %retAddr = blockaddress(@foo, %retpt)
  store i8* %retAddr, i8** %sp
    ; jump to @bar
  tail call void @bar(i8** %sp, ...)

 retpt: 				; <- how can @bar "call" %retpt?
   %sp2 = ???
   %val = ???
   ; ...

 }

 define void @bar (i8** %sp, ...) {
 	  ; perform a return
 	%retAddr0 = load i8*, i8** %sp
 	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
 	%val = bitcast i64 1 to i64
 	  ; jump back to %retpt in @foo, passing %sp and %val
 	tail call void %retAddr1(i8** %sp, i64 %val) 
 }

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

There is currently no way to jump back to %retpt from another function, as block
addresses have restricted usage in LLVM [1]. Our main difficulty is that we
cannot jump to a block address without knowing its calling convention, i.e., the
particular machine registers (or memory locations) that the block expects
incoming values to be passed in.

The workaround we have been using in GHC for LLVM is to break apart every
function, placing the code for the continuation of each call into a new
function. We do this only so that we can store a function pointer instead of a
block address to our stack. This particularly gross transformation inhibits
optimizations in both GHC and LLVM, and we would like to remove the need for it.


Proposal
=======
I believe the lowest-impact method of fixing this problem with LLVM is the 
following: 

First, we add a special 'cps' call instruction marker to be used on
non-tail
calls. Then, we use a specialized calling convention for these non-tail calls,
which fix the returned values to specific locations in the machine code [2].

To help illustrate what's going on, let's rewrite the above example
using the
proposed 'cps' call:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

define { ... } @foo (i8** %sp, ...) {
someBlk:
    ; ...
    ; finish stack frame by writing return address
  %retAddr = blockaddress(@foo, %retpt)
  store i8* %retAddr, i8** %sp
    ; jump to @bar
  %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
  br label %retpt

 retpt:
   %sp2 = extractvalue {i8**, i64} %retVals, 0
   %val = extractvalue {i8**, i64} %retVals, 1
   ; ...

 }

 define {i8**, i64} @bar (i8** %sp, ...) { 
 	  ; perform a return
 	%retAddr0 = load i8*, i8** %sp
 	%retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
 	%val = bitcast i64 1 to i64
 	  ; jump back to %retpt in @foo, passing %sp and %val
 	tail call ghccc void %retAddr1(i8** %sp, i64 %val)

 	unreachable   ; <- ideally this would be our terminator,
 	              ; but 'unreachable' breaks TCO, so we will
 	              ; emit a ret of the struct "returned" by the call.
 }

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

The important point here is that the 'cps' marked call will lower to a
jump. The
'cps' call marker means that the callee knows how to return using the
arguments
explicitly passed to it, i.e., the stack pointer %sp. The callee cannot use a
'ret' instruction if it is 'cps' called.

Either before or during 'cps' call lowering, any instructions following
the
'cps' call to @bar are sunk into the the block %retpt, and the
unconditional
branch to %retpt is deleted/ignored. We include that branch to preserve 
control-flow information for LLVM IR optimization passes.

The 'extractvalue' instructions are what ensure the calling convention
of
%retpt, since the fields of the struct %retVals are returned in physical
registers dictated by the (modified) ghccc convention. Those same physical
registers are used by the ghccc tail call in @bar when it jumps back to %retpt.
So, the call & return convention of ghccc ensures that everything matches
up.


Interaction with LLVM
====================
(1) Caller-saved Values

One may wonder how this would work if there are caller-saved values of the
'cps'
call. But, in our first example, which closely matches what CPS code looks like,
the call to @bar was in tail position. Thus, in the second example, there are no
caller-saved values for the 'cps' call to @bar, as all live values were
passed
as arguments in the call.

This caller-saved part is a bit subtle. It works fine in my experience [2] when
@bar is a function not visible to LLVM. My impression is that even if @bar is
visible to LLVM, there is still no issue, but if you can think of any corner
cases that would be great!

(2) Inlining

My gut feeling is that we cannot inline a 'cps' marked call-site without
more
effort. This is because we might end up with something odd like this once the
dust settles:

    %retAddr = blockaddress(@foo, %retpt)
    %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
    tail call ghccc %retAddr1 ( %sp, ... )

We could add a pass that turns the above sequence into just an unconditional
branch to %retpt, using a phi-node to replace each 'extractvalue'
instruction in
that block.

I'm not sure whether inlining in LLVM is important for us yet, as we tend to
inline quite a lot before generating LLVM IR. I don't think this additional
fix-
up pass would be too difficult to implement if it's desired.


Implementation Sketch and Conclusion
===================================
My current plan is to add this special lowering of 'cps' calls during
the
translation from LLVM IR to the SelectionDAG. I welcome any suggestions or tips
on the best way to approach this. An important goal for us is to merge this into
trunk since we do not want to bundle a special version of LLVM with GHC.

Please let me know soon if you have any objections to this feature.

Thanks for reading,
Kavon


References
=========
[1] http://llvm.org/docs/LangRef.html#blockaddress
[2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf

Friedman, Eli via llvm-dev

2017-Apr-17 18:47 UTC

head link

[llvm-dev] [RFC] Adding CPS call support

On 4/17/2017 8:30 AM, Kavon Farvardin via llvm-dev
wrote:> Summary
> ======>
> There is a need for dedicated continuation-passing style (CPS) calls in
LLVM to
> support functional languages. Herein I describe the problem and propose a
> solution. Feedback and/or tips are greatly appreciated, as our goal is to
> implement these changes so they can be merged into LLVM trunk.
>
>
> Problem
> ======>
> Implementations of functional languages like Haskell and ML (e.g., GHC and
> Manticore) use a continuation-passing style (CPS) transformation in order
to
> manage the call stack explicitly. This is done prior to generating LLVM IR,
so
> the implicit call stack within LLVM is not used for call and return.
>
> When making a non-tail call while in CPS, we initialize a stack frame for
the
> return through our own stack pointer, and then pass that stack pointer to
the
> callee when we jump to it. It is here when we run into a problem in LLVM.
>
> Consider the following CPS call to @bar and how it will return:
>
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
> define void @foo (i8** %sp, ...) {
> someBlk:
>      ; ...
>      ; finish stack frame by writing return address
>    %retAddr = blockaddress(@foo, %retpt)
>    store i8* %retAddr, i8** %sp
>      ; jump to @bar
>    tail call void @bar(i8** %sp, ...)
>
>   retpt: 				; <- how can @bar "call" %retpt?
>     %sp2 = ???
>     %val = ???
>     ; ...
>
>   }
>
>   define void @bar (i8** %sp, ...) {
>   	  ; perform a return
>   	%retAddr0 = load i8*, i8** %sp
>   	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>   	%val = bitcast i64 1 to i64
>   	  ; jump back to %retpt in @foo, passing %sp and %val
>   	tail call void %retAddr1(i8** %sp, i64 %val)
>   }
>
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
> There is currently no way to jump back to %retpt from another function, as
block
> addresses have restricted usage in LLVM [1]. Our main difficulty is that we
> cannot jump to a block address without knowing its calling convention,
i.e., the
> particular machine registers (or memory locations) that the block expects
> incoming values to be passed in.
>
> The workaround we have been using in GHC for LLVM is to break apart every
> function, placing the code for the continuation of each call into a new
> function. We do this only so that we can store a function pointer instead
of a
> block address to our stack. This particularly gross transformation inhibits
> optimizations in both GHC and LLVM, and we would like to remove the need
for it.
>
>
> Proposal
> =======>
> I believe the lowest-impact method of fixing this problem with LLVM is the
> following:
>
> First, we add a special 'cps' call instruction marker to be used on
non-tail
> calls. Then, we use a specialized calling convention for these non-tail
calls,
> which fix the returned values to specific locations in the machine code
[2].
>
> To help illustrate what's going on, let's rewrite the above example
using the
> proposed 'cps' call:
>
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
> define { ... } @foo (i8** %sp, ...) {
> someBlk:
>      ; ...
>      ; finish stack frame by writing return address
>    %retAddr = blockaddress(@foo, %retpt)
>    store i8* %retAddr, i8** %sp
>      ; jump to @bar
>    %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>    br label %retpt
I'm not following how explicitly representing the return address of a 
call in the IR before isel actually solves any relevant issue.  We 
already pass the return address implicitly as an argument to every call; 
you can retrieve it with llvm.returnaddress if you need it.

You can't branch across functions like you're proposing because the 
stack and callee-save registers won't be in the right state.  LLVM will 
inevitably save values to the stack for calls which are not tail calls.  
Therefore, this proposal simply doesn't work.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

Manuel Jacob via llvm-dev

2017-Apr-17 19:56 UTC

head link

[llvm-dev] [RFC] Adding CPS call support

Hi Kavon,

Is there a reason you can't use the algorithm from the paper "A 
Correspondence between Continuation Passing Style and Static Single 
Assignment Form" to convert your IR to LLVM's SSA IR?

-Manuel

On 2017-04-17 17:30, Kavon Farvardin via llvm-dev wrote:> Summary
> ======> 
> There is a need for dedicated continuation-passing style (CPS) calls in 
> LLVM to
> support functional languages. Herein I describe the problem and propose 
> a
> solution. Feedback and/or tips are greatly appreciated, as our goal is 
> to
> implement these changes so they can be merged into LLVM trunk.
> 
> 
> Problem
> ======> 
> Implementations of functional languages like Haskell and ML (e.g., GHC 
> and
> Manticore) use a continuation-passing style (CPS) transformation in 
> order to
> manage the call stack explicitly. This is done prior to generating LLVM 
> IR, so
> the implicit call stack within LLVM is not used for call and return.
> 
> When making a non-tail call while in CPS, we initialize a stack frame 
> for the
> return through our own stack pointer, and then pass that stack pointer 
> to the
> callee when we jump to it. It is here when we run into a problem in 
> LLVM.
> 
> Consider the following CPS call to @bar and how it will return:
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> define void @foo (i8** %sp, ...) {
> someBlk:
>     ; ...
>     ; finish stack frame by writing return address
>   %retAddr = blockaddress(@foo, %retpt)
>   store i8* %retAddr, i8** %sp
>     ; jump to @bar
>   tail call void @bar(i8** %sp, ...)
> 
>  retpt: 				; <- how can @bar "call" %retpt?
>    %sp2 = ???
>    %val = ???
>    ; ...
> 
>  }
> 
>  define void @bar (i8** %sp, ...) {
>  	  ; perform a return
>  	%retAddr0 = load i8*, i8** %sp
>  	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>  	%val = bitcast i64 1 to i64
>  	  ; jump back to %retpt in @foo, passing %sp and %val
>  	tail call void %retAddr1(i8** %sp, i64 %val)
>  }
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> There is currently no way to jump back to %retpt from another function, 
> as block
> addresses have restricted usage in LLVM [1]. Our main difficulty is 
> that we
> cannot jump to a block address without knowing its calling convention, 
> i.e., the
> particular machine registers (or memory locations) that the block 
> expects
> incoming values to be passed in.
> 
> The workaround we have been using in GHC for LLVM is to break apart 
> every
> function, placing the code for the continuation of each call into a new
> function. We do this only so that we can store a function pointer 
> instead of a
> block address to our stack. This particularly gross transformation 
> inhibits
> optimizations in both GHC and LLVM, and we would like to remove the 
> need for it.
> 
> 
> Proposal
> =======> 
> I believe the lowest-impact method of fixing this problem with LLVM is 
> the
> following:
> 
> First, we add a special 'cps' call instruction marker to be used on
> non-tail
> calls. Then, we use a specialized calling convention for these non-tail 
> calls,
> which fix the returned values to specific locations in the machine code 
> [2].
> 
> To help illustrate what's going on, let's rewrite the above example
> using the
> proposed 'cps' call:
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> define { ... } @foo (i8** %sp, ...) {
> someBlk:
>     ; ...
>     ; finish stack frame by writing return address
>   %retAddr = blockaddress(@foo, %retpt)
>   store i8* %retAddr, i8** %sp
>     ; jump to @bar
>   %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>   br label %retpt
> 
>  retpt:
>    %sp2 = extractvalue {i8**, i64} %retVals, 0
>    %val = extractvalue {i8**, i64} %retVals, 1
>    ; ...
> 
>  }
> 
>  define {i8**, i64} @bar (i8** %sp, ...) {
>  	  ; perform a return
>  	%retAddr0 = load i8*, i8** %sp
>  	%retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
>  	%val = bitcast i64 1 to i64
>  	  ; jump back to %retpt in @foo, passing %sp and %val
>  	tail call ghccc void %retAddr1(i8** %sp, i64 %val)
> 
>  	unreachable   ; <- ideally this would be our terminator,
>  	              ; but 'unreachable' breaks TCO, so we will
>  	              ; emit a ret of the struct "returned" by the
call.
>  }
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> The important point here is that the 'cps' marked call will lower
to a
> jump. The
> 'cps' call marker means that the callee knows how to return using
the
> arguments
> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot 
> use a
> 'ret' instruction if it is 'cps' called.
> 
> Either before or during 'cps' call lowering, any instructions
following
> the
> 'cps' call to @bar are sunk into the the block %retpt, and the 
> unconditional
> branch to %retpt is deleted/ignored. We include that branch to preserve
> control-flow information for LLVM IR optimization passes.
> 
> The 'extractvalue' instructions are what ensure the calling
convention
> of
> %retpt, since the fields of the struct %retVals are returned in 
> physical
> registers dictated by the (modified) ghccc convention. Those same 
> physical
> registers are used by the ghccc tail call in @bar when it jumps back to 
> %retpt.
> So, the call & return convention of ghccc ensures that everything 
> matches up.
> 
> 
> Interaction with LLVM
> ====================> 
> (1) Caller-saved Values
> 
> One may wonder how this would work if there are caller-saved values of 
> the 'cps'
> call. But, in our first example, which closely matches what CPS code 
> looks like,
> the call to @bar was in tail position. Thus, in the second example, 
> there are no
> caller-saved values for the 'cps' call to @bar, as all live values
were
> passed
> as arguments in the call.
> 
> This caller-saved part is a bit subtle. It works fine in my experience 
> [2] when
> @bar is a function not visible to LLVM. My impression is that even if 
> @bar is
> visible to LLVM, there is still no issue, but if you can think of any 
> corner
> cases that would be great!
> 
> (2) Inlining
> 
> My gut feeling is that we cannot inline a 'cps' marked call-site 
> without more
> effort. This is because we might end up with something odd like this 
> once the
> dust settles:
> 
>     %retAddr = blockaddress(@foo, %retpt)
>     %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
>     tail call ghccc %retAddr1 ( %sp, ... )
> 
> We could add a pass that turns the above sequence into just an 
> unconditional
> branch to %retpt, using a phi-node to replace each 'extractvalue' 
> instruction in
> that block.
> 
> I'm not sure whether inlining in LLVM is important for us yet, as we 
> tend to
> inline quite a lot before generating LLVM IR. I don't think this 
> additional fix-
> up pass would be too difficult to implement if it's desired.
> 
> 
> Implementation Sketch and Conclusion
> ===================================> 
> My current plan is to add this special lowering of 'cps' calls
during
> the
> translation from LLVM IR to the SelectionDAG. I welcome any suggestions 
> or tips
> on the best way to approach this. An important goal for us is to merge 
> this into
> trunk since we do not want to bundle a special version of LLVM with 
> GHC.
> 
> Please let me know soon if you have any objections to this feature.
> 
> Thanks for reading,
> Kavon
> 
> 
> References
> =========> 
> [1] http://llvm.org/docs/LangRef.html#blockaddress
> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Kavon Farvardin via llvm-dev

2017-Apr-17 22:52 UTC

head link

[llvm-dev] [RFC] Adding CPS call support

(Sorry for the 2nd email Eli, I forgot to reply-all).
> I'm not following how explicitly representing the return address of a
call in the IR before isel actually solves any relevant issue. We already pass
the return address implicitly as an argument to every call; you can retrieve it
with llvm.returnaddress if you need it.

Unfortunately the @llvm.returnaddress intrinsic does not solve the problem, as
it only reads the return address pushed onto the LLVM stack by a call. We would
then need a way to move the LLVM stack pointer back to where it was before the
call, because a CPS call _must_ not grow the LLVM stack (it will never be
popped!), so a 'call' instruction will not do.

> You can't branch across functions like you're proposing because the
stack and callee-save registers won't be in the right state.  LLVM will
inevitably save values to the stack for calls which are not tail calls. 
Therefore, this proposal simply doesn't work.
It might be helpful to think of the CPS transformation as turning the program
into a form such that it always moves "forward", in the sense that the
tail calls _never_ return. A "return" is simply another tail call
(i.e., a jump) to a return address, which looks no different than a function
call.

Thus, we have no callee-saved registers to worry about since the tail calls
never come back. In addition, there are no live values in the LLVM stack frame,
since they are all passed in the CPS call (which uses a different stack). Thus,
retpt receives all of its values from the struct it extracts from. While the
'cps' marked call appears to be non-tail, it will be lowered to a tail
call.

It's also worth noting that this proposal is based on something that already
works in LLVM, using a special assembly routine. We currently this workaround in
Manticore when a thread is preempted. But, it has too much overhead for GHC,
since the workaround would penalize general function calls. This is why support
is necessary for GHC (and related compilers) in order to use LLVM effectively.

~kavon
> On Apr 17, 2017, at 7:47 PM, Friedman, Eli <efriedma at
codeaurora.org> wrote:
> 
> On 4/17/2017 8:30 AM, Kavon Farvardin via llvm-dev wrote:
>> Summary
>> ======>> 
>> There is a need for dedicated continuation-passing style (CPS) calls in
LLVM to
>> support functional languages. Herein I describe the problem and propose
a
>> solution. Feedback and/or tips are greatly appreciated, as our goal is
to
>> implement these changes so they can be merged into LLVM trunk.
>> 
>> 
>> Problem
>> ======>> 
>> Implementations of functional languages like Haskell and ML (e.g., GHC
and
>> Manticore) use a continuation-passing style (CPS) transformation in
order to
>> manage the call stack explicitly. This is done prior to generating LLVM
IR, so
>> the implicit call stack within LLVM is not used for call and return.
>> 
>> When making a non-tail call while in CPS, we initialize a stack frame
for the
>> return through our own stack pointer, and then pass that stack pointer
to the
>> callee when we jump to it. It is here when we run into a problem in
LLVM.
>> 
>> Consider the following CPS call to @bar and how it will return:
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> define void @foo (i8** %sp, ...) {
>> someBlk:
>>     ; ...
>>     ; finish stack frame by writing return address
>>   %retAddr = blockaddress(@foo, %retpt)
>>   store i8* %retAddr, i8** %sp
>>     ; jump to @bar
>>   tail call void @bar(i8** %sp, ...)
>> 
>>  retpt: 				; <- how can @bar "call" %retpt?
>>    %sp2 = ???
>>    %val = ???
>>    ; ...
>> 
>>  }
>> 
>>  define void @bar (i8** %sp, ...) {
>>  	  ; perform a return
>>  	%retAddr0 = load i8*, i8** %sp
>>  	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>>  	%val = bitcast i64 1 to i64
>>  	  ; jump back to %retpt in @foo, passing %sp and %val
>>  	tail call void %retAddr1(i8** %sp, i64 %val)
>>  }
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> There is currently no way to jump back to %retpt from another function,
as block
>> addresses have restricted usage in LLVM [1]. Our main difficulty is
that we
>> cannot jump to a block address without knowing its calling convention,
i.e., the
>> particular machine registers (or memory locations) that the block
expects
>> incoming values to be passed in.
>> 
>> The workaround we have been using in GHC for LLVM is to break apart
every
>> function, placing the code for the continuation of each call into a new
>> function. We do this only so that we can store a function pointer
instead of a
>> block address to our stack. This particularly gross transformation
inhibits
>> optimizations in both GHC and LLVM, and we would like to remove the
need for it.
>> 
>> 
>> Proposal
>> =======>> 
>> I believe the lowest-impact method of fixing this problem with LLVM is
the
>> following:
>> 
>> First, we add a special 'cps' call instruction marker to be
used on non-tail
>> calls. Then, we use a specialized calling convention for these non-tail
calls,
>> which fix the returned values to specific locations in the machine code
[2].
>> 
>> To help illustrate what's going on, let's rewrite the above
example using the
>> proposed 'cps' call:
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> define { ... } @foo (i8** %sp, ...) {
>> someBlk:
>>     ; ...
>>     ; finish stack frame by writing return address
>>   %retAddr = blockaddress(@foo, %retpt)
>>   store i8* %retAddr, i8** %sp
>>     ; jump to @bar
>>   %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>>   br label %retpt
> 
> I'm not following how explicitly representing the return address of a
call in the IR before isel actually solves any relevant issue.  We already pass
the return address implicitly as an argument to every call; you can retrieve it
with llvm.returnaddress if you need it.
> 
> You can't branch across functions like you're proposing because the
stack and callee-save registers won't be in the right state.  LLVM will
inevitably save values to the stack for calls which are not tail calls. 
Therefore, this proposal simply doesn't work.
> 
> -Eli
> 
> -- 
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
>

Kavon Farvardin via llvm-dev

2017-Apr-17 23:04 UTC

head link

[llvm-dev] [RFC] Adding CPS call support

> Is there a reason you can't use the algorithm from the paper "A
Correspondence between Continuation Passing Style and Static Single Assignment
Form" to convert your IR to LLVM's SSA IR?

Yes, there are a few reasons. 

Undoing the CPS transformation earlier in the pipeline would mean that we are
using LLVM's built-in stack. The special layout and usage of the stack in
GHC is achieved through CPS, so it is baked the compiler and garbage-collected
runtime system.

~kavon
> On Apr 17, 2017, at 8:56 PM, Manuel Jacob <me at manueljacob.de>
wrote:
> 
> Hi Kavon,
> 
> Is there a reason you can't use the algorithm from the paper "A
Correspondence between Continuation Passing Style and Static Single Assignment
Form" to convert your IR to LLVM's SSA IR?
> 
> -Manuel
> 
> On 2017-04-17 17:30, Kavon Farvardin via llvm-dev wrote:
>> Summary
>> ======>> There is a need for dedicated continuation-passing style
(CPS) calls in LLVM to
>> support functional languages. Herein I describe the problem and propose
a
>> solution. Feedback and/or tips are greatly appreciated, as our goal is
to
>> implement these changes so they can be merged into LLVM trunk.
>> Problem
>> ======>> Implementations of functional languages like Haskell and
ML (e.g., GHC and
>> Manticore) use a continuation-passing style (CPS) transformation in
order to
>> manage the call stack explicitly. This is done prior to generating LLVM
IR, so
>> the implicit call stack within LLVM is not used for call and return.
>> When making a non-tail call while in CPS, we initialize a stack frame
for the
>> return through our own stack pointer, and then pass that stack pointer
to the
>> callee when we jump to it. It is here when we run into a problem in
LLVM.
>> Consider the following CPS call to @bar and how it will return:
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> define void @foo (i8** %sp, ...) {
>> someBlk:
>>    ; ...
>>    ; finish stack frame by writing return address
>>  %retAddr = blockaddress(@foo, %retpt)
>>  store i8* %retAddr, i8** %sp
>>    ; jump to @bar
>>  tail call void @bar(i8** %sp, ...)
>> retpt: 				; <- how can @bar "call" %retpt?
>>   %sp2 = ???
>>   %val = ???
>>   ; ...
>> }
>> define void @bar (i8** %sp, ...) {
>> 	  ; perform a return
>> 	%retAddr0 = load i8*, i8** %sp
>> 	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>> 	%val = bitcast i64 1 to i64
>> 	  ; jump back to %retpt in @foo, passing %sp and %val
>> 	tail call void %retAddr1(i8** %sp, i64 %val)
>> }
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> There is currently no way to jump back to %retpt from another function,
as block
>> addresses have restricted usage in LLVM [1]. Our main difficulty is
that we
>> cannot jump to a block address without knowing its calling convention,
i.e., the
>> particular machine registers (or memory locations) that the block
expects
>> incoming values to be passed in.
>> The workaround we have been using in GHC for LLVM is to break apart
every
>> function, placing the code for the continuation of each call into a new
>> function. We do this only so that we can store a function pointer
instead of a
>> block address to our stack. This particularly gross transformation
inhibits
>> optimizations in both GHC and LLVM, and we would like to remove the
need for it.
>> Proposal
>> =======>> I believe the lowest-impact method of fixing this
problem with LLVM is the
>> following:
>> First, we add a special 'cps' call instruction marker to be
used on non-tail
>> calls. Then, we use a specialized calling convention for these non-tail
calls,
>> which fix the returned values to specific locations in the machine code
[2].
>> To help illustrate what's going on, let's rewrite the above
example using the
>> proposed 'cps' call:
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> define { ... } @foo (i8** %sp, ...) {
>> someBlk:
>>    ; ...
>>    ; finish stack frame by writing return address
>>  %retAddr = blockaddress(@foo, %retpt)
>>  store i8* %retAddr, i8** %sp
>>    ; jump to @bar
>>  %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>>  br label %retpt
>> retpt:
>>   %sp2 = extractvalue {i8**, i64} %retVals, 0
>>   %val = extractvalue {i8**, i64} %retVals, 1
>>   ; ...
>> }
>> define {i8**, i64} @bar (i8** %sp, ...) {
>> 	  ; perform a return
>> 	%retAddr0 = load i8*, i8** %sp
>> 	%retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
>> 	%val = bitcast i64 1 to i64
>> 	  ; jump back to %retpt in @foo, passing %sp and %val
>> 	tail call ghccc void %retAddr1(i8** %sp, i64 %val)
>> 	unreachable   ; <- ideally this would be our terminator,
>> 	              ; but 'unreachable' breaks TCO, so we will
>> 	              ; emit a ret of the struct "returned" by the
call.
>> }
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> The important point here is that the 'cps' marked call will
lower to a jump. The
>> 'cps' call marker means that the callee knows how to return
using the arguments
>> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot
use a
>> 'ret' instruction if it is 'cps' called.
>> Either before or during 'cps' call lowering, any instructions
following the
>> 'cps' call to @bar are sunk into the the block %retpt, and the
unconditional
>> branch to %retpt is deleted/ignored. We include that branch to preserve
>> control-flow information for LLVM IR optimization passes.
>> The 'extractvalue' instructions are what ensure the calling
convention of
>> %retpt, since the fields of the struct %retVals are returned in
physical
>> registers dictated by the (modified) ghccc convention. Those same
physical
>> registers are used by the ghccc tail call in @bar when it jumps back to
%retpt.
>> So, the call & return convention of ghccc ensures that everything
matches up.
>> Interaction with LLVM
>> ====================>> (1) Caller-saved Values
>> One may wonder how this would work if there are caller-saved values of
the 'cps'
>> call. But, in our first example, which closely matches what CPS code
looks like,
>> the call to @bar was in tail position. Thus, in the second example,
there are no
>> caller-saved values for the 'cps' call to @bar, as all live
values were passed
>> as arguments in the call.
>> This caller-saved part is a bit subtle. It works fine in my experience
[2] when
>> @bar is a function not visible to LLVM. My impression is that even if
@bar is
>> visible to LLVM, there is still no issue, but if you can think of any
corner
>> cases that would be great!
>> (2) Inlining
>> My gut feeling is that we cannot inline a 'cps' marked
call-site without more
>> effort. This is because we might end up with something odd like this
once the
>> dust settles:
>>    %retAddr = blockaddress(@foo, %retpt)
>>    %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
>>    tail call ghccc %retAddr1 ( %sp, ... )
>> We could add a pass that turns the above sequence into just an
unconditional
>> branch to %retpt, using a phi-node to replace each
'extractvalue' instruction in
>> that block.
>> I'm not sure whether inlining in LLVM is important for us yet, as
we tend to
>> inline quite a lot before generating LLVM IR. I don't think this
additional fix-
>> up pass would be too difficult to implement if it's desired.
>> Implementation Sketch and Conclusion
>> ===================================>> My current plan is to add
this special lowering of 'cps' calls during the
>> translation from LLVM IR to the SelectionDAG. I welcome any suggestions
or tips
>> on the best way to approach this. An important goal for us is to merge
this into
>> trunk since we do not want to bundle a special version of LLVM with
GHC.
>> Please let me know soon if you have any objections to this feature.
>> Thanks for reading,
>> Kavon
>> References
>> =========>> [1] http://llvm.org/docs/LangRef.html#blockaddress
>> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Philip Reames via llvm-dev

2017-Apr-18 18:24 UTC

head link

[llvm-dev] [RFC] Adding CPS call support

On 04/17/2017 08:30 AM, Kavon Farvardin via llvm-dev
wrote:> Summary
> ======>
> There is a need for dedicated continuation-passing style (CPS) calls in
LLVM to
> support functional languages. Herein I describe the problem and propose a
> solution. Feedback and/or tips are greatly appreciated, as our goal is to
> implement these changes so they can be merged into LLVM trunk.
>
>
> Problem
> ======>
> Implementations of functional languages like Haskell and ML (e.g., GHC and
> Manticore) use a continuation-passing style (CPS) transformation in order
to
> manage the call stack explicitly. This is done prior to generating LLVM IR,
so
> the implicit call stack within LLVM is not used for call and return.
>
> When making a non-tail call while in CPS, we initialize a stack frame for
the
> return through our own stack pointer, and then pass that stack pointer to
the
> callee when we jump to it. It is here when we run into a problem in LLVM.
>
> Consider the following CPS call to @bar and how it will return:
>
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
> define void @foo (i8** %sp, ...) {
> someBlk:
>      ; ...
>      ; finish stack frame by writing return address
>    %retAddr = blockaddress(@foo, %retpt)
>    store i8* %retAddr, i8** %sp
>      ; jump to @bar
>    tail call void @bar(i8** %sp, ...)
>
>   retpt: 				; <- how can @bar "call" %retpt?
>     %sp2 = ???
>     %val = ???
>     ; ...
>
>   }
>
>   define void @bar (i8** %sp, ...) {
>   	  ; perform a return
>   	%retAddr0 = load i8*, i8** %sp
>   	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>   	%val = bitcast i64 1 to i64
>   	  ; jump back to %retpt in @foo, passing %sp and %val
>   	tail call void %retAddr1(i8** %sp, i64 %val)
>   }
>
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
> There is currently no way to jump back to %retpt from another function, as
block
> addresses have restricted usage in LLVM [1]. Our main difficulty is that we
> cannot jump to a block address without knowing its calling convention,
i.e., the
> particular machine registers (or memory locations) that the block expects
> incoming values to be passed in.
>
> The workaround we have been using in GHC for LLVM is to break apart every
> function, placing the code for the continuation of each call into a new
> function. We do this only so that we can store a function pointer instead
of a
> block address to our stack. This particularly gross transformation inhibits
> optimizations in both GHC and LLVM, and we would like to remove the need
for it.Can you give a bit more information on what this transformation breaks 
optimization wise?  You're essentially explicitly representing the 
continuation and converting all calls into tail calls with an explicit 
continuation argument.  I would expect the LLVM optimizer to do 
reasonable well with such a representation.>
>
> Proposal
> =======>
> I believe the lowest-impact method of fixing this problem with LLVM is the
> following:
>
> First, we add a special 'cps' call instruction marker to be used on
non-tail
> calls. Then, we use a specialized calling convention for these non-tail
calls,
> which fix the returned values to specific locations in the machine code
[2].
>
> To help illustrate what's going on, let's rewrite the above example
using the
> proposed 'cps' call:
>
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
> define { ... } @foo (i8** %sp, ...) {
> someBlk:
>      ; ...
>      ; finish stack frame by writing return address
>    %retAddr = blockaddress(@foo, %retpt)
>    store i8* %retAddr, i8** %sp
>      ; jump to @bar
>    %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>    br label %retpt
>
>   retpt:
>     %sp2 = extractvalue {i8**, i64} %retVals, 0
>     %val = extractvalue {i8**, i64} %retVals, 1
>     ; ...
>
>   }
>
>   define {i8**, i64} @bar (i8** %sp, ...) {
>   	  ; perform a return
>   	%retAddr0 = load i8*, i8** %sp
>   	%retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
>   	%val = bitcast i64 1 to i64
>   	  ; jump back to %retpt in @foo, passing %sp and %val
>   	tail call ghccc void %retAddr1(i8** %sp, i64 %val)
>
>   	unreachable   ; <- ideally this would be our terminator,
>   	              ; but 'unreachable' breaks TCO, so we will
>   	              ; emit a ret of the struct "returned" by the
call.
>   }
>
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
> The important point here is that the 'cps' marked call will lower
to a jump. The
> 'cps' call marker means that the callee knows how to return using
the arguments
> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot use
a
> 'ret' instruction if it is 'cps' called.
>
> Either before or during 'cps' call lowering, any instructions
following the
> 'cps' call to @bar are sunk into the the block %retpt, and the
unconditional
> branch to %retpt is deleted/ignored. We include that branch to preserve
> control-flow information for LLVM IR optimization passes.
>
> The 'extractvalue' instructions are what ensure the calling
convention of
> %retpt, since the fields of the struct %retVals are returned in physical
> registers dictated by the (modified) ghccc convention. Those same physical
> registers are used by the ghccc tail call in @bar when it jumps back to
%retpt.
> So, the call & return convention of ghccc ensures that everything
matches up.
>
>
> Interaction with LLVM
> ====================>
> (1) Caller-saved Values
>
> One may wonder how this would work if there are caller-saved values of the
'cps'
> call. But, in our first example, which closely matches what CPS code looks
like,
> the call to @bar was in tail position. Thus, in the second example, there
are no
> caller-saved values for the 'cps' call to @bar, as all live values
were passed
> as arguments in the call.
>
> This caller-saved part is a bit subtle. It works fine in my experience [2]
when
> @bar is a function not visible to LLVM. My impression is that even if @bar
is
> visible to LLVM, there is still no issue, but if you can think of any
corner
> cases that would be great!
>
> (2) Inlining
>
> My gut feeling is that we cannot inline a 'cps' marked call-site
without more
> effort. This is because we might end up with something odd like this once
the
> dust settles:
>
>      %retAddr = blockaddress(@foo, %retpt)
>      %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
>      tail call ghccc %retAddr1 ( %sp, ... )
>
> We could add a pass that turns the above sequence into just an
unconditional
> branch to %retpt, using a phi-node to replace each 'extractvalue'
instruction in
> that block.
>
> I'm not sure whether inlining in LLVM is important for us yet, as we
tend to
> inline quite a lot before generating LLVM IR. I don't think this
additional fix-
> up pass would be too difficult to implement if it's desired.
>
>
> Implementation Sketch and Conclusion
> ===================================>
> My current plan is to add this special lowering of 'cps' calls
during the
> translation from LLVM IR to the SelectionDAG. I welcome any suggestions or
tips
> on the best way to approach this. An important goal for us is to merge this
into
> trunk since we do not want to bundle a special version of LLVM with GHC.
>
> Please let me know soon if you have any objections to this feature.
>
> Thanks for reading,
> KavonHonestly, this proposed design seems seriously problematic.  The 
semantics around inlining alone are problematic enough to warrant 
serious hesitation.  I think you need to take a step back, describe the 
original problem you're trying to solve and why the standard conversion 
approaches don't work out.  You'd need to strongly motivate a change of 
this magnitude and I really don't feel like you've done so so
far.>
>
> References
> =========>
> [1] http://llvm.org/docs/LangRef.html#blockaddress
> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reid Kleckner via llvm-dev

2017-Apr-18 22:00 UTC

head link

[llvm-dev] [RFC] Adding CPS call support

This seems to be solving a problem very similar to C++ coroutines. You
might find it helpful to read the past discussion of their design in LLVM.

What do you think is more important for GHC: getting good or highly tunable
low-level code, or enabling mid-level optimizations such as GVN across
calls? Is GHC using LLVM more as an optimizer or as a code generator?

If GHC's code generation pipeline leaves opportunities for LLVM mid-level
optimization, then I think you want to pursue a design similar to C++
coroutines, where we leave the control flow graph "uninverted" until
code
generation. I might be making up terminology here, but I think of
transforming a program representation to continuation-passing style as
"inverting" the CFG. I'm not suggesting that you use the native
stack to
store VM data, I'm just suggesting that you hide the fact that CPS calls
will really be tail calls and returns from LLVM until code generation.

On Mon, Apr 17, 2017 at 8:30 AM, Kavon Farvardin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Summary
> ======>
> There is a need for dedicated continuation-passing style (CPS) calls in
> LLVM to
> support functional languages. Herein I describe the problem and propose a
> solution. Feedback and/or tips are greatly appreciated, as our goal is to
> implement these changes so they can be merged into LLVM trunk.
>
>
> Problem
> ======>
> Implementations of functional languages like Haskell and ML (e.g., GHC and
> Manticore) use a continuation-passing style (CPS) transformation in order
> to
> manage the call stack explicitly. This is done prior to generating LLVM
> IR, so
> the implicit call stack within LLVM is not used for call and return.
>
> When making a non-tail call while in CPS, we initialize a stack frame for
> the
> return through our own stack pointer, and then pass that stack pointer to
> the
> callee when we jump to it. It is here when we run into a problem in LLVM.
>
> Consider the following CPS call to @bar and how it will return:
>
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
> define void @foo (i8** %sp, ...) {
> someBlk:
>     ; ...
>     ; finish stack frame by writing return address
>   %retAddr = blockaddress(@foo, %retpt)
>   store i8* %retAddr, i8** %sp
>     ; jump to @bar
>   tail call void @bar(i8** %sp, ...)
>
>  retpt:                                 ; <- how can @bar
"call" %retpt?
>    %sp2 = ???
>    %val = ???
>    ; ...
>
>  }
>
>  define void @bar (i8** %sp, ...) {
>           ; perform a return
>         %retAddr0 = load i8*, i8** %sp
>         %retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>         %val = bitcast i64 1 to i64
>           ; jump back to %retpt in @foo, passing %sp and %val
>         tail call void %retAddr1(i8** %sp, i64 %val)
>  }
>
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
> There is currently no way to jump back to %retpt from another function, as
> block
> addresses have restricted usage in LLVM [1]. Our main difficulty is that we
> cannot jump to a block address without knowing its calling convention,
> i.e., the
> particular machine registers (or memory locations) that the block expects
> incoming values to be passed in.
>
> The workaround we have been using in GHC for LLVM is to break apart every
> function, placing the code for the continuation of each call into a new
> function. We do this only so that we can store a function pointer instead
> of a
> block address to our stack. This particularly gross transformation inhibits
> optimizations in both GHC and LLVM, and we would like to remove the need
> for it.
>
>
> Proposal
> =======>
> I believe the lowest-impact method of fixing this problem with LLVM is the
> following:
>
> First, we add a special 'cps' call instruction marker to be used on
> non-tail
> calls. Then, we use a specialized calling convention for these non-tail
> calls,
> which fix the returned values to specific locations in the machine code
> [2].
>
> To help illustrate what's going on, let's rewrite the above example
using
> the
> proposed 'cps' call:
>
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
> define { ... } @foo (i8** %sp, ...) {
> someBlk:
>     ; ...
>     ; finish stack frame by writing return address
>   %retAddr = blockaddress(@foo, %retpt)
>   store i8* %retAddr, i8** %sp
>     ; jump to @bar
>   %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>   br label %retpt
>
>  retpt:
>    %sp2 = extractvalue {i8**, i64} %retVals, 0
>    %val = extractvalue {i8**, i64} %retVals, 1
>    ; ...
>
>  }
>
>  define {i8**, i64} @bar (i8** %sp, ...) {
>           ; perform a return
>         %retAddr0 = load i8*, i8** %sp
>         %retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
>         %val = bitcast i64 1 to i64
>           ; jump back to %retpt in @foo, passing %sp and %val
>         tail call ghccc void %retAddr1(i8** %sp, i64 %val)
>
>         unreachable   ; <- ideally this would be our terminator,
>                       ; but 'unreachable' breaks TCO, so we will
>                       ; emit a ret of the struct "returned" by
the call.
>  }
>
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
> The important point here is that the 'cps' marked call will lower
to a
> jump. The
> 'cps' call marker means that the callee knows how to return using
the
> arguments
> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot
> use a
> 'ret' instruction if it is 'cps' called.
>
> Either before or during 'cps' call lowering, any instructions
following the
> 'cps' call to @bar are sunk into the the block %retpt, and the
> unconditional
> branch to %retpt is deleted/ignored. We include that branch to preserve
> control-flow information for LLVM IR optimization passes.
>
> The 'extractvalue' instructions are what ensure the calling
convention of
> %retpt, since the fields of the struct %retVals are returned in physical
> registers dictated by the (modified) ghccc convention. Those same physical
> registers are used by the ghccc tail call in @bar when it jumps back to
> %retpt.
> So, the call & return convention of ghccc ensures that everything
matches
> up.
>
>
> Interaction with LLVM
> ====================>
> (1) Caller-saved Values
>
> One may wonder how this would work if there are caller-saved values of the
> 'cps'
> call. But, in our first example, which closely matches what CPS code looks
> like,
> the call to @bar was in tail position. Thus, in the second example, there
> are no
> caller-saved values for the 'cps' call to @bar, as all live values
were
> passed
> as arguments in the call.
>
> This caller-saved part is a bit subtle. It works fine in my experience [2]
> when
> @bar is a function not visible to LLVM. My impression is that even if @bar
> is
> visible to LLVM, there is still no issue, but if you can think of any
> corner
> cases that would be great!
>
> (2) Inlining
>
> My gut feeling is that we cannot inline a 'cps' marked call-site
without
> more
> effort. This is because we might end up with something odd like this once
> the
> dust settles:
>
>     %retAddr = blockaddress(@foo, %retpt)
>     %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
>     tail call ghccc %retAddr1 ( %sp, ... )
>
> We could add a pass that turns the above sequence into just an
> unconditional
> branch to %retpt, using a phi-node to replace each 'extractvalue'
> instruction in
> that block.
>
> I'm not sure whether inlining in LLVM is important for us yet, as we
tend
> to
> inline quite a lot before generating LLVM IR. I don't think this
> additional fix-
> up pass would be too difficult to implement if it's desired.
>
>
> Implementation Sketch and Conclusion
> ===================================>
> My current plan is to add this special lowering of 'cps' calls
during the
> translation from LLVM IR to the SelectionDAG. I welcome any suggestions or
> tips
> on the best way to approach this. An important goal for us is to merge
> this into
> trunk since we do not want to bundle a special version of LLVM with GHC.
>
> Please let me know soon if you have any objections to this feature.
>
> Thanks for reading,
> Kavon
>
>
> References
> =========>
> [1] http://llvm.org/docs/LangRef.html#blockaddress
> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170418/8fbd61fb/attachment.html>

Gor Nishanov via llvm-dev

2017-Apr-18 23:12 UTC

head link

[llvm-dev] [RFC] Adding CPS call support

It does look similar. I was considering adding a few more intrinsics to support
CPS case, but, decided to defer until we have use cases and people willing to do
the work ☺.
Coroutine passes chop up the function into smaller pieces at suspend points, a
CPS call can be represented as a suspend point.

Currently a suspend point looks like this:

  %sp1 = call token @llvm.coro.save()
  <some code may go here> ; if this is a call, it will be replaced with
tail call where calling convention allows
  call i8 @llvm.coro.suspend(token %sp1) ;

Two extra intrinsics I was considering were:


%addr = i8* @llvm.resume.addr(tok) ; gives an address of an extracted
continuation function.

<type> @llvm.resume.value.<type>(tok) ; gives a return value on
resume



Int foo() {

  PartA…

  Int r = bar(args);

  PartB …
}

It can get represented as:

Int foo() {
  PartA
  %sp1 = call token %llvm.coro.save()
  %resume_addr = call @llvm.coro.resume.addr(%sp1)
   call bar(args, %resume_addr)
  call @llvm.coro.suspend(%sp1)
  %r = call %llvm.coro.resume.value<int>(%sp1)
  PartB
}

Coroutine passes after foo() is optimized will extract the continuation into a
separate function:

foo$resume1(int r) {
   PartB
}

And suspend point will be replaced with a jump to bar()


From: Reid Kleckner [mailto:rnk at google.com]
Sent: Tuesday, April 18, 2017 3:00 PM
To: Kavon Farvardin <kavon at farvard.in>; Gor Nishanov <gorn at
microsoft.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [RFC] Adding CPS call support

This seems to be solving a problem very similar to C++ coroutines. You might
find it helpful to read the past discussion of their design in LLVM.

What do you think is more important for GHC: getting good or highly tunable
low-level code, or enabling mid-level optimizations such as GVN across calls? Is
GHC using LLVM more as an optimizer or as a code generator?

If GHC's code generation pipeline leaves opportunities for LLVM mid-level
optimization, then I think you want to pursue a design similar to C++
coroutines, where we leave the control flow graph "uninverted" until
code generation. I might be making up terminology here, but I think of
transforming a program representation to continuation-passing style as
"inverting" the CFG. I'm not suggesting that you use the native
stack to store VM data, I'm just suggesting that you hide the fact that CPS
calls will really be tail calls and returns from LLVM until code generation.

On Mon, Apr 17, 2017 at 8:30 AM, Kavon Farvardin via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Summary
======
There is a need for dedicated continuation-passing style (CPS) calls in LLVM to
support functional languages. Herein I describe the problem and propose a
solution. Feedback and/or tips are greatly appreciated, as our goal is to
implement these changes so they can be merged into LLVM trunk.


Problem
======
Implementations of functional languages like Haskell and ML (e.g., GHC and
Manticore) use a continuation-passing style (CPS) transformation in order to
manage the call stack explicitly. This is done prior to generating LLVM IR, so
the implicit call stack within LLVM is not used for call and return.

When making a non-tail call while in CPS, we initialize a stack frame for the
return through our own stack pointer, and then pass that stack pointer to the
callee when we jump to it. It is here when we run into a problem in LLVM.

Consider the following CPS call to @bar and how it will return:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

define void @foo (i8** %sp, ...) {
someBlk:
    ; ...
    ; finish stack frame by writing return address
  %retAddr = blockaddress(@foo, %retpt)
  store i8* %retAddr, i8** %sp
    ; jump to @bar
  tail call void @bar(i8** %sp, ...)

 retpt:                                 ; <- how can @bar "call"
%retpt?
   %sp2 = ???
   %val = ???
   ; ...

 }

 define void @bar (i8** %sp, ...) {
          ; perform a return
        %retAddr0 = load i8*, i8** %sp
        %retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
        %val = bitcast i64 1 to i64
          ; jump back to %retpt in @foo, passing %sp and %val
        tail call void %retAddr1(i8** %sp, i64 %val)
 }

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

There is currently no way to jump back to %retpt from another function, as block
addresses have restricted usage in LLVM [1]. Our main difficulty is that we
cannot jump to a block address without knowing its calling convention, i.e., the
particular machine registers (or memory locations) that the block expects
incoming values to be passed in.

The workaround we have been using in GHC for LLVM is to break apart every
function, placing the code for the continuation of each call into a new
function. We do this only so that we can store a function pointer instead of a
block address to our stack. This particularly gross transformation inhibits
optimizations in both GHC and LLVM, and we would like to remove the need for it.


Proposal
=======
I believe the lowest-impact method of fixing this problem with LLVM is the
following:

First, we add a special 'cps' call instruction marker to be used on
non-tail
calls. Then, we use a specialized calling convention for these non-tail calls,
which fix the returned values to specific locations in the machine code [2].

To help illustrate what's going on, let's rewrite the above example
using the
proposed 'cps' call:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

define { ... } @foo (i8** %sp, ...) {
someBlk:
    ; ...
    ; finish stack frame by writing return address
  %retAddr = blockaddress(@foo, %retpt)
  store i8* %retAddr, i8** %sp
    ; jump to @bar
  %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
  br label %retpt

 retpt:
   %sp2 = extractvalue {i8**, i64} %retVals, 0
   %val = extractvalue {i8**, i64} %retVals, 1
   ; ...

 }

 define {i8**, i64} @bar (i8** %sp, ...) {
          ; perform a return
        %retAddr0 = load i8*, i8** %sp
        %retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
        %val = bitcast i64 1 to i64
          ; jump back to %retpt in @foo, passing %sp and %val
        tail call ghccc void %retAddr1(i8** %sp, i64 %val)

        unreachable   ; <- ideally this would be our terminator,
                      ; but 'unreachable' breaks TCO, so we will
                      ; emit a ret of the struct "returned" by the
call.
 }

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

The important point here is that the 'cps' marked call will lower to a
jump. The
'cps' call marker means that the callee knows how to return using the
arguments
explicitly passed to it, i.e., the stack pointer %sp. The callee cannot use a
'ret' instruction if it is 'cps' called.

Either before or during 'cps' call lowering, any instructions following
the
'cps' call to @bar are sunk into the the block %retpt, and the
unconditional
branch to %retpt is deleted/ignored. We include that branch to preserve
control-flow information for LLVM IR optimization passes.

The 'extractvalue' instructions are what ensure the calling convention
of
%retpt, since the fields of the struct %retVals are returned in physical
registers dictated by the (modified) ghccc convention. Those same physical
registers are used by the ghccc tail call in @bar when it jumps back to %retpt.
So, the call & return convention of ghccc ensures that everything matches
up.


Interaction with LLVM
====================
(1) Caller-saved Values

One may wonder how this would work if there are caller-saved values of the
'cps'
call. But, in our first example, which closely matches what CPS code looks like,
the call to @bar was in tail position. Thus, in the second example, there are no
caller-saved values for the 'cps' call to @bar, as all live values were
passed
as arguments in the call.

This caller-saved part is a bit subtle. It works fine in my experience [2] when
@bar is a function not visible to LLVM. My impression is that even if @bar is
visible to LLVM, there is still no issue, but if you can think of any corner
cases that would be great!

(2) Inlining

My gut feeling is that we cannot inline a 'cps' marked call-site without
more
effort. This is because we might end up with something odd like this once the
dust settles:

    %retAddr = blockaddress(@foo, %retpt)
    %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
    tail call ghccc %retAddr1 ( %sp, ... )

We could add a pass that turns the above sequence into just an unconditional
branch to %retpt, using a phi-node to replace each 'extractvalue'
instruction in
that block.

I'm not sure whether inlining in LLVM is important for us yet, as we tend to
inline quite a lot before generating LLVM IR. I don't think this additional
fix-
up pass would be too difficult to implement if it's desired.


Implementation Sketch and Conclusion
===================================
My current plan is to add this special lowering of 'cps' calls during
the
translation from LLVM IR to the SelectionDAG. I welcome any suggestions or tips
on the best way to approach this. An important goal for us is to merge this into
trunk since we do not want to bundle a special version of LLVM with GHC.

Please let me know soon if you have any objections to this feature.

Thanks for reading,
Kavon


References
=========
[1]
http://llvm.org/docs/LangRef.html#blockaddress<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fllvm.org%2Fdocs%2FLangRef.html%23blockaddress&data=02%7C01%7Cgorn%40microsoft.com%7C27c084b10ef140f65ed708d486a64e30%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636281496211718883&sdata=zSkPdTTe2DsfZPoNCiWJGYU%2Byi%2BVyJZIECqpffATfhU%3D&reserved=0>
[2]
http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fkavon.farvard.in%2Fpapers%2Fml16-cwc-llvm.pdf&data=02%7C01%7Cgorn%40microsoft.com%7C27c084b10ef140f65ed708d486a64e30%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636281496211718883&sdata=1xT8khSGaDvreysDiINp9Kss65o492horkNqMTCsYRs%3D&reserved=0>


_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev&data=02%7C01%7Cgorn%40microsoft.com%7C27c084b10ef140f65ed708d486a64e30%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636281496211718883&sdata=XWL4PwGJXuL3kbRueMFl7o0v6GeupHRu8L2mWD9h350%3D&reserved=0>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170418/e6ef6d7a/attachment-0001.html>

Kavon Farvardin via llvm-dev

2017-Apr-19 11:22 UTC

head link

[llvm-dev] [RFC] Adding CPS call support

> What do you think is more important for GHC: getting good or highly tunable
low-level code, or enabling mid-level optimizations such as GVN across calls? Is
GHC using LLVM more as an optimizer or as a code generator?
GHC currently uses LLVM more as a code generator; its an alternative backend. 

While some optimizations may not be effective on the proposed construct, we
already produce rather horrible looking code just to use LLVM. Our function
splitting transformation that turns it into this mess is complicated to
maintain, so we'd like to remove it.

I see at least one performance benefit of not breaking the program into hundreds
of tiny functions: better code placement for I-cache locality.

~kavon

> On Apr 18, 2017, at 11:00 PM, Reid Kleckner <rnk at google.com>
wrote:
> 
> This seems to be solving a problem very similar to C++ coroutines. You
might find it helpful to read the past discussion of their design in LLVM.
> 
> What do you think is more important for GHC: getting good or highly tunable
low-level code, or enabling mid-level optimizations such as GVN across calls? Is
GHC using LLVM more as an optimizer or as a code generator?
> 
> If GHC's code generation pipeline leaves opportunities for LLVM
mid-level optimization, then I think you want to pursue a design similar to C++
coroutines, where we leave the control flow graph "uninverted" until
code generation. I might be making up terminology here, but I think of
transforming a program representation to continuation-passing style as
"inverting" the CFG. I'm not suggesting that you use the native
stack to store VM data, I'm just suggesting that you hide the fact that CPS
calls will really be tail calls and returns from LLVM until code generation.
> 
> On Mon, Apr 17, 2017 at 8:30 AM, Kavon Farvardin via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> Summary
> ======> 
> There is a need for dedicated continuation-passing style (CPS) calls in
LLVM to
> support functional languages. Herein I describe the problem and propose a
> solution. Feedback and/or tips are greatly appreciated, as our goal is to
> implement these changes so they can be merged into LLVM trunk.
> 
> 
> Problem
> ======> 
> Implementations of functional languages like Haskell and ML (e.g., GHC and
> Manticore) use a continuation-passing style (CPS) transformation in order
to
> manage the call stack explicitly. This is done prior to generating LLVM IR,
so
> the implicit call stack within LLVM is not used for call and return.
> 
> When making a non-tail call while in CPS, we initialize a stack frame for
the
> return through our own stack pointer, and then pass that stack pointer to
the
> callee when we jump to it. It is here when we run into a problem in LLVM.
> 
> Consider the following CPS call to @bar and how it will return:
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> define void @foo (i8** %sp, ...) {
> someBlk:
>     ; ...
>     ; finish stack frame by writing return address
>   %retAddr = blockaddress(@foo, %retpt)
>   store i8* %retAddr, i8** %sp
>     ; jump to @bar
>   tail call void @bar(i8** %sp, ...)
> 
>  retpt:                                 ; <- how can @bar
"call" %retpt?
>    %sp2 = ???
>    %val = ???
>    ; ...
> 
>  }
> 
>  define void @bar (i8** %sp, ...) {
>           ; perform a return
>         %retAddr0 = load i8*, i8** %sp
>         %retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>         %val = bitcast i64 1 to i64
>           ; jump back to %retpt in @foo, passing %sp and %val
>         tail call void %retAddr1(i8** %sp, i64 %val)
>  }
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> There is currently no way to jump back to %retpt from another function, as
block
> addresses have restricted usage in LLVM [1]. Our main difficulty is that we
> cannot jump to a block address without knowing its calling convention,
i.e., the
> particular machine registers (or memory locations) that the block expects
> incoming values to be passed in.
> 
> The workaround we have been using in GHC for LLVM is to break apart every
> function, placing the code for the continuation of each call into a new
> function. We do this only so that we can store a function pointer instead
of a
> block address to our stack. This particularly gross transformation inhibits
> optimizations in both GHC and LLVM, and we would like to remove the need
for it.
> 
> 
> Proposal
> =======> 
> I believe the lowest-impact method of fixing this problem with LLVM is the
> following:
> 
> First, we add a special 'cps' call instruction marker to be used on
non-tail
> calls. Then, we use a specialized calling convention for these non-tail
calls,
> which fix the returned values to specific locations in the machine code
[2].
> 
> To help illustrate what's going on, let's rewrite the above example
using the
> proposed 'cps' call:
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> define { ... } @foo (i8** %sp, ...) {
> someBlk:
>     ; ...
>     ; finish stack frame by writing return address
>   %retAddr = blockaddress(@foo, %retpt)
>   store i8* %retAddr, i8** %sp
>     ; jump to @bar
>   %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>   br label %retpt
> 
>  retpt:
>    %sp2 = extractvalue {i8**, i64} %retVals, 0
>    %val = extractvalue {i8**, i64} %retVals, 1
>    ; ...
> 
>  }
> 
>  define {i8**, i64} @bar (i8** %sp, ...) {
>           ; perform a return
>         %retAddr0 = load i8*, i8** %sp
>         %retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
>         %val = bitcast i64 1 to i64
>           ; jump back to %retpt in @foo, passing %sp and %val
>         tail call ghccc void %retAddr1(i8** %sp, i64 %val)
> 
>         unreachable   ; <- ideally this would be our terminator,
>                       ; but 'unreachable' breaks TCO, so we will
>                       ; emit a ret of the struct "returned" by
the call.
>  }
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> The important point here is that the 'cps' marked call will lower
to a jump. The
> 'cps' call marker means that the callee knows how to return using
the arguments
> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot use
a
> 'ret' instruction if it is 'cps' called.
> 
> Either before or during 'cps' call lowering, any instructions
following the
> 'cps' call to @bar are sunk into the the block %retpt, and the
unconditional
> branch to %retpt is deleted/ignored. We include that branch to preserve
> control-flow information for LLVM IR optimization passes.
> 
> The 'extractvalue' instructions are what ensure the calling
convention of
> %retpt, since the fields of the struct %retVals are returned in physical
> registers dictated by the (modified) ghccc convention. Those same physical
> registers are used by the ghccc tail call in @bar when it jumps back to
%retpt.
> So, the call & return convention of ghccc ensures that everything
matches up.
> 
> 
> Interaction with LLVM
> ====================> 
> (1) Caller-saved Values
> 
> One may wonder how this would work if there are caller-saved values of the
'cps'
> call. But, in our first example, which closely matches what CPS code looks
like,
> the call to @bar was in tail position. Thus, in the second example, there
are no
> caller-saved values for the 'cps' call to @bar, as all live values
were passed
> as arguments in the call.
> 
> This caller-saved part is a bit subtle. It works fine in my experience [2]
when
> @bar is a function not visible to LLVM. My impression is that even if @bar
is
> visible to LLVM, there is still no issue, but if you can think of any
corner
> cases that would be great!
> 
> (2) Inlining
> 
> My gut feeling is that we cannot inline a 'cps' marked call-site
without more
> effort. This is because we might end up with something odd like this once
the
> dust settles:
> 
>     %retAddr = blockaddress(@foo, %retpt)
>     %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
>     tail call ghccc %retAddr1 ( %sp, ... )
> 
> We could add a pass that turns the above sequence into just an
unconditional
> branch to %retpt, using a phi-node to replace each 'extractvalue'
instruction in
> that block.
> 
> I'm not sure whether inlining in LLVM is important for us yet, as we
tend to
> inline quite a lot before generating LLVM IR. I don't think this
additional fix-
> up pass would be too difficult to implement if it's desired.
> 
> 
> Implementation Sketch and Conclusion
> ===================================> 
> My current plan is to add this special lowering of 'cps' calls
during the
> translation from LLVM IR to the SelectionDAG. I welcome any suggestions or
tips
> on the best way to approach this. An important goal for us is to merge this
into
> trunk since we do not want to bundle a special version of LLVM with GHC.
> 
> Please let me know soon if you have any objections to this feature.
> 
> Thanks for reading,
> Kavon
> 
> 
> References
> =========> 
> [1] http://llvm.org/docs/LangRef.html#blockaddress
<http://llvm.org/docs/LangRef.html#blockaddress>
> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf
<http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170419/f557071f/attachment.html>

Kavon Farvardin via llvm-dev

2017-Apr-19 17:38 UTC

head link

[llvm-dev] [RFC] Adding CPS call support

> The semantics around inlining alone are problematic enough to warrant
serious hesitation.
There are nicer ways to embed CPS call/return into LLVM; I just figured that
there would not be much support for adding a new terminator because it would
change a lot of code. Ideally we would have a block terminator like:

    cps call ghccc @bar (.. args ..) returnsto label %retpt

Where the "returnsto" is optional to cover both CPS calls and returns.
Another alternative would be to emulate this terminator using an intrinsic.

So, my question is: would there be more support for a version of this terminator
(whether as an intrinsic or not) instead of what I was proposing earlier?

~kavon
> On Apr 18, 2017, at 7:24 PM, Philip Reames via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> 
> On 04/17/2017 08:30 AM, Kavon Farvardin via llvm-dev wrote:
>> Summary
>> ======>> 
>> There is a need for dedicated continuation-passing style (CPS) calls in
LLVM to
>> support functional languages. Herein I describe the problem and propose
a
>> solution. Feedback and/or tips are greatly appreciated, as our goal is
to
>> implement these changes so they can be merged into LLVM trunk.
>> 
>> 
>> Problem
>> ======>> 
>> Implementations of functional languages like Haskell and ML (e.g., GHC
and
>> Manticore) use a continuation-passing style (CPS) transformation in
order to
>> manage the call stack explicitly. This is done prior to generating LLVM
IR, so
>> the implicit call stack within LLVM is not used for call and return.
>> 
>> When making a non-tail call while in CPS, we initialize a stack frame
for the
>> return through our own stack pointer, and then pass that stack pointer
to the
>> callee when we jump to it. It is here when we run into a problem in
LLVM.
>> 
>> Consider the following CPS call to @bar and how it will return:
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> define void @foo (i8** %sp, ...) {
>> someBlk:
>>     ; ...
>>     ; finish stack frame by writing return address
>>   %retAddr = blockaddress(@foo, %retpt)
>>   store i8* %retAddr, i8** %sp
>>     ; jump to @bar
>>   tail call void @bar(i8** %sp, ...)
>> 
>>  retpt: 				; <- how can @bar "call" %retpt?
>>    %sp2 = ???
>>    %val = ???
>>    ; ...
>> 
>>  }
>> 
>>  define void @bar (i8** %sp, ...) {
>>  	  ; perform a return
>>  	%retAddr0 = load i8*, i8** %sp
>>  	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>>  	%val = bitcast i64 1 to i64
>>  	  ; jump back to %retpt in @foo, passing %sp and %val
>>  	tail call void %retAddr1(i8** %sp, i64 %val)
>>  }
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> There is currently no way to jump back to %retpt from another function,
as block
>> addresses have restricted usage in LLVM [1]. Our main difficulty is
that we
>> cannot jump to a block address without knowing its calling convention,
i.e., the
>> particular machine registers (or memory locations) that the block
expects
>> incoming values to be passed in.
>> 
>> The workaround we have been using in GHC for LLVM is to break apart
every
>> function, placing the code for the continuation of each call into a new
>> function. We do this only so that we can store a function pointer
instead of a
>> block address to our stack. This particularly gross transformation
inhibits
>> optimizations in both GHC and LLVM, and we would like to remove the
need for it.
> Can you give a bit more information on what this transformation breaks
optimization wise?  You're essentially explicitly representing the
continuation and converting all calls into tail calls with an explicit
continuation argument. I would expect the LLVM optimizer to do reasonable well
with such a representation.
>> 
>> 
>> Proposal
>> =======>> 
>> I believe the lowest-impact method of fixing this problem with LLVM is
the
>> following:
>> 
>> First, we add a special 'cps' call instruction marker to be
used on non-tail
>> calls. Then, we use a specialized calling convention for these non-tail
calls,
>> which fix the returned values to specific locations in the machine code
[2].
>> 
>> To help illustrate what's going on, let's rewrite the above
example using the
>> proposed 'cps' call:
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> define { ... } @foo (i8** %sp, ...) {
>> someBlk:
>>     ; ...
>>     ; finish stack frame by writing return address
>>   %retAddr = blockaddress(@foo, %retpt)
>>   store i8* %retAddr, i8** %sp
>>     ; jump to @bar
>>   %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>>   br label %retpt
>> 
>>  retpt:
>>    %sp2 = extractvalue {i8**, i64} %retVals, 0
>>    %val = extractvalue {i8**, i64} %retVals, 1
>>    ; ...
>> 
>>  }
>> 
>>  define {i8**, i64} @bar (i8** %sp, ...) {
>>  	  ; perform a return
>>  	%retAddr0 = load i8*, i8** %sp
>>  	%retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*
>>  	%val = bitcast i64 1 to i64
>>  	  ; jump back to %retpt in @foo, passing %sp and %val
>>  	tail call ghccc void %retAddr1(i8** %sp, i64 %val)
>> 
>>  	unreachable   ; <- ideally this would be our terminator,
>>  	              ; but 'unreachable' breaks TCO, so we will
>>  	              ; emit a ret of the struct "returned" by the
call.
>>  }
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> The important point here is that the 'cps' marked call will
lower to a jump. The
>> 'cps' call marker means that the callee knows how to return
using the arguments
>> explicitly passed to it, i.e., the stack pointer %sp. The callee cannot
use a
>> 'ret' instruction if it is 'cps' called.
>> 
>> Either before or during 'cps' call lowering, any instructions
following the
>> 'cps' call to @bar are sunk into the the block %retpt, and the
unconditional
>> branch to %retpt is deleted/ignored. We include that branch to preserve
>> control-flow information for LLVM IR optimization passes.
>> 
>> The 'extractvalue' instructions are what ensure the calling
convention of
>> %retpt, since the fields of the struct %retVals are returned in
physical
>> registers dictated by the (modified) ghccc convention. Those same
physical
>> registers are used by the ghccc tail call in @bar when it jumps back to
%retpt.
>> So, the call & return convention of ghccc ensures that everything
matches up.
>> 
>> 
>> Interaction with LLVM
>> ====================>> 
>> (1) Caller-saved Values
>> 
>> One may wonder how this would work if there are caller-saved values of
the 'cps'
>> call. But, in our first example, which closely matches what CPS code
looks like,
>> the call to @bar was in tail position. Thus, in the second example,
there are no
>> caller-saved values for the 'cps' call to @bar, as all live
values were passed
>> as arguments in the call.
>> 
>> This caller-saved part is a bit subtle. It works fine in my experience
[2] when
>> @bar is a function not visible to LLVM. My impression is that even if
@bar is
>> visible to LLVM, there is still no issue, but if you can think of any
corner
>> cases that would be great!
>> 
>> (2) Inlining
>> 
>> My gut feeling is that we cannot inline a 'cps' marked
call-site without more
>> effort. This is because we might end up with something odd like this
once the
>> dust settles:
>> 
>>     %retAddr = blockaddress(@foo, %retpt)
>>     %retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*
>>     tail call ghccc %retAddr1 ( %sp, ... )
>> 
>> We could add a pass that turns the above sequence into just an
unconditional
>> branch to %retpt, using a phi-node to replace each
'extractvalue' instruction in
>> that block.
>> 
>> I'm not sure whether inlining in LLVM is important for us yet, as
we tend to
>> inline quite a lot before generating LLVM IR. I don't think this
additional fix-
>> up pass would be too difficult to implement if it's desired.
>> 
>> 
>> Implementation Sketch and Conclusion
>> ===================================>> 
>> My current plan is to add this special lowering of 'cps' calls
during the
>> translation from LLVM IR to the SelectionDAG. I welcome any suggestions
or tips
>> on the best way to approach this. An important goal for us is to merge
this into
>> trunk since we do not want to bundle a special version of LLVM with
GHC.
>> 
>> Please let me know soon if you have any objections to this feature.
>> 
>> Thanks for reading,
>> Kavon
> Honestly, this proposed design seems seriously problematic.  The semantics
around inlining alone are problematic enough to warrant serious hesitation.  I
think you need to take a step back, describe the original problem you're
trying to solve and why the standard conversion approaches don't work out. 
You'd need to strongly motivate a change of this magnitude and I really
don't feel like you've done so so far.
>> 
>> 
>> References
>> =========>> 
>> [1] http://llvm.org/docs/LangRef.html#blockaddress
>> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Apr 2017 - [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

Reasonably Related Threads