thr3ads.net - llvm dev - [llvm-dev] [RFC] Adding CPS call support [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Kavon Farvardin via llvm-dev

2017-Apr-17 22:52 UTC

[llvm-dev] [RFC] Adding CPS call support

(Sorry for the 2nd email Eli, I forgot to reply-all).
> I'm not following how explicitly representing the return address of a
call in the IR before isel actually solves any relevant issue. We already pass
the return address implicitly as an argument to every call; you can retrieve it
with llvm.returnaddress if you need it.

Unfortunately the @llvm.returnaddress intrinsic does not solve the problem, as
it only reads the return address pushed onto the LLVM stack by a call. We would
then need a way to move the LLVM stack pointer back to where it was before the
call, because a CPS call _must_ not grow the LLVM stack (it will never be
popped!), so a 'call' instruction will not do.

> You can't branch across functions like you're proposing because the
stack and callee-save registers won't be in the right state.  LLVM will
inevitably save values to the stack for calls which are not tail calls. 
Therefore, this proposal simply doesn't work.
It might be helpful to think of the CPS transformation as turning the program
into a form such that it always moves "forward", in the sense that the
tail calls _never_ return. A "return" is simply another tail call
(i.e., a jump) to a return address, which looks no different than a function
call.

Thus, we have no callee-saved registers to worry about since the tail calls
never come back. In addition, there are no live values in the LLVM stack frame,
since they are all passed in the CPS call (which uses a different stack). Thus,
retpt receives all of its values from the struct it extracts from. While the
'cps' marked call appears to be non-tail, it will be lowered to a tail
call.

It's also worth noting that this proposal is based on something that already
works in LLVM, using a special assembly routine. We currently this workaround in
Manticore when a thread is preempted. But, it has too much overhead for GHC,
since the workaround would penalize general function calls. This is why support
is necessary for GHC (and related compilers) in order to use LLVM effectively.

~kavon
> On Apr 17, 2017, at 7:47 PM, Friedman, Eli <efriedma at
codeaurora.org> wrote:
> 
> On 4/17/2017 8:30 AM, Kavon Farvardin via llvm-dev wrote:
>> Summary
>> ======>> 
>> There is a need for dedicated continuation-passing style (CPS) calls in
LLVM to
>> support functional languages. Herein I describe the problem and propose
a
>> solution. Feedback and/or tips are greatly appreciated, as our goal is
to
>> implement these changes so they can be merged into LLVM trunk.
>> 
>> 
>> Problem
>> ======>> 
>> Implementations of functional languages like Haskell and ML (e.g., GHC
and
>> Manticore) use a continuation-passing style (CPS) transformation in
order to
>> manage the call stack explicitly. This is done prior to generating LLVM
IR, so
>> the implicit call stack within LLVM is not used for call and return.
>> 
>> When making a non-tail call while in CPS, we initialize a stack frame
for the
>> return through our own stack pointer, and then pass that stack pointer
to the
>> callee when we jump to it. It is here when we run into a problem in
LLVM.
>> 
>> Consider the following CPS call to @bar and how it will return:
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> define void @foo (i8** %sp, ...) {
>> someBlk:
>>     ; ...
>>     ; finish stack frame by writing return address
>>   %retAddr = blockaddress(@foo, %retpt)
>>   store i8* %retAddr, i8** %sp
>>     ; jump to @bar
>>   tail call void @bar(i8** %sp, ...)
>> 
>>  retpt: 				; <- how can @bar "call" %retpt?
>>    %sp2 = ???
>>    %val = ???
>>    ; ...
>> 
>>  }
>> 
>>  define void @bar (i8** %sp, ...) {
>>  	  ; perform a return
>>  	%retAddr0 = load i8*, i8** %sp
>>  	%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*
>>  	%val = bitcast i64 1 to i64
>>  	  ; jump back to %retpt in @foo, passing %sp and %val
>>  	tail call void %retAddr1(i8** %sp, i64 %val)
>>  }
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> There is currently no way to jump back to %retpt from another function,
as block
>> addresses have restricted usage in LLVM [1]. Our main difficulty is
that we
>> cannot jump to a block address without knowing its calling convention,
i.e., the
>> particular machine registers (or memory locations) that the block
expects
>> incoming values to be passed in.
>> 
>> The workaround we have been using in GHC for LLVM is to break apart
every
>> function, placing the code for the continuation of each call into a new
>> function. We do this only so that we can store a function pointer
instead of a
>> block address to our stack. This particularly gross transformation
inhibits
>> optimizations in both GHC and LLVM, and we would like to remove the
need for it.
>> 
>> 
>> Proposal
>> =======>> 
>> I believe the lowest-impact method of fixing this problem with LLVM is
the
>> following:
>> 
>> First, we add a special 'cps' call instruction marker to be
used on non-tail
>> calls. Then, we use a specialized calling convention for these non-tail
calls,
>> which fix the returned values to specific locations in the machine code
[2].
>> 
>> To help illustrate what's going on, let's rewrite the above
example using the
>> proposed 'cps' call:
>> 
>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> 
>> define { ... } @foo (i8** %sp, ...) {
>> someBlk:
>>     ; ...
>>     ; finish stack frame by writing return address
>>   %retAddr = blockaddress(@foo, %retpt)
>>   store i8* %retAddr, i8** %sp
>>     ; jump to @bar
>>   %retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)
>>   br label %retpt
> 
> I'm not following how explicitly representing the return address of a
call in the IR before isel actually solves any relevant issue.  We already pass
the return address implicitly as an argument to every call; you can retrieve it
with llvm.returnaddress if you need it.
> 
> You can't branch across functions like you're proposing because the
stack and callee-save registers won't be in the right state.  LLVM will
inevitably save values to the stack for calls which are not tail calls. 
Therefore, this proposal simply doesn't work.
> 
> -Eli
> 
> -- 
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
>

Friedman, Eli via llvm-dev

2017-Apr-18 01:32 UTC

head link

[llvm-dev] [RFC] Adding CPS call support

On 4/17/2017 3:52 PM, Kavon Farvardin wrote:> (Sorry for the 2nd email Eli, I forgot to reply-all).
>
>> I'm not following how explicitly representing the return address of
a call in the IR before isel actually solves any relevant issue. We already pass
the return address implicitly as an argument to every call; you can retrieve it
with llvm.returnaddress if you need it.
>
> Unfortunately the @llvm.returnaddress intrinsic does not solve the problem,
as it only reads the return address pushed onto the LLVM stack by a call. We
would then need a way to move the LLVM stack pointer back to where it was before
the call, because a CPS call _must_ not grow the LLVM stack (it will never be
popped!), so a 'call' instruction will not do.
Most architectures have a call instruction which does not push anything 
onto the stack; e.g. on ARM, the "BL" instruction saves the return 
address in LR.  On other architectures, you can emulate this (for 
example, you could lower an IR "call" to LEA+JMP on x86-64).
>> You can't branch across functions like you're proposing because
the stack and callee-save registers won't be in the right state.  LLVM will
inevitably save values to the stack for calls which are not tail calls. 
Therefore, this proposal simply doesn't work.
> It might be helpful to think of the CPS transformation as turning the
program into a form such that it always moves "forward", in the sense
that the tail calls _never_ return. A "return" is simply another tail
call (i.e., a jump) to a return address, which looks no different than a
function call.
>
> Thus, we have no callee-saved registers to worry about since the tail calls
never come back. In addition, there are no live values in the LLVM stack frame,
since they are all passed in the CPS call (which uses a different stack). Thus,
retpt receives all of its values from the struct it extracts from. While the
'cps' marked call appears to be non-tail, it will be lowered to a tail
call.
The return address for the current function is fundamentally live across 
any non-tail call.  And LLVM will hoist other operations across non-tail 
calls, and in the process introduce values which are live across calls.  
You need to save those values somewhere.  The key question is where.  
Your proposal tries to address that by explicitly saving/restoring the 
return address onto the GHC stack, but you're working at the wrong 
level; you can't get a complete list of which values are live across 
calls until register allocation.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

Kavon Farvardin via llvm-dev

2017-Apr-18 10:47 UTC

head link

[llvm-dev] [RFC] Adding CPS call support

> Most architectures have a call instruction which does not push anything
onto the stack; e.g. on ARM, the "BL" instruction saves the return
address in LR.  On other architectures, you can emulate this (for example, you
could lower an IR "call" to LEA+JMP on x86-64).
This is similar to what I was originally thinking, since the end goal of all of
this is the following machine code for a call (I'll use an x86-64 example):

    leaq  _retpt(%rip), %scratchReg
    movq  %scratchReg, (%ghcSPReg)
    jmp   _bar

But, if we want to end up with the above machine code using a custom lowering of
just the following IR instruction,

    %retVals = cps call ghccc @bar (... args ...) 

_without_ explicitly taking the block address and writing it to the GHC stack
pointer beforehand, there are some things to consider:

1. How do we prevent IR optimizations from eliminating the %retpt block?

  If we do not explicitly take the address of the %retpt block, a pass such as
simplifycfg has no reason to preserve the block, and may merge it with its only
predecessor.

2. Where in the GHC stack frame do we write the block address? 

We could hardcode the fact that the stack pointer must be pointing to the field
where the return address should be written to when making a 'cps' IR
call. This is fine for GHC, but is a seemingly unnecessary restriction.

3. How do we know which argument is the GHC stack pointer?  

  Similarly, we could hardcode the fact that, say, the first argument of every
'cps' IR call is the GHC stack pointer. Then we know from the calling
convention which register it is in. This again fine for GHC, but reduces the
generality of the feature.

> The return address for the current function is fundamentally live across
any non-tail call.
I'm not quite sure what you mean by this.

While there may be a return address handed to the main function (and thus passed
to every other function) it's either unused or not known to LLVM.

> And LLVM will hoist other operations across non-tail calls, and in the
process introduce values which are live across calls.  You need to save those
values somewhere.  The key question is where.  Your proposal tries to address
that by explicitly saving/restoring the return address onto the GHC stack, but
you're working at the wrong level; you can't get a complete list of
which values are live across calls until register allocation.
The part I had omitted from the proposal is the initialization of the rest of
the GHC stack frame, which is laid out in GHC by finding all of the values that
are live across the call before CPS conversion.

While the full list of values preserved by LLVM are not known until register
allocation, there are some properties of the LLVM IR we will initially generate
that are important to consider:

1. There are no IR values live across the non-tail 'cps' IR call, since
they are all passed to the callee.
2. All IR values used after the 'cps' IR call come from the struct
returned by that call.

Thus, it should not be possible to hoist instructions above this particular
non-tail call, as they are all derived from the values returned in the struct.
Any register spills allocated to the LLVM stack before the non-tail call are
dead, as they have been copied to the GHC stack frame if they were live.

If you have a particular example in mind that would be great.

~kavon

> On Apr 18, 2017, at 2:32 AM, Friedman, Eli <efriedma at
codeaurora.org> wrote:
> 
> On 4/17/2017 3:52 PM, Kavon Farvardin wrote:
>> (Sorry for the 2nd email Eli, I forgot to reply-all).
>> 
>>> I'm not following how explicitly representing the return
address of a call in the IR before isel actually solves any relevant issue. We
already pass the return address implicitly as an argument to every call; you can
retrieve it with llvm.returnaddress if you need it.
>> 
>> Unfortunately the @llvm.returnaddress intrinsic does not solve the
problem, as it only reads the return address pushed onto the LLVM stack by a
call. We would then need a way to move the LLVM stack pointer back to where it
was before the call, because a CPS call _must_ not grow the LLVM stack (it will
never be popped!), so a 'call' instruction will not do.
> 
> Most architectures have a call instruction which does not push anything
onto the stack; e.g. on ARM, the "BL" instruction saves the return
address in LR.  On other architectures, you can emulate this (for example, you
could lower an IR "call" to LEA+JMP on x86-64).
> 
>>> You can't branch across functions like you're proposing
because the stack and callee-save registers won't be in the right state. 
LLVM will inevitably save values to the stack for calls which are not tail
calls.  Therefore, this proposal simply doesn't work.
>> It might be helpful to think of the CPS transformation as turning the
program into a form such that it always moves "forward", in the sense
that the tail calls _never_ return. A "return" is simply another tail
call (i.e., a jump) to a return address, which looks no different than a
function call.
>> 
>> Thus, we have no callee-saved registers to worry about since the tail
calls never come back. In addition, there are no live values in the LLVM stack
frame, since they are all passed in the CPS call (which uses a different stack).
Thus, retpt receives all of its values from the struct it extracts from. While
the 'cps' marked call appears to be non-tail, it will be lowered to a
tail call.
> 
> The return address for the current function is fundamentally live across
any non-tail call.  And LLVM will hoist other operations across non-tail calls,
and in the process introduce values which are live across calls.  You need to
save those values somewhere.  The key question is where.  Your proposal tries to
address that by explicitly saving/restoring the return address onto the GHC
stack, but you're working at the wrong level; you can't get a complete
list of which values are live across calls until register allocation.
> 
> -Eli
> 
> -- 
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

llvm dev - Apr 2017 - [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support

[llvm-dev] [RFC] Adding CPS call support