thr3ads.net - llvm dev - [LLVMdev] Tail call optimization thoughts [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Anton Korobeynikov

2007-Aug-09 21:33 UTC

[LLVMdev] Tail call optimization thoughts

Hello, Arnold.

Only quick comments, I'll try to make a full review a little bit later.
> 0.)a fast calling convention (maybe use the current
>     CallingConv::Fast, or create a CallingConv::TailCall)
> 1.) lowering of formal arguments
>     like for example x86_LowerCCCArguments in  stdcall mode
>     we need to make sure that later mentioned CALL_CLOBBERED_REG is
>     not used (remove it from availableregisters in callingconv
>     for argument passing?)This can be acceptable, only if function has internal linkage.
Otherwise, we cannot change calling convention of the function. So
extension if this problem:
Let us have the following sequence of calls: a->b->c
Call b->c is tail call.

We need to formulate conditions of the call to be "suitable for tail
call". On x86 it's the following:
1. The calling conventions of b and c are "compatible". There are two
scenarios: caller cleans the stack (C calling convention), callee cleans
the stack (stdcall calling convention). The call is not suitable for
tail call, if b and c cleans the stack differently. For example, let b
has C CC and c - stdcall CC. In this case making tail call results to
double cleaning (on in a and one in c), which is incorrect.
2. The same situtation is for struct return functions: the caller pushes
extra pointer to return value, but callee is required to clean this
pointer out of the stack. So, call b->c is "suitable for tail call"
iff
both b and c are struct return.
3. Various tricky cases with FP. I don't remeber correctly, but there
can be some problems with functions returning FP values.
4. PIC. %ebx should be live in order to make a call via PLT.
Unfortunately, %ebx is callee-saved, that's why we can assemble tail
calls only to functions within the same module (PLT is not required).
5. %ecx is livein for regparm(3) functions.
... (maybe something I forget)

--
WBR, Anton Korobeynikov

Arnold Schwaighofer

2007-Aug-09 22:26 UTC

head link

[LLVMdev] Tail call optimization thoughts

On 8/9/07, Anton Korobeynikov <asl at math.spbu.ru>
wrote:> Hello, Arnold.
>
> Only quick comments, I'll try to make a full review a little bit later.
>
> > 0.)a fast calling convention (maybe use the current
> >     CallingConv::Fast, or create a CallingConv::TailCall)
> > 1.) lowering of formal arguments
> >     like for example x86_LowerCCCArguments in  stdcall mode
> >     we need to make sure that later mentioned CALL_CLOBBERED_REG is
> >     not used (remove it from availableregisters in callingconv
> >     for argument passing?)
> This can be acceptable, only if function has internal linkage.
> Otherwise, we cannot change calling convention of the function. So
> extension if this problem:again i did not express myself clearly -sorry - what i meant is not to
change the calling convention of the function but only do tail call
optimization for functions (caller-callee) that obey a special calling
convention (like for example x86_stdcall convention - with the
exception that x86_stdcall as implemented in the backend currently
allows for params passed in reg eax,ecx and edx, but when used in
future with tailcallopt one of those registers must be reserved for
the tailcallee address resulting in only 2 free registers for param
passing). if both caller and callee obey that "callingconv::fast" -
maximum 2args in register and keep one call clobbered free than do
tailcallopt.

what i also notice (in my first attempt) is that one has do keep the
stack aligned (on mac os x, calls to dynamically loaded functions want
the stack to be 16 byte aligned). when doing tailcalloptimization the
naive way (my 1. try) was to just adjust the stack by the argument
difference. but that could result in messing with the stack alignment.
so one has to round the argument on stack size to 16n+12 such that the
frame pointer difference of two functions is always a multiple of 16
(that is if the stackalignment has to be 16 byte).
> Let us have the following sequence of calls: a->b->c
> Call b->c is tail call.
>
> We need to formulate conditions of the call to be "suitable for tail
> call". On x86 it's the following:
> 1. The calling conventions of b and c are "compatible". There are
two
> scenarios: caller cleans the stack (C calling convention), callee cleans
> the stack (stdcall calling convention). The call is not suitable for
> tail call, if b and c cleans the stack differently. For example, let b
> has C CC and c - stdcall CC. In this case making tail call results to
> double cleaning (on in a and one in c), which is incorrect.i was thinking of just doing tailcallopt with one CC in the beginning
(~stdcall) but yes that is true> 2. The same situtation is for struct return functions: the caller pushes
> extra pointer to return value, but callee is required to clean this
> pointer out of the stack. So, call b->c is "suitable for tail
call" iff
> both b and c are struct return.
> 3. Various tricky cases with FP. I don't remeber correctly, but there
> can be some problems with functions returning FP values.
yes i will have to look a those> 4. PIC. %ebx should be live in order to make a call via PLT.
> Unfortunately, %ebx is callee-saved, that's why we can assemble tail
> calls only to functions within the same module (PLT is not required).
> 5. %ecx is livein for regparm(3) functions.i guess you mean functions which have their address loaded to a
register (call via a function pointer) by regparam
functions> ... (maybe something I forget)
... even more that i don't know :)>
> --
> WBR, Anton Korobeynikov
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Arnold Schwaighofer

2007-Aug-11 20:25 UTC

head link

[LLVMdev] Tail call optimization deeds

Okay so i implemented an(other :) initial version for X86-32 backend,  
this time based on TOT:

It is not very generic at the moment. Only functions with  
callingconv::fastcc and the tail call attribute will be optimized.  
Maybe the next step should be to integrate the code into the other  
calling convention lowering. Here is what i have at the moment:

If callingconv::fastcc is used the function will be custom lowered.  
(LowerX86_32FastCCCallTo and LowerX86_32FastCCArguments, the code is  
based on the std calling convention minus ecx as inreg)
The lowering code decides whether the function call really is  
eligible for tail call optimization tco (caller-callee agree, tail  
call opt is enabled, tail position, TODO: check for PIC code).
If it is okay to do tco it calculates the stack adjustment.
Repositions the return address,
Puts the arguments on the correct position relative to the (virtual)  
frame pointer.
Moves the function address to register ECX if the tailcallee is in a  
register. (TODO: set ECX as live out?)
TODO: check what to do with pic code
Adds register arguments (EAX, EDX) as live in if they are used for  
arguments.
Create a TAILCALL node (a pseudo instruction) holding the Callee  
(Register or TargetGlobalAddress/ExternalSymbol), the stack  
adjustment as a constant, and the live-in registers. The  
TargetLowering class has a IsLastCallTailCall state field which is  
set to true such that the LowerRET function nows it has to produce  
different code. The TAILCALL node is then returned.

If it is not okay the code produces a function call like the std  
calling convention (callee pops arguments on return).
There is one difference though: since tail call functions might  
arbitrarily change stack alignment (since they adjust the stack by  
the difference of the argument stack size) the stack adjustment (by  
that i mean the number of bytes a function requires as its stack  
size) i when i calculate that stack size/ stack adjustment i take  
care that the stack sizes are always multiples of the targets stack  
alignment. (this code at the moment can be turned off by an option  
eg.:if it's known that no dynamically linked functions are called).
I observed the behavior of dynamically linked functions requiring 16  
byte stack alignment on mac os x. This might also render it difficult  
to do general tail call optimization (for normal std call, c call  
functions it would only bem possible if the stack size difference is  
a multiple of the stack alignment). But maybe i am misinterpreting  
the situation. (TODO: look into that:)

The lowering of arguments is pretty similar to  std call.

The LowerRET function checks for the IsLastCallTailCall field and if  
its true does the following otherwise it does its usual duty.
It looks for the TAILCALL node and uses the operand holding the  
callee and the stack adjustment to build and return a TC_RETURN node  
with preceeding operands as its own operands.

The TAILCALL node is converted to a noop machine instruction.
Ditto the TC_RETURN node. (it declares the iscall=1, isbarrier=1,  
isret=1)

The machine instructions needed to adjust the stack and jump to the  
callee is added in the epilogue generating function in  
X86RegisterInfo. It uses the two operands of the TC_RETURN node for  
the required information.


So modifications are:
a LowerX86_32FastCCCallTo function
a LowerX86_32FastCCArguments function
change to LowerRET to generate the TC_RETURN node
some code at the end of X86RegisterInfo to do stack adjustment/ 
jumping to tailcallee (like the EH_RETURN code, thanks Anton for the  
pointers!)
small modifications to X86InstrInfo.td
(a yes and at the moment i remove generating TAILCALL nodes in the  
other functions - LowerCCCall and the like)

Another TODO: look a floating point code to see whether there are any  
kinks. Look at other calling conventions to see what is possible  
there. (see Anton's remarks)

I am away now for 2 weeks (isle of cres - camping). but after that i  
will be looking into further improving the code (and also look at the  
other architectures).
Or maybe i have answers telling me that i am going into a totally  
wrong direction , and i should let the experts do this stuff and stop  
nagging people :).

regards arnold

>  need to formulate conditions of the call to be "suitable for tail
> call". On x86 it's the following:
> 1. The calling conventions of b and c are "compatible". There are
two
> scenarios: caller cleans the stack (C calling convention), callee  
> cleans
> the stack (stdcall calling convention). The call is not suitable for
> tail call, if b and c cleans the stack differently. For example, let b
> has C CC and c - stdcall CC. In this case making tail call results to
> double cleaning (on in a and one in c), which is incorrect.
> 2. The same situtation is for struct return functions: the caller  
> pushes
> extra pointer to return value, but callee is required to clean this
> pointer out of the stack. So, call b->c is "suitable for tail
call"
> iff
> both b and c are struct return.
> 3. Various tricky cases with FP. I don't remeber correctly, but there
> can be some problems with functions returning FP values.
> 4. PIC. %ebx should be live in order to make a call via PLT.
> Unfortunately, %ebx is callee-saved, that's why we can assemble tail
> calls only to functions within the same module (PLT is not required).
> 5. %ecx is livein for regparm(3) functions.
> ... (maybe something I forget)
>
> --
> WBR, Anton Korobeynikov
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Evan Cheng

2007-Aug-13 21:12 UTC

head link

[LLVMdev] Tail call optimization deeds

Hi Arnold and Anton,

Sorry I have been ignoring your emails on this topic. It's an  
important task and I really need sometime to think about it (and talk  
to Chris about it!) But this has been an especially hectic week. I am  
also going to vacation soon so I am not sure when I would get around  
to it.

If Chris has time, I am sure he has lots to say on this topic. :-)  
Otherwise, please be patient. :-)

Evan

On Aug 11, 2007, at 1:25 PM, Arnold Schwaighofer wrote:
> Okay so i implemented an(other :) initial version for X86-32 backend,
> this time based on TOT:
>
> It is not very generic at the moment. Only functions with
> callingconv::fastcc and the tail call attribute will be optimized.
> Maybe the next step should be to integrate the code into the other
> calling convention lowering. Here is what i have at the moment:
>
> If callingconv::fastcc is used the function will be custom lowered.
> (LowerX86_32FastCCCallTo and LowerX86_32FastCCArguments, the code is
> based on the std calling convention minus ecx as inreg)
> The lowering code decides whether the function call really is
> eligible for tail call optimization tco (caller-callee agree, tail
> call opt is enabled, tail position, TODO: check for PIC code).
> If it is okay to do tco it calculates the stack adjustment.
> Repositions the return address,
> Puts the arguments on the correct position relative to the (virtual)
> frame pointer.
> Moves the function address to register ECX if the tailcallee is in a
> register. (TODO: set ECX as live out?)
> TODO: check what to do with pic code
> Adds register arguments (EAX, EDX) as live in if they are used for
> arguments.
> Create a TAILCALL node (a pseudo instruction) holding the Callee
> (Register or TargetGlobalAddress/ExternalSymbol), the stack
> adjustment as a constant, and the live-in registers. The
> TargetLowering class has a IsLastCallTailCall state field which is
> set to true such that the LowerRET function nows it has to produce
> different code. The TAILCALL node is then returned.
>
> If it is not okay the code produces a function call like the std
> calling convention (callee pops arguments on return).
> There is one difference though: since tail call functions might
> arbitrarily change stack alignment (since they adjust the stack by
> the difference of the argument stack size) the stack adjustment (by
> that i mean the number of bytes a function requires as its stack
> size) i when i calculate that stack size/ stack adjustment i take
> care that the stack sizes are always multiples of the targets stack
> alignment. (this code at the moment can be turned off by an option
> eg.:if it's known that no dynamically linked functions are called).
> I observed the behavior of dynamically linked functions requiring 16
> byte stack alignment on mac os x. This might also render it difficult
> to do general tail call optimization (for normal std call, c call
> functions it would only bem possible if the stack size difference is
> a multiple of the stack alignment). But maybe i am misinterpreting
> the situation. (TODO: look into that:)
>
> The lowering of arguments is pretty similar to  std call.
>
> The LowerRET function checks for the IsLastCallTailCall field and if
> its true does the following otherwise it does its usual duty.
> It looks for the TAILCALL node and uses the operand holding the
> callee and the stack adjustment to build and return a TC_RETURN node
> with preceeding operands as its own operands.
>
> The TAILCALL node is converted to a noop machine instruction.
> Ditto the TC_RETURN node. (it declares the iscall=1, isbarrier=1,
> isret=1)
>
> The machine instructions needed to adjust the stack and jump to the
> callee is added in the epilogue generating function in
> X86RegisterInfo. It uses the two operands of the TC_RETURN node for
> the required information.
>
>
> So modifications are:
> a LowerX86_32FastCCCallTo function
> a LowerX86_32FastCCArguments function
> change to LowerRET to generate the TC_RETURN node
> some code at the end of X86RegisterInfo to do stack adjustment/
> jumping to tailcallee (like the EH_RETURN code, thanks Anton for the
> pointers!)
> small modifications to X86InstrInfo.td
> (a yes and at the moment i remove generating TAILCALL nodes in the
> other functions - LowerCCCall and the like)
>
> Another TODO: look a floating point code to see whether there are any
> kinks. Look at other calling conventions to see what is possible
> there. (see Anton's remarks)
>
> I am away now for 2 weeks (isle of cres - camping). but after that i
> will be looking into further improving the code (and also look at the
> other architectures).
> Or maybe i have answers telling me that i am going into a totally
> wrong direction , and i should let the experts do this stuff and stop
> nagging people :).
>
> regards arnold
>
>
>> need to formulate conditions of the call to be "suitable for tail
>> call". On x86 it's the following:
>> 1. The calling conventions of b and c are "compatible". There
are two
>> scenarios: caller cleans the stack (C calling convention), callee
>> cleans
>> the stack (stdcall calling convention). The call is not suitable for
>> tail call, if b and c cleans the stack differently. For example,  
>> let b
>> has C CC and c - stdcall CC. In this case making tail call results to
>> double cleaning (on in a and one in c), which is incorrect.
>> 2. The same situtation is for struct return functions: the caller
>> pushes
>> extra pointer to return value, but callee is required to clean this
>> pointer out of the stack. So, call b->c is "suitable for tail
call"
>> iff
>> both b and c are struct return.
>> 3. Various tricky cases with FP. I don't remeber correctly, but
there
>> can be some problems with functions returning FP values.
>> 4. PIC. %ebx should be live in order to make a call via PLT.
>> Unfortunately, %ebx is callee-saved, that's why we can assemble
tail
>> calls only to functions within the same module (PLT is not required).
>> 5. %ecx is livein for regparm(3) functions.
>> ... (maybe something I forget)
>>
>> --
>> WBR, Anton Korobeynikov
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Evan Cheng

2007-Sep-05 22:57 UTC

head link

[LLVMdev] Tail call optimization deeds

Hi,

I finally had a little time to talk to Chris about tail call. But what  
I have is nothing concrete and also very vague.

We'd like to see tail call optimization to be similar to the target  
independent lowering of ISD::CALL nodes. These are auto-generated  
from ???CallingConv.td files. Some target specific details such as  
function address register (ECX in your example) should be coded in  
these files. Ditto for stack offsets, etc. These should be part of the  
fastcc calling convention. The tail call lowering will use these  
information and target hooks to perform tail call optimization on  
fastcc calls that are really tail calls.

I think it's reasonable to start by doing a (clean) target specific  
implementation for X86. Just understand we'd like to move to a target  
independent mechanism so the earlier implementation may not survive.

As for what you described here. I am having a hard time following  
it. :-) Please send patch.

Thanks,

Evan

On Aug 11, 2007, at 1:25 PM, Arnold Schwaighofer wrote:
> Okay so i implemented an(other :) initial version for X86-32 backend,
> this time based on TOT:
>
> It is not very generic at the moment. Only functions with
> callingconv::fastcc and the tail call attribute will be optimized.
> Maybe the next step should be to integrate the code into the other
> calling convention lowering. Here is what i have at the moment:
>
> If callingconv::fastcc is used the function will be custom lowered.
> (LowerX86_32FastCCCallTo and LowerX86_32FastCCArguments, the code is
> based on the std calling convention minus ecx as inreg)
> The lowering code decides whether the function call really is
> eligible for tail call optimization tco (caller-callee agree, tail
> call opt is enabled, tail position, TODO: check for PIC code).
> If it is okay to do tco it calculates the stack adjustment.
> Repositions the return address,
> Puts the arguments on the correct position relative to the (virtual)
> frame pointer.
> Moves the function address to register ECX if the tailcallee is in a
> register. (TODO: set ECX as live out?)
> TODO: check what to do with pic code
> Adds register arguments (EAX, EDX) as live in if they are used for
> arguments.
> Create a TAILCALL node (a pseudo instruction) holding the Callee
> (Register or TargetGlobalAddress/ExternalSymbol), the stack
> adjustment as a constant, and the live-in registers. The
> TargetLowering class has a IsLastCallTailCall state field which is
> set to true such that the LowerRET function nows it has to produce
> different code. The TAILCALL node is then returned.
>
> If it is not okay the code produces a function call like the std
> calling convention (callee pops arguments on return).
> There is one difference though: since tail call functions might
> arbitrarily change stack alignment (since they adjust the stack by
> the difference of the argument stack size) the stack adjustment (by
> that i mean the number of bytes a function requires as its stack
> size) i when i calculate that stack size/ stack adjustment i take
> care that the stack sizes are always multiples of the targets stack
> alignment. (this code at the moment can be turned off by an option
> eg.:if it's known that no dynamically linked functions are called).
> I observed the behavior of dynamically linked functions requiring 16
> byte stack alignment on mac os x. This might also render it difficult
> to do general tail call optimization (for normal std call, c call
> functions it would only bem possible if the stack size difference is
> a multiple of the stack alignment). But maybe i am misinterpreting
> the situation. (TODO: look into that:)
>
> The lowering of arguments is pretty similar to  std call.
>
> The LowerRET function checks for the IsLastCallTailCall field and if
> its true does the following otherwise it does its usual duty.
> It looks for the TAILCALL node and uses the operand holding the
> callee and the stack adjustment to build and return a TC_RETURN node
> with preceeding operands as its own operands.
>
> The TAILCALL node is converted to a noop machine instruction.
> Ditto the TC_RETURN node. (it declares the iscall=1, isbarrier=1,
> isret=1)
>
> The machine instructions needed to adjust the stack and jump to the
> callee is added in the epilogue generating function in
> X86RegisterInfo. It uses the two operands of the TC_RETURN node for
> the required information.
>
>
> So modifications are:
> a LowerX86_32FastCCCallTo function
> a LowerX86_32FastCCArguments function
> change to LowerRET to generate the TC_RETURN node
> some code at the end of X86RegisterInfo to do stack adjustment/
> jumping to tailcallee (like the EH_RETURN code, thanks Anton for the
> pointers!)
> small modifications to X86InstrInfo.td
> (a yes and at the moment i remove generating TAILCALL nodes in the
> other functions - LowerCCCall and the like)
>
> Another TODO: look a floating point code to see whether there are any
> kinks. Look at other calling conventions to see what is possible
> there. (see Anton's remarks)
>
> I am away now for 2 weeks (isle of cres - camping). but after that i
> will be looking into further improving the code (and also look at the
> other architectures).
> Or maybe i have answers telling me that i am going into a totally
> wrong direction , and i should let the experts do this stuff and stop
> nagging people :).
>
> regards arnold
>
>
>> need to formulate conditions of the call to be "suitable for tail
>> call". On x86 it's the following:
>> 1. The calling conventions of b and c are "compatible". There
are two
>> scenarios: caller cleans the stack (C calling convention), callee
>> cleans
>> the stack (stdcall calling convention). The call is not suitable for
>> tail call, if b and c cleans the stack differently. For example,  
>> let b
>> has C CC and c - stdcall CC. In this case making tail call results to
>> double cleaning (on in a and one in c), which is incorrect.
>> 2. The same situtation is for struct return functions: the caller
>> pushes
>> extra pointer to return value, but callee is required to clean this
>> pointer out of the stack. So, call b->c is "suitable for tail
call"
>> iff
>> both b and c are struct return.
>> 3. Various tricky cases with FP. I don't remeber correctly, but
there
>> can be some problems with functions returning FP values.
>> 4. PIC. %ebx should be live in order to make a call via PLT.
>> Unfortunately, %ebx is callee-saved, that's why we can assemble
tail
>> calls only to functions within the same module (PLT is not required).
>> 5. %ecx is livein for regparm(3) functions.
>> ... (maybe something I forget)
>>
>> --
>> WBR, Anton Korobeynikov
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Aug 2007 - [LLVMdev] Tail call optimization thoughts

[LLVMdev] Tail call optimization thoughts

[LLVMdev] Tail call optimization thoughts

[LLVMdev] Tail call optimization deeds

[LLVMdev] Tail call optimization deeds

[LLVMdev] Tail call optimization deeds

Reasonably Related Threads