thr3ads.net - llvm dev - [LLVMdev] RFC: Tail call optimization X86 [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Arnold Schwaighofer

2007-Sep-24 09:25 UTC

[LLVMdev] RFC: Tail call optimization X86

On 24 Sep 2007, at 09:18, Evan Cheng wrote:> +; RUN: llvm-as < %s | llc -march=x86 -mattr=+sse2 -stats -info-
> output-file - | grep asm-printer | grep 9
> +; change preceeding line form ... | grep 8 to ..| grep 9 since
> +; with new fastcc has std call semantics causing a stack adjustment
> +; after the function call
>
> Not sure if I understand this. Can you illustrate with an example?
Sure

The code generated used to be
_array:
         subl    $12, %esp
         movss   LCPI1_0, %xmm0
         mulss   16(%esp), %xmm0
         movss   %xmm0, (%esp)
         call    L_qux$stub
         mulss   LCPI1_0, %xmm0
         addl    $12, %esp
         ret

FastCC use to be caller pops arguments so there was no stack  
adjustment after the
call to qux. Now FastCC has callee pops arguments on return semantics  
so the
x86 backend inserts a stack adjustment after the call.

_array:
  	subl    $12, %esp
         movss   LCPI1_0, %xmm0
         mulss   16(%esp), %xmm0
         movss   %xmm0, (%esp)
         call    L_qux$stub
         subl    $4, %esp   << stack adjustment because qux pops  
arguments on return
         mulss   LCPI1_0, %xmm0
         addl    $12, %esp
         ret     $4
> Index: include/llvm/Target/TargetLowering.h
> ==================================================================> ---
include/llvm/Target/TargetLowering.h	(revision 42247)
> +++ include/llvm/Target/TargetLowering.h	(working copy)
> @@ -851,8 +851,18 @@
>     virtual std::pair<SDOperand, SDOperand>
>     LowerCallTo(SDOperand Chain, const Type *RetTy, bool  
> RetTyIsSigned,
>                 bool isVarArg, unsigned CallingConv, bool isTailCall,
> -              SDOperand Callee, ArgListTy &Args, SelectionDAG
&DAG);
> +              bool isNextInstRet, SDOperand Callee, ArgListTy &Args,
> +              SelectionDAG &DAG);
> +  // IsEligibleForTailCallElimination - Check whether the call is
> eligible for
> +  // tailcall elimination
> +  virtual bool IsEligibleForTailCallElimination(SelectionDAG& DAG,
> +                                                bool IsNextInstRet,
> +                                                SDOperand Callee,
> +                                                unsigned CalleeCC)
> const {
> +    return false;
> +  }
> +
>
> I am not too keen on "isNextInstRet". We should never use
instruction
> ordering before scheduling to determine whether it's possible to
> perform an optimization. IsEligibleForTailCallElimination() should
> determine the feasibility on its own, no?
My assumption:
Tail call optimization is performed if the instruction following the  
call is
a return and the return uses the result of the call or is a void return.

The problem is that at the point (in TargetLowering:LowerCallTo) where
IsEligibleForTailCallElimination is called the SDOperand node for the
following instructions (we are looking for a return) has not been build.
So IsEligibleForTailCallElimination can't determine - using 'Callee'
-
whether the instruction following the call is a return. That is the  
reason
why i pass  the IsNextInstRet flag to TargetLowering::LowerCallTo.

TargetLowering::LowerCallTo(SDOperand Chain, const Type *RetTy,
                             bool RetTyIsSigned, bool isVarArg,
                             unsigned CallingConv, bool isTailCall,
                             bool isNextInstRet, SDOperand Callee,
                             ArgListTy &Args, SelectionDAG &DAG) {
   SmallVector<SDOperand, 32> Ops;
   Ops.push_back(Chain);   // Op#0 - Chain
   Ops.push_back(DAG.getConstant(CallingConv, getPointerTy())); //  
Op#1 - CC
   Ops.push_back(DAG.getConstant(isVarArg, getPointerTy()));    //  
Op#2 - VarArg
   if (isTailCall &&
       IsEligibleForTailCallElimination(DAG, isNextInstRet, Callee,  
CallingConv))
     Ops.push_back(DAG.getConstant(true, getPointerTy())); // Op#3 -  
Tail

The instructions i am refering here to are LLVM IR instructions not
target machine instructions. I could remove isNextInstRet from
IsEligibleForTailCallElimination because in TargetLowering::LowerCallTo
it is clear that we can't do tail call optimization if isNextInstRet  
(this time coming from
SelectionDAGLowering::LowerCallTo) is false.

if (isTailCall && isNextInstRet &&
       IsEligibleForTailCallElimination(DAG,, Callee, CallingConv))

Note that I interpreted the following paragraph of you:> IsEligibleForTailCallElimination() should be a target hook. This way
> TargetLowering::LowerCallTo() can determine whether a call is
> eligible for tail call elimination and insert the current ISD::CALL
> node.  X86TargetLowering::LowerX86_32FastCCCallTo() will not have to
> handle non-tail calls.
in the following way:
I check in TargetLowering:LowerCallTo whether the call  
IsEligibleForTailCallElimination
and insert a ISD:CALL node with the tailcall attribute set or not.

Then in X86TargetLowering::LowerCALL i can dispatch the right  
lowering code:
> SDOperand X86TargetLowering::LowerCALL(SDOperand Op, SelectionDAG  
> &DAG) {
>   unsigned CallingConv = cast<ConstantSDNode>(Op.getOperand(1))- 
> >getValue();
>   bool isTailCall =
cast<ConstantSDNode>(Op.getOperand(3))->getValue
> () != 0;
>
>   case CallingConv::Fast:
>       if (isTailCall && PerformTailCallOpt)
>         return LowerX86_TailCallTo(Op, DAG, CallingConv);
>       else
>         return LowerCCCCallTo(Op,DAG, CallingConv);
> Some stylistic nitpicks. Please write the comments as:
> /// IsNextInstructionReturn - Check whether..
Will do.> +  assert(BI != BB->end() && "Woohooa");
> Better assertion messages please. :-)
>
> Why not write it like this:
>
okay> Also, shouldn't this function be "static"?
okay> Please fix the inconsistency: "TAILCALL" vs. "TAIL
CALL".
>
okay> +SDOperand GetPossiblePreceedingTailCall(SDOperand Chain) {
>
> This should be "static"?
okay> +
> +  Chain = DAG.getNode( X86ISD::CALL,
>                         NodeTys, &Ops[0], Ops.size());
>
> Stylistic nitpick: please merge 2 lines and remove the space after  
> '('.
okay> -    Chain = DAG.getNode(X86ISD::FP_SET_RESULT, Tys, Ops, 2);
> -    Flag = Chain.getValue(1);
> +      Chain = DAG.getNode(X86ISD::FP_SET_RESULT, Tys, Ops, 2);
> +      Flag = Chain.getValue(1);
>
> What happened here?
Ups corrected.>
> +// Check to see whether the next instruction following the call is a
> return
> +// A function is eligible if caller/callee calling conventions
> match, currently
> +// only fastcc supports tail calls, and the function CALL is
> immediatly followed
> +// by a RET
> +bool X86TargetLowering::IsEligibleForTailCallElimination
> (SelectionDAG& DAG,
> +                                                         bool
> IsNextInstRet,
> +                                                         SDOperand
> Callee,
> +                                                         unsigned
> CalleeCC)
> +  const {
>
> Having "const {" on a separate line looks funky. :-)
>
will change ;)> Please remove code that's commented out.
> if( G  ==> if (G  :-)
> No need to set IsEligible to false since it's initialized to false.
>;) okay dokey

regards

Evan Cheng

2007-Sep-24 21:37 UTC

head link

[LLVMdev] RFC: Tail call optimization X86

On Sep 24, 2007, at 2:25 AM, Arnold Schwaighofer wrote:
>
> On 24 Sep 2007, at 09:18, Evan Cheng wrote:
>> +; RUN: llvm-as < %s | llc -march=x86 -mattr=+sse2 -stats -info-
>> output-file - | grep asm-printer | grep 9
>> +; change preceeding line form ... | grep 8 to ..| grep 9 since
>> +; with new fastcc has std call semantics causing a stack adjustment
>> +; after the function call
>>
>> Not sure if I understand this. Can you illustrate with an example?
>
> Sure
>
> The code generated used to be
> _array:
>         subl    $12, %esp
>         movss   LCPI1_0, %xmm0
>         mulss   16(%esp), %xmm0
>         movss   %xmm0, (%esp)
>         call    L_qux$stub
>         mulss   LCPI1_0, %xmm0
>         addl    $12, %esp
>         ret
>
> FastCC use to be caller pops arguments so there was no stack
> adjustment after the
> call to qux. Now FastCC has callee pops arguments on return semantics
> so the
> x86 backend inserts a stack adjustment after the call.
>
> _array:
>  	subl    $12, %esp
>         movss   LCPI1_0, %xmm0
>         mulss   16(%esp), %xmm0
>         movss   %xmm0, (%esp)
>         call    L_qux$stub
>         subl    $4, %esp   << stack adjustment because qux pops
> arguments on return
>         mulss   LCPI1_0, %xmm0
>         addl    $12, %esp
>         ret     $4
Ok. Please make sure for now this only takes effect if tail call  
optimization is  turned on though. Also, we need to fold the subl into  
addl if possible.
>
>
>> Index: include/llvm/Target/TargetLowering.h
>>
==================================================================>> ---
include/llvm/Target/TargetLowering.h	(revision 42247)
>> +++ include/llvm/Target/TargetLowering.h	(working copy)
>> @@ -851,8 +851,18 @@
>>    virtual std::pair<SDOperand, SDOperand>
>>    LowerCallTo(SDOperand Chain, const Type *RetTy, bool
>> RetTyIsSigned,
>>                bool isVarArg, unsigned CallingConv, bool isTailCall,
>> -              SDOperand Callee, ArgListTy &Args, SelectionDAG
&DAG);
>> +              bool isNextInstRet, SDOperand Callee, ArgListTy
&Args,
>> +              SelectionDAG &DAG);
>> +  // IsEligibleForTailCallElimination - Check whether the call is
>> eligible for
>> +  // tailcall elimination
>> +  virtual bool IsEligibleForTailCallElimination(SelectionDAG& DAG,
>> +                                                bool IsNextInstRet,
>> +                                                SDOperand Callee,
>> +                                                unsigned CalleeCC)
>> const {
>> +    return false;
>> +  }
>> +
>>
>> I am not too keen on "isNextInstRet". We should never use
instruction
>> ordering before scheduling to determine whether it's possible to
>> perform an optimization. IsEligibleForTailCallElimination() should
>> determine the feasibility on its own, no?
>
> My assumption:
> Tail call optimization is performed if the instruction following the
> call is
> a return and the return uses the result of the call or is a void  
> return.
Ok. But using whether the next instruction is a return at this time is  
too restrictive. It's possible there may be instructions that are  
between the call and the return that do not interfere with tail call  
optimization.
>
>
> The problem is that at the point (in TargetLowering:LowerCallTo) where
> IsEligibleForTailCallElimination is called the SDOperand node for the
> following instructions (we are looking for a return) has not been  
> build.
> So IsEligibleForTailCallElimination can't determine - using  
> 'Callee'  -
> whether the instruction following the call is a return. That is the
> reason
> why i pass  the IsNextInstRet flag to TargetLowering::LowerCallTo.
Right. But I think the right way to deal with this is to assume all  
the tail calls are eligible for tail call optimizations at this point.  
Then checks for eligibility and fix the call nodes after all the  
instructions are lowered.

Evan
>
>
> TargetLowering::LowerCallTo(SDOperand Chain, const Type *RetTy,
>                             bool RetTyIsSigned, bool isVarArg,
>                             unsigned CallingConv, bool isTailCall,
>                             bool isNextInstRet, SDOperand Callee,
>                             ArgListTy &Args, SelectionDAG &DAG) {
>   SmallVector<SDOperand, 32> Ops;
>   Ops.push_back(Chain);   // Op#0 - Chain
>   Ops.push_back(DAG.getConstant(CallingConv, getPointerTy())); //
> Op#1 - CC
>   Ops.push_back(DAG.getConstant(isVarArg, getPointerTy()));    //
> Op#2 - VarArg
>   if (isTailCall &&
>       IsEligibleForTailCallElimination(DAG, isNextInstRet, Callee,
> CallingConv))
>     Ops.push_back(DAG.getConstant(true, getPointerTy())); // Op#3 -
> Tail
>
> The instructions i am refering here to are LLVM IR instructions not
> target machine instructions. I could remove isNextInstRet from
> IsEligibleForTailCallElimination because in  
> TargetLowering::LowerCallTo
> it is clear that we can't do tail call optimization if isNextInstRet
> (this time coming from
> SelectionDAGLowering::LowerCallTo) is false.
>
> if (isTailCall && isNextInstRet &&
>       IsEligibleForTailCallElimination(DAG,, Callee, CallingConv))
>
> Note that I interpreted the following paragraph of you:
>> IsEligibleForTailCallElimination() should be a target hook. This way
>> TargetLowering::LowerCallTo() can determine whether a call is
>> eligible for tail call elimination and insert the current ISD::CALL
>> node.  X86TargetLowering::LowerX86_32FastCCCallTo() will not have to
>> handle non-tail calls.
>
> in the following way:
> I check in TargetLowering:LowerCallTo whether the call
> IsEligibleForTailCallElimination
> and insert a ISD:CALL node with the tailcall attribute set or not.
>
> Then in X86TargetLowering::LowerCALL i can dispatch the right
> lowering code:
>
>> SDOperand X86TargetLowering::LowerCALL(SDOperand Op, SelectionDAG
>> &DAG) {
>>  unsigned CallingConv = cast<ConstantSDNode>(Op.getOperand(1))-
>>> getValue();
>>  bool isTailCall =
cast<ConstantSDNode>(Op.getOperand(3))->getValue
>> () != 0;
>>
>>  case CallingConv::Fast:
>>      if (isTailCall && PerformTailCallOpt)
>>        return LowerX86_TailCallTo(Op, DAG, CallingConv);
>>      else
>>        return LowerCCCCallTo(Op,DAG, CallingConv);
>
>
>> Some stylistic nitpicks. Please write the comments as:
>> /// IsNextInstructionReturn - Check whether..
> Will do.
>> +  assert(BI != BB->end() && "Woohooa");
>> Better assertion messages please. :-)
>>
>> Why not write it like this:
>>
> okay
>> Also, shouldn't this function be "static"?
> okay
>> Please fix the inconsistency: "TAILCALL" vs. "TAIL
CALL".
>>
> okay
>> +SDOperand GetPossiblePreceedingTailCall(SDOperand Chain) {
>>
>> This should be "static"?
>
> okay
>> +
>> +  Chain = DAG.getNode( X86ISD::CALL,
>>                        NodeTys, &Ops[0], Ops.size());
>>
>> Stylistic nitpick: please merge 2 lines and remove the space after
>> '('.
> okay
>> -    Chain = DAG.getNode(X86ISD::FP_SET_RESULT, Tys, Ops, 2);
>> -    Flag = Chain.getValue(1);
>> +      Chain = DAG.getNode(X86ISD::FP_SET_RESULT, Tys, Ops, 2);
>> +      Flag = Chain.getValue(1);
>>
>> What happened here?
> Ups corrected.
>>
>> +// Check to see whether the next instruction following the call is a
>> return
>> +// A function is eligible if caller/callee calling conventions
>> match, currently
>> +// only fastcc supports tail calls, and the function CALL is
>> immediatly followed
>> +// by a RET
>> +bool X86TargetLowering::IsEligibleForTailCallElimination
>> (SelectionDAG& DAG,
>> +                                                         bool
>> IsNextInstRet,
>> +                                                         SDOperand
>> Callee,
>> +                                                         unsigned
>> CalleeCC)
>> +  const {
>>
>> Having "const {" on a separate line looks funky. :-)
>>
> will change ;)
>> Please remove code that's commented out.
>
>> if( G  ==> if (G  :-)
>
>> No need to set IsEligible to false since it's initialized to false.
>>
> ;) okay dokey
>
> regards
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Arnold Schwaighofer

2007-Sep-25 07:01 UTC

head link

[LLVMdev] RFC: Tail call optimization X86

> > FastCC use to be caller pops arguments so there was no stack
> > adjustment after the
> > call to qux. Now FastCC has callee pops arguments on return semantics
> > so the
> > x86 backend inserts a stack adjustment after the call.
> >
> > _array:
> >       subl    $12, %esp
> >         movss   LCPI1_0, %xmm0
> >         mulss   16(%esp), %xmm0
> >         movss   %xmm0, (%esp)
> >         call    L_qux$stub
> >         subl    $4, %esp   << stack adjustment because qux pops
> > arguments on return
> >         mulss   LCPI1_0, %xmm0
> >         addl    $12, %esp
> >         ret     $4
>
> Ok. Please make sure for now this only takes effect if tail call
> optimization is  turned on though. Also, we need to fold the subl into
> addl if possible.I think you misunderstood me. I have not added code to insert this
subl it is done already by the backend (before my changes) with
certain calling conventions (callee pops arguments). There used to be
already calling conventions e.g. x86_fastcall that would have cause
the stack adjustment only fastcc was not one of them. Now that fastcc
can cause tail call optimization i had to change the convention from
caller pops arguments to callee pops arguments in order to allow tail
call optimization in a general way.>
> >
> >
> >> Index: include/llvm/Target/TargetLowering.h
> >>
==================================================================> >>
--- include/llvm/Target/TargetLowering.h     (revision 42247)
> >> +++ include/llvm/Target/TargetLowering.h     (working copy)
> >> @@ -851,8 +851,18 @@
> >>    virtual std::pair<SDOperand, SDOperand>
> >>    LowerCallTo(SDOperand Chain, const Type *RetTy, bool
> >> RetTyIsSigned,
> >>                bool isVarArg, unsigned CallingConv, bool
isTailCall,
> >> -              SDOperand Callee, ArgListTy &Args, SelectionDAG
&DAG);
> >> +              bool isNextInstRet, SDOperand Callee, ArgListTy
&Args,
> >> +              SelectionDAG &DAG);
> >> +  // IsEligibleForTailCallElimination - Check whether the call is
> >> eligible for
> >> +  // tailcall elimination
> >> +  virtual bool IsEligibleForTailCallElimination(SelectionDAG&
DAG,
> >> +                                                bool
IsNextInstRet,
> >> +                                                SDOperand Callee,
> >> +                                                unsigned
CalleeCC)
> >> const {
> >> +    return false;
> >> +  }
> >> +
> >>
> >> I am not too keen on "isNextInstRet". We should never
use instruction
> >> ordering before scheduling to determine whether it's possible
to
> >> perform an optimization. IsEligibleForTailCallElimination() should
> >> determine the feasibility on its own, no?
> >
> > My assumption:
> > Tail call optimization is performed if the instruction following the
> > call is
> > a return and the return uses the result of the call or is a void
> > return.
>
> Ok. But using whether the next instruction is a return at this time is
> too restrictive. It's possible there may be instructions that are
> between the call and the return that do not interfere with tail call
> optimization.
>
> >
> >
> > The problem is that at the point (in TargetLowering:LowerCallTo) where
> > IsEligibleForTailCallElimination is called the SDOperand node for the
> > following instructions (we are looking for a return) has not been
> > build.
> > So IsEligibleForTailCallElimination can't determine - using
> > 'Callee'  -
> > whether the instruction following the call is a return. That is the
> > reason
> > why i pass  the IsNextInstRet flag to TargetLowering::LowerCallTo.
>
> Right. But I think the right way to deal with this is to assume all
> the tail calls are eligible for tail call optimizations at this point.
> Then checks for eligibility and fix the call nodes after all the
> instructions are lowered.I am not sure that i understand where this point would be. As I view
it the only place would be in LegalizeDAG.cpp in

SDOperand SelectionDAGLegalize::LegalizeOp(SDOperand Op) {
...
 case ISD::CALL:
    // The only option for this is to custom lower it.
    Tmp3 = TLI.LowerOperation(Result.getValue(0), DAG);

}

would change to (pseudo code)
SDOperand SelectionDAGLegalize::LegalizeOp(SDOperand Op) {
 SDOperand Result = Op;
...
 case ISD::CALL:
    // The only option for this is to custom lower it.

    Result =
modifyCallSoThatTailAttributeReallyIndicatesWhetherThisCallIsEligibleForTailCallOptimization(Result);

    Tmp3 = TLI.LowerOperation(Result.getValue(0), DAG);

}

because any point after that (e.g. scheduling) would be to late
because the target specific loweringcode (e.g. LowerCALL ->
LowerCCCall)

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Sep 2007 - [LLVMdev] RFC: Tail call optimization X86

[LLVMdev] RFC: Tail call optimization X86

[LLVMdev] RFC: Tail call optimization X86

[LLVMdev] RFC: Tail call optimization X86

Reasonably Related Threads