On 24 Sep 2007, at 09:18, Evan Cheng wrote:> +; RUN: llvm-as < %s | llc -march=x86 -mattr=+sse2 -stats -info- > output-file - | grep asm-printer | grep 9 > +; change preceeding line form ... | grep 8 to ..| grep 9 since > +; with new fastcc has std call semantics causing a stack adjustment > +; after the function call > > Not sure if I understand this. Can you illustrate with an example?Sure The code generated used to be _array: subl $12, %esp movss LCPI1_0, %xmm0 mulss 16(%esp), %xmm0 movss %xmm0, (%esp) call L_qux$stub mulss LCPI1_0, %xmm0 addl $12, %esp ret FastCC use to be caller pops arguments so there was no stack adjustment after the call to qux. Now FastCC has callee pops arguments on return semantics so the x86 backend inserts a stack adjustment after the call. _array: subl $12, %esp movss LCPI1_0, %xmm0 mulss 16(%esp), %xmm0 movss %xmm0, (%esp) call L_qux$stub subl $4, %esp << stack adjustment because qux pops arguments on return mulss LCPI1_0, %xmm0 addl $12, %esp ret $4> Index: include/llvm/Target/TargetLowering.h > ==================================================================> --- include/llvm/Target/TargetLowering.h (revision 42247) > +++ include/llvm/Target/TargetLowering.h (working copy) > @@ -851,8 +851,18 @@ > virtual std::pair<SDOperand, SDOperand> > LowerCallTo(SDOperand Chain, const Type *RetTy, bool > RetTyIsSigned, > bool isVarArg, unsigned CallingConv, bool isTailCall, > - SDOperand Callee, ArgListTy &Args, SelectionDAG &DAG); > + bool isNextInstRet, SDOperand Callee, ArgListTy &Args, > + SelectionDAG &DAG); > + // IsEligibleForTailCallElimination - Check whether the call is > eligible for > + // tailcall elimination > + virtual bool IsEligibleForTailCallElimination(SelectionDAG& DAG, > + bool IsNextInstRet, > + SDOperand Callee, > + unsigned CalleeCC) > const { > + return false; > + } > + > > I am not too keen on "isNextInstRet". We should never use instruction > ordering before scheduling to determine whether it's possible to > perform an optimization. IsEligibleForTailCallElimination() should > determine the feasibility on its own, no?My assumption: Tail call optimization is performed if the instruction following the call is a return and the return uses the result of the call or is a void return. The problem is that at the point (in TargetLowering:LowerCallTo) where IsEligibleForTailCallElimination is called the SDOperand node for the following instructions (we are looking for a return) has not been build. So IsEligibleForTailCallElimination can't determine - using 'Callee' - whether the instruction following the call is a return. That is the reason why i pass the IsNextInstRet flag to TargetLowering::LowerCallTo. TargetLowering::LowerCallTo(SDOperand Chain, const Type *RetTy, bool RetTyIsSigned, bool isVarArg, unsigned CallingConv, bool isTailCall, bool isNextInstRet, SDOperand Callee, ArgListTy &Args, SelectionDAG &DAG) { SmallVector<SDOperand, 32> Ops; Ops.push_back(Chain); // Op#0 - Chain Ops.push_back(DAG.getConstant(CallingConv, getPointerTy())); // Op#1 - CC Ops.push_back(DAG.getConstant(isVarArg, getPointerTy())); // Op#2 - VarArg if (isTailCall && IsEligibleForTailCallElimination(DAG, isNextInstRet, Callee, CallingConv)) Ops.push_back(DAG.getConstant(true, getPointerTy())); // Op#3 - Tail The instructions i am refering here to are LLVM IR instructions not target machine instructions. I could remove isNextInstRet from IsEligibleForTailCallElimination because in TargetLowering::LowerCallTo it is clear that we can't do tail call optimization if isNextInstRet (this time coming from SelectionDAGLowering::LowerCallTo) is false. if (isTailCall && isNextInstRet && IsEligibleForTailCallElimination(DAG,, Callee, CallingConv)) Note that I interpreted the following paragraph of you:> IsEligibleForTailCallElimination() should be a target hook. This way > TargetLowering::LowerCallTo() can determine whether a call is > eligible for tail call elimination and insert the current ISD::CALL > node. X86TargetLowering::LowerX86_32FastCCCallTo() will not have to > handle non-tail calls.in the following way: I check in TargetLowering:LowerCallTo whether the call IsEligibleForTailCallElimination and insert a ISD:CALL node with the tailcall attribute set or not. Then in X86TargetLowering::LowerCALL i can dispatch the right lowering code:> SDOperand X86TargetLowering::LowerCALL(SDOperand Op, SelectionDAG > &DAG) { > unsigned CallingConv = cast<ConstantSDNode>(Op.getOperand(1))- > >getValue(); > bool isTailCall = cast<ConstantSDNode>(Op.getOperand(3))->getValue > () != 0; > > case CallingConv::Fast: > if (isTailCall && PerformTailCallOpt) > return LowerX86_TailCallTo(Op, DAG, CallingConv); > else > return LowerCCCCallTo(Op,DAG, CallingConv);> Some stylistic nitpicks. Please write the comments as: > /// IsNextInstructionReturn - Check whether..Will do.> + assert(BI != BB->end() && "Woohooa"); > Better assertion messages please. :-) > > Why not write it like this: >okay> Also, shouldn't this function be "static"?okay> Please fix the inconsistency: "TAILCALL" vs. "TAIL CALL". >okay> +SDOperand GetPossiblePreceedingTailCall(SDOperand Chain) { > > This should be "static"?okay> + > + Chain = DAG.getNode( X86ISD::CALL, > NodeTys, &Ops[0], Ops.size()); > > Stylistic nitpick: please merge 2 lines and remove the space after > '('.okay> - Chain = DAG.getNode(X86ISD::FP_SET_RESULT, Tys, Ops, 2); > - Flag = Chain.getValue(1); > + Chain = DAG.getNode(X86ISD::FP_SET_RESULT, Tys, Ops, 2); > + Flag = Chain.getValue(1); > > What happened here?Ups corrected.> > +// Check to see whether the next instruction following the call is a > return > +// A function is eligible if caller/callee calling conventions > match, currently > +// only fastcc supports tail calls, and the function CALL is > immediatly followed > +// by a RET > +bool X86TargetLowering::IsEligibleForTailCallElimination > (SelectionDAG& DAG, > + bool > IsNextInstRet, > + SDOperand > Callee, > + unsigned > CalleeCC) > + const { > > Having "const {" on a separate line looks funky. :-) >will change ;)> Please remove code that's commented out.> if( G ==> if (G :-)> No need to set IsEligible to false since it's initialized to false. >;) okay dokey regards
On Sep 24, 2007, at 2:25 AM, Arnold Schwaighofer wrote:> > On 24 Sep 2007, at 09:18, Evan Cheng wrote: >> +; RUN: llvm-as < %s | llc -march=x86 -mattr=+sse2 -stats -info- >> output-file - | grep asm-printer | grep 9 >> +; change preceeding line form ... | grep 8 to ..| grep 9 since >> +; with new fastcc has std call semantics causing a stack adjustment >> +; after the function call >> >> Not sure if I understand this. Can you illustrate with an example? > > Sure > > The code generated used to be > _array: > subl $12, %esp > movss LCPI1_0, %xmm0 > mulss 16(%esp), %xmm0 > movss %xmm0, (%esp) > call L_qux$stub > mulss LCPI1_0, %xmm0 > addl $12, %esp > ret > > FastCC use to be caller pops arguments so there was no stack > adjustment after the > call to qux. Now FastCC has callee pops arguments on return semantics > so the > x86 backend inserts a stack adjustment after the call. > > _array: > subl $12, %esp > movss LCPI1_0, %xmm0 > mulss 16(%esp), %xmm0 > movss %xmm0, (%esp) > call L_qux$stub > subl $4, %esp << stack adjustment because qux pops > arguments on return > mulss LCPI1_0, %xmm0 > addl $12, %esp > ret $4Ok. Please make sure for now this only takes effect if tail call optimization is turned on though. Also, we need to fold the subl into addl if possible.> > >> Index: include/llvm/Target/TargetLowering.h >> ==================================================================>> --- include/llvm/Target/TargetLowering.h (revision 42247) >> +++ include/llvm/Target/TargetLowering.h (working copy) >> @@ -851,8 +851,18 @@ >> virtual std::pair<SDOperand, SDOperand> >> LowerCallTo(SDOperand Chain, const Type *RetTy, bool >> RetTyIsSigned, >> bool isVarArg, unsigned CallingConv, bool isTailCall, >> - SDOperand Callee, ArgListTy &Args, SelectionDAG &DAG); >> + bool isNextInstRet, SDOperand Callee, ArgListTy &Args, >> + SelectionDAG &DAG); >> + // IsEligibleForTailCallElimination - Check whether the call is >> eligible for >> + // tailcall elimination >> + virtual bool IsEligibleForTailCallElimination(SelectionDAG& DAG, >> + bool IsNextInstRet, >> + SDOperand Callee, >> + unsigned CalleeCC) >> const { >> + return false; >> + } >> + >> >> I am not too keen on "isNextInstRet". We should never use instruction >> ordering before scheduling to determine whether it's possible to >> perform an optimization. IsEligibleForTailCallElimination() should >> determine the feasibility on its own, no? > > My assumption: > Tail call optimization is performed if the instruction following the > call is > a return and the return uses the result of the call or is a void > return.Ok. But using whether the next instruction is a return at this time is too restrictive. It's possible there may be instructions that are between the call and the return that do not interfere with tail call optimization.> > > The problem is that at the point (in TargetLowering:LowerCallTo) where > IsEligibleForTailCallElimination is called the SDOperand node for the > following instructions (we are looking for a return) has not been > build. > So IsEligibleForTailCallElimination can't determine - using > 'Callee' - > whether the instruction following the call is a return. That is the > reason > why i pass the IsNextInstRet flag to TargetLowering::LowerCallTo.Right. But I think the right way to deal with this is to assume all the tail calls are eligible for tail call optimizations at this point. Then checks for eligibility and fix the call nodes after all the instructions are lowered. Evan> > > TargetLowering::LowerCallTo(SDOperand Chain, const Type *RetTy, > bool RetTyIsSigned, bool isVarArg, > unsigned CallingConv, bool isTailCall, > bool isNextInstRet, SDOperand Callee, > ArgListTy &Args, SelectionDAG &DAG) { > SmallVector<SDOperand, 32> Ops; > Ops.push_back(Chain); // Op#0 - Chain > Ops.push_back(DAG.getConstant(CallingConv, getPointerTy())); // > Op#1 - CC > Ops.push_back(DAG.getConstant(isVarArg, getPointerTy())); // > Op#2 - VarArg > if (isTailCall && > IsEligibleForTailCallElimination(DAG, isNextInstRet, Callee, > CallingConv)) > Ops.push_back(DAG.getConstant(true, getPointerTy())); // Op#3 - > Tail > > The instructions i am refering here to are LLVM IR instructions not > target machine instructions. I could remove isNextInstRet from > IsEligibleForTailCallElimination because in > TargetLowering::LowerCallTo > it is clear that we can't do tail call optimization if isNextInstRet > (this time coming from > SelectionDAGLowering::LowerCallTo) is false. > > if (isTailCall && isNextInstRet && > IsEligibleForTailCallElimination(DAG,, Callee, CallingConv)) > > Note that I interpreted the following paragraph of you: >> IsEligibleForTailCallElimination() should be a target hook. This way >> TargetLowering::LowerCallTo() can determine whether a call is >> eligible for tail call elimination and insert the current ISD::CALL >> node. X86TargetLowering::LowerX86_32FastCCCallTo() will not have to >> handle non-tail calls. > > in the following way: > I check in TargetLowering:LowerCallTo whether the call > IsEligibleForTailCallElimination > and insert a ISD:CALL node with the tailcall attribute set or not. > > Then in X86TargetLowering::LowerCALL i can dispatch the right > lowering code: > >> SDOperand X86TargetLowering::LowerCALL(SDOperand Op, SelectionDAG >> &DAG) { >> unsigned CallingConv = cast<ConstantSDNode>(Op.getOperand(1))- >>> getValue(); >> bool isTailCall = cast<ConstantSDNode>(Op.getOperand(3))->getValue >> () != 0; >> >> case CallingConv::Fast: >> if (isTailCall && PerformTailCallOpt) >> return LowerX86_TailCallTo(Op, DAG, CallingConv); >> else >> return LowerCCCCallTo(Op,DAG, CallingConv); > > >> Some stylistic nitpicks. Please write the comments as: >> /// IsNextInstructionReturn - Check whether.. > Will do. >> + assert(BI != BB->end() && "Woohooa"); >> Better assertion messages please. :-) >> >> Why not write it like this: >> > okay >> Also, shouldn't this function be "static"? > okay >> Please fix the inconsistency: "TAILCALL" vs. "TAIL CALL". >> > okay >> +SDOperand GetPossiblePreceedingTailCall(SDOperand Chain) { >> >> This should be "static"? > > okay >> + >> + Chain = DAG.getNode( X86ISD::CALL, >> NodeTys, &Ops[0], Ops.size()); >> >> Stylistic nitpick: please merge 2 lines and remove the space after >> '('. > okay >> - Chain = DAG.getNode(X86ISD::FP_SET_RESULT, Tys, Ops, 2); >> - Flag = Chain.getValue(1); >> + Chain = DAG.getNode(X86ISD::FP_SET_RESULT, Tys, Ops, 2); >> + Flag = Chain.getValue(1); >> >> What happened here? > Ups corrected. >> >> +// Check to see whether the next instruction following the call is a >> return >> +// A function is eligible if caller/callee calling conventions >> match, currently >> +// only fastcc supports tail calls, and the function CALL is >> immediatly followed >> +// by a RET >> +bool X86TargetLowering::IsEligibleForTailCallElimination >> (SelectionDAG& DAG, >> + bool >> IsNextInstRet, >> + SDOperand >> Callee, >> + unsigned >> CalleeCC) >> + const { >> >> Having "const {" on a separate line looks funky. :-) >> > will change ;) >> Please remove code that's commented out. > >> if( G ==> if (G :-) > >> No need to set IsEligible to false since it's initialized to false. >> > ;) okay dokey > > regards > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > FastCC use to be caller pops arguments so there was no stack > > adjustment after the > > call to qux. Now FastCC has callee pops arguments on return semantics > > so the > > x86 backend inserts a stack adjustment after the call. > > > > _array: > > subl $12, %esp > > movss LCPI1_0, %xmm0 > > mulss 16(%esp), %xmm0 > > movss %xmm0, (%esp) > > call L_qux$stub > > subl $4, %esp << stack adjustment because qux pops > > arguments on return > > mulss LCPI1_0, %xmm0 > > addl $12, %esp > > ret $4 > > Ok. Please make sure for now this only takes effect if tail call > optimization is turned on though. Also, we need to fold the subl into > addl if possible.I think you misunderstood me. I have not added code to insert this subl it is done already by the backend (before my changes) with certain calling conventions (callee pops arguments). There used to be already calling conventions e.g. x86_fastcall that would have cause the stack adjustment only fastcc was not one of them. Now that fastcc can cause tail call optimization i had to change the convention from caller pops arguments to callee pops arguments in order to allow tail call optimization in a general way.> > > > > > >> Index: include/llvm/Target/TargetLowering.h > >> ==================================================================> >> --- include/llvm/Target/TargetLowering.h (revision 42247) > >> +++ include/llvm/Target/TargetLowering.h (working copy) > >> @@ -851,8 +851,18 @@ > >> virtual std::pair<SDOperand, SDOperand> > >> LowerCallTo(SDOperand Chain, const Type *RetTy, bool > >> RetTyIsSigned, > >> bool isVarArg, unsigned CallingConv, bool isTailCall, > >> - SDOperand Callee, ArgListTy &Args, SelectionDAG &DAG); > >> + bool isNextInstRet, SDOperand Callee, ArgListTy &Args, > >> + SelectionDAG &DAG); > >> + // IsEligibleForTailCallElimination - Check whether the call is > >> eligible for > >> + // tailcall elimination > >> + virtual bool IsEligibleForTailCallElimination(SelectionDAG& DAG, > >> + bool IsNextInstRet, > >> + SDOperand Callee, > >> + unsigned CalleeCC) > >> const { > >> + return false; > >> + } > >> + > >> > >> I am not too keen on "isNextInstRet". We should never use instruction > >> ordering before scheduling to determine whether it's possible to > >> perform an optimization. IsEligibleForTailCallElimination() should > >> determine the feasibility on its own, no? > > > > My assumption: > > Tail call optimization is performed if the instruction following the > > call is > > a return and the return uses the result of the call or is a void > > return. > > Ok. But using whether the next instruction is a return at this time is > too restrictive. It's possible there may be instructions that are > between the call and the return that do not interfere with tail call > optimization. > > > > > > > The problem is that at the point (in TargetLowering:LowerCallTo) where > > IsEligibleForTailCallElimination is called the SDOperand node for the > > following instructions (we are looking for a return) has not been > > build. > > So IsEligibleForTailCallElimination can't determine - using > > 'Callee' - > > whether the instruction following the call is a return. That is the > > reason > > why i pass the IsNextInstRet flag to TargetLowering::LowerCallTo. > > Right. But I think the right way to deal with this is to assume all > the tail calls are eligible for tail call optimizations at this point. > Then checks for eligibility and fix the call nodes after all the > instructions are lowered.I am not sure that i understand where this point would be. As I view it the only place would be in LegalizeDAG.cpp in SDOperand SelectionDAGLegalize::LegalizeOp(SDOperand Op) { ... case ISD::CALL: // The only option for this is to custom lower it. Tmp3 = TLI.LowerOperation(Result.getValue(0), DAG); } would change to (pseudo code) SDOperand SelectionDAGLegalize::LegalizeOp(SDOperand Op) { SDOperand Result = Op; ... case ISD::CALL: // The only option for this is to custom lower it. Result = modifyCallSoThatTailAttributeReallyIndicatesWhetherThisCallIsEligibleForTailCallOptimization(Result); Tmp3 = TLI.LowerOperation(Result.getValue(0), DAG); } because any point after that (e.g. scheduling) would be to late because the target specific loweringcode (e.g. LowerCALL -> LowerCCCall)