Hi Evan, first off thanks to you and Chris for taking time. On 6 Sep 2007, at 00:57, Evan Cheng wrote:> We'd like to see tail call optimization to be similar to the target > independent lowering of ISD::CALL nodes. These are auto-generated > from ???CallingConv.td files. Some target specific details such as > function address register (ECX in your example) should be coded in > these files. Ditto for stack offsets, etc. These should be part of the > fastcc calling convention. The tail call lowering will use these > information and target hooks to perform tail call optimization on > fastcc calls that are really tail calls. > > I think it's reasonable to start by doing a (clean) target specific > implementation for X86. Just understand we'd like to move to a target > independent mechanism so the earlier implementation may not survive. >Good that roughly sounds like what i did. (except probably for the clean bit :)> As for what you described here. I am having a hard time following > it. :-) Please send patch. >Okay here is a patch containing the tail call optimization code for X86. It mostly does what i have described in preceeding emails: There is new code for lowering fastcc calls: LowerX86_32FastCCCallTo LowerX86_32FastCCArguments There is some code checking whether a TAIL CALL really is eligible for tail call optimization. I modified: LowerRET to create a TC_RETURN node. (used for adjusting stackpointer and jumping to function in epilogue, similar to EH_RETURN) There is a new calling convention: CC_X86_32_TailCall which is ~ CC_X86_32_C minus ECX. There are some modifications in X86InstrInfo.td There are new X86 DAG nodes: TRUETAILCALL TC_RETURN There is some voodoo in X86RegisterInfo.cpp to deal with the RETADDR problem described in X86ISelLowering.cpp comments. (callee has more args than caller -> there needs to be space between RETADDR stack slot and ebp/spilled registers for safely moving the RETADDR to) any comments, critique, suggestions welcome. regards arnold -------------- next part -------------- A non-text attachment was scrubbed... Name: tailcall10.patch Type: application/octet-stream Size: 43991 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070906/270e6d7a/attachment.obj>
Hi Arnold, Thanks for the patch. Some questions and commons: 1. Have you test it against the llvm test suite? Does it work if fp elimination optimization is turned off? 2. Please follow llvm coding convention and make sure every line fits in 80 columns. 3. enum NameDecorationStyle { None, StdCall, - FastCall + FastCall, + FastCC // the normal fastcc calling convention }; Why is FastCC necessary? Can't you just use FastCall? 4. def X86tailcall: SDNode<"X86ISD::TAILCALL", SDT_X86Call, [SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>; +def X86truetailcall: SDNode<"X86ISD::TRUETAILCALL", SDT_X86Call, + [SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>; + Please use X86tailcall. It's not currently used so feel free to change its patterns, etc. 5. +// the following two instructions are used to adjust the stack pointer +// in the case where the callee has more arguments than the caller +// an area is created where the return addr can be safely moved to +let isConvertibleToThreeAddress = 1 in { // Can transform into LEA. +def TCADD32ri : Ii32<0x81, MRM0r, (outs GR32:$dst), (ins GR32: $src1, i32imm:$src2), + "add{l}\t{$src2, $dst|$dst, $src2}", + []>; +} + +def TCSUB32ri : Ii32<0x81, MRM5r, (outs GR32:$dst), (ins GR32: $src1, i32imm:$src2), + "sub{l}\t{$src2, $dst|$dst, $src2}", + []>; + +//let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in +//def TCRETURNmi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm: $offset), +// "#TC_RETURN $dst $offset", +// []>; + Why are these needed? They don't look any different from normal add and subtraction instructions. Why not just use ADJCALLSTACKDOWN and ADJCALLSTACKUP? 6. + } else if (RetOpcode == X86::TCRETURNdi) { + // a tailcall adjust the stack ... + } else if (RetOpcode == X86::TCRETURNri) { + MBBI = prior(MBB.end()); Seems like there are quite a bit of duplicate code between the 2 cases. Please refactor. 7. X86TargetLowering::X86TargetLowering(TargetMachine &TM) : TargetLowering(TM) { + IsLastCallTailCall = false; This is probably not a good idea. You are assuming nodes are lowered in certain order, that is dangerous. It's probably better to check whether the last call is a tail call on the fly as you are processing the return node. 8. - - SDOperand Chain = Op.getOperand(0); - SDOperand Flag; - - // Copy the result values into the output registers. - if (RVLocs.size() != 1 || !RVLocs[0].isRegLoc() || - RVLocs[0].getLocReg() != X86::ST0) { - for (unsigned i = 0; i != RVLocs.size(); ++i) { - CCValAssign &VA = RVLocs[i]; - assert(VA.isRegLoc() && "Can only return in registers!"); - Chain = DAG.getCopyToReg(Chain, VA.getLocReg(), Op.getOperand (i*2+1), - Flag); - Flag = Chain.getValue(1); - } + if (IsLastCallTailCall) { + IsLastCallTailCall = false; + SDOperand TailCall = GetTailCall(Op); + SDOperand TargetAddress = TailCall.getOperand(1); + SDOperand StackAdjustment = TailCall.getOperand(2); + assert ( ((TargetAddress.getOpcode() == ISD::Register && + cast<RegisterSDNode>(TargetAddress)->getReg() == X86::ECX) || + TargetAddress.getOpcode() == ISD::TargetExternalSymbol || + TargetAddress.getOpcode() == ISD::TargetGlobalAddress) && + "Expecting an global address, external symbol, or register"); + assert( StackAdjustment.getOpcode() == ISD::Constant && "Expecting a const value"); + // TODO: should we add flag from tail call as last operand? + return DAG.getNode(X86ISD::TC_RETURN, MVT::Other, Op.getOperand (0), TargetAddress, StackAdjustment); } else { Since if (IsLastCallTailCall) { ... } always ends with an early exist, you can just place the block before if (RVLocs.size() ...) and there is no need to change indentation for the rest of the function. 9. +// Check to see whether the next instruction following the call is a return +// A function is eligable if caller/callee calling conventions match and the +// function CALL is immediatly followed by a RET +bool X86TargetLowering::IsEligibleForTailCallElimination(SDOperand Call, SelectionDAG& DAG, unsigned CalleeCC, SDOperand Callee) { + bool IsEligible = false; + SDNode * CallNode = Call.Val; ... +SDOperand X86TargetLowering::LowerX86_32FastCCCallTo(SDOperand Op, + SelectionDAG &DAG, + unsigned CC) { + DOUT << "LowerX86_32FastCCCallTo\n"; + SDOperand Chain = Op.getOperand(0); + bool isVarArg = cast<ConstantSDNode>(Op.getOperand(2))- >getValue() != 0; + bool isTailCall = cast<ConstantSDNode>(Op.getOperand(3))- >getValue() != 0; + SDOperand Callee = Op.getOperand(4); + //unsigned NumOps = (Op.getNumOperands() - 5) / 2; + + // Analyze operands of the call, assigning locations to each operand. + SmallVector<CCValAssign, 16> ArgLocs; + CCState CCInfo(CC, isVarArg, getTargetMachine(), ArgLocs); + CCInfo.AnalyzeCallOperands(Op.Val, CC_X86_32_TailCall); + if (isTailCall && + IsEligibleForTailCallElimination(Op, DAG,CC, Callee) && + PerformTailCallOpt) { IsEligibleForTailCallElimination() should be a target hook. This way TargetLowering::LowerCallTo() can determine whether a call is eligible for tail call elimination and insert the current ISD::CALL node. X86TargetLowering::LowerX86_32FastCCCallTo() will not have to handle non-tail calls. Thanks, Evan On Sep 6, 2007, at 7:20 AM, Arnold Schwaighofer wrote:> Hi Evan, > > first off thanks to you and Chris for taking time. > On 6 Sep 2007, at 00:57, Evan Cheng wrote: > >> We'd like to see tail call optimization to be similar to the target >> independent lowering of ISD::CALL nodes. These are auto-generated >> from ???CallingConv.td files. Some target specific details such as >> function address register (ECX in your example) should be coded in >> these files. Ditto for stack offsets, etc. These should be part of >> the >> fastcc calling convention. The tail call lowering will use these >> information and target hooks to perform tail call optimization on >> fastcc calls that are really tail calls. >> >> I think it's reasonable to start by doing a (clean) target specific >> implementation for X86. Just understand we'd like to move to a target >> independent mechanism so the earlier implementation may not survive. >> > Good that roughly sounds like what i did. > (except probably for the clean bit :) > > >> As for what you described here. I am having a hard time following >> it. :-) Please send patch. >> > > Okay here is a patch containing the tail call optimization code > for X86. It mostly does what i have described in preceeding > emails: > > There is new code for lowering fastcc calls: > LowerX86_32FastCCCallTo > LowerX86_32FastCCArguments > > There is some code checking whether a TAIL CALL really is > eligible for tail call optimization. > > I modified: > LowerRET > to create a TC_RETURN node. (used for adjusting stackpointer and > jumping to function in epilogue, similar to EH_RETURN) > > There is a new calling convention: > CC_X86_32_TailCall > which is ~ CC_X86_32_C minus ECX. > > There are some modifications in > X86InstrInfo.td > > There are new X86 DAG nodes: > TRUETAILCALL > TC_RETURN > > > There is some voodoo in X86RegisterInfo.cpp to deal with the > RETADDR problem described in X86ISelLowering.cpp comments. > (callee has more args than caller -> there needs to be space > between RETADDR stack slot and ebp/spilled registers for safely > moving the RETADDR to) > > any comments, critique, suggestions welcome. > regards arnold > > <tailcall10.patch> > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Begin forwarded message:> From: Evan Cheng <evan.cheng at apple.com> > Date: 11 September 2007 19:26:39 GMT+02:00 > To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> > Subject: Re: [LLVMdev] RFC: Tail call optimization X86 > Reply-To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> > > Hi Arnold, > > Thanks for the patch. Some questions and commons: > > 1. Have you test it against the llvm test suite?No not yet.> Does it work if fp > elimination optimization is turned off?For my test cases - yes.> 2. Please follow llvm coding convention and make sure every line fits > in 80 columns.Okay dokey. i was looking through the files manually - must have missed some - sorry.> 3. > enum NameDecorationStyle { > None, > StdCall, > - FastCall > + FastCall, > + FastCC // the normal fastcc calling convention > }; > > Why is FastCC necessary? Can't you just use FastCall?FastCall is used to indicate a function has x86_FastCall semantics if i am not mistaken. I needed to differentiate between a normal fastcc and the x86_fastcall semantics in an older version of my code. I no longer depend on that so it can be removed as you suggest. sorry for the code corpse :)> 4. > def X86tailcall: SDNode<"X86ISD::TAILCALL", SDT_X86Call, > [SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>; > +def X86truetailcall: SDNode<"X86ISD::TRUETAILCALL", SDT_X86Call, > + [SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>; > + > > Please use X86tailcall. It's not currently used so feel free to > change its patterns, etc. >Okay dokey.> 5. > +// the following two instructions are used to adjust the stack > pointer > +// in the case where the callee has more arguments than the caller > +// an area is created where the return addr can be safely moved to > +let isConvertibleToThreeAddress = 1 in { // Can transform into LEA. > +def TCADD32ri : Ii32<0x81, MRM0r, (outs GR32:$dst), (ins GR32: > $src1, i32imm:$src2), > + "add{l}\t{$src2, $dst|$dst, $src2}", > + []>; > +} > + > +def TCSUB32ri : Ii32<0x81, MRM5r, (outs GR32:$dst), (ins GR32: > $src1, i32imm:$src2), > + "sub{l}\t{$src2, $dst|$dst, $src2}", > + []>; > +In an intermediate implementation (that did not work anyway) i need to differentiate between the normal ADD32ri/SUB32ri that were in the code and the stack adjustments i added. I no longer need them so i will remove them. again a code corpse sorry for that.> +//let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in > +//def TCRETURNmi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm: > $offset), > +// "#TC_RETURN $dst $offset", > +// []>; > +This node holds the stack adjustment delta and the jmp destination of the call. The information is used in epilogue generation.> Why are these needed? They don't look any different from normal add > and subtraction instructions. Why not just use ADJCALLSTACKDOWN and > ADJCALLSTACKUP? >> 6. > + } else if (RetOpcode == X86::TCRETURNdi) { > + // a tailcall adjust the stack > ... > + } else if (RetOpcode == X86::TCRETURNri) { > + MBBI = prior(MBB.end()); > > Seems like there are quite a bit of duplicate code between the 2 > cases. Please refactor.Okay dokey> 7. > X86TargetLowering::X86TargetLowering(TargetMachine &TM) > : TargetLowering(TM) { > + IsLastCallTailCall = false; > > This is probably not a good idea. You are assuming nodes are lowered > in certain order, that is dangerous. It's probably better to check > whether the last call is a tail call on the fly as you are processing > the return node.Will do so.> 8. > - > - SDOperand Chain = Op.getOperand(0); > - SDOperand Flag; > - > - // Copy the result values into the output registers. > - if (RVLocs.size() != 1 || !RVLocs[0].isRegLoc() || > - RVLocs[0].getLocReg() != X86::ST0) { > - for (unsigned i = 0; i != RVLocs.size(); ++i) { > - CCValAssign &VA = RVLocs[i]; > - assert(VA.isRegLoc() && "Can only return in registers!"); > - Chain = DAG.getCopyToReg(Chain, VA.getLocReg(), Op.getOperand > (i*2+1), > - Flag); > - Flag = Chain.getValue(1); > - } > + if (IsLastCallTailCall) { > + IsLastCallTailCall = false; > + SDOperand TailCall = GetTailCall(Op); > + SDOperand TargetAddress = TailCall.getOperand(1); > + SDOperand StackAdjustment = TailCall.getOperand(2); > + assert ( ((TargetAddress.getOpcode() == ISD::Register && > + cast<RegisterSDNode>(TargetAddress)->getReg() => X86::ECX) || > + TargetAddress.getOpcode() == > ISD::TargetExternalSymbol || > + TargetAddress.getOpcode() == > ISD::TargetGlobalAddress) && > + "Expecting an global address, external symbol, or > register"); > + assert( StackAdjustment.getOpcode() == ISD::Constant && > "Expecting a const value"); > + // TODO: should we add flag from tail call as last operand? > + return DAG.getNode(X86ISD::TC_RETURN, MVT::Other, Op.getOperand > (0), TargetAddress, StackAdjustment); > } else { > > Since if (IsLastCallTailCall) { ... } always ends with an early > exist, you can just place the block before if (RVLocs.size() ...) and > there is no need to change indentation for the rest of the function. >Okay> 9. > +// Check to see whether the next instruction following the call is a > return > +// A function is eligable if caller/callee calling conventions match > and the > +// function CALL is immediatly followed by a RET > +bool X86TargetLowering::IsEligibleForTailCallElimination(SDOperand > Call, SelectionDAG& DAG, unsigned CalleeCC, SDOperand Callee) { > + bool IsEligible = false; > + SDNode * CallNode = Call.Val; > ... > > +SDOperand X86TargetLowering::LowerX86_32FastCCCallTo(SDOperand Op, > + SelectionDAG > &DAG, > + unsigned CC) { > + DOUT << "LowerX86_32FastCCCallTo\n"; > + SDOperand Chain = Op.getOperand(0); > + bool isVarArg = cast<ConstantSDNode>(Op.getOperand(2))- >> getValue() != 0; > + bool isTailCall = cast<ConstantSDNode>(Op.getOperand(3))- >> getValue() != 0; > + SDOperand Callee = Op.getOperand(4); > + //unsigned NumOps = (Op.getNumOperands() - 5) / 2; > + > + // Analyze operands of the call, assigning locations to each > operand. > + SmallVector<CCValAssign, 16> ArgLocs; > + CCState CCInfo(CC, isVarArg, getTargetMachine(), ArgLocs); > + CCInfo.AnalyzeCallOperands(Op.Val, CC_X86_32_TailCall); > + if (isTailCall && > + IsEligibleForTailCallElimination(Op, DAG,CC, Callee) && > + PerformTailCallOpt) { > > > IsEligibleForTailCallElimination() should be a target hook. This way > TargetLowering::LowerCallTo() can determine whether a call is > eligible for tail call elimination and insert the current ISD::CALL > node.Okay> X86TargetLowering::LowerX86_32FastCCCallTo() will not have to > handle non-tail calls. >I will then add code to LowerCCCCallTo() to check whether to use the normal calling convention CC_X86_32_C (3 registers free for argument passing) or the tail call (fastcc) calling convention CC_X86_32_TailCall (2 register free). Expect to see a new patch after i have done the testing with the whole test suite. regards arnold