Comments: CheckDAGForTailCallsAndFixThem - 1. for (SelectionDAG::allnodes_iterator BE = DAG.allnodes_begin(), + BI = prior(DAG.allnodes_end()); BI != BE; BI--) { Please use pre-decrement instead of post-decrement. 2. The function is slower than it should be. You are scanning all the nodes in the DAG twice. You should just examine DAG.getRoot() to make determine whether it's a return or not. @@ -4618,6 +4662,12 @@ // Lower the terminator after the copies are emitted. SDL.visit(*LLVMBB->getTerminator()); + // check whether calls in this block are real tail calls fix up CALL nodes + // with correct tailcall attribute so that the target can rely on the tailcall + // attribute indicating whether the call is really eligible for tail call + // optimization + CheckDAGForTailCallsAndFixThem(DAG, TLI); + check -> Check to start a sentence. :-) + // Skip the RETADDR move area + X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>(); + int32_t TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta(); Why using int32_t instead of int in some of the places? Nothing "wrong" with it, just inconsistent. + // If there is an SUB32ri of ESP immediately before this instruction, + // merge the two. This can be the case when tail call elimination is + // enabled and the callee has more arguments then the caller + if (MBBI != MBB.begin()) { + MachineBasicBlock::iterator PI = prior(MBBI); + unsigned Opc = PI->getOpcode(); + if ((Opc == X86::SUB64ri32 || Opc == X86::SUB64ri8 || + Opc == X86::SUB32ri || Opc == X86::SUB32ri8) && Weird indentation? +/// CheckAndDeletePreceedingADD - looks at instruction before the passed +/// instruction if it is an ADD instruction it is deleted and the size is +/// returned +static int32_t CheckAndDeletePreceedingADD(MachineBasicBlock::iterator MBBI, + MachineBasicBlock &MBB, + unsigned StackPtr) { + int32_t Offset = 0; + if (MBBI != MBB.begin()) { + MachineBasicBlock::iterator PI = prior(MBBI); + unsigned Opc = PI->getOpcode(); + if ((Opc == X86::ADD64ri32 || Opc == X86::ADD64ri8 || + Opc == X86::ADD32ri || Opc == X86::ADD32ri8) && + PI->getOperand(0).getReg() == StackPtr){ + Offset += PI->getOperand(2).getImm(); + MBB.erase(PI); + } + } + return Offset; +} + Seems like the previous chunk can be changed to use this function with some refactoring? + +static cl::opt<bool> + PerformTailCallOpt("tail-call-opt", cl::Hidden, + cl::desc("Turn on tail call optimization")); Perhaps it's better to move this out of x86. Eventually we want to do tail call optimization for more targets. See include/llvm/Target/ TargetOptions.h, lib/Target/TargetMachine.cpp Also, moving the option there will allow us to change fastcc ABI (callee popping arguments) only when this option is on. See Chris' email: > Sure it can be, you can set up custom predicates, for example the > X86CallingConv.td file has: > > class CCIfSubtarget<string F, CCAction A> > : CCIf<! strconcat("State.getTarget().getSubtarget<X86Subtarget>().", F), A>; > > It would be straight-forward to have a CCIf defined to check some command > line argument. +/// IsEligibleForTailCallElimination - Check to see whether the next instruction +// following the call is a return. A function is eligible if caller/ callee +// calling conventions match, currently only fastcc supports tail calls, and the +// function CALL is immediatly followed by a RET +bool X86TargetLowering::IsEligibleForTailCallOptimization(SDOperand Call, /// vs. // + // copy from stack slots to stack slot of a tail called function. this needs + // to be done because if we would lower the arguments directly to their real + // stack slot we might end up overwriting each other + // TODO: to make this more efficient (sometimes saving a store/ load) we could + // analyse the arguments and emit this store/load/store sequence only for + // arguments which would be overwritten otherwise Please capitalize "copy", "this", etc. Also please end sentences with proper punctuation. :-) Overall, a very good patch. Please also incorporate the minor changes mentions in other emails (LLCBETAOPTION, -tail-call-opt-align-stack). I think it'll be ready for svn comit after one more iteration. :-) Evan On Oct 2, 2007, at 2:27 AM, Arnold Schwaighofer wrote:> Hi all, > > I changed the code that checks whether a tail call is really > eligible for optimization so that it performs the check/fix in > SelectionDAGISel.cpp:BuildSelectionDAG() as suggest by Evan. Also > eliminated an error that caused the remaining failing test cases in > the test-suite. > > The results look very nice (on darwin x86, r42486). > The same number (46) of failing test cases on patched version and > vanilla version. LCCBETA was enabled on both. I changed the LCCBETA > option in Makefile.programs in the patched version to include the > tail-call-opt flags so that the optimization code gets exercised. > > ifeq ($(ARCH),x86) > LLCBETAOPTION := -regalloc=local -fast -tail-call-opt -tail-call-opt- > align-stack > > On 26 Sep 2007, at 02:26, Chris Lattner wrote: >> I think enabling this as llcbeta for a few nights makes >> sense before turning it on by default. >> >> -Chris > > What does turning on by default mean? Does that mean it is shown in > llc -help or that it is performed every time (without requesting via > the command line) when compiling? > > I would not do the latter since tail call optimization clears the > stack frame of the caller (sequence), possibly causing confusion > when debugging also resulting code is probably not faster (since > arguments are lowered to regular stack slot -where they would be if > the call was a normal call - first and then moved to the real stack > slot - onto the callers parameters. As noted in the source file this > might be optimized in many cases by looking whether the actual > parameters come from the caller function parameters and would be > overwritten if they are not moved. In cases where they would not be > overwritten they could be directly lowered to the real stack slot). > > regards arnold > > > <tailcall-r42525- > src.patch>_______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi Evan, I incoporated the changes you request but to the following i have got a question:> Also, moving the option > there will allow us to change fastcc ABI (callee popping arguments) > only when this option is on. See Chris' email:I am not to sure on that. because that would make modules compiled with the flag on incompatible with ones compiled without the flag off as stack behaviour would mismatch. It would be no problem to make the behaviour dependent on the -tail- call-opt flag. i am not sure that this is a good idea? pseudocode: module 1: with -tailcallopt enabled fastcc int callee(int arg1, int arg2) { ... -> onreturn: pops 8 byte } module 2: no -tailcallopt fastcc int caller() { int x= call fastcc callee(); //!! caller pops the arguments => stack mismatch callee pops the arguments but caller also wants to pop the arguments of the stack Apparently i forgot to send the answer email to chris reponse. sorry for that.> >> Hmmm. Ok. So this is due to X86CallingConv.td changes? Unfortunately >> that's not controlled by options. Ok then. >> > > Sure it can be, you can set up custom predicates, for example the > X86CallingConv.td file has: > > class CCIfSubtarget<string F, CCAction A> > : CCIf<!strconcat("State.getTarget().getSubtarget<X86Subtarget> > ().", F), A>; > > It would be straight-forward to have a CCIf defined to check some > command > line argument. I think enabling this as llcbeta for a few nights > makes > sense before turning it on by default.No not directly. The code related to "caller/callee cleans arguments off the stack" is not controlled by the .td. It's controlled in code by the operands of CALLSEQ_END. for example in SDOperand X86TargetLowering::LowerCCCCallTo: ... if (CC == CallingConv::X86_StdCall || CC == CallingConv::Fast) { if (isVarArg) NumBytesForCalleeToPush = isSRet ? 4 : 0; else NumBytesForCalleeToPush = NumBytes; assert(!(isVarArg && CC==CallingConv::Fast) && "CallingConv::Fast does not support varargs."); } else { // If this is is a call to a struct-return function, the callee // pops the hidden struct pointer, so we have to push it back. // This is common for Darwin/X86, Linux & Mingw32 targets. NumBytesForCalleeToPush = isSRet ? 4 : 0; } NodeTys = DAG.getVTList(MVT::Other, MVT::Flag); Ops.clear(); Ops.push_back(Chain); Ops.push_back(DAG.getConstant(NumBytes, getPointerTy())); Ops.push_back(DAG.getConstant(NumBytesForCalleeToPush, getPointerTy ())); Ops.push_back(InFlag); Chain = DAG.getNode(ISD::CALLSEQ_END, NodeTys, &Ops[0], Ops.size()); InFlag = Chain.getValue(1); The third operand is the number of bytes the callee pops of the stack on return (on x86). This gets lowered to a ADJCALLSTACKUP pseudo machineinstruction. Later when X86RegisterInfo::eliminateCallFramePseudoInstr is called and framepointerelimination is enabled the following code gets called: ... else if (I->getOpcode() == X86::ADJCALLSTACKUP) { // If we are performing frame pointer elimination and if the callee pops // something off the stack pointer, add it back. We do this until we have // more advanced stack pointer tracking ability. if (uint64_t CalleeAmt = I->getOperand(1).getImm()) { unsigned Opc = (CalleeAmt < 128) ? (Is64Bit ? X86::SUB64ri8 : X86::SUB32ri8) : (Is64Bit ? X86::SUB64ri32 : X86::SUB32ri); MachineInstr *New BuildMI(TII.get(Opc), StackPtr).addReg(StackPtr).addImm (CalleeAmt); MBB.insert(I, New); } } I am not sure about a command line switch would toggling the stack adjusting behaviour of a function. Because if the function is called from a different module which was compiled with the opposite command line switch all hell would break loose (because it assumes callee pops arguments when it does not).
On Oct 5, 2007, at 2:42 AM, Arnold Schwaighofer wrote:> Hi Evan, > I incoporated the changes you request but to the following i have got > a question: > >> Also, moving the option >> there will allow us to change fastcc ABI (callee popping arguments) >> only when this option is on. See Chris' email: > > I am not to sure on that. because that would make modules compiled > with the flag on incompatible with ones compiled without the flag off > as stack behaviour would mismatch. > It would be no problem to make the behaviour dependent on the -tail- > call-opt flag. i am not sure that this is a good idea?In theory, any function can be marked fastcc. But llvm-gcc c / c++ frontend does *not* mark any external functions fastcc. Also, the optimizer only changes the calling convention of internal functions to fastcc. We can set a policy of treating fastcc external functions as c functions. Then there is no chance of introducing ABI incompatibility.> > pseudocode: > > module 1: with -tailcallopt enabled > fastcc int callee(int arg1, int arg2) { > ... > -> onreturn: pops 8 byte > } > > module 2: no -tailcallopt > fastcc int caller() { > int x= call fastcc callee(); > //!! caller pops the arguments => stack mismatch > > callee pops the arguments but caller also wants to pop the arguments > of the stack > > Apparently i forgot to send the answer email to chris reponse. sorry > for that. >> >>> Hmmm. Ok. So this is due to X86CallingConv.td changes? Unfortunately >>> that's not controlled by options. Ok then. >>> >> >> Sure it can be, you can set up custom predicates, for example the >> X86CallingConv.td file has: >> >> class CCIfSubtarget<string F, CCAction A> >> : CCIf<!strconcat("State.getTarget().getSubtarget<X86Subtarget> >> ().", F), A>; >> >> It would be straight-forward to have a CCIf defined to check some >> command >> line argument. I think enabling this as llcbeta for a few nights >> makes >> sense before turning it on by default. > No not directly. The code related to "caller/callee cleans arguments > off the stack" is not controlled by the .td. It's controlled in code > by the operands of CALLSEQ_END.Ok, that's even easier to have it be controlled by the command line option. Evan> > for example in SDOperand X86TargetLowering::LowerCCCCallTo: > ... > if (CC == CallingConv::X86_StdCall || CC == CallingConv::Fast) { > if (isVarArg) > NumBytesForCalleeToPush = isSRet ? 4 : 0; > else > NumBytesForCalleeToPush = NumBytes; > assert(!(isVarArg && CC==CallingConv::Fast) && > "CallingConv::Fast does not support varargs."); > } else { > // If this is is a call to a struct-return function, the callee > // pops the hidden struct pointer, so we have to push it back. > // This is common for Darwin/X86, Linux & Mingw32 targets. > NumBytesForCalleeToPush = isSRet ? 4 : 0; > } > > NodeTys = DAG.getVTList(MVT::Other, MVT::Flag); > Ops.clear(); > Ops.push_back(Chain); > Ops.push_back(DAG.getConstant(NumBytes, getPointerTy())); > Ops.push_back(DAG.getConstant(NumBytesForCalleeToPush, getPointerTy > ())); > Ops.push_back(InFlag); > Chain = DAG.getNode(ISD::CALLSEQ_END, NodeTys, &Ops[0], Ops.size > ()); > InFlag = Chain.getValue(1); > > The third operand is the number of bytes the callee pops of the > stack on return (on x86). This gets lowered to a ADJCALLSTACKUP > pseudo machineinstruction. > > Later when X86RegisterInfo::eliminateCallFramePseudoInstr is called > and framepointerelimination is enabled the following code gets called: > ... > else if (I->getOpcode() == X86::ADJCALLSTACKUP) { > // If we are performing frame pointer elimination and if the > callee pops > // something off the stack pointer, add it back. We do this > until we have > // more advanced stack pointer tracking ability. > if (uint64_t CalleeAmt = I->getOperand(1).getImm()) { > unsigned Opc = (CalleeAmt < 128) ? > (Is64Bit ? X86::SUB64ri8 : X86::SUB32ri8) : > (Is64Bit ? X86::SUB64ri32 : X86::SUB32ri); > MachineInstr *New > BuildMI(TII.get(Opc), StackPtr).addReg(StackPtr).addImm > (CalleeAmt); > MBB.insert(I, New); > } > } > > I am not sure about a command line switch would toggling the stack > adjusting behaviour of a function. Because if the function is called > from a different module which was compiled with the opposite command > line switch all hell would break loose (because it assumes callee > pops arguments when it does not). > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Dale,> The user can specify the "fastcall" and "stdcall" attributes, and it > looks to me like llvm-gcc honors that. Certainly it should.fastcall and stdcall are specific calling conventions. They are not connected with fastcc at all. -- With best regards, Anton Korobeynikov. Faculty of Mathematics & Mechanics, Saint Petersburg State University.