Begin forwarded message:
> From: Evan Cheng <evan.cheng at apple.com>
> Date: 11 September 2007 19:26:39 GMT+02:00
> To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
> Subject: Re: [LLVMdev] RFC: Tail call optimization X86
> Reply-To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
>
> Hi Arnold,
>
> Thanks for the patch. Some questions and commons:
>
> 1. Have you test it against the llvm test suite?
No not yet.
> Does it work if fp
> elimination optimization is turned off?
For my test cases - yes.> 2. Please follow llvm coding convention and make sure every line fits
> in 80 columns.
Okay dokey. i was looking through the files manually - must have
missed some - sorry.> 3.
> enum NameDecorationStyle {
> None,
> StdCall,
> - FastCall
> + FastCall,
> + FastCC // the normal fastcc calling convention
> };
>
> Why is FastCC necessary? Can't you just use FastCall?
FastCall is used to indicate a function has x86_FastCall semantics if
i am not mistaken.
I needed to differentiate between a normal fastcc and the
x86_fastcall semantics in an older version
of my code. I no longer depend on that so it can be removed as you
suggest. sorry for the code corpse :)> 4.
> def X86tailcall: SDNode<"X86ISD::TAILCALL", SDT_X86Call,
> [SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>;
> +def X86truetailcall: SDNode<"X86ISD::TRUETAILCALL",
SDT_X86Call,
> + [SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>;
> +
>
> Please use X86tailcall. It's not currently used so feel free to
> change its patterns, etc.
>
Okay dokey.
> 5.
> +// the following two instructions are used to adjust the stack
> pointer
> +// in the case where the callee has more arguments than the caller
> +// an area is created where the return addr can be safely moved to
> +let isConvertibleToThreeAddress = 1 in { // Can transform into LEA.
> +def TCADD32ri : Ii32<0x81, MRM0r, (outs GR32:$dst), (ins GR32:
> $src1, i32imm:$src2),
> + "add{l}\t{$src2, $dst|$dst, $src2}",
> + []>;
> +}
> +
> +def TCSUB32ri : Ii32<0x81, MRM5r, (outs GR32:$dst), (ins GR32:
> $src1, i32imm:$src2),
> + "sub{l}\t{$src2, $dst|$dst, $src2}",
> + []>;
> +
In an intermediate implementation (that did not work anyway) i need
to differentiate
between the normal ADD32ri/SUB32ri that were in the code and the stack
adjustments i added. I no longer need them so i will remove them.
again a code
corpse sorry for that.> +//let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in
> +//def TCRETURNmi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm:
> $offset),
> +// "#TC_RETURN $dst $offset",
> +// []>;
> +
This node holds the stack adjustment delta and the jmp destination of
the call.
The information is used in epilogue generation.> Why are these needed? They don't look any different from normal add
> and subtraction instructions. Why not just use ADJCALLSTACKDOWN and
> ADJCALLSTACKUP?
>
> 6.
> + } else if (RetOpcode == X86::TCRETURNdi) {
> + // a tailcall adjust the stack
> ...
> + } else if (RetOpcode == X86::TCRETURNri) {
> + MBBI = prior(MBB.end());
>
> Seems like there are quite a bit of duplicate code between the 2
> cases. Please refactor.
Okay dokey> 7.
> X86TargetLowering::X86TargetLowering(TargetMachine &TM)
> : TargetLowering(TM) {
> + IsLastCallTailCall = false;
>
> This is probably not a good idea. You are assuming nodes are lowered
> in certain order, that is dangerous. It's probably better to check
> whether the last call is a tail call on the fly as you are processing
> the return node.
Will do so.> 8.
> -
> - SDOperand Chain = Op.getOperand(0);
> - SDOperand Flag;
> -
> - // Copy the result values into the output registers.
> - if (RVLocs.size() != 1 || !RVLocs[0].isRegLoc() ||
> - RVLocs[0].getLocReg() != X86::ST0) {
> - for (unsigned i = 0; i != RVLocs.size(); ++i) {
> - CCValAssign &VA = RVLocs[i];
> - assert(VA.isRegLoc() && "Can only return in
registers!");
> - Chain = DAG.getCopyToReg(Chain, VA.getLocReg(), Op.getOperand
> (i*2+1),
> - Flag);
> - Flag = Chain.getValue(1);
> - }
> + if (IsLastCallTailCall) {
> + IsLastCallTailCall = false;
> + SDOperand TailCall = GetTailCall(Op);
> + SDOperand TargetAddress = TailCall.getOperand(1);
> + SDOperand StackAdjustment = TailCall.getOperand(2);
> + assert ( ((TargetAddress.getOpcode() == ISD::Register &&
> + cast<RegisterSDNode>(TargetAddress)->getReg()
=> X86::ECX) ||
> + TargetAddress.getOpcode() ==
> ISD::TargetExternalSymbol ||
> + TargetAddress.getOpcode() ==
> ISD::TargetGlobalAddress) &&
> + "Expecting an global address, external symbol, or
> register");
> + assert( StackAdjustment.getOpcode() == ISD::Constant &&
> "Expecting a const value");
> + // TODO: should we add flag from tail call as last operand?
> + return DAG.getNode(X86ISD::TC_RETURN, MVT::Other, Op.getOperand
> (0), TargetAddress, StackAdjustment);
> } else {
>
> Since if (IsLastCallTailCall) { ... } always ends with an early
> exist, you can just place the block before if (RVLocs.size() ...) and
> there is no need to change indentation for the rest of the function.
>
Okay> 9.
> +// Check to see whether the next instruction following the call is a
> return
> +// A function is eligable if caller/callee calling conventions match
> and the
> +// function CALL is immediatly followed by a RET
> +bool X86TargetLowering::IsEligibleForTailCallElimination(SDOperand
> Call, SelectionDAG& DAG, unsigned CalleeCC, SDOperand Callee) {
> + bool IsEligible = false;
> + SDNode * CallNode = Call.Val;
> ...
>
> +SDOperand X86TargetLowering::LowerX86_32FastCCCallTo(SDOperand Op,
> + SelectionDAG
> &DAG,
> + unsigned CC) {
> + DOUT << "LowerX86_32FastCCCallTo\n";
> + SDOperand Chain = Op.getOperand(0);
> + bool isVarArg = cast<ConstantSDNode>(Op.getOperand(2))-
>> getValue() != 0;
> + bool isTailCall = cast<ConstantSDNode>(Op.getOperand(3))-
>> getValue() != 0;
> + SDOperand Callee = Op.getOperand(4);
> + //unsigned NumOps = (Op.getNumOperands() - 5) / 2;
> +
> + // Analyze operands of the call, assigning locations to each
> operand.
> + SmallVector<CCValAssign, 16> ArgLocs;
> + CCState CCInfo(CC, isVarArg, getTargetMachine(), ArgLocs);
> + CCInfo.AnalyzeCallOperands(Op.Val, CC_X86_32_TailCall);
> + if (isTailCall &&
> + IsEligibleForTailCallElimination(Op, DAG,CC, Callee) &&
> + PerformTailCallOpt) {
>
>
> IsEligibleForTailCallElimination() should be a target hook. This way
> TargetLowering::LowerCallTo() can determine whether a call is
> eligible for tail call elimination and insert the current ISD::CALL
> node.
Okay
> X86TargetLowering::LowerX86_32FastCCCallTo() will not have to
> handle non-tail calls.
>
I will then add code to LowerCCCCallTo() to check whether to use the
normal
calling convention CC_X86_32_C (3 registers free for argument
passing) or
the tail call (fastcc) calling convention CC_X86_32_TailCall (2
register free).
Expect to see a new patch after i have done the testing with the
whole test suite.
regards arnold