thr3ads.net - llvm dev - [LLVMdev] RFC: Tail call optimization X86 [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Arnold Schwaighofer

2007-Sep-11 18:21 UTC

[LLVMdev] RFC: Tail call optimization X86

Begin forwarded message:
> From: Evan Cheng <evan.cheng at apple.com>
> Date: 11 September 2007 19:26:39 GMT+02:00
> To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
> Subject: Re: [LLVMdev] RFC: Tail call optimization X86
> Reply-To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
>
> Hi Arnold,
>
> Thanks for the patch. Some questions and commons:
>
> 1. Have you test it against the llvm test suite?No not yet.
> Does it work if fp
> elimination optimization is turned off?
For my test cases - yes.> 2. Please follow llvm coding convention and make sure every line fits
> in 80 columns.Okay dokey. i was looking through the files manually - must have  
missed some - sorry.> 3.
> enum NameDecorationStyle {
>     None,
>     StdCall,
> -  FastCall
> +  FastCall,
> +  FastCC // the normal fastcc calling convention
> };
>
> Why is FastCC necessary? Can't you just use FastCall?
FastCall is used to indicate a function has x86_FastCall semantics if  
i am not mistaken.
I needed to differentiate between a normal fastcc and the  
x86_fastcall semantics in an older version
of my code. I no longer depend on that so it can be removed as you  
suggest. sorry for the code corpse :)> 4.
> def X86tailcall: SDNode<"X86ISD::TAILCALL",     SDT_X86Call,
>                           [SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>;
> +def X86truetailcall: SDNode<"X86ISD::TRUETAILCALL",    
SDT_X86Call,
> +                        [SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>;
> +
>
> Please use X86tailcall. It's not currently used so feel free to
> change its patterns, etc.
>
Okay dokey.
> 5.
> +// the following two instructions are used to adjust the stack  
> pointer
> +// in the case where the callee has more arguments than the caller
> +// an area is created where the return addr can be safely moved to
> +let isConvertibleToThreeAddress = 1 in {   // Can transform into LEA.
> +def TCADD32ri  : Ii32<0x81, MRM0r, (outs GR32:$dst), (ins GR32:
> $src1, i32imm:$src2),
> +                    "add{l}\t{$src2, $dst|$dst, $src2}",
> +                    []>;
> +}
> +
> +def TCSUB32ri  : Ii32<0x81, MRM5r, (outs GR32:$dst), (ins GR32:
> $src1, i32imm:$src2),
> +                    "sub{l}\t{$src2, $dst|$dst, $src2}",
> +                    []>;
> +In an intermediate implementation (that did not work anyway) i need  
to differentiate
  between the normal ADD32ri/SUB32ri that were in the code and the stack
adjustments i added. I  no longer need them so i will remove them.  
again a code
corpse sorry for that.> +//let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in
> +//def TCRETURNmi : I<0, Pseudo, (outs), (ins i32mem:$dst, i32imm:
> $offset),
> +//                 "#TC_RETURN $dst $offset",
> +//                 []>;
> +
This node holds the stack adjustment delta and the jmp destination of  
the call.
The information is used in epilogue generation.> Why are these needed? They don't look any different from normal add
> and subtraction instructions. Why not just use ADJCALLSTACKDOWN and
> ADJCALLSTACKUP?
>
> 6.
> +  } else if (RetOpcode ==  X86::TCRETURNdi) {
> +    // a tailcall adjust the stack
> ...
> +  } else if (RetOpcode == X86::TCRETURNri) {
> +    MBBI = prior(MBB.end());
>
> Seems like there are quite a bit of duplicate code between the 2
> cases. Please refactor.
Okay dokey> 7.
> X86TargetLowering::X86TargetLowering(TargetMachine &TM)
>     : TargetLowering(TM) {
> +  IsLastCallTailCall = false;
>
> This is probably not a good idea. You are assuming nodes are lowered
> in certain order, that is dangerous. It's probably better to check
> whether the last call is a tail call on the fly as you are processing
> the return node.
Will do so.> 8.
> -
> -  SDOperand Chain = Op.getOperand(0);
> -  SDOperand Flag;
> -
> -  // Copy the result values into the output registers.
> -  if (RVLocs.size() != 1 || !RVLocs[0].isRegLoc() ||
> -      RVLocs[0].getLocReg() != X86::ST0) {
> -    for (unsigned i = 0; i != RVLocs.size(); ++i) {
> -      CCValAssign &VA = RVLocs[i];
> -      assert(VA.isRegLoc() && "Can only return in
registers!");
> -      Chain = DAG.getCopyToReg(Chain, VA.getLocReg(), Op.getOperand
> (i*2+1),
> -                               Flag);
> -      Flag = Chain.getValue(1);
> -    }
> +  if (IsLastCallTailCall) {
> +    IsLastCallTailCall = false;
> +    SDOperand TailCall = GetTailCall(Op);
> +    SDOperand TargetAddress = TailCall.getOperand(1);
> +    SDOperand StackAdjustment = TailCall.getOperand(2);
> +    assert ( ((TargetAddress.getOpcode() == ISD::Register &&
> +               cast<RegisterSDNode>(TargetAddress)->getReg()
=> X86::ECX) ||
> +              TargetAddress.getOpcode() ==  
> ISD::TargetExternalSymbol ||
> +              TargetAddress.getOpcode() ==  
> ISD::TargetGlobalAddress) &&
> +             "Expecting an global address, external symbol, or
> register");
> +    assert( StackAdjustment.getOpcode() == ISD::Constant &&
> "Expecting a const value");
> +    // TODO: should we add flag from tail call as last operand?
> +    return DAG.getNode(X86ISD::TC_RETURN, MVT::Other, Op.getOperand
> (0), TargetAddress, StackAdjustment);
>     } else {
>
> Since if (IsLastCallTailCall) { ... } always ends with an early
> exist, you can just place the block before if (RVLocs.size() ...) and
> there is no need to change indentation for the rest of the function.
>
Okay> 9.
> +// Check to see whether the next instruction following the call is a
> return
> +// A function is eligable if caller/callee calling conventions match
> and the
> +// function CALL is immediatly followed by a RET
> +bool X86TargetLowering::IsEligibleForTailCallElimination(SDOperand
> Call, SelectionDAG& DAG, unsigned CalleeCC, SDOperand Callee) {
> +  bool IsEligible = false;
> +  SDNode * CallNode = Call.Val;
> ...
>
> +SDOperand X86TargetLowering::LowerX86_32FastCCCallTo(SDOperand Op,
> +                                                     SelectionDAG  
> &DAG,
> +                                                     unsigned CC) {
> +  DOUT << "LowerX86_32FastCCCallTo\n";
> +  SDOperand Chain     = Op.getOperand(0);
> +  bool isVarArg       = cast<ConstantSDNode>(Op.getOperand(2))-
>> getValue() != 0;
> +  bool isTailCall     = cast<ConstantSDNode>(Op.getOperand(3))-
>> getValue() != 0;
> +  SDOperand Callee    = Op.getOperand(4);
> +  //unsigned NumOps     = (Op.getNumOperands() - 5) / 2;
> +
> +  // Analyze operands of the call, assigning locations to each  
> operand.
> +  SmallVector<CCValAssign, 16> ArgLocs;
> +  CCState CCInfo(CC, isVarArg, getTargetMachine(), ArgLocs);
> +  CCInfo.AnalyzeCallOperands(Op.Val, CC_X86_32_TailCall);
> +  if (isTailCall &&
> +      IsEligibleForTailCallElimination(Op, DAG,CC, Callee) &&
> +      PerformTailCallOpt) {
>
>
> IsEligibleForTailCallElimination() should be a target hook. This way
> TargetLowering::LowerCallTo() can determine whether a call is
> eligible for tail call elimination and insert the current ISD::CALL
> node.Okay
> X86TargetLowering::LowerX86_32FastCCCallTo() will not have to
> handle non-tail calls.
>I will then add code to LowerCCCCallTo() to check whether to use the  
normal
calling convention CC_X86_32_C (3 registers free for argument  
passing) or
the tail call (fastcc) calling convention CC_X86_32_TailCall (2  
register free).

Expect to see a new patch after i have done the testing with the  
whole test suite.

regards arnold

Evan Cheng

2007-Sep-12 17:36 UTC

head link

[LLVMdev] RFC: Tail call optimization X86

On Sep 11, 2007, at 11:21 AM, Arnold Schwaighofer wrote:
> Begin forwarded message:
>
>> From: Evan Cheng <evan.cheng at apple.com>
>> Date: 11 September 2007 19:26:39 GMT+02:00
>> To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
>> Subject: Re: [LLVMdev] RFC: Tail call optimization X86
>> Reply-To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
>>
>> Hi Arnold,
>>
>> Thanks for the patch. Some questions and commons:
>>
>> 1. Have you test it against the llvm test suite?
> No not yet.
Ok, I don't expect everything would just work. But please run at least  
SingleSource and MultiSource and let us know what breaks. Once it's  
proven to pass a good chunk of the tests it can be checked in but  
turned off. This will allow us to test it as llcbeta.
>
> Expect to see a new patch after i have done the testing with the
> whole test suite.
Sounds good. Thanks.

Evan
>
>
> regards arnold
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Arnold Schwaighofer

2007-Sep-12 22:50 UTC

head link

[LLVMdev] RFC: Tail call optimization X86

> Ok, I don't expect everything would just work. But please run at least
> SingleSource and MultiSource and let us know what breaks. Once it's
> proven to pass a good chunk of the tests it can be checked in but
> turned off. This will allow us to test it as llcbeta.
>I just ran the test suite in llvm-test with a vanilla copy of the svn
tree on darwin-x86.
To perform the tests i downloaded the test-suite as instructed on the web.
cd'ed into llvm-test. configured (--withllvmfoo etc) and make'd ;).
What worries me is that 148 tests are failing. to determine this
number i redirected the output (stdout+stderr) to a file and grep FAIL
| wc -l on it.
Is the number of 148 failing tests sane or could this indicate there
is something bogus with my setup?
When i do a make in the llvm-test directory all tests are executed right?

regards arnold

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Sep 2007 - [LLVMdev] RFC: Tail call optimization X86

[LLVMdev] RFC: Tail call optimization X86

[LLVMdev] RFC: Tail call optimization X86

[LLVMdev] RFC: Tail call optimization X86

Possibly Parallel Threads