thr3ads.net - llvm dev - [LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Eli Bendersky

2013-Feb-14 22:45 UTC

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

Hello,

While investigating one of the existing tests
(test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some
interesting code. The IR is very straightforward:

define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 %a4) {
entry:
ret i32 %a3
}

define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
entry:
%tmp11 = tail call fastcc i32 @tailcallee( i32 %in1, i32 %in2, i32
%in1, i32 %in2)
ret i32 %tmp11
}

define i32 @foo(i32 %in1, i32 %in2) {
entry:
  %q = call fastcc i32 @tailcaller(i32 %in2, i32 %in1)
  %ww = sub i32 %q, 6
  ret i32 %ww
}

Built with (ToT LLVM):
llc < ~/temp/z.ll  -march=x86 -tailcallopt -O3

The produced code is (cleaned up a bit)

tailcallee:                             # @tailcallee
  movl  4(%esp), %eax
  ret  $12

tailcaller:                             # @tailcaller
  subl  $12, %esp
  movl  %edx, 20(%esp)
  movl  %ecx, 16(%esp)
  addl  $12, %esp
  jmp  tailcallee              # TAILCALL

foo:                                    # @foo
  subl  $12, %esp
  movl  20(%esp), %ecx
  movl  16(%esp), %edx
  calll  tailcaller
  subl  $12, %esp
  addl  $-6, %eax
  addl  $12, %esp
  ret

A number of questions arise here:

1) Notice that 'tailcaller' goes beyond its own stack frame when
arranging arguments for 'tailcallee'. It subs 12 from %esp, but then
writes to 20(%esp). Clearly, something in the fastcc convention allows
it to assume that stack space will be available there? What is it?

2) Note the %esp dance 'tailcaller' is doing - completely useless sub
followed by add. Does this have an inherent goal or can it be
eliminated?

3) The %esp dance of 'foo' is even stranger:

  subl  $12, %esp
  addl  $-6, %eax
  addl  $12, %esp

The subl and addl to %esp cancel out, and with an unrelated operation
in between. Why are they needed?

I'll be very grateful if someone could shed some light on this.

Eli

Cameron McInally

2013-Feb-15 00:15 UTC

head link

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

Hey Eli,

On Thu, Feb 14, 2013 at 5:45 PM, Eli Bendersky <eliben at google.com>
wrote:
> Hello,
>
> While investigating one of the existing tests
> (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some
> interesting code. The IR is very straightforward:
>
> define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32
> %a4) {
> entry:
> ret i32 %a3
> }
>
> define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
> entry:
> %tmp11 = tail call fastcc i32 @tailcallee( i32 %in1, i32 %in2, i32
> %in1, i32 %in2)
> ret i32 %tmp11
> }
>
> define i32 @foo(i32 %in1, i32 %in2) {
> entry:
>   %q = call fastcc i32 @tailcaller(i32 %in2, i32 %in1)
>   %ww = sub i32 %q, 6
>   ret i32 %ww
> }
>
> Built with (ToT LLVM):
> llc < ~/temp/z.ll  -march=x86 -tailcallopt -O3
>
> The produced code is (cleaned up a bit)
>
> tailcallee:                             # @tailcallee
>   movl  4(%esp), %eax
>   ret  $12
>
> tailcaller:                             # @tailcaller
>   subl  $12, %esp
>   movl  %edx, 20(%esp)
>   movl  %ecx, 16(%esp)
>   addl  $12, %esp
>   jmp  tailcallee              # TAILCALL
>
> foo:                                    # @foo
>   subl  $12, %esp
>   movl  20(%esp), %ecx
>   movl  16(%esp), %edx
>   calll  tailcaller
>   subl  $12, %esp
>   addl  $-6, %eax
>   addl  $12, %esp
>   ret
>
> A number of questions arise here:
>
> 1) Notice that 'tailcaller' goes beyond its own stack frame when
> arranging arguments for 'tailcallee'. It subs 12 from %esp, but
then
> writes to 20(%esp). Clearly, something in the fastcc convention allows
> it to assume that stack space will be available there? What is it?

It looks like your call is being converted to a tailcall. I agree that
those stack writes are setting up the arguments for tailcallee. Although, I
haven't done the stack frame math to say for sure.

I suspect that this is legal since tailcallee is a leaf function and the
writes are into the "red zone".

> 2) Note the %esp dance 'tailcaller' is doing - completely useless
sub
> followed by add. Does this have an inherent goal or can it be
> eliminated?
>
> 3) The %esp dance of 'foo' is even stranger:
>
>   subl  $12, %esp
>   addl  $-6, %eax
>   addl  $12, %esp
>
> The subl and addl to %esp cancel out, and with an unrelated operation
> in between. Why are they needed?
>
I'm not an expert in this area, but I believe that "ret  $12"
cleans up the
stack by adding 12 bytes to %esp; an artifact of the tailcall conversion.
So,

  subl  $12, %esp <= Matches the "ret $12" from tailcallee's
epilogue.
  addl  $-6, %eax
  addl  $12, %esp <= Matches the "subl  $12, %esp" from foo's
prologue.

I suppose they're explicitly needed in case a stack operation occurs after
the call and before the return. I wonder if the spiller has not run yet
when the tailcall decision is made, or something similar.

-Cameron
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130214/a4b6829c/attachment.html>

Eli Bendersky

2013-Feb-15 00:50 UTC

head link

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

>> While investigating one of the existing tests
>> (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some
>> interesting code. The IR is very straightforward:
>>
>> define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32
>> %a4) {
>> entry:
>> ret i32 %a3
>> }
>>
>> define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
>> entry:
>> %tmp11 = tail call fastcc i32 @tailcallee( i32 %in1, i32 %in2, i32
>> %in1, i32 %in2)
>> ret i32 %tmp11
>> }
>>
>> define i32 @foo(i32 %in1, i32 %in2) {
>> entry:
>>   %q = call fastcc i32 @tailcaller(i32 %in2, i32 %in1)
>>   %ww = sub i32 %q, 6
>>   ret i32 %ww
>> }
>>
>> Built with (ToT LLVM):
>> llc < ~/temp/z.ll  -march=x86 -tailcallopt -O3
>>
>> The produced code is (cleaned up a bit)
>>
>> tailcallee:                             # @tailcallee
>>   movl  4(%esp), %eax
>>   ret  $12
>>
>> tailcaller:                             # @tailcaller
>>   subl  $12, %esp
>>   movl  %edx, 20(%esp)
>>   movl  %ecx, 16(%esp)
>>   addl  $12, %esp
>>   jmp  tailcallee              # TAILCALL
>>
>> foo:                                    # @foo
>>   subl  $12, %esp
>>   movl  20(%esp), %ecx
>>   movl  16(%esp), %edx
>>   calll  tailcaller
>>   subl  $12, %esp
>>   addl  $-6, %eax
>>   addl  $12, %esp
>>   ret
>>
>> A number of questions arise here:
>>
>> 1) Notice that 'tailcaller' goes beyond its own stack frame
when
>> arranging arguments for 'tailcallee'. It subs 12 from %esp, but
then
>> writes to 20(%esp). Clearly, something in the fastcc convention allows
>> it to assume that stack space will be available there? What is it?
>
>
> It looks like your call is being converted to a tailcall. I agree that
those
> stack writes are setting up the arguments for tailcallee. Although, I
> haven't done the stack frame math to say for sure.
>
> I suspect that this is legal since tailcallee is a leaf function and the
> writes are into the "red zone".
Thanks for answering, Cameron.

I don't think this is red-zone related, because the (1) red-zone is in
the callee's, not caller's stack frame (i.e. it's *below* the return
address) and (2) red-zone is x86-64 specific and this code is
generated for 32-bit x86.

The math is pretty simple here. tailcaller gets two int arguments,
both passed on the stack (fastcc). So when it's entered there's only
the return address on stack. It subs 12 from the %esp but then writes
into 20(%esp), which is above the return address and hence in its
caller's frame.
>
>>
>> 2) Note the %esp dance 'tailcaller' is doing - completely
useless sub
>> followed by add. Does this have an inherent goal or can it be
>> eliminated?
>>
>> 3) The %esp dance of 'foo' is even stranger:
>>
>>   subl  $12, %esp
>>   addl  $-6, %eax
>>   addl  $12, %esp
>>
>> The subl and addl to %esp cancel out, and with an unrelated operation
>> in between. Why are they needed?
>
>
> I'm not an expert in this area, but I believe that "ret  $12"
cleans up the
> stack by adding 12 bytes to %esp; an artifact of the tailcall conversion.
> So,
>
>   subl  $12, %esp <= Matches the "ret $12" from
tailcallee's epilogue.
>   addl  $-6, %eax
>   addl  $12, %esp <= Matches the "subl  $12, %esp" from
foo's prologue.
>
> I suppose they're explicitly needed in case a stack operation occurs
after
> the call and before the return. I wonder if the spiller has not run yet
when
> the tailcall decision is made, or something similar.
Yep, I agree about their purpose. It's just that they could (and
should) have been optimized away, I think.

Eli

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Feb 2013 - [LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

Apparently Analagous Threads