thr3ads.net - llvm dev - [LLVMdev] Possible missed optimization on function calling? [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Borja Ferrer

2010-Sep-21 20:21 UTC

[LLVMdev] Possible missed optimization on function calling?

Hello, I noticed that the following code could be improved a little bit
further. If the optimization is too tricky for the compiler or something and
it's done this way by design forgive me, but in any case i just wanted to
point it out.

Consider the following C code:

extern int mcos(int a);
extern int msin(int a);
extern int mdiv(int a, int b);

int foo(int a, int b)
{
    int a4 = mdiv(mcos(a), msin(b));
    return a4;
}

I noticed this while testing it for the backend i'm currently developing,
but it produces exactly the same code for other targets:

march = msp430:
    push.w    r11
    push.w    r10
    push.w    r9
    push.w    r8
    mov.w    r14, r11
    mov.w    r15, r10   ; store a
    mov.w    r13, r15
    mov.w    r12, r14   ; pass b
    call    #msin
    mov.w    r15, r9
    mov.w    r14, r8   ; store msin(b)
    mov.w    r10, r15
    mov.w    r11, r14 ; pass a
    call    #mcos
    mov.w    r9, r13   ;  pass msin(b)
    mov.w    r8, r12
    call    #mdiv
    pop.w    r8
    pop.w    r9
    pop.w    r10
    pop.w    r11
    ret

march = thumb
    push    {r4, r5, lr}
    mov    r4, r0
    mov    r0, r1
    bl    msin
    mov    r5, r0
    mov    r0, r4
    bl    mcos
    mov    r1, r5
    bl    mdiv
    pop    {r4, r5, pc}

Using the MSP430 example above, it could have produced:

    push.w    r11
    push.w    r10
    mov.w    r14, r11
    mov.w    r15, r10   ; store a
    mov.w    r13, r15
    mov.w    r12, r14   ; pass b
    call    #msin
   ; SWAP MSIN(B) AND ARGUMENT "a" USING R13:R12
    mov.w    r15, r13
    mov.w    r14, r12  : store msin(b)
    mov.w    r11, r14
    mov.w    r10, r15 ;  pass a
    mov.w    r13, r11
    mov.w    r12, r10  ; save msin(b) into callee saved regs
    call    #mcos
    mov.w    r11, r13   ;  pass msin(b)
    mov.w    r10, r12
    call    #mdiv
    pop.w    r10
    pop.w    r11
    ret

The basic explanation is that r13:r12 could be used as scratch registers
after msin() is called to swap the result of msin(b) with the argument a.
This saves pushing and popping r9 and r8, at the cost of using two extra
moves, saving in total 2 instructions but saving 4 memory acceses.
In the case of my backend which is targetted for an 8bit arch but supports
16bit moves it saves pushing and popping four 8bit regs which means saving 6
instructions, or in other words 8 memory accesses. In terms of speed it
saves 14 cycles (2 cycles per push/pop).

As a side note GCC produces this same code.

Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100921/74c00f6f/attachment.html>

Borja Ferrer

2010-Oct-06 12:54 UTC

head link

[LLVMdev] Possible missed optimization on function calling?

I'm bringing up this email in case it wasn't noticed or got lost in the
depth of the mailing list. Please let me know if this issue
requires openning a bug report or if it can be handled in another way.

Thanks and congrats for the the new release!

2010/9/21 Borja Ferrer <borja.ferav at gmail.com>
> Hello, I noticed that the following code could be improved a little bit
> further. If the optimization is too tricky for the compiler or something
and
> it's done this way by design forgive me, but in any case i just wanted
to
> point it out.
>
> Consider the following C code:
>
> extern int mcos(int a);
> extern int msin(int a);
> extern int mdiv(int a, int b);
>
> int foo(int a, int b)
> {
>     int a4 = mdiv(mcos(a), msin(b));
>     return a4;
> }
>
> I noticed this while testing it for the backend i'm currently
developing,
> but it produces exactly the same code for other targets:
>
> march = msp430:
>     push.w    r11
>     push.w    r10
>     push.w    r9
>     push.w    r8
>     mov.w    r14, r11
>     mov.w    r15, r10   ; store a
>     mov.w    r13, r15
>     mov.w    r12, r14   ; pass b
>     call    #msin
>     mov.w    r15, r9
>     mov.w    r14, r8   ; store msin(b)
>     mov.w    r10, r15
>     mov.w    r11, r14 ; pass a
>     call    #mcos
>     mov.w    r9, r13   ;  pass msin(b)
>     mov.w    r8, r12
>     call    #mdiv
>     pop.w    r8
>     pop.w    r9
>     pop.w    r10
>     pop.w    r11
>     ret
>
> march = thumb
>     push    {r4, r5, lr}
>     mov    r4, r0
>     mov    r0, r1
>     bl    msin
>     mov    r5, r0
>     mov    r0, r4
>     bl    mcos
>     mov    r1, r5
>     bl    mdiv
>     pop    {r4, r5, pc}
>
> Using the MSP430 example above, it could have produced:
>
>     push.w    r11
>     push.w    r10
>     mov.w    r14, r11
>     mov.w    r15, r10   ; store a
>     mov.w    r13, r15
>     mov.w    r12, r14   ; pass b
>     call    #msin
>    ; SWAP MSIN(B) AND ARGUMENT "a" USING R13:R12
>     mov.w    r15, r13
>     mov.w    r14, r12  : store msin(b)
>     mov.w    r11, r14
>     mov.w    r10, r15 ;  pass a
>     mov.w    r13, r11
>     mov.w    r12, r10  ; save msin(b) into callee saved regs
>     call    #mcos
>     mov.w    r11, r13   ;  pass msin(b)
>     mov.w    r10, r12
>     call    #mdiv
>     pop.w    r10
>     pop.w    r11
>     ret
>
> The basic explanation is that r13:r12 could be used as scratch registers
> after msin() is called to swap the result of msin(b) with the argument a.
> This saves pushing and popping r9 and r8, at the cost of using two extra
> moves, saving in total 2 instructions but saving 4 memory acceses.
> In the case of my backend which is targetted for an 8bit arch but supports
> 16bit moves it saves pushing and popping four 8bit regs which means saving
6
> instructions, or in other words 8 memory accesses. In terms of speed it
> saves 14 cycles (2 cycles per push/pop).
>
> As a side note GCC produces this same code.
>
> Thanks.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101006/b83ff1ae/attachment.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Sep 2010 - [LLVMdev] Possible missed optimization on function calling?

[LLVMdev] Possible missed optimization on function calling?

[LLVMdev] Possible missed optimization on function calling?

Apparently Analagous Threads