thr3ads.net - llvm dev - [LLVMdev] [PATCH] Unwanted r11 in push/pop on Cortex-M. [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Andrea Mucignat

2013-Oct-21 00:06 UTC

[LLVMdev] [PATCH] Unwanted r11 in push/pop on Cortex-M.

To recap, this is what I was trying to solve:

This C code:

int bar(int a, int b, int c, int d, int e, int f);

int foo(int a, int b, int c, int d, int e )
{
    int x = 3*a;
    return bar3(a,b,c,d,e,x);
}

Produced the following assembly output:

foo:
    push    {r11, lr}
    sub    sp, #8
    bl    bar
    add    sp, #8
    pop    {r11, pc}


The part I didn't like is that push/pop become 4 bytes instructions because
the register that we picked to "align" the stack pointer does not
belong to
the low register set (r0-r7).
After a bit of digging, the following patch seems to take care of it.

Basically what happens is:
- the C function in the example doesn't have any local to save
- the only spilled register is LR
- the Prolog/Epilog Inserted calls the target specific
ARMFrameLowering::processFunctionBeforeCalleeSavedScan
- at this point the we need to fix the fact that SP would not be EABI
compliant because only 1 register has been spilled so far
- inside processFunctionBeforeCalleeSavedScan, there a check: if
TargetAlign == 8 && (NumGPRSpills & 1) then we align the stack by
pushing
one of the _unspilled_ registers.

The problem is that this "unspilled" register list starts with r11
(for
reasons I'm not clear about), so I decided to skip all the registers r8-r15.
The part I'm not sure about is: what if r4-r7 are all spilled already, is
the SP going to be aligned by some other routine later on (maybe by adding
a "sub sp, sp, #4")?

Any comment?

Thanks,
Andrea

diff -aruN llvm-source/lib/Target/ARM/ARMFrameLowering.cpp
llvm-source-new/lib/Target/ARM/ARMFrameLowering.cpp
--- llvm-source/lib/Target/ARM/ARMFrameLowering.cpp    2013-07-29
10:50:07.000000000 -0700
+++ llvm-source-new/lib/Target/ARM/ARMFrameLowering.cpp    2013-10-20
16:33:59.905251552 -0700
@@ -1313,7 +1313,7 @@
         for (unsigned i = 0, e = UnspilledCS1GPRs.size(); i != e; ++i) {
           unsigned Reg = UnspilledCS1GPRs[i];
           // Don't spill high register if the function is thumb1
-          if (!AFI->isThumb1OnlyFunction() ||
+          if (!AFI->isThumbFunction() ||
               isARMLowRegister(Reg) || Reg == ARM::LR) {
             MRI.setPhysRegUsed(Reg);
             if (!MRI.isReserved(Reg))



On Tue, Oct 15, 2013 at 10:15 AM, Andrea Mucignat <andrea at
nestlabs.com>wrote:
> Umesh,
>
> From Target/ARM/ARMRegisterInfo.td, it looks like FP for Thumb should be
> r7 instead of r11.
>
> // Register classes.
> //
> // pc  == Program Counter
> // lr  == Link Register
> // sp  == Stack Pointer
> // r12 == ip (scratch)
> // r7  == Frame Pointer (thumb-style backtraces)
> // r9  == May be reserved as Thread Register
> // r11 == Frame Pointer (arm-style backtraces)
> // r10 == Stack Limit
>
> Regardless, I'm compiling with -fomit-frame-pointer, so it should not
> really matter, right?
>
> Andrea
>
>
>
> On Tue, Oct 15, 2013 at 8:33 AM, Umesh Kalappa <umesh.kalappa0 at
gmail.com>wrote:
>
>> Hi andrea,
>> R11 treated as  frame pointer at arm backend , which is fixed again .
>>
>> Thanks
>> Umesh
>>
>>
>> On Tuesday, October 15, 2013, Andrea Mucignat <andrea at
nestlabs.com>
>> wrote:
>> > Umesh,
>> > Makes some sort of sense to me, OTOH:
>> > If instead of choosing r11 as a "dummy" to align the
stack we had
>> chosen some other register in the range r0-r7 then we could have
emitted
>> the PUSH encoding T1 (2 bytes opcode) as opposed to the encoding T2
(which
>> is a 4 bytes opcode).
>> > A
>> >
>> > On Tue, Oct 15, 2013 at 2:59 AM, Umesh Kalappa <
>> umesh.kalappa0 at gmail.com> wrote:
>> >
>> > Hi Andrea,
>> >
>> > That is because the LR is the  fixed register as per the
>> >
>>
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf
>> > and out_char() function is not  the leaf function ,Hence compiler
>> > tends to save and restore the  LR  and the save  and restore of
>> > register r11 is to align stack for 8 bytes as per ARM EABI.
>> >
>> >
>> >
>> > Thanks
>> >
>> > ~Umesh
>> >
>> >
>> > On Tuesday, October 15, 2013, Umesh Kalappa <umesh.kalappa0 at
gmail.com>
>> wrote:
>> >> ---------- Forwarded message ----------
>> >> From: Andrea Mucignat <andrea at nestlabs.com>
>> >> Date: Tue, Oct 15, 2013 at 7:05 AM
>> >> Subject: [LLVMdev] Unwanted push/pop on Cortex-M.
>> >> To: llvmdev at cs.uiuc.edu
>> >>
>> >>
>> >> Hi,
>> >>
>> >> I have this code:
>> >>
>> >> void platform_putchar(int, char);
>> >> void out_char( char ch );
>> >>
>> >> void out_char( char ch )
>> >> {
>> >>     platform_putchar (0, ch);
>> >> }
>> >>
>> >> I'm compiling with the following clang invocation:
>> >>
>> >> $
>> /usr/local/vendor/toolchains/llvm/3.3/armv7m/bin/armv7m-none-eabi-clang
>> >> -mcpu=cortex-m4 -mfloat-abi=soft -mthumb -nostdinc
-ffreestanding
>> >> -ffunction-sections -fdata-sections -fno-exceptions -fno-rtti
>> >> -fomit-frame-pointer -momit-leaf-frame-pointer -nostdinc -v
-Os -S -o
>> >> a.s a.c
>> >> vendor-clang version 3.3 (tags/RELEASE_33/final) (based on
LLVM 3.3)
>> >> Target: armv7m-none--eabi
>> >> Thread model: posix
>> >>
>> 
"/usr/local/vendor-20130805-b8d59d2/toolchains/llvm/3.3/armv7m/bin/armv7m-none-eabi-clang"
>> >> -cc1 -triple thumbv7em-none--eabi -S -disable-free
-main-file-name a.c
>> >> -mrelocation-model static -fmath-errno -mconstructor-aliases
>> >> -target-abi aapcs -target-cpu cortex-m4 -msoft-float
-mfloat-abi soft
>> >> -target-feature +soft-float -target-feature +soft-float-abi
>> >> -target-feature -neon -target-linker-version 2.22
>> >> -momit-leaf-frame-pointer -v -ffunction-sections
-fdata-sections
>> >> -coverage-file /tmp/a.s -nostdsysteminc -nobuiltininc
-resource-dir
>> >>
>>
/usr/local/vendor-20130805-b8d59d2/toolchains/llvm/3.3/armv7m/bin/../lib/clang/3.3
>> >> -Os -fno-dwarf-directory-asm -fdebug-compilation-dir /tmp
>> >> -ferror-limit 19 -fmessage-length 207 -ffreestanding
-mstackrealign
>> >> -fno-rtti -fno-signed-char -fobjc-runtime=gcc
>> >> -fobjc-default-synthesize-properties -fdiagnostics-show-option
>> >> -fcolor-diagnostics -backend-option -vectorize-loops -o a.s -x
c a.c
>> >> clang -cc1 version 3.3 based upon LLVM 3.3 default target
>> armv7m-none-eabi
>> >> #include "..." search starts here:
>> >> End of search list.
>> >>
>> >> $ cat a.s
>> >>     .syntax unified
>> >>     .eabi_attribute 6, 10
>> >>     .eabi_attribute 9, 2
>> >>     .eabi_attribute 10, 5
>> >>     .fpu vfpv4
>> >>     .eabi_attribute 20, 1
>> >>     .eabi_attribute 21, 1
>> >>     .eabi_attribute 23, 3
>> >>     .eabi_attribute 24, 1
>> >>     .eabi_attribute 25, 1
>> >>     .eabi_attribute 44, 1
>> >>     .file    "a.c"
>> >>     .section    .text.out_char,"ax",%progbits
>> >>     .globl    out_char
>> >>     .align    2
>> >>     .type    out_char,%function
>> >>     .code    16
>> >>     .thumb_func
>> >> out_char:
>> >>     push.w    {r11, lr}
>> >>     mov    r1, r0
>> >>     movs    r0, #0
>> >>     bl    platform_putchar
>> >>     pop.w    {r11, pc}
>> >> .Ltmp0:
>> >>     .size    out_char, .Ltmp0-out_char
>> >>
>> >> The one question I have is:
>> >> why can't this out_char function be compiled into:
>> >>
>> >> out_char:
>> >>     mov r1, r0
>> >>     mov r0, #0
>> >>     bl platform_putchar
>> >>     bx lr
>> >>
>> >> What's the clang/llvm module responsible for generating
the prologue
>> >> and epilogue for this function?
>> >> I looked into CodeGen/PrologEpilogInserter,
ARMFrameLowering.cpp
>> >> Thumb1FrameLowering.cpp but it's not 100% clear how the
code gen
>> >> decides what the stack frame look like.
>> >>
>> >> Here's the -emit-llvm output
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131020/49c8df94/attachment.html>

Renato Golin

2013-Oct-21 11:57 UTC

head link

[LLVMdev] [PATCH] Unwanted r11 in push/pop on Cortex-M.

On 21 October 2013 01:06, Andrea Mucignat <andrea at nestlabs.com> wrote:
> The part I didn't like is that push/pop become 4 bytes instructions
> because the register that we picked to "align" the stack pointer
does not
> belong to the low register set (r0-r7).
> After a bit of digging, the following patch seems to take care of it.
>
Hi Andrea,

This seems a valid intention, but, since you're in Thumb2 mode, it
shouldn't make much of a difference, except for code size. And even for
code size, you'd save four bytes per function on the binary, which is
normally not enough to warrant a change. I also believe that this was not
accidental (though I can't confirm), since using R7 for the stack pointer
would split the register bank and reduce the ability, for instance, to use
sequential registers to shuffle GPRs into Ds and Qs and vice-versa.

That said, if register pressure is low, or if you're not using VFP or NEON,
you could use R7 as the stack frame, even on Thumb2. This could be an
option, and even enabled by default when in -Oz mode, for instance.

The problem is that this "unspilled" register list starts with r11
(for> reasons I'm not clear about)
>
I guess that's because it's always good to have the frame pointer saved
on
each stack, to make it easier to unwind the stack while debugging, even
with high optimization levels. Someone with more contact with that part of
the code can chime in, but that seems pretty intentional, and it makes
sense, too.

> The part I'm not sure about is: what if r4-r7 are all spilled already,
is
> the SP going to be aligned by some other routine later on (maybe by adding
> a "sub sp, sp, #4")?
>
I think you'd have to come up with a few tests on each possible problematic
case you can think of. If you end up implementing this, I think you'll need
a good set of test cases, so that we make sure the behaviour you want
doesn't leak to unwanted cases.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131021/67ac93f4/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Oct 2013 - [LLVMdev] [PATCH] Unwanted r11 in push/pop on Cortex-M.

[LLVMdev] [PATCH] Unwanted r11 in push/pop on Cortex-M.

[LLVMdev] [PATCH] Unwanted r11 in push/pop on Cortex-M.

Possibly Parallel Threads