Andrea Mucignat
2013-Oct-21 00:06 UTC
[LLVMdev] [PATCH] Unwanted r11 in push/pop on Cortex-M.
To recap, this is what I was trying to solve: This C code: int bar(int a, int b, int c, int d, int e, int f); int foo(int a, int b, int c, int d, int e ) { int x = 3*a; return bar3(a,b,c,d,e,x); } Produced the following assembly output: foo: push {r11, lr} sub sp, #8 bl bar add sp, #8 pop {r11, pc} The part I didn't like is that push/pop become 4 bytes instructions because the register that we picked to "align" the stack pointer does not belong to the low register set (r0-r7). After a bit of digging, the following patch seems to take care of it. Basically what happens is: - the C function in the example doesn't have any local to save - the only spilled register is LR - the Prolog/Epilog Inserted calls the target specific ARMFrameLowering::processFunctionBeforeCalleeSavedScan - at this point the we need to fix the fact that SP would not be EABI compliant because only 1 register has been spilled so far - inside processFunctionBeforeCalleeSavedScan, there a check: if TargetAlign == 8 && (NumGPRSpills & 1) then we align the stack by pushing one of the _unspilled_ registers. The problem is that this "unspilled" register list starts with r11 (for reasons I'm not clear about), so I decided to skip all the registers r8-r15. The part I'm not sure about is: what if r4-r7 are all spilled already, is the SP going to be aligned by some other routine later on (maybe by adding a "sub sp, sp, #4")? Any comment? Thanks, Andrea diff -aruN llvm-source/lib/Target/ARM/ARMFrameLowering.cpp llvm-source-new/lib/Target/ARM/ARMFrameLowering.cpp --- llvm-source/lib/Target/ARM/ARMFrameLowering.cpp 2013-07-29 10:50:07.000000000 -0700 +++ llvm-source-new/lib/Target/ARM/ARMFrameLowering.cpp 2013-10-20 16:33:59.905251552 -0700 @@ -1313,7 +1313,7 @@ for (unsigned i = 0, e = UnspilledCS1GPRs.size(); i != e; ++i) { unsigned Reg = UnspilledCS1GPRs[i]; // Don't spill high register if the function is thumb1 - if (!AFI->isThumb1OnlyFunction() || + if (!AFI->isThumbFunction() || isARMLowRegister(Reg) || Reg == ARM::LR) { MRI.setPhysRegUsed(Reg); if (!MRI.isReserved(Reg)) On Tue, Oct 15, 2013 at 10:15 AM, Andrea Mucignat <andrea at nestlabs.com>wrote:> Umesh, > > From Target/ARM/ARMRegisterInfo.td, it looks like FP for Thumb should be > r7 instead of r11. > > // Register classes. > // > // pc == Program Counter > // lr == Link Register > // sp == Stack Pointer > // r12 == ip (scratch) > // r7 == Frame Pointer (thumb-style backtraces) > // r9 == May be reserved as Thread Register > // r11 == Frame Pointer (arm-style backtraces) > // r10 == Stack Limit > > Regardless, I'm compiling with -fomit-frame-pointer, so it should not > really matter, right? > > Andrea > > > > On Tue, Oct 15, 2013 at 8:33 AM, Umesh Kalappa <umesh.kalappa0 at gmail.com>wrote: > >> Hi andrea, >> R11 treated as frame pointer at arm backend , which is fixed again . >> >> Thanks >> Umesh >> >> >> On Tuesday, October 15, 2013, Andrea Mucignat <andrea at nestlabs.com> >> wrote: >> > Umesh, >> > Makes some sort of sense to me, OTOH: >> > If instead of choosing r11 as a "dummy" to align the stack we had >> chosen some other register in the range r0-r7 then we could have emitted >> the PUSH encoding T1 (2 bytes opcode) as opposed to the encoding T2 (which >> is a 4 bytes opcode). >> > A >> > >> > On Tue, Oct 15, 2013 at 2:59 AM, Umesh Kalappa < >> umesh.kalappa0 at gmail.com> wrote: >> > >> > Hi Andrea, >> > >> > That is because the LR is the fixed register as per the >> > >> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf >> > and out_char() function is not the leaf function ,Hence compiler >> > tends to save and restore the LR and the save and restore of >> > register r11 is to align stack for 8 bytes as per ARM EABI. >> > >> > >> > >> > Thanks >> > >> > ~Umesh >> > >> > >> > On Tuesday, October 15, 2013, Umesh Kalappa <umesh.kalappa0 at gmail.com> >> wrote: >> >> ---------- Forwarded message ---------- >> >> From: Andrea Mucignat <andrea at nestlabs.com> >> >> Date: Tue, Oct 15, 2013 at 7:05 AM >> >> Subject: [LLVMdev] Unwanted push/pop on Cortex-M. >> >> To: llvmdev at cs.uiuc.edu >> >> >> >> >> >> Hi, >> >> >> >> I have this code: >> >> >> >> void platform_putchar(int, char); >> >> void out_char( char ch ); >> >> >> >> void out_char( char ch ) >> >> { >> >> platform_putchar (0, ch); >> >> } >> >> >> >> I'm compiling with the following clang invocation: >> >> >> >> $ >> /usr/local/vendor/toolchains/llvm/3.3/armv7m/bin/armv7m-none-eabi-clang >> >> -mcpu=cortex-m4 -mfloat-abi=soft -mthumb -nostdinc -ffreestanding >> >> -ffunction-sections -fdata-sections -fno-exceptions -fno-rtti >> >> -fomit-frame-pointer -momit-leaf-frame-pointer -nostdinc -v -Os -S -o >> >> a.s a.c >> >> vendor-clang version 3.3 (tags/RELEASE_33/final) (based on LLVM 3.3) >> >> Target: armv7m-none--eabi >> >> Thread model: posix >> >> >> "/usr/local/vendor-20130805-b8d59d2/toolchains/llvm/3.3/armv7m/bin/armv7m-none-eabi-clang" >> >> -cc1 -triple thumbv7em-none--eabi -S -disable-free -main-file-name a.c >> >> -mrelocation-model static -fmath-errno -mconstructor-aliases >> >> -target-abi aapcs -target-cpu cortex-m4 -msoft-float -mfloat-abi soft >> >> -target-feature +soft-float -target-feature +soft-float-abi >> >> -target-feature -neon -target-linker-version 2.22 >> >> -momit-leaf-frame-pointer -v -ffunction-sections -fdata-sections >> >> -coverage-file /tmp/a.s -nostdsysteminc -nobuiltininc -resource-dir >> >> >> /usr/local/vendor-20130805-b8d59d2/toolchains/llvm/3.3/armv7m/bin/../lib/clang/3.3 >> >> -Os -fno-dwarf-directory-asm -fdebug-compilation-dir /tmp >> >> -ferror-limit 19 -fmessage-length 207 -ffreestanding -mstackrealign >> >> -fno-rtti -fno-signed-char -fobjc-runtime=gcc >> >> -fobjc-default-synthesize-properties -fdiagnostics-show-option >> >> -fcolor-diagnostics -backend-option -vectorize-loops -o a.s -x c a.c >> >> clang -cc1 version 3.3 based upon LLVM 3.3 default target >> armv7m-none-eabi >> >> #include "..." search starts here: >> >> End of search list. >> >> >> >> $ cat a.s >> >> .syntax unified >> >> .eabi_attribute 6, 10 >> >> .eabi_attribute 9, 2 >> >> .eabi_attribute 10, 5 >> >> .fpu vfpv4 >> >> .eabi_attribute 20, 1 >> >> .eabi_attribute 21, 1 >> >> .eabi_attribute 23, 3 >> >> .eabi_attribute 24, 1 >> >> .eabi_attribute 25, 1 >> >> .eabi_attribute 44, 1 >> >> .file "a.c" >> >> .section .text.out_char,"ax",%progbits >> >> .globl out_char >> >> .align 2 >> >> .type out_char,%function >> >> .code 16 >> >> .thumb_func >> >> out_char: >> >> push.w {r11, lr} >> >> mov r1, r0 >> >> movs r0, #0 >> >> bl platform_putchar >> >> pop.w {r11, pc} >> >> .Ltmp0: >> >> .size out_char, .Ltmp0-out_char >> >> >> >> The one question I have is: >> >> why can't this out_char function be compiled into: >> >> >> >> out_char: >> >> mov r1, r0 >> >> mov r0, #0 >> >> bl platform_putchar >> >> bx lr >> >> >> >> What's the clang/llvm module responsible for generating the prologue >> >> and epilogue for this function? >> >> I looked into CodeGen/PrologEpilogInserter, ARMFrameLowering.cpp >> >> Thumb1FrameLowering.cpp but it's not 100% clear how the code gen >> >> decides what the stack frame look like. >> >> >> >> Here's the -emit-llvm output >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131020/49c8df94/attachment.html>
Renato Golin
2013-Oct-21 11:57 UTC
[LLVMdev] [PATCH] Unwanted r11 in push/pop on Cortex-M.
On 21 October 2013 01:06, Andrea Mucignat <andrea at nestlabs.com> wrote:> The part I didn't like is that push/pop become 4 bytes instructions > because the register that we picked to "align" the stack pointer does not > belong to the low register set (r0-r7). > After a bit of digging, the following patch seems to take care of it. >Hi Andrea, This seems a valid intention, but, since you're in Thumb2 mode, it shouldn't make much of a difference, except for code size. And even for code size, you'd save four bytes per function on the binary, which is normally not enough to warrant a change. I also believe that this was not accidental (though I can't confirm), since using R7 for the stack pointer would split the register bank and reduce the ability, for instance, to use sequential registers to shuffle GPRs into Ds and Qs and vice-versa. That said, if register pressure is low, or if you're not using VFP or NEON, you could use R7 as the stack frame, even on Thumb2. This could be an option, and even enabled by default when in -Oz mode, for instance. The problem is that this "unspilled" register list starts with r11 (for> reasons I'm not clear about) >I guess that's because it's always good to have the frame pointer saved on each stack, to make it easier to unwind the stack while debugging, even with high optimization levels. Someone with more contact with that part of the code can chime in, but that seems pretty intentional, and it makes sense, too.> The part I'm not sure about is: what if r4-r7 are all spilled already, is > the SP going to be aligned by some other routine later on (maybe by adding > a "sub sp, sp, #4")? >I think you'd have to come up with a few tests on each possible problematic case you can think of. If you end up implementing this, I think you'll need a good set of test cases, so that we make sure the behaviour you want doesn't leak to unwanted cases. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131021/67ac93f4/attachment.html>