Alexander Mitin via llvm-dev
2019-Jul-16 14:52 UTC
[llvm-dev] Custom calling convention & ARM target
Hello. For our project needs we implemented a custom calling convention. The main goals are to pass function arguments in registers and always use tailcall optimization for calls to functions with our CC when applicable. Function arguments are always pointers and the maximum number of arguments is 5. No frame pointer register is in use for this CC. No varargs. Finally, there are not any callee-saving registers. This approach worked successfully for x86 arch. For ARM we are having troubles with the LR register. The problem is that when there is a return from a function using our CC the existing LLVM machinery emits 'mov pc, lr' instruction which looks fine. The expectation is that we would return to a function which called our CC function for the first time (no tailcall for the first time call). But at the moment the return LR register contains an invalid value because is wasn't preserved. So the question is how to preserve LR register in the best way? My current idea is to write a MachineFunctionPass which would add LR register spill instruction to stack or some other memory and add LR reload instruction on return. Does this seem like a reasonable approach? Thank you for your time and consideration. Kind Regards, Alexander Mitin
Tim Northover via llvm-dev
2019-Jul-16 18:20 UTC
[llvm-dev] Custom calling convention & ARM target
On Tue, 16 Jul 2019 at 18:54, Alexander Mitin via llvm-dev <llvm-dev at lists.llvm.org> wrote:> For ARM we are having troubles with the LR register. > The problem is that when there is a return from a function using our CC > the existing LLVM machinery emits 'mov pc, lr' instruction which looks fine.It's actually pretty suspicious. You'd only realistically use that sequence on a very, very old CPU which puts you deep into barely tested territory on LLVM. I'd look into setting your triple and target to something more recent. Triple should probably be "arm-linux-gnueabi" at least, or maybe "arm-none-eabi" if you're targeting bare metal; the CPU would probably be OK at default for either of those, but otherwise would normally be something implementing at least ARMv6 (arm1176jzf-s in RPi), probably ARMv7 (something starting with "cortex").> So the question is how to preserve LR register in the best way? My > current idea is to write a MachineFunctionPass which would add LR > register spill instruction to stack or some other memory and add LR > reload instruction on return.The backend should preserve LR through to the return instruction automatically since it's a fundamental part of any calling convention on ARM. I had a quick look and couldn't even see a way to break it while tweaking purely calling convention knobs, so I suspect your CPU issue above is to blame. If fixing that doesn't resolve the issue, could you tell us which bits of the calling convention you have customized for ARM? It'd hopefully help to narrow down where things might be going wrong, because I'm pretty perplexed. Cheers. Tim.
Alexander Mitin via llvm-dev
2019-Jul-17 19:29 UTC
[llvm-dev] Custom calling convention & ARM target
Hi Tim, Thank you for your reply. Actually, I already played with various target triples including what sys::getProcessTriple() returns when I tried to compile it on a Raspberry Pi 3 device. Yes, changing the triple to armv7-unknown-linux-gnueabi changes the emitted return instruction to 'bx lr'. But this is not the issue. Let me describe it based on an example I prepared to demonstrate the problem. Currently, LLVM contains GHC calling convention (aka cc10) and this CC is very similar to what we are trying to implement. The difference is that our CC has a simpler argument specification (only pointers) and could have a prologue/epilogue. I wrote a simple example which is a sort of interpreter implemented as a threaded code. The C version of it is here: https://gist.github.com/amitin/7df4fbb806c0b48eb5bcaf614e5d93cd#file-test-c There are three handler functions which invoke each other sequentially (the order doesn't matter actually). A starter function initializes and runs handler functions. It runs until a terminator function is encountered which returns the execution flow to the starter function. Note that the code is really simplified for you to get the idea. However it works and could be compiled using cmake script which I included in the same gist. I compiled it into LLVM IR using clang -S -O3 -emit-llvm --target=armv7-unknown-linux-gnueabihf test.c Then I modified the resulting IR in order to use cc10 for handler functions and simplified it a bit, see https://gist.github.com/amitin/7df4fbb806c0b48eb5bcaf614e5d93cd#file-test-ll Next, I compiled it into asm file using llc test.ll command https://gist.github.com/amitin/7df4fbb806c0b48eb5bcaf614e5d93cd#file-test-s LLVM is really good at tail call elimination so it replaced all calls between handlers to just branches. Now getting back to the problem, note that the handlers call to 'puts' so LR register gets changed. See https://gist.github.com/amitin/7df4fbb806c0b48eb5bcaf614e5d93cd#file-test-s-L31 Thus, when the execution flow reaches 'terminatorFunc' it will branch to an unknown location. See https://gist.github.com/amitin/7df4fbb806c0b48eb5bcaf614e5d93cd#file-test-s-L73 The same IR code works fine for x86_64. You can verify it by changing the triple to x86_64 (uncomment the test.ll:5 line and comment test.ll:4 then compile it with llc). I think I have to mention that I tried all above using LLVM v.8.0. Thank you kindly for any insights you can provide. On Tue, Jul 16, 2019 at 9:20 PM Tim Northover <t.p.northover at gmail.com> wrote:> > On Tue, 16 Jul 2019 at 18:54, Alexander Mitin via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > For ARM we are having troubles with the LR register. > > The problem is that when there is a return from a function using our CC > > the existing LLVM machinery emits 'mov pc, lr' instruction which looks fine. > > It's actually pretty suspicious. You'd only realistically use that > sequence on a very, very old CPU which puts you deep into barely > tested territory on LLVM. I'd look into setting your triple and target > to something more recent. > > Triple should probably be "arm-linux-gnueabi" at least, or maybe > "arm-none-eabi" if you're targeting bare metal; the CPU would probably > be OK at default for either of those, but otherwise would normally be > something implementing at least ARMv6 (arm1176jzf-s in RPi), probably > ARMv7 (something starting with "cortex"). > > > So the question is how to preserve LR register in the best way? My > > current idea is to write a MachineFunctionPass which would add LR > > register spill instruction to stack or some other memory and add LR > > reload instruction on return. > > The backend should preserve LR through to the return instruction > automatically since it's a fundamental part of any calling convention > on ARM. I had a quick look and couldn't even see a way to break it > while tweaking purely calling convention knobs, so I suspect your CPU > issue above is to blame. > > If fixing that doesn't resolve the issue, could you tell us which bits > of the calling convention you have customized for ARM? It'd hopefully > help to narrow down where things might be going wrong, because I'm > pretty perplexed. > > Cheers. > > Tim.Kind Regards, Alexander Mitin