Alexander Mitin via llvm-dev
2019-Jul-17 19:29 UTC
[llvm-dev] Custom calling convention & ARM target
Hi Tim, Thank you for your reply. Actually, I already played with various target triples including what sys::getProcessTriple() returns when I tried to compile it on a Raspberry Pi 3 device. Yes, changing the triple to armv7-unknown-linux-gnueabi changes the emitted return instruction to 'bx lr'. But this is not the issue. Let me describe it based on an example I prepared to demonstrate the problem. Currently, LLVM contains GHC calling convention (aka cc10) and this CC is very similar to what we are trying to implement. The difference is that our CC has a simpler argument specification (only pointers) and could have a prologue/epilogue. I wrote a simple example which is a sort of interpreter implemented as a threaded code. The C version of it is here: https://gist.github.com/amitin/7df4fbb806c0b48eb5bcaf614e5d93cd#file-test-c There are three handler functions which invoke each other sequentially (the order doesn't matter actually). A starter function initializes and runs handler functions. It runs until a terminator function is encountered which returns the execution flow to the starter function. Note that the code is really simplified for you to get the idea. However it works and could be compiled using cmake script which I included in the same gist. I compiled it into LLVM IR using clang -S -O3 -emit-llvm --target=armv7-unknown-linux-gnueabihf test.c Then I modified the resulting IR in order to use cc10 for handler functions and simplified it a bit, see https://gist.github.com/amitin/7df4fbb806c0b48eb5bcaf614e5d93cd#file-test-ll Next, I compiled it into asm file using llc test.ll command https://gist.github.com/amitin/7df4fbb806c0b48eb5bcaf614e5d93cd#file-test-s LLVM is really good at tail call elimination so it replaced all calls between handlers to just branches. Now getting back to the problem, note that the handlers call to 'puts' so LR register gets changed. See https://gist.github.com/amitin/7df4fbb806c0b48eb5bcaf614e5d93cd#file-test-s-L31 Thus, when the execution flow reaches 'terminatorFunc' it will branch to an unknown location. See https://gist.github.com/amitin/7df4fbb806c0b48eb5bcaf614e5d93cd#file-test-s-L73 The same IR code works fine for x86_64. You can verify it by changing the triple to x86_64 (uncomment the test.ll:5 line and comment test.ll:4 then compile it with llc). I think I have to mention that I tried all above using LLVM v.8.0. Thank you kindly for any insights you can provide. On Tue, Jul 16, 2019 at 9:20 PM Tim Northover <t.p.northover at gmail.com> wrote:> > On Tue, 16 Jul 2019 at 18:54, Alexander Mitin via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > For ARM we are having troubles with the LR register. > > The problem is that when there is a return from a function using our CC > > the existing LLVM machinery emits 'mov pc, lr' instruction which looks fine. > > It's actually pretty suspicious. You'd only realistically use that > sequence on a very, very old CPU which puts you deep into barely > tested territory on LLVM. I'd look into setting your triple and target > to something more recent. > > Triple should probably be "arm-linux-gnueabi" at least, or maybe > "arm-none-eabi" if you're targeting bare metal; the CPU would probably > be OK at default for either of those, but otherwise would normally be > something implementing at least ARMv6 (arm1176jzf-s in RPi), probably > ARMv7 (something starting with "cortex"). > > > So the question is how to preserve LR register in the best way? My > > current idea is to write a MachineFunctionPass which would add LR > > register spill instruction to stack or some other memory and add LR > > reload instruction on return. > > The backend should preserve LR through to the return instruction > automatically since it's a fundamental part of any calling convention > on ARM. I had a quick look and couldn't even see a way to break it > while tweaking purely calling convention knobs, so I suspect your CPU > issue above is to blame. > > If fixing that doesn't resolve the issue, could you tell us which bits > of the calling convention you have customized for ARM? It'd hopefully > help to narrow down where things might be going wrong, because I'm > pretty perplexed. > > Cheers. > > Tim.Kind Regards, Alexander Mitin
Tim Northover via llvm-dev
2019-Jul-18 06:44 UTC
[llvm-dev] Custom calling convention & ARM target
Hi, On Wed, 17 Jul 2019 at 23:30, Alexander Mitin <amitin at instantiations.com> wrote:> Now getting back to the problem, note that the handlers call to 'puts' so LR > register gets changed. > See https://gist.github.com/amitin/7df4fbb806c0b48eb5bcaf614e5d93cd#file-test-s-L31Sure, but this is completely normal, ARM's BL instruction will *always* change LR. If LLVM wasn't used to dealing with this issue then no code would ever work on ARM. What normally happens is that the call gets marked as clobbering LR which triggers ARMFrameLowering.cpp to save it in the prologue and restore it in the epilogue. You mention above that you've patterned your changes on the GHC convention, which suppresses prologue and epilogue. That's probably where I'd start to look for the problem. But don't trust what's there already: I'm not sure how functional the GHC convention is on ARM, it seems like it could only work if it guaranteed *every* call was a tail call.> The same IR code works fine for x86_64. You can verify it by changing > the triple to x86_64 (uncomment the test.ll:5 line and comment > test.ll:4 then compile it with llc).x86_64 has a completely different call/return sequence that automatically involves the stack so that's not surprising. Cheers. Tim.
Alexander Mitin via llvm-dev
2019-Jul-18 13:35 UTC
[llvm-dev] Custom calling convention & ARM target
Hi.> What normally happens is that the call gets marked as clobbering LR > which triggers ARMFrameLowering.cpp to save it in the prologue and > restore it in the epilogue.Do you mean the callee should save/restore LR? If so, then this is not as good as it could be. Let me clarify things a bit more. We don't want it to be like that: handlerFunc0: ; our CC push lr ; do smth pop lr b handlerFuncX ; branch to the next handler We need it like this: starterFunc: ; cdecl or whatever standard CC mov lr, offset of label1 push lr ; or store lr somewhere else bl handlerFuncX ; or even just b handlerFuncX label1: ; do something on return ; .. terminatorFunc: ; our CC pop lr ; or reload lr from some other memory bx lr Also this code would look good: handlerFunc0: ; our CC ; do smth push lr bl someFuncWithStandardCC pop lr ; do smth else b handlerFuncX ; branch to the next handler I wasn't expecting that it would work out-of-the-box and I don't think it's a bug in LLVM :) I'm just looking for guidance on the best way to implement it.> You mention above that you've patterned your changes on the GHC > convention, which suppresses prologue and epilogue. That's probably > where I'd start to look for the problem. But don't trust what's there > already: I'm not sure how functional the GHC convention is on ARM, it > seems like it could only work if it guaranteed *every* call was a tail > call.For our implementation we don't suppress prologue/epilogue. However, no callee-saved registers allowed to reduce mem-reg operations. This CC is very specific and it was never intended to be used widely. And you are correct that it would only work if it guaranteed every call was a tail call - and that's the idea how the interpreter of our VA Smalltalk virtual machine works. It's much faster than any implementation in C language. Kind Regards, Alexander Mitin