Alexey Perevalov
2015-Apr-21 15:54 UTC
[LLVMdev] question about alignment of structures on the stack (arm 32)
Hello Tim, thanks for response ----------------------------------------> Date: Mon, 20 Apr 2015 11:45:03 -0700 > Subject: Re: [LLVMdev] question about alignment of structures on the stack (arm 32) > From: t.p.northover at gmail.com > To: alexey.perevalov at hotmail.com > CC: llvmdev at cs.uiuc.edu > > On 20 April 2015 at 11:09, Alexey Perevalov > <alexey.perevalov at hotmail.com> wrote: >> And before printf call I see an argument preparation, and one of the most interesting instruction >> >> orr r3, r2, #4 ;for address of range.length > > This is certainly odd, and I can't reproduce the behaviour here. Even > if the stack itself is 8-byte aligned (it's not on iOS), that struct > would usually only be 4-byte aligned. LLVM shouldn't be using "orr" > there.Yes, you're right, it's odd ). Sorry I didn't clearly described my environment. I'm using MachO loader (https://github.com/LubosD/darling/). I'm trying to make it work on ARM. The scenario is to load MachO binary (e.g. compiled in xCode) that binary is invoking function from ELF library which implements libobjc2 and CoreFoundation. in MachO on the ARM stack is 4-bytes aligned. Code produced for ELF expects 8-bytes alignment. So in 50% cases when call made from MachO to ELF stack pointer register contains not a 8-bytes aligned address. Even in case of trivial call NSLog(@"Test string") from MachO it leads to -[NSString getCharacters:] ------ -(void)getCharacters:(unichar *)unicode { NSRange range={0,[self length]}; [self getCharacters:unicode range:range]; } ------ when "range" is copying by value, and second field of "range" is evaluated incorrectly, its address evaluated as address of the structure itself. because of orr r3, r2, #4, The minimum example I think is: #include <stdio.h> typedef struct { int a; char b; } MyStruct; int main(void) { MyStruct mStruct = {11, 100}; printf("%p, %p\n", &mStruct.a, &mStruct.b); return 0; } compile it by clang ---- clang version 3.3 (tags/RELEASE_33/final) Target: armv7l-unknown-linux-gnueabi Thread model: posix ----- And we get following code of assembler language: main: push {r11, lr} mov r11, sp sub sp, sp, #24 mov r0, #0 str r0, [r11, #-4] add r1, sp, #8 movw r2, :lower16:.Lmain.mStruct movt r2, :upper16:.Lmain.mStruct vldr d16, [r2] vstr d16, [sp, #8] orr r2, r1, #4 movw r3, :lower16:.L.str movt r3, :upper16:.L.str str r0, [sp, #4] mov r0, r3 bl printf ldr r1, [sp, #4] str r0, [sp] mov r0, r1 mov sp, r11 pop {r11, pc} r2 populates by r1 plus 4 (but plus here is optimized). I think you know it better than me ;) And if address of mStruct mod 4 = 0 and != mod 8, I got r2 the same as r1. Due I can't modify MachO binaries, I'm looking for a way to avoid orr and use add instruction here. Maybe it will not solve all of my problems due difference in ABI, I suggest it's the easiest way. I found -mstack-alignment= options, and I tried 4 value there for ELF build, but orr still used. BTW for x86_64 it worked, both on linux and mac. Another way, I think, it's make realignment inside all of ELF function, here could be a performance penalty, I tried -mstackrealign, but it wasn't lead to 8-bytes aligned stack, I mean sp wasn't aligned to 0x.....0/8, as well as address of structure on the stack. Also I tried -mstrict-align. So I assume, somewhere should be patches for llvm, which could do it ) I not yet tested some __attribute__((pcs("aapcs")))/-target-abi, maybe there is magic pcs attribute, and I could apply it for dangerous function, but I would prefer to solve that problem in general.> > Do you have a self-contained example (code, compiler version & command > line flags)? > > Cheers. > > Tim.Best regards, Alexey
Tim Northover
2015-Apr-21 16:15 UTC
[LLVMdev] question about alignment of structures on the stack (arm 32)
> I'm using MachO loader (https://github.com/LubosD/darling/). I'm trying to make it work on ARM. > The scenario is to load MachO binary (e.g. compiled in xCode) that binary is invoking function from > ELF library which implements libobjc2 and CoreFoundation. > > in MachO on the ARM stack is 4-bytes aligned. Code produced for ELF expects 8-bytes alignment. > So in 50% cases when call made from MachO to ELF stack pointer register contains not a 8-bytes aligned address.Ah, that could do it. I see that LLVM does indeed make use of stack alignment in this case. Regardless, this approach is going to go really badly. By default almost all ELF platforms use an ABI called AAPCS (either hard or soft float). iOS uses an older ABI called APCS. You can't mix code from these two worlds in any kind of non-trivial case without a translation layer. You've discovered one issue: AAPCS requires 8-byte alignment for sp, APCS only requires 4. It's the first of many without a more thorough approach to the interface between the two.> I not yet tested some __attribute__((pcs("aapcs")))/-target-abi, maybe there is magic pcs attribute, and I could apply it for dangerous function, but I would prefer to solve that problem in general.I don't think so; __attribute__((pcs("apcs"))) might work, if it existed. But it doesn't. You might find it's fairly easy to add it in Clang, but I worry about the assumptions being made in the backend. Either way, I'd recommend against trying to hack just this one stack alignment issue. Tim.
Alexey Perevalov
2015-Apr-23 13:03 UTC
[LLVMdev] question about alignment of structures on the stack (arm 32)
----------------------------------------> Date: Tue, 21 Apr 2015 09:15:02 -0700 > Subject: Re: [LLVMdev] question about alignment of structures on the stack (arm 32) > From: t.p.northover at gmail.com > To: alexey.perevalov at hotmail.com > CC: llvmdev at cs.uiuc.edu > >> I'm using MachO loader (https://github.com/LubosD/darling/). I'm trying to make it work on ARM. >> The scenario is to load MachO binary (e.g. compiled in xCode) that binary is invoking function from >> ELF library which implements libobjc2 and CoreFoundation. >> >> in MachO on the ARM stack is 4-bytes aligned. Code produced for ELF expects 8-bytes alignment. >> So in 50% cases when call made from MachO to ELF stack pointer register contains not a 8-bytes aligned address. > > Ah, that could do it. I see that LLVM does indeed make use of stack > alignment in this case. Regardless, this approach is going to go > really badly.> > By default almost all ELF platforms use an ABI called AAPCS (either > hard or soft float). iOS uses an older ABI called APCS. You can't mix > code from these two worlds in any kind of non-trivial case without a > translation layer.Do you mean translation layer in loader. If so, loader could replace any ELF invocation by stub function invocation, stub will adjust stack and so on, but stub in this case should know invoking function signature, otherwise arguments on stack could be missed, I think it's compiler responsibility. Or you meant something like virtual machines?> > You've discovered one issue: AAPCS requires 8-byte alignment for sp, > APCS only requires 4. It's the first of many without a more thorough > approach to the interface between the two.I have seen https://developer.apple.com/library/ios/documentation/Xcode/Conceptual/iPhoneOSABIReference/Articles/ARMv6FunctionCallingConventions.html, document says ARMv6 call convention is similar to ARMv7, and document refers to AAPCS, but also describes some discrepencies, like The stack is 4-byte aligned at the point of function calls. I faced here with bugs, due stack alignment, but as I wrote before, I think realignment or removing orr and use add instead could solve it. Large data types (larger than 4 bytes) are 4-byte aligned. I didn't yet test this case, but I think here could be the same pitfalls like with orr r0, r0, 4 Register R7 is used as a frame pointer If I truly understood it's for debug purpose only, but disasmly of my CoreFoundation(ELF) shows r7 usage. Frame pointer on my system is r11. Register R9 has special usage Document says r9 could be used since iOS 3.0, and I found a usage in my CoreFoundation. So I don't think it could be a problem.> >> I not yet tested some __attribute__((pcs("aapcs")))/-target-abi, maybe there is magic pcs attribute, and I could apply it for dangerous function, but I would prefer to solve that problem in general. > > I don't think so; __attribute__((pcs("apcs"))) might work, if it > existed. But it doesn't. You might find it's fairly easy to add it in > Clang, but I worry about the assumptions being made in the backend. >I tried -mstack-alignment=8 -mstackrealign for x86_64 and I found it working for example - subq $32, %rsp + andq $-8, %rsp + subq $24, %rsp ... and of course modified function body, but for arm nothing happened. I tried to understand what goes wrong in llvm, but too many layers of abstractions. Maybe that code exists, but condition from ARMBaseRegisterInfo::canRealignStack prevent its generation. BTW, I build llvm/clang 3.6 (it was impossible to build latest version from HEAD ) and something changed ;) - str r1, [sp, #20] - str r2, [sp, #16] - add r1, sp, #16 - orr r2, r1, #4 + str r1, [sp, #16] + str r2, [sp, #12] + add r1, sp, #12 + add r2, r1, #4 add instead of orr. Unfortunately, I didn't yet put 36 clang into my chroot to build (I'm not using cross compilation). But if somebody could point me to proper source code or name the patch, I'll be very appreciate.> Either way, I'd recommend against trying to hack just this one stack > alignment issue. > > Tim.
Apparently Analagous Threads
- [LLVMdev] question about alignment of structures on the stack (arm 32)
- [LLVMdev] question about alignment of structures on the stack (arm 32)
- [LLVMdev] question about alignment of structures on the stack (arm 32)
- [LLVMdev] question about alignment of structures on the stack (arm 32)
- Optimised qmf_synth and iir_mem16