thr3ads.net - llvm dev - [LLVMdev] question about alignment of structures on the stack (arm 32) [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Alexey Perevalov

2015-Apr-21 15:54 UTC

[LLVMdev] question about alignment of structures on the stack (arm 32)

Hello Tim, thanks for response

----------------------------------------> Date: Mon, 20 Apr 2015 11:45:03 -0700
> Subject: Re: [LLVMdev] question about alignment of structures on the stack
(arm 32)
> From: t.p.northover at gmail.com
> To: alexey.perevalov at hotmail.com
> CC: llvmdev at cs.uiuc.edu
>
> On 20 April 2015 at 11:09, Alexey Perevalov
> <alexey.perevalov at hotmail.com> wrote:
>> And before printf call I see an argument preparation, and one of the
most interesting instruction
>>
>> orr r3, r2, #4 ;for address of range.length
>
> This is certainly odd, and I can't reproduce the behaviour here. Even
> if the stack itself is 8-byte aligned (it's not on iOS), that struct
> would usually only be 4-byte aligned. LLVM shouldn't be using
"orr"
> there.
Yes, you're right, it's odd ).

Sorry I didn't clearly described my environment.
I'm using MachO loader (https://github.com/LubosD/darling/). I'm trying
to make it work on ARM.
The scenario is to load MachO binary (e.g. compiled in xCode) that binary is
invoking function from
ELF library which implements libobjc2 and CoreFoundation.

 in MachO on the ARM stack is 4-bytes aligned. Code produced for ELF expects
8-bytes alignment.
So in 50% cases when call made from MachO to ELF stack pointer register contains
not a 8-bytes aligned address.
Even in case of trivial call
NSLog(@"Test string") from MachO
it leads to -[NSString getCharacters:]
------
-(void)getCharacters:(unichar *)unicode {
   NSRange range={0,[self length]};
   [self getCharacters:unicode range:range];
}

------
when "range" is copying by value, and second field of
"range" is evaluated incorrectly,
its address evaluated as address of the structure itself.
because of orr r3, r2, #4,

The minimum example I think is:
#include <stdio.h>

typedef struct
{
    int a;
    char b;
} MyStruct;


int main(void) {
    MyStruct mStruct = {11, 100};
    printf("%p, %p\n", &mStruct.a, &mStruct.b);
    return 0;
}
compile it by clang
----
clang version 3.3 (tags/RELEASE_33/final)
Target: armv7l-unknown-linux-gnueabi
Thread model: posix
-----
And we get following code of assembler language:
main:
    push    {r11, lr}
    mov    r11, sp
    sub    sp, sp, #24
    mov    r0, #0
    str    r0, [r11, #-4]
    add    r1, sp, #8
    movw    r2, :lower16:.Lmain.mStruct
    movt    r2, :upper16:.Lmain.mStruct
    vldr    d16, [r2]
    vstr    d16, [sp, #8]
    orr    r2, r1, #4
    movw    r3, :lower16:.L.str
    movt    r3, :upper16:.L.str
    str    r0, [sp, #4]
    mov    r0, r3
    bl    printf
    ldr    r1, [sp, #4]
    str    r0, [sp]
    mov    r0, r1
    mov    sp, r11
    pop    {r11, pc}

r2 populates by r1 plus 4 (but plus here is optimized). I think you know it
better than me ;)
And if address of mStruct mod 4 = 0 and != mod 8, I got r2 the same as r1.

Due I can't modify MachO binaries, I'm looking for a way to avoid orr
and use add instruction here.
Maybe it will not solve all of my problems due difference in ABI, I suggest
it's the easiest way.
I found -mstack-alignment= options, and I tried 4 value there for ELF build, but
orr still used. BTW for x86_64 it worked, both on linux and mac.

Another way, I think, it's make realignment inside all of ELF function, here
could be a performance penalty, I tried -mstackrealign, but it wasn't lead
to 8-bytes aligned stack, I mean sp wasn't aligned to 0x.....0/8,
as well as address of structure on the stack. Also I tried -mstrict-align.
So I assume, somewhere should be patches for llvm, which could do it )

I not yet tested some __attribute__((pcs("aapcs")))/-target-abi, maybe
there is magic pcs attribute, and I could apply it for dangerous function, but I
would prefer to solve that problem in general.
>
> Do you have a self-contained example (code, compiler version & command
> line flags)?
>
> Cheers.
>
> Tim.
Best regards,

Alexey

Tim Northover

2015-Apr-21 16:15 UTC

head link

[LLVMdev] question about alignment of structures on the stack (arm 32)

> I'm using MachO loader (https://github.com/LubosD/darling/). I'm
trying to make it work on ARM.
> The scenario is to load MachO binary (e.g. compiled in xCode) that binary
is invoking function from
> ELF library which implements libobjc2 and CoreFoundation.
>
>  in MachO on the ARM stack is 4-bytes aligned. Code produced for ELF
expects 8-bytes alignment.
> So in 50% cases when call made from MachO to ELF stack pointer register
contains not a 8-bytes aligned address.
Ah, that could do it. I see that LLVM does indeed make use of stack
alignment in this case. Regardless, this approach is going to go
really badly.

By default almost all ELF platforms use an ABI called AAPCS (either
hard or soft float). iOS uses an older ABI called APCS. You can't mix
code from these two worlds in any kind of non-trivial case without a
translation layer.

You've discovered one issue: AAPCS requires 8-byte alignment for sp,
APCS only requires 4. It's the first of many without a more thorough
approach to the interface between the two.
> I not yet tested some __attribute__((pcs("aapcs")))/-target-abi,
maybe there is magic pcs attribute, and I could apply it for dangerous function,
but I would prefer to solve that problem in general.
I don't think so; __attribute__((pcs("apcs"))) might work, if it
existed. But it doesn't. You might find it's fairly easy to add it in
Clang, but I worry about the assumptions being made in the backend.

Either way, I'd recommend against trying to hack just this one stack
alignment issue.

Tim.

Alexey Perevalov

2015-Apr-23 13:03 UTC

head link

[LLVMdev] question about alignment of structures on the stack (arm 32)

----------------------------------------> Date: Tue, 21 Apr 2015 09:15:02 -0700
> Subject: Re: [LLVMdev] question about alignment of structures on the stack
(arm 32)
> From: t.p.northover at gmail.com
> To: alexey.perevalov at hotmail.com
> CC: llvmdev at cs.uiuc.edu
>
>> I'm using MachO loader (https://github.com/LubosD/darling/).
I'm trying to make it work on ARM.
>> The scenario is to load MachO binary (e.g. compiled in xCode) that
binary is invoking function from
>> ELF library which implements libobjc2 and CoreFoundation.
>>
>> in MachO on the ARM stack is 4-bytes aligned. Code produced for ELF
expects 8-bytes alignment.
>> So in 50% cases when call made from MachO to ELF stack pointer register
contains not a 8-bytes aligned address.
>
> Ah, that could do it. I see that LLVM does indeed make use of stack
> alignment in this case. Regardless, this approach is going to go
> really badly.

>
> By default almost all ELF platforms use an ABI called AAPCS (either
> hard or soft float). iOS uses an older ABI called APCS. You can't mix
> code from these two worlds in any kind of non-trivial case without a
> translation layer.Do you mean translation layer in loader. If so, loader could replace any ELF
invocation by stub function invocation, stub will adjust stack and so on, but
stub in this case should know invoking function signature, otherwise
arguments on stack could be missed, I think it's compiler responsibility.
Or you meant something like virtual machines?

>
> You've discovered one issue: AAPCS requires 8-byte alignment for sp,
> APCS only requires 4. It's the first of many without a more thorough
> approach to the interface between the two.I have seen
https://developer.apple.com/library/ios/documentation/Xcode/Conceptual/iPhoneOSABIReference/Articles/ARMv6FunctionCallingConventions.html,
document says ARMv6 call convention is similar to ARMv7, and document refers to
AAPCS, but also describes some discrepencies, like        The stack is 4-byte
aligned at the point of function calls.              I faced here with bugs, due
stack alignment, but as I wrote before, I think realignment or removing orr and
use add instead could solve it.        Large data types (larger than 4 bytes)
are 4-byte aligned.              I didn't yet test this case, but I think
here could be the same pitfalls like with orr r0, r0, 4
        Register R7 is used as a frame pointer              If I truly
understood it's for debug purpose only, but disasmly of my
CoreFoundation(ELF) shows r7 usage. Frame pointer on my system is r11.
        Register R9 has special usage              Document says r9 could be
used since iOS 3.0, and I found a usage in my CoreFoundation. So I don't
think it could be a problem.


>
>> I not yet tested some
__attribute__((pcs("aapcs")))/-target-abi, maybe there is magic pcs
attribute, and I could apply it for dangerous function, but I would prefer to
solve that problem in general.
>
> I don't think so; __attribute__((pcs("apcs"))) might work, if
it
> existed. But it doesn't. You might find it's fairly easy to add it
in
> Clang, but I worry about the assumptions being made in the backend.
>
I tried -mstack-alignment=8 -mstackrealign for x86_64 and I found it working
for example
-    subq    $32, %rsp
+    andq    $-8, %rsp
+    subq    $24, %rsp
...
and of course modified function body,
but for arm nothing happened. I tried to understand what goes wrong in llvm, but
too many layers of abstractions.
Maybe that code exists, but condition from ARMBaseRegisterInfo::canRealignStack
prevent its generation.


BTW, I build llvm/clang 3.6 (it was impossible to build  latest version from
HEAD ) and something changed ;)
-    str    r1, [sp, #20]
-    str    r2, [sp, #16]
-    add    r1, sp, #16
-    orr    r2, r1, #4
+    str    r1, [sp, #16]
+    str    r2, [sp, #12]
+    add    r1, sp, #12
+    add    r2, r1, #4
add instead of orr. Unfortunately, I didn't yet put 36 clang into my chroot
to build (I'm not using cross compilation).
But if somebody could point me to proper source code or name the patch, I'll
be very appreciate.



> Either way, I'd recommend against trying to hack just this one stack
> alignment issue.
>
> Tim.

llvm dev - Apr 2015 - [LLVMdev] question about alignment of structures on the stack (arm 32)

[LLVMdev] question about alignment of structures on the stack (arm 32)

[LLVMdev] question about alignment of structures on the stack (arm 32)

[LLVMdev] question about alignment of structures on the stack (arm 32)