Moritz Angermann via llvm-dev
2017-Dec-01  10:30 UTC
[llvm-dev] Some strange i64 behavior with arm 32bit. (Raspberry Pi)
Hi Tim, thanks for the swift response! @debug is defined in the same module, which makes this all the more confusing. The target information from the working example are: target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" target triple = "armv6kz--linux-gnueabihf" from the ghc produced module: target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" target triple = "arm-unknown-linux-gnueabihf" However there ones more thing, I could think of, arm does allow mixed mode I believe. And as such as the code from the ghc produced module is called from outside of the module, could the endianness be set there prior to entering the function? The working module contains the main directly and is not called from a main function in a different module. I've also tried to define a regular c function with the same code and called that from within the ghccc function with the same (incorrect) results. Any further ideas I could expore? Cheers, Moritz> On Dec 1, 2017, at 4:26 PM, Tim Northover <t.p.northover at gmail.com> wrote: > > Hi Moritz, >> If someone could offer some hint, where to look further for debugging this, I'd very much appreciate the advice! >> I'm a bit lost right now how to figure out why I end up getting swapped words. > > If one file was compiled for big-endian ARM and the other for > little-endian that could be the result. I'm not aware of any other > possible cause and from local tests I don't think the "ghccc" alone > explains the difference. > > So maybe some glitch in how GHC was configured on your system? What's > the triple at the top of the GHC module and the module containing the > definition of @debug? > > Cheers. > > Tim.
Moritz Angermann via llvm-dev
2017-Dec-03  07:26 UTC
[llvm-dev] Some strange i64 behavior with arm 32bit. (Raspberry Pi)
Alright, so after some more debugging (injeting print statements at the llvm ir
level),
I came across the following:
GHC has the following code for the C into STG and back bridge: `RunStg`, which
is defined
in https://github.com/ghc/ghc/blob/master/rts/StgCRun.c; the resulting llvm ir
ends up being:
```
; Function Attrs: nounwind
define hidden %struct.StgRegTable* @StgRun(i8* ()* ()*, %struct.StgRegTable*)
local_unnamed_addr #0 {
  
  %3 = tail call %struct.StgRegTable* asm sideeffect "stmfd sp!, {r4-r11,
ip, lr}\0A\09vstmdb sp!, {d8-d11}\0A\09sub sp, sp, $3\0A\09mov r4, $2\0A\09bx
$1\0A\09.globl StgReturn\0A\09.type StgReturn, %function\0AStgReturn:\0A\09add
sp, sp, $3\0A\09mov $0, r7\0A\09vldmia sp!, {d8-d11}\0A\09ldmfd sp!, {r4-r11,
ip, lr}\0A\09",
"=r,r,r,i,~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r12},~{lr}"(i8*
()* ()* %0, %struct.StgRegTable* %1, i32 8192) #1, !srcloc !3
  ret %struct.StgRegTable* %3
}
```
The assembly for better readability reads:
  stmfd sp!, {r4-r11, ip, lr}
  vstmdb sp!, {d8-d11}
  sub sp, sp, $3
  mov r4, $2
  bx $1
.globl StgReturn
.type StgReturn, %function
StgReturn:
  add sp, sp, $3
  mov $0, r7
  vldmia sp!, {d8-d11}
  ldmfd sp!, {r4-r11, ip, lr}
And when this results in the following assembly being emitted (for
armv-unknown-linux-gnueabihf):
```
00000074 <StgRun>:
  74:   e92d4ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
  78:   e28db01c        add     fp, sp, #28, 0
  7c:   e92d5ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
  80:   ed2d8b08        vpush   {d8-d11}
  84:   e24dda02        sub     sp, sp, #8192   ; 0x2000
  88:   e1a04001        mov     r4, r1
  8c:   e12fff10        bx      r0
00000090 <StgReturn>:
  90:   e28dda02        add     sp, sp, #8192   ; 0x2000
  94:   e1a00007        mov     r0, r7
  98:   ecbd8b08        vpop    {d8-d11}
  9c:   e8bd5ff0        pop     {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
  a0:   e8bd8ff0        pop     {r4, r5, r6, r7, r8, r9, sl, fp, pc}
```
By adding extra ptinf statements, I found out that adding a `printf` statement
after the assembly and before
the `ret`, the generated code looks slightly different:
```
00000074 <StgRun>:
  74:   e92d4ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
  78:   e28db01c        add     fp, sp, #28, 0
  7c:   e24dd004        sub     sp, sp, #4, 0
  80:   e92d5ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
  84:   ed2d8b08        vpush   {d8-d11}
  88:   e24dda02        sub     sp, sp, #8192   ; 0x2000
  8c:   e1a04001        mov     r4, r1
  90:   e12fff10        bx      r0
00000094 <StgReturn>:
  94:   e28dda02        add     sp, sp, #8192   ; 0x2000
  98:   e1a00007        mov     r0, r7
  9c:   ecbd8b08        vpop    {d8-d11}
  a0:   e8bd5ff0        pop     {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
  a4:   e58d0000        str     r0, [sp]
  a8:   e3a00002        mov     r0, #2, 0
  ac:   ebfffffe        bl      44 <.LdebugEnd>
  b0:   e59d0000        ldr     r0, [sp]
  b4:   e24bd01c        sub     sp, fp, #28, 0
  b8:   e8bd8ff0        pop     {r4, r5, r6, r7, r8, r9, sl, fp, pc}
```
and we can see that an additional `sp = sp - 4` was added.
With the log statement in StgRun, subsequent log statements so far work.
Now I wonder
  a) could I write this logic in llvm ir directly,
     without having to resort to assembly?
  b) could I force llvm to emit 32 instead of 28 somehow? to make sure
     my sp is 8byte aligned?
Of course I'm happy to take any other ideas as well.
Cheers,
 Moritz
> On Dec 1, 2017, at 6:30 PM, Moritz Angermann <moritz.angermann at
gmail.com> wrote:
> 
> Hi Tim,
> thanks for the swift response!
> 
> @debug is defined in the same module, which makes this all the more
confusing.
> 
> 
> The target information from the working example are:
> target datalayout =
"e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
> target triple = "armv6kz--linux-gnueabihf"
> 
> 
> from the ghc produced module:
> target datalayout =
"e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
> target triple = "arm-unknown-linux-gnueabihf"
> 
> However there ones more thing, I could think of, arm does allow mixed mode
> I believe. And as such as the code from the ghc produced module is called
> from outside of the module, could the endianness be set there prior to
> entering the function?
> 
> The working module contains the main directly and is not called from a main
> function in a different module.
> 
> I've also tried to define a regular c function with the same code and
called
> that from within the ghccc function with the same (incorrect) results.
> 
> Any further ideas I could expore?
> 
> 
> Cheers,
> Moritz
> 
>> On Dec 1, 2017, at 4:26 PM, Tim Northover <t.p.northover at
gmail.com> wrote:
>> 
>> Hi Moritz,
>>> If someone could offer some hint, where to look further for
debugging this, I'd very much appreciate the advice!
>>> I'm a bit lost right now how to figure out why I end up getting
swapped words.
>> 
>> If one file was compiled for big-endian ARM and the other for
>> little-endian that could be the result. I'm not aware of any other
>> possible cause and from local tests I don't think the
"ghccc" alone
>> explains the difference.
>> 
>> So maybe some glitch in how GHC was configured on your system?
What's
>> the triple at the top of the GHC module and the module containing the
>> definition of @debug?
>> 
>> Cheers.
>> 
>> Tim.
>
Moritz Angermann via llvm-dev
2017-Dec-03  13:52 UTC
[llvm-dev] Some strange i64 behavior with arm 32bit. (Raspberry Pi)
Ok...
after some more digging it turned out that the underlying issue was a bug in my
code generator. For the record I'll just note down the issue.
My code generator generated /unpacked/ structs for simplicity reasons, and
because
I though--incorrectly--that we (GHC) generated GEP accessors.  We don't! 
GHC
computes absolute offsets into those structs, as such generating /unpacked/
structs (e.g. { i32, i64 }, does not guarantee that the i64 is at offset +4;
there
might be padding) is futile and all I needed to change was to generate packed
instead of unpacked structs.
However I still believe that the code gen for the C to STG bridge should add an
`sub sp, sp, 4` line to the inline assembly *if* it emits the `vstmdb sp!,
{d8-d11}`
part, to ensure that the stack is 8byte aligned.
Thank you.
Cheers,
 Moritz
> On Dec 3, 2017, at 3:26 PM, Moritz Angermann <moritz.angermann at
gmail.com> wrote:
> 
> Alright, so after some more debugging (injeting print statements at the
llvm ir level),
> I came across the following:
> 
> GHC has the following code for the C into STG and back bridge: `RunStg`,
which is defined
> in https://github.com/ghc/ghc/blob/master/rts/StgCRun.c; the resulting llvm
ir ends up being:
> 
> ```
> ; Function Attrs: nounwind
> define hidden %struct.StgRegTable* @StgRun(i8* ()* ()*,
%struct.StgRegTable*) local_unnamed_addr #0 {
> 
>  %3 = tail call %struct.StgRegTable* asm sideeffect "stmfd sp!,
{r4-r11, ip, lr}\0A\09vstmdb sp!, {d8-d11}\0A\09sub sp, sp, $3\0A\09mov r4,
$2\0A\09bx $1\0A\09.globl StgReturn\0A\09.type StgReturn,
%function\0AStgReturn:\0A\09add sp, sp, $3\0A\09mov $0, r7\0A\09vldmia sp!,
{d8-d11}\0A\09ldmfd sp!, {r4-r11, ip, lr}\0A\09",
"=r,r,r,i,~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r12},~{lr}"(i8*
()* ()* %0, %struct.StgRegTable* %1, i32 8192) #1, !srcloc !3
> 
>  ret %struct.StgRegTable* %3
> }
> ```
> 
> The assembly for better readability reads:
> 
>  stmfd sp!, {r4-r11, ip, lr}
>  vstmdb sp!, {d8-d11}
>  sub sp, sp, $3
>  mov r4, $2
>  bx $1
> .globl StgReturn
> .type StgReturn, %function
> StgReturn:
>  add sp, sp, $3
>  mov $0, r7
>  vldmia sp!, {d8-d11}
>  ldmfd sp!, {r4-r11, ip, lr}
> 
> And when this results in the following assembly being emitted (for
armv-unknown-linux-gnueabihf):
> 
> ```
> 00000074 <StgRun>:
>  74:   e92d4ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
>  78:   e28db01c        add     fp, sp, #28, 0
>  7c:   e92d5ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
>  80:   ed2d8b08        vpush   {d8-d11}
>  84:   e24dda02        sub     sp, sp, #8192   ; 0x2000
>  88:   e1a04001        mov     r4, r1
>  8c:   e12fff10        bx      r0
> 
> 00000090 <StgReturn>:
>  90:   e28dda02        add     sp, sp, #8192   ; 0x2000
>  94:   e1a00007        mov     r0, r7
>  98:   ecbd8b08        vpop    {d8-d11}
>  9c:   e8bd5ff0        pop     {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
>  a0:   e8bd8ff0        pop     {r4, r5, r6, r7, r8, r9, sl, fp, pc}
> ```
> 
> By adding extra ptinf statements, I found out that adding a `printf`
statement after the assembly and before
> the `ret`, the generated code looks slightly different:
> 
> ```
> 00000074 <StgRun>:
>  74:   e92d4ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
>  78:   e28db01c        add     fp, sp, #28, 0
>  7c:   e24dd004        sub     sp, sp, #4, 0
>  80:   e92d5ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
>  84:   ed2d8b08        vpush   {d8-d11}
>  88:   e24dda02        sub     sp, sp, #8192   ; 0x2000
>  8c:   e1a04001        mov     r4, r1
>  90:   e12fff10        bx      r0
> 
> 00000094 <StgReturn>:
>  94:   e28dda02        add     sp, sp, #8192   ; 0x2000
>  98:   e1a00007        mov     r0, r7
>  9c:   ecbd8b08        vpop    {d8-d11}
>  a0:   e8bd5ff0        pop     {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
>  a4:   e58d0000        str     r0, [sp]
>  a8:   e3a00002        mov     r0, #2, 0
>  ac:   ebfffffe        bl      44 <.LdebugEnd>
>  b0:   e59d0000        ldr     r0, [sp]
>  b4:   e24bd01c        sub     sp, fp, #28, 0
>  b8:   e8bd8ff0        pop     {r4, r5, r6, r7, r8, r9, sl, fp, pc}
> ```
> 
> and we can see that an additional `sp = sp - 4` was added.
> 
> With the log statement in StgRun, subsequent log statements so far work.
> 
> Now I wonder
>  a) could I write this logic in llvm ir directly,
>     without having to resort to assembly?
>  b) could I force llvm to emit 32 instead of 28 somehow? to make sure
>     my sp is 8byte aligned?
> 
> Of course I'm happy to take any other ideas as well.
> 
> Cheers,
> Moritz
> 
>> On Dec 1, 2017, at 6:30 PM, Moritz Angermann <moritz.angermann at
gmail.com> wrote:
>> 
>> Hi Tim,
>> thanks for the swift response!
>> 
>> @debug is defined in the same module, which makes this all the more
confusing.
>> 
>> 
>> The target information from the working example are:
>> target datalayout =
"e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
>> target triple = "armv6kz--linux-gnueabihf"
>> 
>> 
>> from the ghc produced module:
>> target datalayout =
"e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
>> target triple = "arm-unknown-linux-gnueabihf"
>> 
>> However there ones more thing, I could think of, arm does allow mixed
mode
>> I believe. And as such as the code from the ghc produced module is
called
>> from outside of the module, could the endianness be set there prior to
>> entering the function?
>> 
>> The working module contains the main directly and is not called from a
main
>> function in a different module.
>> 
>> I've also tried to define a regular c function with the same code
and called
>> that from within the ghccc function with the same (incorrect) results.
>> 
>> Any further ideas I could expore?
>> 
>> 
>> Cheers,
>> Moritz
>> 
>>> On Dec 1, 2017, at 4:26 PM, Tim Northover <t.p.northover at
gmail.com> wrote:
>>> 
>>> Hi Moritz,
>>>> If someone could offer some hint, where to look further for
debugging this, I'd very much appreciate the advice!
>>>> I'm a bit lost right now how to figure out why I end up
getting swapped words.
>>> 
>>> If one file was compiled for big-endian ARM and the other for
>>> little-endian that could be the result. I'm not aware of any
other
>>> possible cause and from local tests I don't think the
"ghccc" alone
>>> explains the difference.
>>> 
>>> So maybe some glitch in how GHC was configured on your system?
What's
>>> the triple at the top of the GHC module and the module containing
the
>>> definition of @debug?
>>> 
>>> Cheers.
>>> 
>>> Tim.
>> 
>