thr3ads.net - llvm dev - [llvm-dev] Possible bug in x86 frame lowering with SSE instructions? [Oct 2020]

If this information is useful, please help other people find it:
Share via:

Jonathan Smith via llvm-dev

2020-Oct-26 22:51 UTC

[llvm-dev] Possible bug in x86 frame lowering with SSE instructions?

Hello, everyone.

I'm looking for some insight into a bug I encountered while testing
some custom IR passes on Solaris (x86) and Linux. I don't know if it's
a bug with the x86 backend or the way the frame is set up by Solaris
-- or if I'm simply doing something I shouldn't be doing. The bug
manifests even if I don't run any of my passes, so I'm certain those
aren't the issue.

Given the following test C code:

    int main(int argc, char **argv) {
      int x[10] = {1,2,3};
      return 0;
    }

I compile it to IR with the following arguments:

  clang --target=i386-sun-solaris -S -emit-llvm -Xclang
-disable-O0-optnone -x c -c array-test.c -o array-test.ll

This yields the following IR:

    target datalayout
"e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
    target triple = "i386-sun-solaris"

    ; Function Attrs: noinline nounwind
    define dso_local i32 @main(i32 %0, i8** %1) #0 {
      %3 = alloca i32, align 4
      %4 = alloca i32, align 4
      %5 = alloca i8**, align 4
      %6 = alloca [10 x i32], align 4
      store i32 0, i32* %3, align 4
      store i32 %0, i32* %4, align 4
      store i8** %1, i8*** %5, align 4
      %7 = bitcast [10 x i32]* %6 to i8*
      call void @llvm.memset.p0i8.i32(i8* align 4 %7, i8 0, i32 40, i1 false)
      %8 = bitcast i8* %7 to [10 x i32]*
      %9 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 0
      store i32 1, i32* %9, align 4
      %10 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 1
      store i32 2, i32* %10, align 4
      %11 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 2
      store i32 3, i32* %11, align 4
      ret i32 0
    }

    ; Function Attrs: argmemonly nounwind willreturn writeonly
    declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8,
i32, i1 immarg) #1

    attributes #0 = { noinline nounwind
"correctly-rounded-divide-sqrt-fp-math"="false"
"disable-tail-calls"="false"
"frame-pointer"="all"
"less-precise-fpmad"="false"
"min-legal-vector-width"="0"
"no-infs-fp-math"="false"
"no-jump-tables"="false"
"no-nans-fp-math"="false"
"no-signed-zeros-fp-math"="false"
"no-trapping-math"="true"
"stack-protector-buffer-size"="8"
"target-cpu"="pentium4"
"target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87"
"unsafe-fp-math"="false"
"use-soft-float"="false" }
    attributes #1 = { argmemonly nounwind willreturn writeonly }

Normally, I would run custom passes at this point via opt. But the
error I'm getting occurs with or without this step.

Without changing anything else, I run this IR through llc with the
following arguments:

    llc --x86-asm-syntax=intel --filetype=asm array-test.ll -o=array-test.s

This results in the following assembly:

            .text
            .intel_syntax noprefix
            .file   "/home/user/code/array-test.ll"
            .globl  main                            # -- Begin function main
            .p2align        4, 0x90
            .type   main, at function
    main:                                   # @main
    # %bb.0:
            push    ebp
            mov     ebp, esp
            sub     esp, 56
            mov     dword ptr [ebp - 4], 0
            xorps   xmm0, xmm0
            movaps  xmmword ptr [ebp - 56], xmm0
            movaps  xmmword ptr [ebp - 40], xmm0
            mov     dword ptr [ebp - 20], 0
            mov     dword ptr [ebp - 24], 0
            mov     dword ptr [ebp - 56], 1
            mov     dword ptr [ebp - 52], 2
            mov     dword ptr [ebp - 48], 3
            xor     eax, eax
            add     esp, 56
            pop     ebp
            ret
    .Lfunc_end0:
            .size   main, .Lfunc_end0-main
                                            # -- End function
            .ident  "clang version 12.0.0
(https://github.com/llvm/llvm-project.git
62dbbcf6d7c67b02fd540a5a1e55c494bf88adea)"
            .section        ".note.GNU-stack","", at
progbits

Other than target being i386-sun-solaris, this is  exact same code
generated in both instances if I target i386-pc-linux-gnu.

If I run this on Linux (Ubuntu 18.04 in this case), there are no
problems. If I run this on Solaris, however, a segfault occurs on the
first `movaps` instruction. I believe the issue is because the stack
is 4-byte aligned on Solaris whereas it's 8-bit aligned on Linux, so
the 56- and 40-byte offsets for the array stores just happen to work
on Linux -- while they end up being 8 bytes off on Solaris.

Running llc with --stackrealign fixes the problem:

    main:                                   # @main
    # %bb.0:
            push    ebp
            mov     ebp, esp
            and     esp, -16
            sub     esp, 64
            mov     dword ptr [esp + 12], 0
            xorps   xmm0, xmm0
            movaps  xmmword ptr [esp + 16], xmm0
            movaps  xmmword ptr [esp + 32], xmm0
            mov     dword ptr [esp + 52], 0
            mov     dword ptr [esp + 48], 0
            mov     dword ptr [esp + 16], 1
            mov     dword ptr [esp + 20], 2
            mov     dword ptr [esp + 24], 3
            xor     eax, eax
            mov     esp, ebp
            pop     ebp
            ret

Running clang with -fomit-frame-pointer also fixes the problem, but I
have no idea why. Adding --stack-alignment=16 does *not* fix the
problem. If I explicitly add the -O0 flag to llc, the
`X86TargetLowering::getOptimalMemOpType()` function doesn't lower the
array stores to `movaps`:

    main:                                   # @main
    # %bb.0:
            push    ebp
            mov     ebp, esp
            push    esi
            sub     esp, 68
            mov     eax, dword ptr [ebp + 12]
            mov     ecx, dword ptr [ebp + 8]
            xor     edx, edx
            mov     dword ptr [ebp - 8], 0
            lea     esi, [ebp - 48]
            mov     dword ptr [esp], esi
            mov     dword ptr [esp + 4], 0
            mov     dword ptr [esp + 8], 40
            mov     dword ptr [ebp - 52], eax       # 4-byte Spill
            mov     dword ptr [ebp - 56], ecx       # 4-byte Spill
            mov     dword ptr [ebp - 60], edx       # 4-byte Spill
            call    memset
            mov     dword ptr [ebp - 48], 1
            mov     dword ptr [ebp - 44], 2
            mov     dword ptr [ebp - 40], 3
            mov     eax, dword ptr [ebp - 60]       # 4-byte Reload
            add     esp, 68
            pop     esi
            pop     ebp
            ret

I've spent the better part of ten hours trying to debug the X86
backend code (and I am, admittedly, not the best at knowing where to
look). I determined the `X86FrameLowering::emitPrologue()` function
will *only* emit the proper offset adjustment if
`X86RegisterInfo::needsStackRealignment()` returns `true`, and the
only thing that seems to force it to return `true` is if
--stackrealign is used (which sets the "stackrealign" function
attribute on `main`).

I don't know if this is truly a bug in the X86 backend (an assumption
about the ABI on Linux vs. Solaris? Maybe? I'm truly guessing...) or
if this is a result of me using -disable-O0-optnone in Clang without
-O0 in llc.

Any insight would be helpful, and thanks for reading my rather verbose message.

Wang, Pengfei via llvm-dev

2020-Oct-27 06:21 UTC

head link

[llvm-dev] Possible bug in x86 frame lowering with SSE instructions?

Hi Jonathan,

It seems the trunk code solves this problem. https://godbolt.org/z/Y1Wdbj
I took a look at the x86 ABI:
https://gitlab.com/x86-psABIs/i386-ABI/-/tree/hjl/x86/1.1#
It says "In other words, the value (%esp + 4) is always a multiple of 16
(32 or 64) when control is transferred to the function entry point."
So if the OS follows the ABI, the ESP's value should always be 0xXXXXXXXC
when enters to a function, and it turns to be 0xXXXXXXX8 after "push
ebp". Which happens to be aligned to 8.

Thanks
Pengfei

-----Original Message-----
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Jonathan
Smith via llvm-dev
Sent: Tuesday, October 27, 2020 6:51 AM
To: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Possible bug in x86 frame lowering with SSE instructions?

Hello, everyone.

I'm looking for some insight into a bug I encountered while testing some
custom IR passes on Solaris (x86) and Linux. I don't know if it's a bug
with the x86 backend or the way the frame is set up by Solaris
-- or if I'm simply doing something I shouldn't be doing. The bug
manifests even if I don't run any of my passes, so I'm certain those
aren't the issue.

Given the following test C code:

    int main(int argc, char **argv) {
      int x[10] = {1,2,3};
      return 0;
    }

I compile it to IR with the following arguments:

  clang --target=i386-sun-solaris -S -emit-llvm -Xclang -disable-O0-optnone -x c
-c array-test.c -o array-test.ll

This yields the following IR:

    target datalayout
"e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
    target triple = "i386-sun-solaris"

    ; Function Attrs: noinline nounwind
    define dso_local i32 @main(i32 %0, i8** %1) #0 {
      %3 = alloca i32, align 4
      %4 = alloca i32, align 4
      %5 = alloca i8**, align 4
      %6 = alloca [10 x i32], align 4
      store i32 0, i32* %3, align 4
      store i32 %0, i32* %4, align 4
      store i8** %1, i8*** %5, align 4
      %7 = bitcast [10 x i32]* %6 to i8*
      call void @llvm.memset.p0i8.i32(i8* align 4 %7, i8 0, i32 40, i1 false)
      %8 = bitcast i8* %7 to [10 x i32]*
      %9 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 0
      store i32 1, i32* %9, align 4
      %10 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 1
      store i32 2, i32* %10, align 4
      %11 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 2
      store i32 3, i32* %11, align 4
      ret i32 0
    }

    ; Function Attrs: argmemonly nounwind willreturn writeonly
    declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8, i32, i1
immarg) #1

    attributes #0 = { noinline nounwind
"correctly-rounded-divide-sqrt-fp-math"="false"
"disable-tail-calls"="false"
"frame-pointer"="all"
"less-precise-fpmad"="false"
"min-legal-vector-width"="0"
"no-infs-fp-math"="false"
"no-jump-tables"="false"
"no-nans-fp-math"="false"
"no-signed-zeros-fp-math"="false"
"no-trapping-math"="true"
"stack-protector-buffer-size"="8"
"target-cpu"="pentium4"
"target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87"
"unsafe-fp-math"="false"
"use-soft-float"="false" }
    attributes #1 = { argmemonly nounwind willreturn writeonly }

Normally, I would run custom passes at this point via opt. But the error I'm
getting occurs with or without this step.

Without changing anything else, I run this IR through llc with the following
arguments:

    llc --x86-asm-syntax=intel --filetype=asm array-test.ll -o=array-test.s

This results in the following assembly:

            .text
            .intel_syntax noprefix
            .file   "/home/user/code/array-test.ll"
            .globl  main                            # -- Begin function main
            .p2align        4, 0x90
            .type   main, at function
    main:                                   # @main
    # %bb.0:
            push    ebp
            mov     ebp, esp
            sub     esp, 56
            mov     dword ptr [ebp - 4], 0
            xorps   xmm0, xmm0
            movaps  xmmword ptr [ebp - 56], xmm0
            movaps  xmmword ptr [ebp - 40], xmm0
            mov     dword ptr [ebp - 20], 0
            mov     dword ptr [ebp - 24], 0
            mov     dword ptr [ebp - 56], 1
            mov     dword ptr [ebp - 52], 2
            mov     dword ptr [ebp - 48], 3
            xor     eax, eax
            add     esp, 56
            pop     ebp
            ret
    .Lfunc_end0:
            .size   main, .Lfunc_end0-main
                                            # -- End function
            .ident  "clang version 12.0.0
(https://github.com/llvm/llvm-project.git
62dbbcf6d7c67b02fd540a5a1e55c494bf88adea)"
            .section        ".note.GNU-stack","", at
progbits

Other than target being i386-sun-solaris, this is  exact same code generated in
both instances if I target i386-pc-linux-gnu.

If I run this on Linux (Ubuntu 18.04 in this case), there are no problems. If I
run this on Solaris, however, a segfault occurs on the first `movaps`
instruction. I believe the issue is because the stack is 4-byte aligned on
Solaris whereas it's 8-bit aligned on Linux, so the 56- and 40-byte offsets
for the array stores just happen to work on Linux -- while they end up being 8
bytes off on Solaris.

Running llc with --stackrealign fixes the problem:

    main:                                   # @main
    # %bb.0:
            push    ebp
            mov     ebp, esp
            and     esp, -16
            sub     esp, 64
            mov     dword ptr [esp + 12], 0
            xorps   xmm0, xmm0
            movaps  xmmword ptr [esp + 16], xmm0
            movaps  xmmword ptr [esp + 32], xmm0
            mov     dword ptr [esp + 52], 0
            mov     dword ptr [esp + 48], 0
            mov     dword ptr [esp + 16], 1
            mov     dword ptr [esp + 20], 2
            mov     dword ptr [esp + 24], 3
            xor     eax, eax
            mov     esp, ebp
            pop     ebp
            ret

Running clang with -fomit-frame-pointer also fixes the problem, but I have no
idea why. Adding --stack-alignment=16 does *not* fix the problem. If I
explicitly add the -O0 flag to llc, the
`X86TargetLowering::getOptimalMemOpType()` function doesn't lower the array
stores to `movaps`:

    main:                                   # @main
    # %bb.0:
            push    ebp
            mov     ebp, esp
            push    esi
            sub     esp, 68
            mov     eax, dword ptr [ebp + 12]
            mov     ecx, dword ptr [ebp + 8]
            xor     edx, edx
            mov     dword ptr [ebp - 8], 0
            lea     esi, [ebp - 48]
            mov     dword ptr [esp], esi
            mov     dword ptr [esp + 4], 0
            mov     dword ptr [esp + 8], 40
            mov     dword ptr [ebp - 52], eax       # 4-byte Spill
            mov     dword ptr [ebp - 56], ecx       # 4-byte Spill
            mov     dword ptr [ebp - 60], edx       # 4-byte Spill
            call    memset
            mov     dword ptr [ebp - 48], 1
            mov     dword ptr [ebp - 44], 2
            mov     dword ptr [ebp - 40], 3
            mov     eax, dword ptr [ebp - 60]       # 4-byte Reload
            add     esp, 68
            pop     esi
            pop     ebp
            ret

I've spent the better part of ten hours trying to debug the X86 backend code
(and I am, admittedly, not the best at knowing where to look). I determined the
`X86FrameLowering::emitPrologue()` function will *only* emit the proper offset
adjustment if `X86RegisterInfo::needsStackRealignment()` returns `true`, and the
only thing that seems to force it to return `true` is if --stackrealign is used
(which sets the "stackrealign" function attribute on `main`).

I don't know if this is truly a bug in the X86 backend (an assumption about
the ABI on Linux vs. Solaris? Maybe? I'm truly guessing...) or if this is a
result of me using -disable-O0-optnone in Clang without
-O0 in llc.

Any insight would be helpful, and thanks for reading my rather verbose message.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Jonathan Smith via llvm-dev

2020-Oct-27 09:52 UTC

head link

[llvm-dev] Possible bug in x86 frame lowering with SSE instructions?

Interesting. Thank you.

I'm still curious to know what commit fixed this problem, although it
sounds like it's also a problem with how Solaris is implementing the
ABI.

I suppose it's time for me to go hunting through commits.

On Tue, Oct 27, 2020 at 2:21 AM Wang, Pengfei <pengfei.wang at intel.com>
wrote:>
> Hi Jonathan,
>
> It seems the trunk code solves this problem. https://godbolt.org/z/Y1Wdbj
> I took a look at the x86 ABI:
https://gitlab.com/x86-psABIs/i386-ABI/-/tree/hjl/x86/1.1#
> It says "In other words, the value (%esp + 4) is always a multiple of
16 (32 or 64) when control is transferred to the function entry point."
> So if the OS follows the ABI, the ESP's value should always be
0xXXXXXXXC when enters to a function, and it turns to be 0xXXXXXXX8 after
"push ebp". Which happens to be aligned to 8.
>
> Thanks
> Pengfei
>
> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
Jonathan Smith via llvm-dev
> Sent: Tuesday, October 27, 2020 6:51 AM
> To: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [llvm-dev] Possible bug in x86 frame lowering with SSE
instructions?
>
> Hello, everyone.
>
> I'm looking for some insight into a bug I encountered while testing
some custom IR passes on Solaris (x86) and Linux. I don't know if it's a
bug with the x86 backend or the way the frame is set up by Solaris
> -- or if I'm simply doing something I shouldn't be doing. The bug
manifests even if I don't run any of my passes, so I'm certain those
aren't the issue.
>
> Given the following test C code:
>
>     int main(int argc, char **argv) {
>       int x[10] = {1,2,3};
>       return 0;
>     }
>
> I compile it to IR with the following arguments:
>
>   clang --target=i386-sun-solaris -S -emit-llvm -Xclang -disable-O0-optnone
-x c -c array-test.c -o array-test.ll
>
> This yields the following IR:
>
>     target datalayout >
"e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
>     target triple = "i386-sun-solaris"
>
>     ; Function Attrs: noinline nounwind
>     define dso_local i32 @main(i32 %0, i8** %1) #0 {
>       %3 = alloca i32, align 4
>       %4 = alloca i32, align 4
>       %5 = alloca i8**, align 4
>       %6 = alloca [10 x i32], align 4
>       store i32 0, i32* %3, align 4
>       store i32 %0, i32* %4, align 4
>       store i8** %1, i8*** %5, align 4
>       %7 = bitcast [10 x i32]* %6 to i8*
>       call void @llvm.memset.p0i8.i32(i8* align 4 %7, i8 0, i32 40, i1
false)
>       %8 = bitcast i8* %7 to [10 x i32]*
>       %9 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 0
>       store i32 1, i32* %9, align 4
>       %10 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 1
>       store i32 2, i32* %10, align 4
>       %11 = getelementptr inbounds [10 x i32], [10 x i32]* %8, i32 0, i32 2
>       store i32 3, i32* %11, align 4
>       ret i32 0
>     }
>
>     ; Function Attrs: argmemonly nounwind willreturn writeonly
>     declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8, i32, i1
immarg) #1
>
>     attributes #0 = { noinline nounwind
> "correctly-rounded-divide-sqrt-fp-math"="false"
> "disable-tail-calls"="false"
"frame-pointer"="all"
> "less-precise-fpmad"="false"
"min-legal-vector-width"="0"
> "no-infs-fp-math"="false"
"no-jump-tables"="false"
> "no-nans-fp-math"="false"
"no-signed-zeros-fp-math"="false"
> "no-trapping-math"="true"
"stack-protector-buffer-size"="8"
> "target-cpu"="pentium4"
> "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87"
> "unsafe-fp-math"="false"
"use-soft-float"="false" }
>     attributes #1 = { argmemonly nounwind willreturn writeonly }
>
> Normally, I would run custom passes at this point via opt. But the error
I'm getting occurs with or without this step.
>
> Without changing anything else, I run this IR through llc with the
following arguments:
>
>     llc --x86-asm-syntax=intel --filetype=asm array-test.ll -o=array-test.s
>
> This results in the following assembly:
>
>             .text
>             .intel_syntax noprefix
>             .file   "/home/user/code/array-test.ll"
>             .globl  main                            # -- Begin function
main
>             .p2align        4, 0x90
>             .type   main, at function
>     main:                                   # @main
>     # %bb.0:
>             push    ebp
>             mov     ebp, esp
>             sub     esp, 56
>             mov     dword ptr [ebp - 4], 0
>             xorps   xmm0, xmm0
>             movaps  xmmword ptr [ebp - 56], xmm0
>             movaps  xmmword ptr [ebp - 40], xmm0
>             mov     dword ptr [ebp - 20], 0
>             mov     dword ptr [ebp - 24], 0
>             mov     dword ptr [ebp - 56], 1
>             mov     dword ptr [ebp - 52], 2
>             mov     dword ptr [ebp - 48], 3
>             xor     eax, eax
>             add     esp, 56
>             pop     ebp
>             ret
>     .Lfunc_end0:
>             .size   main, .Lfunc_end0-main
>                                             # -- End function
>             .ident  "clang version 12.0.0
(https://github.com/llvm/llvm-project.git
> 62dbbcf6d7c67b02fd540a5a1e55c494bf88adea)"
>             .section        ".note.GNU-stack","", at
progbits
>
> Other than target being i386-sun-solaris, this is  exact same code
generated in both instances if I target i386-pc-linux-gnu.
>
> If I run this on Linux (Ubuntu 18.04 in this case), there are no problems.
If I run this on Solaris, however, a segfault occurs on the first `movaps`
instruction. I believe the issue is because the stack is 4-byte aligned on
Solaris whereas it's 8-bit aligned on Linux, so the 56- and 40-byte offsets
for the array stores just happen to work on Linux -- while they end up being 8
bytes off on Solaris.
>
> Running llc with --stackrealign fixes the problem:
>
>     main:                                   # @main
>     # %bb.0:
>             push    ebp
>             mov     ebp, esp
>             and     esp, -16
>             sub     esp, 64
>             mov     dword ptr [esp + 12], 0
>             xorps   xmm0, xmm0
>             movaps  xmmword ptr [esp + 16], xmm0
>             movaps  xmmword ptr [esp + 32], xmm0
>             mov     dword ptr [esp + 52], 0
>             mov     dword ptr [esp + 48], 0
>             mov     dword ptr [esp + 16], 1
>             mov     dword ptr [esp + 20], 2
>             mov     dword ptr [esp + 24], 3
>             xor     eax, eax
>             mov     esp, ebp
>             pop     ebp
>             ret
>
> Running clang with -fomit-frame-pointer also fixes the problem, but I have
no idea why. Adding --stack-alignment=16 does *not* fix the problem. If I
explicitly add the -O0 flag to llc, the
`X86TargetLowering::getOptimalMemOpType()` function doesn't lower the array
stores to `movaps`:
>
>     main:                                   # @main
>     # %bb.0:
>             push    ebp
>             mov     ebp, esp
>             push    esi
>             sub     esp, 68
>             mov     eax, dword ptr [ebp + 12]
>             mov     ecx, dword ptr [ebp + 8]
>             xor     edx, edx
>             mov     dword ptr [ebp - 8], 0
>             lea     esi, [ebp - 48]
>             mov     dword ptr [esp], esi
>             mov     dword ptr [esp + 4], 0
>             mov     dword ptr [esp + 8], 40
>             mov     dword ptr [ebp - 52], eax       # 4-byte Spill
>             mov     dword ptr [ebp - 56], ecx       # 4-byte Spill
>             mov     dword ptr [ebp - 60], edx       # 4-byte Spill
>             call    memset
>             mov     dword ptr [ebp - 48], 1
>             mov     dword ptr [ebp - 44], 2
>             mov     dword ptr [ebp - 40], 3
>             mov     eax, dword ptr [ebp - 60]       # 4-byte Reload
>             add     esp, 68
>             pop     esi
>             pop     ebp
>             ret
>
> I've spent the better part of ten hours trying to debug the X86 backend
code (and I am, admittedly, not the best at knowing where to look). I determined
the `X86FrameLowering::emitPrologue()` function will *only* emit the proper
offset adjustment if `X86RegisterInfo::needsStackRealignment()` returns `true`,
and the only thing that seems to force it to return `true` is if --stackrealign
is used (which sets the "stackrealign" function attribute on `main`).
>
> I don't know if this is truly a bug in the X86 backend (an assumption
about the ABI on Linux vs. Solaris? Maybe? I'm truly guessing...) or if this
is a result of me using -disable-O0-optnone in Clang without
> -O0 in llc.
>
> Any insight would be helpful, and thanks for reading my rather verbose
message.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Oct 2020 - Possible bug in x86 frame lowering with SSE instructions?

[llvm-dev] Possible bug in x86 frame lowering with SSE instructions?

[llvm-dev] Possible bug in x86 frame lowering with SSE instructions?

[llvm-dev] Possible bug in x86 frame lowering with SSE instructions?