John Kåre Alsaker via llvm-dev
2015-Aug-16 18:25 UTC
[llvm-dev] [LLVMdev] Adding a stack probe function attribute
I started to implement inlining of the stack probe function based on
Microsoft's inlined stack probes in
https://github.com/Microsoft/llvm/tree/MS.
Do we know why the stack pointer cannot be updated in a loop (which
results in ideal code)? I noticed that was commented in Microsoft's
code.
I suspect this is due to debug or unwinding information, since it is
allowed on Windows x86-32.
I ran into two issues while implementing this.
The epilog was inserted into the wrong basic block. This is because
the basic blocks to insert epilogs into is calculated before
emitPrologue is called.
I fixed this by adding the following code after the emitPrologue call
in PEI::insertPrologEpilogCode:
RestoreBlocks.clear();
calculateSets(Fn);
This doesn't seem like a very nice solution. It's also unclear to me
how Microsoft's branch handles this.
The other issue is with code generation. All the tests passed but the
one were segmented stacks are used and stack probes are required.
I don't see what is actually wrong here, and would like some help.
Here is the output:
# After Prologue/Epilogue Insertion & Frame Finalization
# Machine code for function test_large: Post SSA
Frame Objects:
fi#0: size=40000, align=4, at location [SP-40000]
BB#4:
%R11<def> = LEA64r %RSP, 1, %noreg, -40040, %noreg
CMP64rm %R11, %noreg, 1, %noreg, 40, %GS, %EFLAGS<imp-def>
JA_1 <BB#0>, %EFLAGS<imp-use>
Successors according to CFG: BB#3 BB#0
BB#3:
Predecessors according to CFG: BB#4
%R10<def> = MOV64ri 40040
%R11<def> = MOV64ri 32
CALL64pcrel32 <es:__morestack>, %RSP<imp-use>
MORESTACK_RET
Successors according to CFG: BB#0
BB#0: derived from LLVM BB %0
Predecessors according to CFG: BB#3 BB#4
%EAX<def> = MOV32ri 40040; flags: FrameSetup
%RDX<def> = MOV64rr %RAX; flags: FrameSetup
%RCX<def> = MOV64rr %RSP; flags: FrameSetup
Successors according to CFG: BB#1
BB#1: derived from LLVM BB %0
Predecessors according to CFG: BB#0 BB#1
OR64mi8 %RCX, 1, %noreg, 0, %noreg, 0, %EFLAGS<imp-def>;
flags: FrameSetup
%RCX<def,tied1> = SUB64ri32 %RCX<tied0>, 4096,
%EFLAGS<imp-def>; flags: FrameSetup
%RDX<def,tied1> = SUB64ri32 %RDX<tied0>, 4096,
%EFLAGS<imp-def>; flags: FrameSetup
JAE_1 <BB#1>, %EFLAGS<imp-use>; flags: FrameSetup
Successors according to CFG: BB#2 BB#1
BB#2: derived from LLVM BB %0
Predecessors according to CFG: BB#1
%RSP<def,tied1> = SUB64rr %RSP<tied0>, %RAX,
%EFLAGS<imp-def>;
flags: FrameSetup
SEH_StackAlloc 40040; flags: FrameSetup
SEH_EndPrologue; flags: FrameSetup
%RCX<def> = LEA64r %RSP, 1, %noreg, 40, %noreg
%EDX<def> = MOV32r0 %EFLAGS<imp-def,dead>
CALL64pcrel32 <ga:@dummy_use>, <regmask>,
%RSP<imp-use>,
%RCX<imp-use>, %EDX<imp-use,kill>, %RSP<imp-def>
SEH_Epilogue
%RSP<def,tied1> = ADD64ri32 %RSP<tied0>, 40040,
%EFLAGS<imp-def,dead>
RETQ
# End machine code for function test_large.
*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#1 (0x40b1d70)
- instruction: OR64mi8- operand 0: %RCX
*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#1 (0x40b1d70)
- instruction: %RCX<def,tied1> = SUB64ri32- operand 1: %RCX<tied0>
*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#1 (0x40b1d70)
- instruction: %RDX<def,tied1> = SUB64ri32- operand 1: %RDX<tied0>
*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#2 (0x40b1e20)
- instruction: %RSP<def,tied1> = SUB64rr- operand 2: %RAX
LLVM ERROR: Found 4 machine code errors.
Andy Ayers via llvm-dev
2015-Aug-17 17:46 UTC
[llvm-dev] [LLVMdev] Adding a stack probe function attribute
The stack pointer can't ever safely point beyond the valid range. I
don't 100% know the exact reason but suspect the OS may verify this on
context save/restore, which can happen unpredictably. This can be tricky to
guarantee if you're modifying the stack pointer in flight without some extra
math here and there, at which point you might as well use another register and
defer the update like we do.
I'm also not sure what you mean by updating the stack pointer being
"allowed" for x86-32 -- if you look at the implementation of __chkstk
for x86-32, which we provide in sources with VS, it does not modify ESP in the
loop -- it uses EAX to probe the stack.
[from] c:\Program Files (x86)\Microsoft Visual Studio
12.0\VC\crt\src\intel\chkstk.asm
mov eax, esp ; current TOS
and eax, not ( _PAGESIZE_ - 1) ; Round down to current page boundary
cs10:
cmp ecx, eax ; Is new TOS
jb short cs20 ; in probed page?
mov eax, ecx ; yes.
pop ecx
xchg esp, eax ; update esp
mov eax, dword ptr [eax] ; get return address
mov dword ptr [esp], eax ; and put it at new TOS
ret
; Find next lower page and probe
cs20:
sub eax, _PAGESIZE_ ; decrease by PAGESIZE
test dword ptr [eax],eax ; probe page.
jmp short cs10
Another common pattern which we haven't optimized for yet is to probe and
zero the newly grown stack region.
We have the same issue with epilogs that you saw. We've worked around it for
now by generating epilogs first, but this seems a bit hacky. I'd like to see
us fix this by delaying the inline expansion of a stack probe until after
prolog/epilog generation. That also removes the need for us to update the prolog
block in ::emitProlog, which is a bit clunky. But we haven't gotten around
to this yet.
-----Original Message-----
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of John
Kåre Alsaker via llvm-dev
Sent: Sunday, August 16, 2015 11:26 AM
To: Reid Kleckner <rnk at google.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [LLVMdev] Adding a stack probe function attribute
I started to implement inlining of the stack probe function based on
Microsoft's inlined stack probes in
https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fMicrosoft%2fllvm%2ftree%2fMS.&data=01%7c01%7candya%40microsoft.com%7c7f3665f51a1a45a612c008d2a668163e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=pm63Kdg2E%2bprrwKZKpcGpQNEYjwBaY7%2beZrX8eYfyYo%3d
Do we know why the stack pointer cannot be updated in a loop (which results in
ideal code)? I noticed that was commented in Microsoft's code.
I suspect this is due to debug or unwinding information, since it is allowed on
Windows x86-32.
I ran into two issues while implementing this.
The epilog was inserted into the wrong basic block. This is because the basic
blocks to insert epilogs into is calculated before emitPrologue is called.
I fixed this by adding the following code after the emitPrologue call in
PEI::insertPrologEpilogCode:
RestoreBlocks.clear();
calculateSets(Fn);
This doesn't seem like a very nice solution. It's also unclear to me how
Microsoft's branch handles this.
The other issue is with code generation. All the tests passed but the one were
segmented stacks are used and stack probes are required.
I don't see what is actually wrong here, and would like some help.
Here is the output:
# After Prologue/Epilogue Insertion & Frame Finalization # Machine code for
function test_large: Post SSA Frame Objects:
fi#0: size=40000, align=4, at location [SP-40000]
BB#4:
%R11<def> = LEA64r %RSP, 1, %noreg, -40040, %noreg
CMP64rm %R11, %noreg, 1, %noreg, 40, %GS, %EFLAGS<imp-def>
JA_1 <BB#0>, %EFLAGS<imp-use>
Successors according to CFG: BB#3 BB#0
BB#3:
Predecessors according to CFG: BB#4
%R10<def> = MOV64ri 40040
%R11<def> = MOV64ri 32
CALL64pcrel32 <es:__morestack>, %RSP<imp-use>
MORESTACK_RET
Successors according to CFG: BB#0
BB#0: derived from LLVM BB %0
Predecessors according to CFG: BB#3 BB#4
%EAX<def> = MOV32ri 40040; flags: FrameSetup
%RDX<def> = MOV64rr %RAX; flags: FrameSetup
%RCX<def> = MOV64rr %RSP; flags: FrameSetup
Successors according to CFG: BB#1
BB#1: derived from LLVM BB %0
Predecessors according to CFG: BB#0 BB#1
OR64mi8 %RCX, 1, %noreg, 0, %noreg, 0, %EFLAGS<imp-def>;
flags: FrameSetup
%RCX<def,tied1> = SUB64ri32 %RCX<tied0>, 4096,
%EFLAGS<imp-def>; flags: FrameSetup
%RDX<def,tied1> = SUB64ri32 %RDX<tied0>, 4096,
%EFLAGS<imp-def>; flags: FrameSetup
JAE_1 <BB#1>, %EFLAGS<imp-use>; flags: FrameSetup
Successors according to CFG: BB#2 BB#1
BB#2: derived from LLVM BB %0
Predecessors according to CFG: BB#1
%RSP<def,tied1> = SUB64rr %RSP<tied0>, %RAX,
%EFLAGS<imp-def>;
flags: FrameSetup
SEH_StackAlloc 40040; flags: FrameSetup
SEH_EndPrologue; flags: FrameSetup
%RCX<def> = LEA64r %RSP, 1, %noreg, 40, %noreg
%EDX<def> = MOV32r0 %EFLAGS<imp-def,dead>
CALL64pcrel32 <ga:@dummy_use>, <regmask>,
%RSP<imp-use>, %RCX<imp-use>, %EDX<imp-use,kill>,
%RSP<imp-def>
SEH_Epilogue
%RSP<def,tied1> = ADD64ri32 %RSP<tied0>, 40040,
%EFLAGS<imp-def,dead>
RETQ
# End machine code for function test_large.
*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#1 (0x40b1d70)
- instruction: OR64mi8- operand 0: %RCX
*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#1 (0x40b1d70)
- instruction: %RCX<def,tied1> = SUB64ri32- operand 1: %RCX<tied0>
*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#1 (0x40b1d70)
- instruction: %RDX<def,tied1> = SUB64ri32- operand 1: %RDX<tied0>
*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#2 (0x40b1e20)
- instruction: %RSP<def,tied1> = SUB64rr- operand 2: %RAX
LLVM ERROR: Found 4 machine code errors.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fllvm.cs.uiuc.edu&data=01%7c01%7candya%40microsoft.com%7c7f3665f51a1a45a612c008d2a668163e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=iQjl5g5NZHMOzJ4%2fhFEN6ECfAnFBR5Evud7KBAWO%2b7Y%3d
https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev%0a&data=01%7c01%7candya%40microsoft.com%7c7f3665f51a1a45a612c008d2a668163e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=BiZb%2bhINDhjdc0YshtClHXKkOS10wY7Gjw6LWjzozP0%3d
John Kåre Alsaker via llvm-dev
2015-Aug-30 20:16 UTC
[llvm-dev] [LLVMdev] Adding a stack probe function attribute
I've submited my patch to inline stack probes here: http://reviews.llvm.org/D12483 On Mon, Aug 17, 2015 at 7:46 PM, Andy Ayers <andya at microsoft.com> wrote:> The stack pointer can't ever safely point beyond the valid range. I don't 100% know the exact reason but suspect the OS may verify this on context save/restore, which can happen unpredictably. This can be tricky to guarantee if you're modifying the stack pointer in flight without some extra math here and there, at which point you might as well use another register and defer the update like we do. > > I'm also not sure what you mean by updating the stack pointer being "allowed" for x86-32 -- if you look at the implementation of __chkstk for x86-32, which we provide in sources with VS, it does not modify ESP in the loop -- it uses EAX to probe the stack.I was thinking of the fact that __chkstk modified SP on x86-32, but not on x86-64.> We have the same issue with epilogs that you saw. We've worked around it for now by generating epilogs first, but this seems a bit hacky. I'd like to see us fix this by delaying the inline expansion of a stack probe until after prolog/epilog generation. That also removes the need for us to update the prolog block in ::emitProlog, which is a bit clunky. But we haven't gotten around to this yet.Hm.. I did try just generating epilogs first, but I ran into test failures.