John Kåre Alsaker via llvm-dev
2015-Aug-16 18:25 UTC
[llvm-dev] [LLVMdev] Adding a stack probe function attribute
I started to implement inlining of the stack probe function based on Microsoft's inlined stack probes in https://github.com/Microsoft/llvm/tree/MS. Do we know why the stack pointer cannot be updated in a loop (which results in ideal code)? I noticed that was commented in Microsoft's code. I suspect this is due to debug or unwinding information, since it is allowed on Windows x86-32. I ran into two issues while implementing this. The epilog was inserted into the wrong basic block. This is because the basic blocks to insert epilogs into is calculated before emitPrologue is called. I fixed this by adding the following code after the emitPrologue call in PEI::insertPrologEpilogCode: RestoreBlocks.clear(); calculateSets(Fn); This doesn't seem like a very nice solution. It's also unclear to me how Microsoft's branch handles this. The other issue is with code generation. All the tests passed but the one were segmented stacks are used and stack probes are required. I don't see what is actually wrong here, and would like some help. Here is the output: # After Prologue/Epilogue Insertion & Frame Finalization # Machine code for function test_large: Post SSA Frame Objects: fi#0: size=40000, align=4, at location [SP-40000] BB#4: %R11<def> = LEA64r %RSP, 1, %noreg, -40040, %noreg CMP64rm %R11, %noreg, 1, %noreg, 40, %GS, %EFLAGS<imp-def> JA_1 <BB#0>, %EFLAGS<imp-use> Successors according to CFG: BB#3 BB#0 BB#3: Predecessors according to CFG: BB#4 %R10<def> = MOV64ri 40040 %R11<def> = MOV64ri 32 CALL64pcrel32 <es:__morestack>, %RSP<imp-use> MORESTACK_RET Successors according to CFG: BB#0 BB#0: derived from LLVM BB %0 Predecessors according to CFG: BB#3 BB#4 %EAX<def> = MOV32ri 40040; flags: FrameSetup %RDX<def> = MOV64rr %RAX; flags: FrameSetup %RCX<def> = MOV64rr %RSP; flags: FrameSetup Successors according to CFG: BB#1 BB#1: derived from LLVM BB %0 Predecessors according to CFG: BB#0 BB#1 OR64mi8 %RCX, 1, %noreg, 0, %noreg, 0, %EFLAGS<imp-def>; flags: FrameSetup %RCX<def,tied1> = SUB64ri32 %RCX<tied0>, 4096, %EFLAGS<imp-def>; flags: FrameSetup %RDX<def,tied1> = SUB64ri32 %RDX<tied0>, 4096, %EFLAGS<imp-def>; flags: FrameSetup JAE_1 <BB#1>, %EFLAGS<imp-use>; flags: FrameSetup Successors according to CFG: BB#2 BB#1 BB#2: derived from LLVM BB %0 Predecessors according to CFG: BB#1 %RSP<def,tied1> = SUB64rr %RSP<tied0>, %RAX, %EFLAGS<imp-def>; flags: FrameSetup SEH_StackAlloc 40040; flags: FrameSetup SEH_EndPrologue; flags: FrameSetup %RCX<def> = LEA64r %RSP, 1, %noreg, 40, %noreg %EDX<def> = MOV32r0 %EFLAGS<imp-def,dead> CALL64pcrel32 <ga:@dummy_use>, <regmask>, %RSP<imp-use>, %RCX<imp-use>, %EDX<imp-use,kill>, %RSP<imp-def> SEH_Epilogue %RSP<def,tied1> = ADD64ri32 %RSP<tied0>, 40040, %EFLAGS<imp-def,dead> RETQ # End machine code for function test_large. *** Bad machine code: Using an undefined physical register *** - function: test_large - basic block: BB#1 (0x40b1d70) - instruction: OR64mi8- operand 0: %RCX *** Bad machine code: Using an undefined physical register *** - function: test_large - basic block: BB#1 (0x40b1d70) - instruction: %RCX<def,tied1> = SUB64ri32- operand 1: %RCX<tied0> *** Bad machine code: Using an undefined physical register *** - function: test_large - basic block: BB#1 (0x40b1d70) - instruction: %RDX<def,tied1> = SUB64ri32- operand 1: %RDX<tied0> *** Bad machine code: Using an undefined physical register *** - function: test_large - basic block: BB#2 (0x40b1e20) - instruction: %RSP<def,tied1> = SUB64rr- operand 2: %RAX LLVM ERROR: Found 4 machine code errors.
Andy Ayers via llvm-dev
2015-Aug-17 17:46 UTC
[llvm-dev] [LLVMdev] Adding a stack probe function attribute
The stack pointer can't ever safely point beyond the valid range. I don't 100% know the exact reason but suspect the OS may verify this on context save/restore, which can happen unpredictably. This can be tricky to guarantee if you're modifying the stack pointer in flight without some extra math here and there, at which point you might as well use another register and defer the update like we do. I'm also not sure what you mean by updating the stack pointer being "allowed" for x86-32 -- if you look at the implementation of __chkstk for x86-32, which we provide in sources with VS, it does not modify ESP in the loop -- it uses EAX to probe the stack. [from] c:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\intel\chkstk.asm mov eax, esp ; current TOS and eax, not ( _PAGESIZE_ - 1) ; Round down to current page boundary cs10: cmp ecx, eax ; Is new TOS jb short cs20 ; in probed page? mov eax, ecx ; yes. pop ecx xchg esp, eax ; update esp mov eax, dword ptr [eax] ; get return address mov dword ptr [esp], eax ; and put it at new TOS ret ; Find next lower page and probe cs20: sub eax, _PAGESIZE_ ; decrease by PAGESIZE test dword ptr [eax],eax ; probe page. jmp short cs10 Another common pattern which we haven't optimized for yet is to probe and zero the newly grown stack region. We have the same issue with epilogs that you saw. We've worked around it for now by generating epilogs first, but this seems a bit hacky. I'd like to see us fix this by delaying the inline expansion of a stack probe until after prolog/epilog generation. That also removes the need for us to update the prolog block in ::emitProlog, which is a bit clunky. But we haven't gotten around to this yet. -----Original Message----- From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of John Kåre Alsaker via llvm-dev Sent: Sunday, August 16, 2015 11:26 AM To: Reid Kleckner <rnk at google.com> Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] [LLVMdev] Adding a stack probe function attribute I started to implement inlining of the stack probe function based on Microsoft's inlined stack probes in https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fMicrosoft%2fllvm%2ftree%2fMS.&data=01%7c01%7candya%40microsoft.com%7c7f3665f51a1a45a612c008d2a668163e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=pm63Kdg2E%2bprrwKZKpcGpQNEYjwBaY7%2beZrX8eYfyYo%3d Do we know why the stack pointer cannot be updated in a loop (which results in ideal code)? I noticed that was commented in Microsoft's code. I suspect this is due to debug or unwinding information, since it is allowed on Windows x86-32. I ran into two issues while implementing this. The epilog was inserted into the wrong basic block. This is because the basic blocks to insert epilogs into is calculated before emitPrologue is called. I fixed this by adding the following code after the emitPrologue call in PEI::insertPrologEpilogCode: RestoreBlocks.clear(); calculateSets(Fn); This doesn't seem like a very nice solution. It's also unclear to me how Microsoft's branch handles this. The other issue is with code generation. All the tests passed but the one were segmented stacks are used and stack probes are required. I don't see what is actually wrong here, and would like some help. Here is the output: # After Prologue/Epilogue Insertion & Frame Finalization # Machine code for function test_large: Post SSA Frame Objects: fi#0: size=40000, align=4, at location [SP-40000] BB#4: %R11<def> = LEA64r %RSP, 1, %noreg, -40040, %noreg CMP64rm %R11, %noreg, 1, %noreg, 40, %GS, %EFLAGS<imp-def> JA_1 <BB#0>, %EFLAGS<imp-use> Successors according to CFG: BB#3 BB#0 BB#3: Predecessors according to CFG: BB#4 %R10<def> = MOV64ri 40040 %R11<def> = MOV64ri 32 CALL64pcrel32 <es:__morestack>, %RSP<imp-use> MORESTACK_RET Successors according to CFG: BB#0 BB#0: derived from LLVM BB %0 Predecessors according to CFG: BB#3 BB#4 %EAX<def> = MOV32ri 40040; flags: FrameSetup %RDX<def> = MOV64rr %RAX; flags: FrameSetup %RCX<def> = MOV64rr %RSP; flags: FrameSetup Successors according to CFG: BB#1 BB#1: derived from LLVM BB %0 Predecessors according to CFG: BB#0 BB#1 OR64mi8 %RCX, 1, %noreg, 0, %noreg, 0, %EFLAGS<imp-def>; flags: FrameSetup %RCX<def,tied1> = SUB64ri32 %RCX<tied0>, 4096, %EFLAGS<imp-def>; flags: FrameSetup %RDX<def,tied1> = SUB64ri32 %RDX<tied0>, 4096, %EFLAGS<imp-def>; flags: FrameSetup JAE_1 <BB#1>, %EFLAGS<imp-use>; flags: FrameSetup Successors according to CFG: BB#2 BB#1 BB#2: derived from LLVM BB %0 Predecessors according to CFG: BB#1 %RSP<def,tied1> = SUB64rr %RSP<tied0>, %RAX, %EFLAGS<imp-def>; flags: FrameSetup SEH_StackAlloc 40040; flags: FrameSetup SEH_EndPrologue; flags: FrameSetup %RCX<def> = LEA64r %RSP, 1, %noreg, 40, %noreg %EDX<def> = MOV32r0 %EFLAGS<imp-def,dead> CALL64pcrel32 <ga:@dummy_use>, <regmask>, %RSP<imp-use>, %RCX<imp-use>, %EDX<imp-use,kill>, %RSP<imp-def> SEH_Epilogue %RSP<def,tied1> = ADD64ri32 %RSP<tied0>, 40040, %EFLAGS<imp-def,dead> RETQ # End machine code for function test_large. *** Bad machine code: Using an undefined physical register *** - function: test_large - basic block: BB#1 (0x40b1d70) - instruction: OR64mi8- operand 0: %RCX *** Bad machine code: Using an undefined physical register *** - function: test_large - basic block: BB#1 (0x40b1d70) - instruction: %RCX<def,tied1> = SUB64ri32- operand 1: %RCX<tied0> *** Bad machine code: Using an undefined physical register *** - function: test_large - basic block: BB#1 (0x40b1d70) - instruction: %RDX<def,tied1> = SUB64ri32- operand 1: %RDX<tied0> *** Bad machine code: Using an undefined physical register *** - function: test_large - basic block: BB#2 (0x40b1e20) - instruction: %RSP<def,tied1> = SUB64rr- operand 2: %RAX LLVM ERROR: Found 4 machine code errors. _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fllvm.cs.uiuc.edu&data=01%7c01%7candya%40microsoft.com%7c7f3665f51a1a45a612c008d2a668163e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=iQjl5g5NZHMOzJ4%2fhFEN6ECfAnFBR5Evud7KBAWO%2b7Y%3d https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev%0a&data=01%7c01%7candya%40microsoft.com%7c7f3665f51a1a45a612c008d2a668163e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=BiZb%2bhINDhjdc0YshtClHXKkOS10wY7Gjw6LWjzozP0%3d
John Kåre Alsaker via llvm-dev
2015-Aug-30 20:16 UTC
[llvm-dev] [LLVMdev] Adding a stack probe function attribute
I've submited my patch to inline stack probes here: http://reviews.llvm.org/D12483 On Mon, Aug 17, 2015 at 7:46 PM, Andy Ayers <andya at microsoft.com> wrote:> The stack pointer can't ever safely point beyond the valid range. I don't 100% know the exact reason but suspect the OS may verify this on context save/restore, which can happen unpredictably. This can be tricky to guarantee if you're modifying the stack pointer in flight without some extra math here and there, at which point you might as well use another register and defer the update like we do. > > I'm also not sure what you mean by updating the stack pointer being "allowed" for x86-32 -- if you look at the implementation of __chkstk for x86-32, which we provide in sources with VS, it does not modify ESP in the loop -- it uses EAX to probe the stack.I was thinking of the fact that __chkstk modified SP on x86-32, but not on x86-64.> We have the same issue with epilogs that you saw. We've worked around it for now by generating epilogs first, but this seems a bit hacky. I'd like to see us fix this by delaying the inline expansion of a stack probe until after prolog/epilog generation. That also removes the need for us to update the prolog block in ::emitProlog, which is a bit clunky. But we haven't gotten around to this yet.Hm.. I did try just generating epilogs first, but I ran into test failures.