On Tue, Jul 28, 2015 at 6:34 PM, Reid Kleckner <rnk at google.com> wrote:> On Tue, Jul 28, 2015 at 2:25 AM, John Kåre Alsaker > <john.mailinglists at gmail.com> wrote: >> >> On Tue, Jul 28, 2015 at 12:44 AM, Reid Kleckner <rnk at google.com> wrote: >> > Yeah, the function attributes section of LangRef is a reasonable place >> > to >> > put stuff like this: >> > http://llvm.org/docs/LangRef.html#function-attributes >> I'll see if I can't sneak something in there. >> >> > >> > I think we should add this. I also know that LLILAC needs something like >> > this as well. I propose the following: >> > - Add a string attribute called "stack-probe-symbol"="foo". >> > - The presence of this attribute indicates that stack probes should be >> > emitted, even on non-Windows OSs. >> > - (future work) For LLILAC, if this attribute is present but the string >> > is >> > empty, this can be a signal that the check must be emitted inline, >> > either as >> > a sequence of stores or a loop. >> > >> > This also addresses David's concern with the hardcoded __probestack >> > symbol >> > name. >> First of all, LLVM should be free to choose how it does stack probes, >> it could call ___chkstk_ms, ___chkstk_ms, __chkstk, _alloca, _chkstk, >> __probestack or any other stack probe function it knows about, it >> could unroll and inline it for smaller allocation amounts, it could >> inline the function entirely or it could do nothing, for platforms >> which does stack overflow checks in hardware. >> >> I don't see why hardcoding __probestack is different from every other >> hardcoded thing in LLVM. Furthermore since calls to it can be elided >> it is not useful for clients to specify their own function, so they >> would just point it to whatever the platform stack probing function >> would be (replicating the ugly logic in >> X86FrameLowering::emitStackProbeCall). If LLVM in the future always >> inlined the call, the stack probe function would never be called and >> the attribute argument is useless. > > > The difference between __probestack and __chkstk etc is that we are happy to > call into existing interfaces that are somehow guaranteed by the > environment. Sometimes we do invent our own in compiler-rt for obscure cases > like i128 division, but it's rare. After years of adapting to fit > pre-existing interfaces, we are naturally very cautious to define our own.The code does need to go somewhere though.> Since not everyone uses compiler-rt, I worry about a situation where people > fight over the definition of __probestackWouldn't this be resolved by defining what __probestack does?> , or where users want to override > __probestack to call into their runtime, rather than dealing with signals.As I said before, calls to __probestack are not guaranteed to be emitted, so clients can't rely on it doing anything other than probing the stack. Also clients must always deal with guard page faults. Those will usually happen outside of __probestack, since functions with large stack frames are rare.
John Kåre Alsaker via llvm-dev
2015-Aug-16 18:25 UTC
[llvm-dev] [LLVMdev] Adding a stack probe function attribute
I started to implement inlining of the stack probe function based on Microsoft's inlined stack probes in https://github.com/Microsoft/llvm/tree/MS. Do we know why the stack pointer cannot be updated in a loop (which results in ideal code)? I noticed that was commented in Microsoft's code. I suspect this is due to debug or unwinding information, since it is allowed on Windows x86-32. I ran into two issues while implementing this. The epilog was inserted into the wrong basic block. This is because the basic blocks to insert epilogs into is calculated before emitPrologue is called. I fixed this by adding the following code after the emitPrologue call in PEI::insertPrologEpilogCode: RestoreBlocks.clear(); calculateSets(Fn); This doesn't seem like a very nice solution. It's also unclear to me how Microsoft's branch handles this. The other issue is with code generation. All the tests passed but the one were segmented stacks are used and stack probes are required. I don't see what is actually wrong here, and would like some help. Here is the output: # After Prologue/Epilogue Insertion & Frame Finalization # Machine code for function test_large: Post SSA Frame Objects: fi#0: size=40000, align=4, at location [SP-40000] BB#4: %R11<def> = LEA64r %RSP, 1, %noreg, -40040, %noreg CMP64rm %R11, %noreg, 1, %noreg, 40, %GS, %EFLAGS<imp-def> JA_1 <BB#0>, %EFLAGS<imp-use> Successors according to CFG: BB#3 BB#0 BB#3: Predecessors according to CFG: BB#4 %R10<def> = MOV64ri 40040 %R11<def> = MOV64ri 32 CALL64pcrel32 <es:__morestack>, %RSP<imp-use> MORESTACK_RET Successors according to CFG: BB#0 BB#0: derived from LLVM BB %0 Predecessors according to CFG: BB#3 BB#4 %EAX<def> = MOV32ri 40040; flags: FrameSetup %RDX<def> = MOV64rr %RAX; flags: FrameSetup %RCX<def> = MOV64rr %RSP; flags: FrameSetup Successors according to CFG: BB#1 BB#1: derived from LLVM BB %0 Predecessors according to CFG: BB#0 BB#1 OR64mi8 %RCX, 1, %noreg, 0, %noreg, 0, %EFLAGS<imp-def>; flags: FrameSetup %RCX<def,tied1> = SUB64ri32 %RCX<tied0>, 4096, %EFLAGS<imp-def>; flags: FrameSetup %RDX<def,tied1> = SUB64ri32 %RDX<tied0>, 4096, %EFLAGS<imp-def>; flags: FrameSetup JAE_1 <BB#1>, %EFLAGS<imp-use>; flags: FrameSetup Successors according to CFG: BB#2 BB#1 BB#2: derived from LLVM BB %0 Predecessors according to CFG: BB#1 %RSP<def,tied1> = SUB64rr %RSP<tied0>, %RAX, %EFLAGS<imp-def>; flags: FrameSetup SEH_StackAlloc 40040; flags: FrameSetup SEH_EndPrologue; flags: FrameSetup %RCX<def> = LEA64r %RSP, 1, %noreg, 40, %noreg %EDX<def> = MOV32r0 %EFLAGS<imp-def,dead> CALL64pcrel32 <ga:@dummy_use>, <regmask>, %RSP<imp-use>, %RCX<imp-use>, %EDX<imp-use,kill>, %RSP<imp-def> SEH_Epilogue %RSP<def,tied1> = ADD64ri32 %RSP<tied0>, 40040, %EFLAGS<imp-def,dead> RETQ # End machine code for function test_large. *** Bad machine code: Using an undefined physical register *** - function: test_large - basic block: BB#1 (0x40b1d70) - instruction: OR64mi8- operand 0: %RCX *** Bad machine code: Using an undefined physical register *** - function: test_large - basic block: BB#1 (0x40b1d70) - instruction: %RCX<def,tied1> = SUB64ri32- operand 1: %RCX<tied0> *** Bad machine code: Using an undefined physical register *** - function: test_large - basic block: BB#1 (0x40b1d70) - instruction: %RDX<def,tied1> = SUB64ri32- operand 1: %RDX<tied0> *** Bad machine code: Using an undefined physical register *** - function: test_large - basic block: BB#2 (0x40b1e20) - instruction: %RSP<def,tied1> = SUB64rr- operand 2: %RAX LLVM ERROR: Found 4 machine code errors.