Hi! I'm a GSoC student this year, working on implementing split stacks on LLVM. TL;DR: I'm facing some problems trying to get LLVM to generate the code I want, please help me out if you can spare some time. It involves the SelectionDAG, MachinsInstr and liveness analysis portions. I'm currently trying to implement alloca correctly. It essentially boils down to checking if the current stack block has enough space to hold the alloca'ed block of memory. If yes, going the conventional way (bumping the RSP); otherwise calling into a function that allocates the memory from the heap [1]. The stack pointer is not modified in the second case. I am trying to implement this by: a. Custom lowering DYNAMIC_ALLOCA in case segmented stacks are enabled. b. Creating a X86ISD::SEG_ALLOCA node in LowerDYNAMIC_STACKALLOC if segmented stacks are enabled. (Right now all LowerDYNAMIC_STACKALLOC on x86 does is check for Windows and lower the call to X86ISD::WIN_ALLOCA). c. Having EmitLoweredSegAlloca do the checks, (calling the external function if needed) and, in both the cases, write the pointer to the allocated memory to RAX. If the function is called nothing extra needs to be done, since the return value stays at RAX. If the stack pointer was changed, I do a (X86::MOV64rr, X86::RAX).addReg(X86::RSP). d. Setting the value of the node to RAX in LowerDYNAMIC_STACKALLOC, making the last part of the function effectively look like this: // Reg is RAX or EAX, based on the subtarget Chain = DAG.getNode(X86ISD::SEG_ALLOCA, dl, NodeTys, Chain, Flag); Flag = Chain.getValue(1); Chain = DAG.getCopyFromReg(Chain, dl, Reg, SPTy).getValue(1); SDValue Ops1[2] = { Chain.getValue(0), Chain }; return DAG.getMergeValues(Ops1, 2, dl); Firstly, I would also like some feedback on this implementation in general. Secondly, the problem I'm facing: in the final assembly generated, the move instruction to RAX, in (c) is absent. I suspected this has something to do with the liveness analysis pass. With -debug-only=liveintervals, I see this 304L %vreg5<def> = COPY %RAX<kill>; GR64:%vreg5 in the basic block I jump to after allocating the memory (both after bumping the SP or after calling the runtime). Perhaps this causes the pass to think the assignment to RAX is not needed, and can be removed? My guess is that it has something to do with the DAG.getCopyFromReg. How can I fix this? Thirdly, the comments say DYNAMIC_STACKALLOC is supposed to evaluate to the new stack pointer. However, seeing that the actual update is done inside the lowering, I am setting its value to a pointer to the new block of memory. Now that I'm violating this assumption, what should I change? Could this be the reason for the above MOV missing in my original problem? [1] There is some bookkeeping done to make sure we don't leak memory, but I'll skip that detail to keep my email short. -- Sanjoy Das http://playingwithpointers.com
Rafael Ávila de Espíndola
2011-Jun-17 20:12 UTC
[LLVMdev] Custom lowering DYNAMIC_STACKALLOC
On 11-06-17 10:31 AM, Sanjoy Das wrote:> Hi! > > I'm a GSoC student this year, working on implementing split stacks on LLVM. > > TL;DR: I'm facing some problems trying to get LLVM to generate the code > I want, please help me out if you can spare some time. It involves the > SelectionDAG, MachinsInstr and liveness analysis portions. > > > > I'm currently trying to implement alloca correctly. It essentially boils > down to checking if the current stack block has enough space to hold the > alloca'ed block of memory. If yes, going the conventional way (bumping > the RSP); otherwise calling into a function that allocates the memory > from the heap [1]. The stack pointer is not modified in the second case. > > I am trying to implement this by: > > a. Custom lowering DYNAMIC_ALLOCA in case segmented stacks are enabled. > > b. Creating a X86ISD::SEG_ALLOCA node in LowerDYNAMIC_STACKALLOC if > segmented stacks are enabled. (Right now all LowerDYNAMIC_STACKALLOC on > x86 does is check for Windows and lower the call to X86ISD::WIN_ALLOCA). > > c. Having EmitLoweredSegAlloca do the checks, (calling the external > function if needed) and, in both the cases, write the pointer to the > allocated memory to RAX. If the function is called nothing extra needs > to be done, since the return value stays at RAX. If the stack pointer > was changed, I do a (X86::MOV64rr, X86::RAX).addReg(X86::RSP). > > d. Setting the value of the node to RAX in LowerDYNAMIC_STACKALLOC, > making the last part of the function effectively look like this: > > // Reg is RAX or EAX, based on the subtarget > Chain = DAG.getNode(X86ISD::SEG_ALLOCA, dl, NodeTys, Chain, Flag); > Flag = Chain.getValue(1); > Chain = DAG.getCopyFromReg(Chain, dl, Reg, SPTy).getValue(1); > > SDValue Ops1[2] = { Chain.getValue(0), Chain }; > return DAG.getMergeValues(Ops1, 2, dl); > > Firstly, I would also like some feedback on this implementation in general. > > Secondly, the problem I'm facing: in the final assembly generated, the > move instruction to RAX, in (c) is absent. I suspected this has > something to do with the liveness analysis pass. With > -debug-only=liveintervals, I see this > > 304L %vreg5<def> = COPY %RAX<kill>; GR64:%vreg5Is SEG_ALLOCA marked as writing to RAX? Is this code in github? It has been a long time since I looked at selection dags, but I could take a look. btw, have you got -view-isel-dags (and the other view dags options) working? They are really handy for debugging this stuff. Cheers, Rafael
Hi!> Is SEG_ALLOCA marked as writing to RAX?It has RAX in its Defs list.> Is this code in github? It has been a long time since I looked at > selection dags, but I could take a look.It is up at https://github.com/sanjoy/llvm/tree/segmented-stacks> btw, have you got -view-isel-dags (and the other view dags options) > working? They are really handy for debugging this stuff.I have, but perhaps not as extensively as they can be used. I'll give them a try again. -- Sanjoy Das http://playingwithpointers.com
On Jun 17, 2011, at 7:31 AM, Sanjoy Das wrote:> c. Having EmitLoweredSegAlloca do the checks, (calling the external > function if needed) and, in both the cases, write the pointer to the > allocated memory to RAX. If the function is called nothing extra needs > to be done, since the return value stays at RAX. If the stack pointer > was changed, I do a (X86::MOV64rr, X86::RAX).addReg(X86::RSP).Try not to use physical registers before register allocation unless you absolutely have to. If you call a function, immediately copy the return value to a virtual register. Same thing for the stack pointer; copy it to a virtual register (using COPY, not MOV64rr). You'll probably need a PHI as well. /jakob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110621/9aa942f5/attachment.html>