Reid Kleckner
2013-Jul-25 21:38 UTC
[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI
Hi LLVM folks, To properly implement pass-by-value in the Microsoft C++ ABI, we need to be able to take the address of an outgoing call argument slot. This is http://llvm.org/PR5064 . Problem ------- On Windows, C structs are pushed right onto the stack in line with the other arguments. In LLVM, we use byval to model this, and it works for C structs. However, C++ records are also passed this way, and reusing byval for C++ records breaks C++ object identity rules. In order to implement the ABI properly, we need a way to get the address of the argument slot *before* we start the call, so that we can either construct the object in place on the stack or at least call its copy constructor. This is further complicated by the possibility of nested calls passing arguments by value. A good general case to think about is a binary tree of calls that take two arguments by value and return by value: struct A { int a; }; A foo(A, A); foo(foo(A(), A()), foo(A(), A())); To complete the outer call to foo, we have to adjust the stack for its outgoing arguments before the inner calls to foo, and arrange for the sret pointers to point to those slots. To make this even more complicated, C++ methods are typically callee cleanup (thiscall), but free functions are caller cleanup (cdecl). Features -------- A few weeks ago, I sat down with some folks at Google and we came up with this proposal, which tries to add the minimum set of LLVM IL features to make this possible. 1. Allow alloca instructions to use llvm.stacksave values to indicate scoping. This creates an SSA dependence between the alloca instruction and the stackrestore instruction that prevents optimizers from accidentally reordering them in ways that don't verify. llvm.stacksave in this case is taking on a role similar to CALLSEQ_START in the selection dag. LLVM can also apply this to dynamic allocas from inline functions to ensure that optimizers don't move them. 2. Add an 'alloca' attribute for parameters. Only an alloca value can be passed to a parameter with this attribute. It cannot be bitcasted or GEPed. An alloca can only be passed in this way once. It can be passed as a normal pointer to any number of other functions. Aside from allocas bounded by llvm.stacksave and llvm.stackrestore calls, there can be no allocas between the creation of an alloca passed with this attribute and its associated call. 3. Add a stackrestore field to call and invoke instructions. This models calling conventions which do their own cleanup, and ensures that even after optimizations have perturbed the IR, we don't consider the allocas to be live. For caller cleanup conventions, while the callee may have called destructors on its arguments, the allocas can be considered live until the stack restore. Example ------- A single call to foo, assuming it is stdcall, would be lowered something like: %res = alloca %struct.A %base = llvm.stacksave() %arg1 = alloca %struct.A, stackbase %base %arg2 = alloca %struct.A, stackbase %base call @A_ctor(%arg1) call @A_ctor(%arg2) call x86_stdcallcc @foo(%res sret, %arg1 alloca, %arg2 alloca), stackrestore %base If control does not flow through a call or invoke with a stackrestore field, then manual calls to llvm.stackrestore must be emitted before another call or invoke can use an 'alloca' argument. The manual stack restore call ends the lifetime of the allocas. This is necessary to handle unwind edges from argument expression evaluation as well as the case where foo is not callee cleanup. Implementation -------------- By starting out with the stack save and restore intrinsics, we can hopefully approach a slow but working implementation sooner rather than later. The work should mostly be in the verifier, the IR, its parser, and the x86 backend. I don't plan to start working on this immediately, but over the long run this will be really important to support well. --- That's all! Please send feedback! This is admittedly a really complicated feature and I'm sorry for inflicting it on the LLVM community, but it's obviously beyond my control. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130725/65134108/attachment.html>
Duncan Sands
2013-Jul-29 13:00 UTC
[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI
Hi Reid, On 25/07/13 23:38, Reid Kleckner wrote:> Hi LLVM folks, > > To properly implement pass-by-value in the Microsoft C++ ABI, we need to be able > to take the address of an outgoing call argument slot. This is > http://llvm.org/PR5064 . > > Problem > ------- > > On Windows, C structs are pushed right onto the stack in line with the other > arguments. In LLVM, we use byval to model this, and it works for C structs. > However, C++ records are also passed this way, and reusing byval for C++ records > breaks C++ object identity rules. > > In order to implement the ABI properly, we need a way to get the address of the > argument slot *before* we start the call, so that we can either construct the > object in place on the stack or at least call its copy constructor.what does GCC do? Ciao, Duncan.> > This is further complicated by the possibility of nested calls passing arguments by > value. A good general case to think about is a binary tree of calls that take > two arguments by value and return by value: > > struct A { int a; }; > A foo(A, A); > foo(foo(A(), A()), foo(A(), A())); > > To complete the outer call to foo, we have to adjust the stack for its outgoing > arguments before the inner calls to foo, and arrange for the sret pointers to > point to those slots. > > To make this even more complicated, C++ methods are typically callee cleanup > (thiscall), but free functions are caller cleanup (cdecl). > > Features > -------- > > A few weeks ago, I sat down with some folks at Google and we came up with this > proposal, which tries to add the minimum set of LLVM IL features to make this > possible. > > 1. Allow alloca instructions to use llvm.stacksave values to indicate scoping. > > This creates an SSA dependence between the alloca instruction and the > stackrestore instruction that prevents optimizers from accidentally reordering > them in ways that don't verify. llvm.stacksave in this case is taking on a role > similar to CALLSEQ_START in the selection dag. > > LLVM can also apply this to dynamic allocas from inline functions to ensure that > optimizers don't move them. > > 2. Add an 'alloca' attribute for parameters. > > Only an alloca value can be passed to a parameter with this attribute. It > cannot be bitcasted or GEPed. An alloca can only be passed in this way once. > It can be passed as a normal pointer to any number of other functions. > > Aside from allocas bounded by llvm.stacksave and llvm.stackrestore calls, there > can be no allocas between the creation of an alloca passed with this attribute > and its associated call. > > 3. Add a stackrestore field to call and invoke instructions. > > This models calling conventions which do their own cleanup, and ensures that > even after optimizations have perturbed the IR, we don't consider the allocas to > be live. For caller cleanup conventions, while the callee may have called > destructors on its arguments, the allocas can be considered live until the stack > restore. > > Example > ------- > > A single call to foo, assuming it is stdcall, would be lowered something like: > > %res = alloca %struct.A > %base = llvm.stacksave() > %arg1 = alloca %struct.A, stackbase %base > %arg2 = alloca %struct.A, stackbase %base > call @A_ctor(%arg1) > call @A_ctor(%arg2) > call x86_stdcallcc @foo(%res sret, %arg1 alloca, %arg2 alloca), stackrestore %base > > If control does not flow through a call or invoke with a stackrestore field, > then manual calls to llvm.stackrestore must be emitted before another call or > invoke can use an 'alloca' argument. The manual stack restore call ends the > lifetime of the allocas. This is necessary to handle unwind edges from argument > expression evaluation as well as the case where foo is not callee cleanup. > > Implementation > -------------- > > By starting out with the stack save and restore intrinsics, we can hopefully > approach a slow but working implementation sooner rather than later. The work > should mostly be in the verifier, the IR, its parser, and the x86 backend. > > I don't plan to start working on this immediately, but over the long run this > will be really important to support well. > > --- > > That's all! Please send feedback! This is admittedly a really complicated > feature and I'm sorry for inflicting it on the LLVM community, but it's > obviously beyond my control. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Anton Korobeynikov
2013-Jul-29 13:30 UTC
[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI
>> object in place on the stack or at least call its copy constructor. > > > what does GCC do?Nothing. It does not support MSVC ABI. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
Rafael EspĂndola
2013-Jul-30 18:07 UTC
[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI
How do you handle this during codegen? One problem is avoid stack changes (like spills). Another is coordinating things that are using allocas and those that are not but end up in the stack. Consider void foo(int arg1, int arg2, int arg3, ....CXXTypeWithCopyConstructor argn, int argp1...) You will need an alloca for argn, but the ABI also requires it to be next to the plain integers that didn' fit in registers, no? This is part of the reason my suggestion was to have a single opaque object representing the frame being constructed and a getelementpointer like abstraction to get pointers out of it. On 25 July 2013 17:38, Reid Kleckner <rnk at google.com> wrote:> Hi LLVM folks, > > To properly implement pass-by-value in the Microsoft C++ ABI, we need to be > able > to take the address of an outgoing call argument slot. This is > http://llvm.org/PR5064 . > > Problem > ------- > > On Windows, C structs are pushed right onto the stack in line with the other > arguments. In LLVM, we use byval to model this, and it works for C structs. > However, C++ records are also passed this way, and reusing byval for C++ > records > breaks C++ object identity rules. > > In order to implement the ABI properly, we need a way to get the address of > the > argument slot *before* we start the call, so that we can either construct > the > object in place on the stack or at least call its copy constructor. > > This is further complicated by the possibility of nested calls passing > arguments by > value. A good general case to think about is a binary tree of calls that > take > two arguments by value and return by value: > > struct A { int a; }; > A foo(A, A); > foo(foo(A(), A()), foo(A(), A())); > > To complete the outer call to foo, we have to adjust the stack for its > outgoing > arguments before the inner calls to foo, and arrange for the sret pointers > to > point to those slots. > > To make this even more complicated, C++ methods are typically callee cleanup > (thiscall), but free functions are caller cleanup (cdecl). > > Features > -------- > > A few weeks ago, I sat down with some folks at Google and we came up with > this > proposal, which tries to add the minimum set of LLVM IL features to make > this > possible. > > 1. Allow alloca instructions to use llvm.stacksave values to indicate > scoping. > > This creates an SSA dependence between the alloca instruction and the > stackrestore instruction that prevents optimizers from accidentally > reordering > them in ways that don't verify. llvm.stacksave in this case is taking on a > role > similar to CALLSEQ_START in the selection dag. > > LLVM can also apply this to dynamic allocas from inline functions to ensure > that > optimizers don't move them. > > 2. Add an 'alloca' attribute for parameters. > > Only an alloca value can be passed to a parameter with this attribute. It > cannot be bitcasted or GEPed. An alloca can only be passed in this way > once. > It can be passed as a normal pointer to any number of other functions. > > Aside from allocas bounded by llvm.stacksave and llvm.stackrestore calls, > there > can be no allocas between the creation of an alloca passed with this > attribute > and its associated call. > > 3. Add a stackrestore field to call and invoke instructions. > > This models calling conventions which do their own cleanup, and ensures that > even after optimizations have perturbed the IR, we don't consider the > allocas to > be live. For caller cleanup conventions, while the callee may have called > destructors on its arguments, the allocas can be considered live until the > stack > restore. > > Example > ------- > > A single call to foo, assuming it is stdcall, would be lowered something > like: > > %res = alloca %struct.A > %base = llvm.stacksave() > %arg1 = alloca %struct.A, stackbase %base > %arg2 = alloca %struct.A, stackbase %base > call @A_ctor(%arg1) > call @A_ctor(%arg2) > call x86_stdcallcc @foo(%res sret, %arg1 alloca, %arg2 alloca), stackrestore > %base > > If control does not flow through a call or invoke with a stackrestore field, > then manual calls to llvm.stackrestore must be emitted before another call > or > invoke can use an 'alloca' argument. The manual stack restore call ends the > lifetime of the allocas. This is necessary to handle unwind edges from > argument > expression evaluation as well as the case where foo is not callee cleanup. > > Implementation > -------------- > > By starting out with the stack save and restore intrinsics, we can hopefully > approach a slow but working implementation sooner rather than later. The > work > should mostly be in the verifier, the IR, its parser, and the x86 backend. > > I don't plan to start working on this immediately, but over the long run > this will be really important to support well. > > --- > > That's all! Please send feedback! This is admittedly a really complicated > feature and I'm sorry for inflicting it on the LLVM community, but it's > obviously beyond my control. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Reid Kleckner
2013-Jul-30 18:32 UTC
[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI
On Tue, Jul 30, 2013 at 11:07 AM, Rafael EspĂndola < rafael.espindola at gmail.com> wrote:> How do you handle this during codegen? One problem is avoid stack > changes (like spills)I'm not sure I understand your question, but my plan is to basically use a frame pointer when there is a call with an argument using the 'alloca' attribute. It'll be slow but functional. Later the backend can be optimized to be clever about spilling through an SP-based memory operand in the presence of stack adjustments. I don't yet have a concrete plan for this, and it will require more familiarity with the backend than I currently have. Another is coordinating things that are using> allocas and those that are not but end up in the stack. Consider > > void foo(int arg1, int arg2, int arg3, ....CXXTypeWithCopyConstructor > argn, int argp1...) > > You will need an alloca for argn, but the ABI also requires it to be > next to the plain integers that didn' fit in registers, no? This is > part of the reason my suggestion was to have a single opaque object > representing the frame being constructed and a getelementpointer like > abstraction to get pointers out of it.This proposal puts this complexity in the backend. The backend will lay out the outgoing argument slots as required by the ABI, and alloca pointer will be resolved to point to the appropriate outgoing argument slot. The verifier will be changed to reject IR with a live alloca between a call site with an alloca-attributed argument and the creation of that alloca. This will work however: %s1 = stacksave %1 = alloca stackbase %s1 %s2 = stacksave %2 = alloca stackbase %s2 call @foo(%2 alloca) stackrestore %s2 call @foo(%1 alloca) stackrestore %s1 Because the %2 alloca is dead due to the stack restore before the second foo call. I should also mention how this interacts with regparms. The win64 CC has 4 regparms, and if one of them is a struct, it is passed indirectly. Users can easily handle that in the frontend, and the backend could reject the alloca attribute on parameters that should be in registers. I need to double-check what happens for fastcall on x86_32. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130730/d53f210c/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI
- [LLVMdev] Starting implementation of 'inalloca' parameter attribute for MS C++ ABI pass-by-value
- [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI
- [LLVMdev] Issues with the llvm.stackrestore intrinsic
- [LLVMdev] Issues with the llvm.stackrestore intrinsic - now LoopRotation handling of alloca