thr3ads.net - llvm dev - [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Reid Kleckner

2013-Jul-25 21:38 UTC

[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI

Hi LLVM folks,

To properly implement pass-by-value in the Microsoft C++ ABI, we need to be
able
to take the address of an outgoing call argument slot.  This is
http://llvm.org/PR5064 .

Problem
-------

On Windows, C structs are pushed right onto the stack in line with the other
arguments.  In LLVM, we use byval to model this, and it works for C structs.
However, C++ records are also passed this way, and reusing byval for C++
records
breaks C++ object identity rules.

In order to implement the ABI properly, we need a way to get the address of
the
argument slot *before* we start the call, so that we can either construct
the
object in place on the stack or at least call its copy constructor.

This is further complicated by the possibility of nested calls passing
arguments by
value.  A good general case to think about is a binary tree of calls that
take
two arguments by value and return by value:

  struct A { int a; };
  A foo(A, A);
  foo(foo(A(), A()), foo(A(), A()));

To complete the outer call to foo, we have to adjust the stack for its
outgoing
arguments before the inner calls to foo, and arrange for the sret pointers
to
point to those slots.

To make this even more complicated, C++ methods are typically callee
cleanup (thiscall), but free functions are caller cleanup (cdecl).

Features
--------

A few weeks ago, I sat down with some folks at Google and we came up with
this
proposal, which tries to add the minimum set of LLVM IL features to make
this
possible.

1. Allow alloca instructions to use llvm.stacksave values to indicate
scoping.

This creates an SSA dependence between the alloca instruction and the
stackrestore instruction that prevents optimizers from accidentally
reordering
them in ways that don't verify.  llvm.stacksave in this case is taking on a
role
similar to CALLSEQ_START in the selection dag.

LLVM can also apply this to dynamic allocas from inline functions to ensure
that
optimizers don't move them.

2. Add an 'alloca' attribute for parameters.

Only an alloca value can be passed to a parameter with this attribute.  It
cannot be bitcasted or GEPed.  An alloca can only be passed in this way
once.
It can be passed as a normal pointer to any number of other functions.

Aside from allocas bounded by llvm.stacksave and llvm.stackrestore calls,
there
can be no allocas between the creation of an alloca passed with this
attribute
and its associated call.

3. Add a stackrestore field to call and invoke instructions.

This models calling conventions which do their own cleanup, and ensures that
even after optimizations have perturbed the IR, we don't consider the
allocas to
be live.  For caller cleanup conventions, while the callee may have called
destructors on its arguments, the allocas can be considered live until the
stack
restore.

Example
-------

A single call to foo, assuming it is stdcall, would be lowered something
like:

%res = alloca %struct.A
%base = llvm.stacksave()
%arg1 = alloca %struct.A, stackbase %base
%arg2 = alloca %struct.A, stackbase %base
call @A_ctor(%arg1)
call @A_ctor(%arg2)
call x86_stdcallcc @foo(%res sret, %arg1 alloca, %arg2 alloca),
stackrestore %base

If control does not flow through a call or invoke with a stackrestore field,
then manual calls to llvm.stackrestore must be emitted before another call
or
invoke can use an 'alloca' argument.  The manual stack restore call ends
the
lifetime of the allocas.  This is necessary to handle unwind edges from
argument
expression evaluation as well as the case where foo is not callee cleanup.

Implementation
--------------

By starting out with the stack save and restore intrinsics, we can hopefully
approach a slow but working implementation sooner rather than later.  The
work
should mostly be in the verifier, the IR, its parser, and the x86 backend.

I don't plan to start working on this immediately, but over the long run
this will be really important to support well.

---

That's all!  Please send feedback!  This is admittedly a really complicated
feature and I'm sorry for inflicting it on the LLVM community, but it's
obviously beyond my control.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130725/65134108/attachment.html>

Duncan Sands

2013-Jul-29 13:00 UTC

head link

[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI

Hi Reid,

On 25/07/13 23:38, Reid Kleckner wrote:> Hi LLVM folks,
>
> To properly implement pass-by-value in the Microsoft C++ ABI, we need to be
able
> to take the address of an outgoing call argument slot.  This is
> http://llvm.org/PR5064 .
>
> Problem
> -------
>
> On Windows, C structs are pushed right onto the stack in line with the
other
> arguments.  In LLVM, we use byval to model this, and it works for C
structs.
> However, C++ records are also passed this way, and reusing byval for C++
records
> breaks C++ object identity rules.
>
> In order to implement the ABI properly, we need a way to get the address of
the
> argument slot *before* we start the call, so that we can either construct
the
> object in place on the stack or at least call its copy constructor.
what does GCC do?

Ciao, Duncan.
>
> This is further complicated by the possibility of nested calls passing
arguments by
> value.  A good general case to think about is a binary tree of calls that
take
> two arguments by value and return by value:
>
>    struct A { int a; };
>    A foo(A, A);
>    foo(foo(A(), A()), foo(A(), A()));
>
> To complete the outer call to foo, we have to adjust the stack for its
outgoing
> arguments before the inner calls to foo, and arrange for the sret pointers
to
> point to those slots.
>
> To make this even more complicated, C++ methods are typically callee
cleanup
> (thiscall), but free functions are caller cleanup (cdecl).
>
> Features
> --------
>
> A few weeks ago, I sat down with some folks at Google and we came up with
this
> proposal, which tries to add the minimum set of LLVM IL features to make
this
> possible.
>
> 1. Allow alloca instructions to use llvm.stacksave values to indicate
scoping.
>
> This creates an SSA dependence between the alloca instruction and the
> stackrestore instruction that prevents optimizers from accidentally
reordering
> them in ways that don't verify.  llvm.stacksave in this case is taking
on a role
> similar to CALLSEQ_START in the selection dag.
>
> LLVM can also apply this to dynamic allocas from inline functions to ensure
that
> optimizers don't move them.
>
> 2. Add an 'alloca' attribute for parameters.
>
> Only an alloca value can be passed to a parameter with this attribute.  It
> cannot be bitcasted or GEPed.  An alloca can only be passed in this way
once.
> It can be passed as a normal pointer to any number of other functions.
>
> Aside from allocas bounded by llvm.stacksave and llvm.stackrestore calls,
there
> can be no allocas between the creation of an alloca passed with this
attribute
> and its associated call.
>
> 3. Add a stackrestore field to call and invoke instructions.
>
> This models calling conventions which do their own cleanup, and ensures
that
> even after optimizations have perturbed the IR, we don't consider the
allocas to
> be live.  For caller cleanup conventions, while the callee may have called
> destructors on its arguments, the allocas can be considered live until the
stack
> restore.
>
> Example
> -------
>
> A single call to foo, assuming it is stdcall, would be lowered something
like:
>
> %res = alloca %struct.A
> %base = llvm.stacksave()
> %arg1 = alloca %struct.A, stackbase %base
> %arg2 = alloca %struct.A, stackbase %base
> call @A_ctor(%arg1)
> call @A_ctor(%arg2)
> call x86_stdcallcc @foo(%res sret, %arg1 alloca, %arg2 alloca),
stackrestore %base
>
> If control does not flow through a call or invoke with a stackrestore
field,
> then manual calls to llvm.stackrestore must be emitted before another call
or
> invoke can use an 'alloca' argument.  The manual stack restore call
ends the
> lifetime of the allocas.  This is necessary to handle unwind edges from
argument
> expression evaluation as well as the case where foo is not callee cleanup.
>
> Implementation
> --------------
>
> By starting out with the stack save and restore intrinsics, we can
hopefully
> approach a slow but working implementation sooner rather than later.  The
work
> should mostly be in the verifier, the IR, its parser, and the x86 backend.
>
> I don't plan to start working on this immediately, but over the long
run this
> will be really important to support well.
>
> ---
>
> That's all!  Please send feedback!  This is admittedly a really
complicated
> feature and I'm sorry for inflicting it on the LLVM community, but
it's
> obviously beyond my control.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Anton Korobeynikov

2013-Jul-29 13:30 UTC

head link

[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI

>> object in place on the stack or at least call its copy constructor.
>
>
> what does GCC do?Nothing. It does not support MSVC ABI.

-- 
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University

Rafael Espíndola

2013-Jul-30 18:07 UTC

head link

[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI

How do you handle this during codegen? One problem is avoid stack
changes (like spills). Another is coordinating things that are using
allocas and those that are not but end up in the stack. Consider

void foo(int arg1, int arg2, int arg3, ....CXXTypeWithCopyConstructor
argn, int argp1...)

You will need an alloca for argn, but the ABI also requires it to be
next to the plain integers that didn' fit in registers, no? This is
part of the reason my suggestion was to have a single opaque object
representing the frame being constructed and a getelementpointer like
abstraction to get pointers out of it.


On 25 July 2013 17:38, Reid Kleckner <rnk at google.com>
wrote:> Hi LLVM folks,
>
> To properly implement pass-by-value in the Microsoft C++ ABI, we need to be
> able
> to take the address of an outgoing call argument slot.  This is
> http://llvm.org/PR5064 .
>
> Problem
> -------
>
> On Windows, C structs are pushed right onto the stack in line with the
other
> arguments.  In LLVM, we use byval to model this, and it works for C
structs.
> However, C++ records are also passed this way, and reusing byval for C++
> records
> breaks C++ object identity rules.
>
> In order to implement the ABI properly, we need a way to get the address of
> the
> argument slot *before* we start the call, so that we can either construct
> the
> object in place on the stack or at least call its copy constructor.
>
> This is further complicated by the possibility of nested calls passing
> arguments by
> value.  A good general case to think about is a binary tree of calls that
> take
> two arguments by value and return by value:
>
>   struct A { int a; };
>   A foo(A, A);
>   foo(foo(A(), A()), foo(A(), A()));
>
> To complete the outer call to foo, we have to adjust the stack for its
> outgoing
> arguments before the inner calls to foo, and arrange for the sret pointers
> to
> point to those slots.
>
> To make this even more complicated, C++ methods are typically callee
cleanup
> (thiscall), but free functions are caller cleanup (cdecl).
>
> Features
> --------
>
> A few weeks ago, I sat down with some folks at Google and we came up with
> this
> proposal, which tries to add the minimum set of LLVM IL features to make
> this
> possible.
>
> 1. Allow alloca instructions to use llvm.stacksave values to indicate
> scoping.
>
> This creates an SSA dependence between the alloca instruction and the
> stackrestore instruction that prevents optimizers from accidentally
> reordering
> them in ways that don't verify.  llvm.stacksave in this case is taking
on a
> role
> similar to CALLSEQ_START in the selection dag.
>
> LLVM can also apply this to dynamic allocas from inline functions to ensure
> that
> optimizers don't move them.
>
> 2. Add an 'alloca' attribute for parameters.
>
> Only an alloca value can be passed to a parameter with this attribute.  It
> cannot be bitcasted or GEPed.  An alloca can only be passed in this way
> once.
> It can be passed as a normal pointer to any number of other functions.
>
> Aside from allocas bounded by llvm.stacksave and llvm.stackrestore calls,
> there
> can be no allocas between the creation of an alloca passed with this
> attribute
> and its associated call.
>
> 3. Add a stackrestore field to call and invoke instructions.
>
> This models calling conventions which do their own cleanup, and ensures
that
> even after optimizations have perturbed the IR, we don't consider the
> allocas to
> be live.  For caller cleanup conventions, while the callee may have called
> destructors on its arguments, the allocas can be considered live until the
> stack
> restore.
>
> Example
> -------
>
> A single call to foo, assuming it is stdcall, would be lowered something
> like:
>
> %res = alloca %struct.A
> %base = llvm.stacksave()
> %arg1 = alloca %struct.A, stackbase %base
> %arg2 = alloca %struct.A, stackbase %base
> call @A_ctor(%arg1)
> call @A_ctor(%arg2)
> call x86_stdcallcc @foo(%res sret, %arg1 alloca, %arg2 alloca),
stackrestore
> %base
>
> If control does not flow through a call or invoke with a stackrestore
field,
> then manual calls to llvm.stackrestore must be emitted before another call
> or
> invoke can use an 'alloca' argument.  The manual stack restore call
ends the
> lifetime of the allocas.  This is necessary to handle unwind edges from
> argument
> expression evaluation as well as the case where foo is not callee cleanup.
>
> Implementation
> --------------
>
> By starting out with the stack save and restore intrinsics, we can
hopefully
> approach a slow but working implementation sooner rather than later.  The
> work
> should mostly be in the verifier, the IR, its parser, and the x86 backend.
>
> I don't plan to start working on this immediately, but over the long
run
> this will be really important to support well.
>
> ---
>
> That's all!  Please send feedback!  This is admittedly a really
complicated
> feature and I'm sorry for inflicting it on the LLVM community, but
it's
> obviously beyond my control.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Reid Kleckner

2013-Jul-30 18:32 UTC

head link

[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI

On Tue, Jul 30, 2013 at 11:07 AM, Rafael Espíndola <
rafael.espindola at gmail.com> wrote:
> How do you handle this during codegen? One problem is avoid stack
> changes (like spills)

I'm not sure I understand your question, but my plan is to basically use a
frame pointer when there is a call with an argument using the 'alloca'
attribute.  It'll be slow but functional.

Later the backend can be optimized to be clever about spilling through an
SP-based memory operand in the presence of stack adjustments.  I don't yet
have a concrete plan for this, and it will require more familiarity with
the backend than I currently have.

Another is coordinating things that are using> allocas and those that are not but end up in the stack. Consider
>
> void foo(int arg1, int arg2, int arg3, ....CXXTypeWithCopyConstructor
> argn, int argp1...)
>
> You will need an alloca for argn, but the ABI also requires it to be
> next to the plain integers that didn' fit in registers, no? This is
> part of the reason my suggestion was to have a single opaque object
> representing the frame being constructed and a getelementpointer like
> abstraction to get pointers out of it.

This proposal puts this complexity in the backend.  The backend will lay
out the outgoing argument slots as required by the ABI, and alloca pointer
will be resolved to point to the appropriate outgoing argument slot.

The verifier will be changed to reject IR with a live alloca between a call
site with an alloca-attributed argument and the creation of that alloca.

This will work however:

%s1 = stacksave
%1 = alloca stackbase %s1
%s2 = stacksave
%2 = alloca stackbase %s2
call @foo(%2 alloca)
stackrestore %s2
call @foo(%1 alloca)
stackrestore %s1

Because the %2 alloca is dead due to the stack restore before the second
foo call.

I should also mention how this interacts with regparms.   The win64 CC has
4 regparms, and if one of them is a struct, it is passed indirectly.  Users
can easily handle that in the frontend, and the backend could reject the
alloca attribute on parameters that should be in registers.  I need to
double-check what happens for fastcall on x86_32.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130730/d53f210c/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Jul 2013 - [LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI

[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI

[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI

[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI

[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI

[LLVMdev] Proposing a new 'alloca' parameter attribute to implement the Microsoft C++ ABI

Possibly Parallel Threads