thr3ads.net - llvm dev - [llvm-dev] [RFC] Introducing an explicit calling convention [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Frej Drejhammar via llvm-dev

2019-Jan-15 08:20 UTC

[llvm-dev] [RFC] Introducing an explicit calling convention

Hi All,

TLDR: Allow calling conventions to be defined on-the-fly for functions
in LLVM-IR, comments are requested on the mechanism and syntax.

Summary
======
This is a proposal for adding a mechanism by which LLVM can be used to
generate code fragments adhering to an arbitrary calling
convention. Intended use cases are: generating code intended to be
called from the shadow of a stackmap or patchpoint; generating the
target function of a statepoint; or simply for generating a piece of
shell-code during reverse engineering or binary patching.

Motivation
=========
The LLVM assembly language provides stackmaps, patchpoints, and
statepoints which all provide the user with the value or storage
location of operands given to the respective intrinsic. All three
intrinsics emit the information in a special stackmap section [3] of
the produced object file. The previous three intrinsics are useful to
the implementer of a JIT-compiler for an interpreted language [2] (the
author's use case) as stackmaps can be used to incrementally extend
blocks of native code and a statepoint can both be used as a mechanism
to call native code and as a landing-pad for reentry from
native-code. Other uses, such as inserting a stackmap and later
overwriting its shadow with a call to logging function are also
possible.

The information in the stackmap section can be seen as a custom
calling convention which is unique for this particular
location. Unfortunately there is currently no way to define the
details of a LLVM calling convention dynamically, as LLVM only allows
the user to choose among a fixed set of predefined conventions.

Approach
=======
This proposal adds a new calling convention called 'explicitcc', which
can be applied to void functions. A function using the explicit
calling convention requires that each element of the argument list has
a parameter attribute 'hwreg(metadata)' specifying the register from
which the argument gets its value. An 'explicit' function can have an
optional 'noclobber(metadata)' function attribute to tell the compiler
which registers are to be treated as callee save. Additionally a new
'@llvm.experimental.retwr(...)' (standing for return with registers)
intrinsic is introduced. By giving each parameter to retwr a hwreg
attribute, it allows the 'explicit' function to return to its caller
with a defined register state.

Only parameters passed in registers are considered as the
llvm.addressofreturnaddress intrinsic can be used to calculate the
location of values on the callers stack.

Example
======
The following is a function which exchanges the values of rcx and rdx
without clobbering rax and rbx.

define explicitcc void @example(i64 hwreg(metadata !1) %a,
                                i64 hwreg(metadata !2) %b)
			       noclobber(metadata !0) {
  call void (...) @llvm.experimental.retwr(i64 hwreg(metadata !2) %a,
                                           i64 hwreg(metadata !1) %b)
  ret void
}

!0 = !{!"rax", !"rbx"}
!1 = !{!"rcx"}
!2 = !{!"rdx"}

Open Questions
=============
Are parameter attributes the best way to encode the register
information? The metadata reference requires adding a pointer to the
ISD::ArgFlagsTy struct, thus growing it by 50%, is this acceptable?
An alternative could be to instead have a function attribute which
points to a metadata tuple with the explicit registers.

References
=========
[1] https://llvm.org/docs/StackMaps.html#stack-map-format

[2] https://dl.acm.org/citation.cfm?id=2633450

[3] https://llvm.org/docs/StackMaps.html#stack-map-section

Regards,

--Frej Drejhammar

David Chisnall via llvm-dev

2019-Jan-15 11:21 UTC

head link

[llvm-dev] [RFC] Introducing an explicit calling convention

On 15/01/2019 08:20, Frej Drejhammar via llvm-dev wrote:> Only parameters passed in registers are considered as the
> llvm.addressofreturnaddress intrinsic can be used to calculate the
> location of values on the callers stack.
I'm not convinced by this part of the proposal.  It is valid for x86, 
for calls that use a normal call instruction and so the return address 
is pushed onto the stack, but on pretty much any other architecture this 
is not the case: the return address is in a register and it's the 
responsibility of the callee prolog to spill it  (or to not spill it, if 
this is a leaf function), so may well not be at the start of the return 
area.  It's also unlikely to interact well with safe stack and similar 
techniques.

Perhaps more importantly, this works only for calling conventions where 
the caller is responsible for cleaning up the stack.  Some calling 
conventions (e.g. Windows' stdcall) require the callee to clean up the 
stack, supporting these would require that we be able to find on-stack 
parameters and move the stack pointer from IR.

Supporting any of these probably requires more explicit prolog and 
epilog instructions in the IR.

I'm not opposed to this in principle, and actually I'd quite like to 
move in this direction and remove our reliance on undocumented and 
inconsistent conventions between the back end and the front end for 
conveying information about ABIs.  For example, returning two 32-bit 
integers or a pair of pointers on x86-32 requires returning the result 
in a single i64 in LLVM IR (on platforms where small structs are 
returned in registers, not on Linux), which is not particularly helpful 
for analysis or consistent with any other architecture.  Given that 
front ends have to be aware of calling conventions, it would be nice if 
they could express them in the same way that the ABI references do...

David

David Greene via llvm-dev

2019-Jan-15 17:23 UTC

head link

[llvm-dev] [RFC] Introducing an explicit calling convention

David Chisnall via llvm-dev <llvm-dev at lists.llvm.org> writes:
> I'm not opposed to this in principle, and actually I'd quite like
to
> move in this direction and remove our reliance on undocumented and
> inconsistent conventions between the back end and the front end for
> conveying information about ABIs.  For example, returning two 32-bit
> integers or a pair of pointers on x86-32 requires returning the result
> in a single i64 in LLVM IR (on platforms where small structs are
> returned in registers, not on Linux), which is not particularly
> helpful for analysis or consistent with any other architecture.  Given
> that front ends have to be aware of calling conventions, it would be
> nice if they could express them in the same way that the ABI
> references do...
+1.  Coordinating ABI semantics between the frontend and LLVM is tricky.
It would be super helpful to have a formal way of expressing ABI
semantics in the IR.

                           -David

Philip Reames via llvm-dev

2019-Jan-15 23:02 UTC

head link

[llvm-dev] [RFC] Introducing an explicit calling convention

I generally support the goal - we have the same problem described in a 
bit of detail just below - but I'm really not sure about the framing of 
the solution here.

Our variation on the problem is that we have many distinct calling 
conventions which are slight variation for each other. We're adapting a 
legacy collection of hand written assembly stubs each which had a 
slightly different calling convention.  The typical difference is that 
one stub might kill a register that another preserves.  At the moment, 
we've solved this through a mixture of a bunch of custom calling 
conventions declared downstream, and normalizing stubs where possible.  
The former is tedious; the later is rather error prone. The key thing 
for us is that variation between stubs is primarily small and mostly in 
the callee saved lists.

On the framing piece, the ABI is really a property of the callee, not of 
the arguments.  As such, I think this really deserves to either be a 
first class syntax for spelling a calling convention, or an attribute.  
I'd suggest framing the description of the calling convention as "like 
this existing calling convention XXX, but w/o this callee saved 
register" or "like this existing calling convention XYZ, but with one 
argument register defined".  Possibly spellings might include:

declare "C" {i64, i64} @example(i64 %a, i64 %b)
ccoverride(noclobber={rax, rbx}, arg_reg_order={rcx, rdx}, ret_arg_regs={rdx,
rcx))
declare "C" {i64, i64} @example(i64 %a, i64 %b)
ccoverride="(noclobber={rax, rbx}, arg_reg_order={rcx, rdx},
ret_arg_regs={rdx, rcx))"

(The second one abuses String attributes horribly, but would be the 
easiest to implement.)

An alternate way of framing this would be to provide a clean interface 
for plugging in an externally defined calling convention w/o needing to 
rebuild LLVM.  This would require a custom driver, but would avoid the 
need to build LLVM.  This would solve our problem cleanly - and is 
probably what I'd get around to implementing someday - but I'm not sure 
how it matches your original use case.

Philip


On 1/15/19 12:20 AM, Frej Drejhammar via llvm-dev wrote:> Hi All,
>
> TLDR: Allow calling conventions to be defined on-the-fly for functions
> in LLVM-IR, comments are requested on the mechanism and syntax.
>
> Summary
> ======>
> This is a proposal for adding a mechanism by which LLVM can be used to
> generate code fragments adhering to an arbitrary calling
> convention. Intended use cases are: generating code intended to be
> called from the shadow of a stackmap or patchpoint; generating the
> target function of a statepoint; or simply for generating a piece of
> shell-code during reverse engineering or binary patching.
>
> Motivation
> =========>
> The LLVM assembly language provides stackmaps, patchpoints, and
> statepoints which all provide the user with the value or storage
> location of operands given to the respective intrinsic. All three
> intrinsics emit the information in a special stackmap section [3] of
> the produced object file. The previous three intrinsics are useful to
> the implementer of a JIT-compiler for an interpreted language [2] (the
> author's use case) as stackmaps can be used to incrementally extend
> blocks of native code and a statepoint can both be used as a mechanism
> to call native code and as a landing-pad for reentry from
> native-code. Other uses, such as inserting a stackmap and later
> overwriting its shadow with a call to logging function are also
> possible.
>
> The information in the stackmap section can be seen as a custom
> calling convention which is unique for this particular
> location. Unfortunately there is currently no way to define the
> details of a LLVM calling convention dynamically, as LLVM only allows
> the user to choose among a fixed set of predefined conventions.
>
> Approach
> =======>
> This proposal adds a new calling convention called 'explicitcc',
which
> can be applied to void functions. A function using the explicit
> calling convention requires that each element of the argument list has
> a parameter attribute 'hwreg(metadata)' specifying the register
from
> which the argument gets its value. An 'explicit' function can have
an
> optional 'noclobber(metadata)' function attribute to tell the
compiler
> which registers are to be treated as callee save. Additionally a new
> '@llvm.experimental.retwr(...)' (standing for return with
registers)
> intrinsic is introduced. By giving each parameter to retwr a hwreg
> attribute, it allows the 'explicit' function to return to its
caller
> with a defined register state.
>
> Only parameters passed in registers are considered as the
> llvm.addressofreturnaddress intrinsic can be used to calculate the
> location of values on the callers stack.
>
> Example
> ======>
> The following is a function which exchanges the values of rcx and rdx
> without clobbering rax and rbx.
>
> define explicitcc void @example(i64 hwreg(metadata !1) %a,
>                                  i64 hwreg(metadata !2) %b)
> 			       noclobber(metadata !0) {
>    call void (...) @llvm.experimental.retwr(i64 hwreg(metadata !2) %a,
>                                             i64 hwreg(metadata !1) %b)
>    ret void
> }
>
> !0 = !{!"rax", !"rbx"}
> !1 = !{!"rcx"}
> !2 = !{!"rdx"}
>
> Open Questions
> =============>
> Are parameter attributes the best way to encode the register
> information? The metadata reference requires adding a pointer to the
> ISD::ArgFlagsTy struct, thus growing it by 50%, is this acceptable?
> An alternative could be to instead have a function attribute which
> points to a metadata tuple with the explicit registers.
>
> References
> =========>
> [1] https://llvm.org/docs/StackMaps.html#stack-map-format
>
> [2] https://dl.acm.org/citation.cfm?id=2633450
>
> [3] https://llvm.org/docs/StackMaps.html#stack-map-section
>
> Regards,
>
> --Frej Drejhammar
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Alex Rosenberg via llvm-dev

2019-Jan-16 01:50 UTC

head link

[llvm-dev] [RFC] Introducing an explicit calling convention

One way to see if this mechanism is suitably flexible would be to check if it’s
possible to implement #pragma parameter from 68K-based Mac and Palm compilers.
This syntax allowed specifying specific registers to be used for arguments in a
similar way.

Alex
> On Jan 15, 2019, at 12:20 AM, Frej Drejhammar via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi All,
> 
> TLDR: Allow calling conventions to be defined on-the-fly for functions
> in LLVM-IR, comments are requested on the mechanism and syntax.
> 
> Summary
> ======> 
> This is a proposal for adding a mechanism by which LLVM can be used to
> generate code fragments adhering to an arbitrary calling
> convention. Intended use cases are: generating code intended to be
> called from the shadow of a stackmap or patchpoint; generating the
> target function of a statepoint; or simply for generating a piece of
> shell-code during reverse engineering or binary patching.
> 
> Motivation
> =========> 
> The LLVM assembly language provides stackmaps, patchpoints, and
> statepoints which all provide the user with the value or storage
> location of operands given to the respective intrinsic. All three
> intrinsics emit the information in a special stackmap section [3] of
> the produced object file. The previous three intrinsics are useful to
> the implementer of a JIT-compiler for an interpreted language [2] (the
> author's use case) as stackmaps can be used to incrementally extend
> blocks of native code and a statepoint can both be used as a mechanism
> to call native code and as a landing-pad for reentry from
> native-code. Other uses, such as inserting a stackmap and later
> overwriting its shadow with a call to logging function are also
> possible.
> 
> The information in the stackmap section can be seen as a custom
> calling convention which is unique for this particular
> location. Unfortunately there is currently no way to define the
> details of a LLVM calling convention dynamically, as LLVM only allows
> the user to choose among a fixed set of predefined conventions.
> 
> Approach
> =======> 
> This proposal adds a new calling convention called 'explicitcc',
which
> can be applied to void functions. A function using the explicit
> calling convention requires that each element of the argument list has
> a parameter attribute 'hwreg(metadata)' specifying the register
from
> which the argument gets its value. An 'explicit' function can have
an
> optional 'noclobber(metadata)' function attribute to tell the
compiler
> which registers are to be treated as callee save. Additionally a new
> '@llvm.experimental.retwr(...)' (standing for return with
registers)
> intrinsic is introduced. By giving each parameter to retwr a hwreg
> attribute, it allows the 'explicit' function to return to its
caller
> with a defined register state.
> 
> Only parameters passed in registers are considered as the
> llvm.addressofreturnaddress intrinsic can be used to calculate the
> location of values on the callers stack.
> 
> Example
> ======> 
> The following is a function which exchanges the values of rcx and rdx
> without clobbering rax and rbx.
> 
> define explicitcc void @example(i64 hwreg(metadata !1) %a,
>                                i64 hwreg(metadata !2) %b)
>                   noclobber(metadata !0) {
>  call void (...) @llvm.experimental.retwr(i64 hwreg(metadata !2) %a,
>                                           i64 hwreg(metadata !1) %b)
>  ret void
> }
> 
> !0 = !{!"rax", !"rbx"}
> !1 = !{!"rcx"}
> !2 = !{!"rdx"}
> 
> Open Questions
> =============> 
> Are parameter attributes the best way to encode the register
> information? The metadata reference requires adding a pointer to the
> ISD::ArgFlagsTy struct, thus growing it by 50%, is this acceptable?
> An alternative could be to instead have a function attribute which
> points to a metadata tuple with the explicit registers.
> 
> References
> =========> 
> [1] https://llvm.org/docs/StackMaps.html#stack-map-format
> 
> [2] https://dl.acm.org/citation.cfm?id=2633450
> 
> [3] https://llvm.org/docs/StackMaps.html#stack-map-section
> 
> Regards,
> 
> --Frej Drejhammar
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Frej Drejhammar via llvm-dev

2019-Jan-16 11:44 UTC

head link

[llvm-dev] [RFC] Introducing an explicit calling convention

David Chisnall via llvm-dev <llvm-dev at lists.llvm.org> writes:
> On 15/01/2019 08:20, Frej Drejhammar via llvm-dev wrote:
>> Only parameters passed in registers are considered as the
>> llvm.addressofreturnaddress intrinsic can be used to calculate the
>> location of values on the callers stack.
>
> I'm not convinced by this part of the proposal.  It is valid for x86,
> for calls that use a normal call instruction and so the return address
> is pushed onto the stack, but on pretty much any other architecture
> this is not the case: [...]
You're right, that only works for x86. A cleaner way would be to require
that if the stackpointer is associated with a parameter, the value of
that parameter should be the caller's stackpointer. The explicit CC
would then have to take this into account while producing its prologue,
but that shouldn't be worse than what we are already doing on return
[see below].
> Perhaps more importantly, this works only for calling conventions
> where the caller is responsible for cleaning up the stack.  Some
> calling conventions (e.g. Windows' stdcall) require the callee to
> clean up the stack, supporting these would require that we be able to
> find on-stack parameters and move the stack pointer from IR.
In our prototype implementation, used for an Erlang JIT, we have call
chains (not tail calls) of explicitcc functions where the leaf does a
direct return back to the initial call. For that we have implemented a
stack-adjustment function attribute which allows us to adjust the stack
pointer on return (from the stackmap we know the size of each activation
record on the stack and our runtime system knows the call tree).

--Frej

Frej Drejhammar via llvm-dev

2019-Jan-16 11:53 UTC

head link

[llvm-dev] [RFC] Introducing an explicit calling convention

Philip Reames <listmail at philipreames.com> writes:
> An alternate way of framing this would be to provide a clean interface
> for plugging in an externally defined calling convention w/o needing
> to rebuild LLVM. This would require a custom driver, but would avoid
> the need to build LLVM. This would solve our problem cleanly - and is
> probably what I'd get around to implementing someday - but I'm not
> sure how it matches your original use case.
I like this solution a lot. It would provide everything I need for my
use case and would sidestep the discussions about the syntax and exactly
what it should express. As far as I understand, it could also be used by
the ABI-lowering library suggested by Manuel Jacob and David Chisnall in
[1,2].

I'll wait a day or so to see if I get more suggestions before writing an
updated/alternative proposal.

--Frej

[1] http://lists.llvm.org/pipermail/llvm-dev/2019-January/129184.html
[2] http://lists.llvm.org/pipermail/llvm-dev/2019-January/129207.html

David Greene via llvm-dev

2019-Jan-17 18:58 UTC

head link

[llvm-dev] [RFC] Introducing an explicit calling convention

Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> writes:
> On the framing piece, the ABI is really a property of the callee, not
> of the arguments.
The *calling convention* might be so, but the ABI is definitely not.
For a given language, the ABI is a property of the target.  The ABI
covers a lot more ground than just the calling convention.  With more
complex argument types, the "calling convention" depends on a lot of
ABI
information beyond caller-save/callee-save/argument/return register
specifications.  It has to know what the types look like in memory.
>  As such, I think this really deserves to either be a first class
>syntax for spelling a calling convention, or an attribute.  I'd suggest
>framing the description of the calling convention as "like this
>existing calling convention XXX, but w/o this callee saved register" or
>"like this existing calling convention XYZ, but with one argument
>register defined".  Possibly spellings might include:
>
> declare "C" {i64, i64} @example(i64 %a, i64 %b)
> ccoverride(noclobber={rax, rbx}, arg_reg_order={rcx, rdx},
> ret_arg_regs={rdx, rcx))
> declare "C" {i64, i64} @example(i64 %a, i64 %b)
> ccoverride="(noclobber={rax, rbx}, arg_reg_order={rcx, rdx},
> ret_arg_regs={rdx, rcx))"
>
> (The second one abuses String attributes horribly, but would be the
> easiest to implement.)
This might work for simple scalar arguments but quickly breaks down in
the presence of aggregates.  Describing, for example, the layout of C
struct types and their mapping to registers in the calling convention is
non-trivial.
> An alternate way of framing this would be to provide a clean interface
> for plugging in an externally defined calling convention w/o needing
> to rebuild LLVM.  This would require a custom driver, but would avoid
> the need to build LLVM.  This would solve our problem cleanly - and is
> probably what I'd get around to implementing someday - but I'm not
> sure how it matches your original use case.
This is an interesting idea.  Keeping in mind what Manuel Jacob[1] and
David Chisnall[2] have said, what I've really wanted in my work is
something that, given a source-level function signature, a set of
argument Values and an optional Value in which to store the returned
value, would tell me how to generate the LLVM IR to pass the arguments
and store the return value.  I've also wanted something that, given a
source type, would generate an LLVM type that correctly implements the
ABI layout.  Finally, I'd want something that, given a source type, a
Value of the correspoding LLVM type and a source field access specifier,
would generate the IR to read from or write to the field.

StructType kinda-sorta provides some of the latter two, but not really.

The above is written with a C/C++ lens and other languages may need
more/different things from an ABI library.

                         -David

[1] http://lists.llvm.org/pipermail/llvm-dev/2019-January/129184.html
[2] http://lists.llvm.org/pipermail/llvm-dev/2019-January/129207.html

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - Jan 2019 - [RFC] Introducing an explicit calling convention

[llvm-dev] [RFC] Introducing an explicit calling convention

[llvm-dev] [RFC] Introducing an explicit calling convention

[llvm-dev] [RFC] Introducing an explicit calling convention

[llvm-dev] [RFC] Introducing an explicit calling convention

[llvm-dev] [RFC] Introducing an explicit calling convention

[llvm-dev] [RFC] Introducing an explicit calling convention

[llvm-dev] [RFC] Introducing an explicit calling convention

[llvm-dev] [RFC] Introducing an explicit calling convention

Seemingly Similar Threads