thr3ads.net - llvm dev - [llvm-dev] [RFC] The future of the va

If this information is useful, please help other people find it:
Share via:

Alex Bradbury via llvm-dev

2017-Aug-09 16:11 UTC

[llvm-dev] [RFC] The future of the va_arg instruction

# The future of the va_arg instruction

## Summary
LLVM IR currently defines a va_arg instruction, which can be used to access
a vararg. Few Clang targets make use of it, and it has a number of
limitations. This RFC hopes to promote discussion on its future - how
'smart'
should va_arg be? Should we be aiming to transition all targets and LLVM
frontends to using it?

## Background on va_arg
The va_arg instruction is described in the language reference here
<http://llvm.org/docs/LangRef.html#int-varargs> and here
<http://llvm.org/docs/LangRef.html#i-va-arg>. When it's possible to
use
va_arg, it frees the frontend from worrying about manipulation of the
target-specific va_list struct. This also has the potential to make analysis
of the IR more straight-forward. However, va_arg can't currently be used
with an aggregate type (such as a struct). The difficulty of adding support
for aggregates is discussed later in this email.

Which Clang targets generate va_arg?
* PNaCl always uses va_arg, even for aggregates. Their ExpandVarArgs pass
replaces it with appropriate loads and stors.
* AArch64/Darwin generates va_arg if possible. When not possible, as for
aggregates or illegal vector types, it generates the usual va_list
manipulation code. It is not used for other AARch64 platforms.
* A few other targets such as MSP430, Lanai and AVR seem to use it due to
DefaultABIInfo

Which in-tree backends support va_arg?
* AArch64, ARM, Hexagon, Lanai, MSP430, Mips, PPC, Sparc, WebAssembly, X86,
XCore

It's worth nothing there has been some relevant prior discussion, see these
messages from Will Dietz and Renato Golin
<http://lists.llvm.org/pipermail/llvm-dev/2011-August/042505.html>
<http://lists.llvm.org/pipermail/llvm-dev/2011-August/042509.html>.

## Options for the future of va_arg

Option 1: Discourage use of va_arg and aim to remove it in the future
  * Most targets frontends have to directly manipulate va_list in at least
  some cases. You could argue we'd be better off by having varargs
  handled in a uniform manner, even if va_list manipulation is more explicit
  and target specific?

Option 2: Status quo
  * va_arg is there. Most backends can at least expand it, though it's not
  clear how heavily tested this is.
  * There's still a question of what the reccomendation should be for
  frontends. If we keep va_arg as-is, would it be beneficial to
  modify Clang to use it when possible, while falling back to explicit
  manipulation if necessary like on Darwin/AArch64? Alternatively, casting may
  allow va_arg to be used for a wider variety of types.

Option 3: Teach va_arg to handle aggregates
  * In this option, va_arg might reasonably be expected to handle a struct,
  but would not be expected to have detailed ABI-specific knowledge. e.g. it
  won't automagically know whether a value of a certain size/type is passed
  indirectly or not. In a sense, this would put support for aggregates passed
  as varargs on par with aggregates passed in named arguments.
  * Casting would be necessary in the same cases casting is required
for named args
  * Support for aggregates could be implemented via a new module-level
pass, much like PNaCl.
  * Alternatively, the conversion from the va_arg instruction to
  SelectionDAG could be modified. It might be desirable to convert the vaarg
  instruction to a number of loads and a new node that is responsible only for
  manipulating the va_list struct.

Option 4: Expect va_arg to handle all ABI details
  * In this more extreme option, va_arg with any type would expected to
  generate ABI-compliant code. e.g. a va_arg with i128 would "do the right
  thing", regardless of whether an i128 is passed indirectly or not for the
  given ABI.
  * This would be nice, but probably only makes sense as part of a larger
  effort to reduce the ABI lowering burden on frontends. This sort of effort
  has been discussed many times, and is not a small project.

## Next steps
I'd really appreciate any input on the issues here. Do people have strong
feelings about the future direction of va_arg? Will GlobalISel have any effect
on the relative difficulty or desirability of these options?

Thanks,

Alex

Martin J. O'Riordan via llvm-dev

2017-Aug-09 17:08 UTC

head link

[llvm-dev] [RFC] The future of the va_arg instruction

I don't feel strongly about it, though since it is really an ABI issue I
think it lives at a higher level than LLVM IR (Front-End language semantics).

We don't use 'va_arg' in our TableGen descriptions, but we do have
special handling for 'ISD::VAARG' during lowering to handle various
vector lengths for which we don’t have native register support, but which should
still be extracted to and from a particular register class.  For example,
'v2i8' which we map to the lower half of a 32-bit SIMD register and
'v2i32' which we map to the lower half of a 128-bit SIMD register.  The
TTI (or TRI perhaps) would need to be able to describe these special register
interactions in another way to remove the need for custom handling of these
optimisations if a generic target agnostic implementation was preferred.

We also have optimisations for vectors that are larger than our registers can
handle, which the default implementation does not provide an optimal solution.

I think the memory load/store handling could be made generic, but the optimal
destination/source register(s) is not so straight-forward.

Curiously, I have a group of test failures to do with 'va_arg' and
aggregates that I haven't solved.  Always assumed they were my fault, but
perhaps not from what you describe below.

	MartinO

-----Original Message-----
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Alex
Bradbury via llvm-dev
Sent: 09 August 2017 17:11
To: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] [RFC] The future of the va_arg instruction

# The future of the va_arg instruction

## Summary
LLVM IR currently defines a va_arg instruction, which can be used to access a
vararg. Few Clang targets make use of it, and it has a number of limitations.
This RFC hopes to promote discussion on its future - how 'smart'
should va_arg be? Should we be aiming to transition all targets and LLVM
frontends to using it?

## Background on va_arg
The va_arg instruction is described in the language reference here
<http://llvm.org/docs/LangRef.html#int-varargs> and here
<http://llvm.org/docs/LangRef.html#i-va-arg>. When it's possible to
use va_arg, it frees the frontend from worrying about manipulation of the
target-specific va_list struct. This also has the potential to make analysis of
the IR more straight-forward. However, va_arg can't currently be used with
an aggregate type (such as a struct). The difficulty of adding support for
aggregates is discussed later in this email.

Which Clang targets generate va_arg?
* PNaCl always uses va_arg, even for aggregates. Their ExpandVarArgs pass
replaces it with appropriate loads and stors.
* AArch64/Darwin generates va_arg if possible. When not possible, as for
aggregates or illegal vector types, it generates the usual va_list manipulation
code. It is not used for other AARch64 platforms.
* A few other targets such as MSP430, Lanai and AVR seem to use it due to
DefaultABIInfo

Which in-tree backends support va_arg?
* AArch64, ARM, Hexagon, Lanai, MSP430, Mips, PPC, Sparc, WebAssembly, X86,
XCore

It's worth nothing there has been some relevant prior discussion, see these
messages from Will Dietz and Renato Golin
<http://lists.llvm.org/pipermail/llvm-dev/2011-August/042505.html>
<http://lists.llvm.org/pipermail/llvm-dev/2011-August/042509.html>.

## Options for the future of va_arg

Option 1: Discourage use of va_arg and aim to remove it in the future
  * Most targets frontends have to directly manipulate va_list in at least
  some cases. You could argue we'd be better off by having varargs
  handled in a uniform manner, even if va_list manipulation is more explicit
  and target specific?

Option 2: Status quo
  * va_arg is there. Most backends can at least expand it, though it's not
  clear how heavily tested this is.
  * There's still a question of what the reccomendation should be for
  frontends. If we keep va_arg as-is, would it be beneficial to
  modify Clang to use it when possible, while falling back to explicit
  manipulation if necessary like on Darwin/AArch64? Alternatively, casting may
  allow va_arg to be used for a wider variety of types.

Option 3: Teach va_arg to handle aggregates
  * In this option, va_arg might reasonably be expected to handle a struct,
  but would not be expected to have detailed ABI-specific knowledge. e.g. it
  won't automagically know whether a value of a certain size/type is passed
  indirectly or not. In a sense, this would put support for aggregates passed
  as varargs on par with aggregates passed in named arguments.
  * Casting would be necessary in the same cases casting is required for named
args
  * Support for aggregates could be implemented via a new module-level pass,
much like PNaCl.
  * Alternatively, the conversion from the va_arg instruction to
  SelectionDAG could be modified. It might be desirable to convert the vaarg
  instruction to a number of loads and a new node that is responsible only for
  manipulating the va_list struct.

Option 4: Expect va_arg to handle all ABI details
  * In this more extreme option, va_arg with any type would expected to
  generate ABI-compliant code. e.g. a va_arg with i128 would "do the right
  thing", regardless of whether an i128 is passed indirectly or not for the
  given ABI.
  * This would be nice, but probably only makes sense as part of a larger
  effort to reduce the ABI lowering burden on frontends. This sort of effort
  has been discussed many times, and is not a small project.

## Next steps
I'd really appreciate any input on the issues here. Do people have strong
feelings about the future direction of va_arg? Will GlobalISel have any effect
on the relative difficulty or desirability of these options?

Thanks,

Alex
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Friedman, Eli via llvm-dev

2017-Aug-09 18:38 UTC

head link

[llvm-dev] [RFC] The future of the va_arg instruction

On 8/9/2017 9:11 AM, Alex Bradbury via llvm-dev wrote:> Option 3: Teach va_arg to handle aggregates
>    * In this option, va_arg might reasonably be expected to handle a
struct,
>    but would not be expected to have detailed ABI-specific knowledge. e.g.
it
>    won't automagically know whether a value of a certain size/type is
passed
>    indirectly or not. In a sense, this would put support for aggregates
passed
>    as varargs on par with aggregates passed in named arguments.
>    * Casting would be necessary in the same cases casting is required
> for named args
>    * Support for aggregates could be implemented via a new module-level
> pass, much like PNaCl.
>    * Alternatively, the conversion from the va_arg instruction to
>    SelectionDAG could be modified. It might be desirable to convert the
vaarg
>    instruction to a number of loads and a new node that is responsible only
for
>    manipulating the va_list struct.
We could automatically split va_arg on an LLVM struct type into a series 
of va_arg calls for each of the elements of the struct.  Not sure that 
actually helps anyone much, though.

Anything more requires full type information, which isn't currently 
encoded into IR; for example, on x86-64, to properly lower va_arg on a 
struct, you need to figure out whether the struct would be passed in 
integer registers, floating-point registers, or memory.
> ## Next steps
> I'd really appreciate any input on the issues here. Do people have
strong
> feelings about the future direction of va_arg? Will GlobalISel have any
effect
> on the relative difficulty or desirability of these options?
>For GlobalISel, the important bit is the mostly orthogonal question of 
*when* we lower va_arg.  If we do it sometime before isel, we save a bit 
of implementation work.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

Alex Bradbury via llvm-dev

2017-Aug-09 19:34 UTC

head link

[llvm-dev] [RFC] The future of the va_arg instruction

On 9 August 2017 at 19:38, Friedman, Eli <efriedma at codeaurora.org>
wrote:> On 8/9/2017 9:11 AM, Alex Bradbury via llvm-dev wrote:
>>
>> Option 3: Teach va_arg to handle aggregates
>>    * In this option, va_arg might reasonably be expected to handle a
>> struct,
>>    but would not be expected to have detailed ABI-specific knowledge.
e.g.
>> it
>>    won't automagically know whether a value of a certain size/type
is
>> passed
>>    indirectly or not. In a sense, this would put support for aggregates
>> passed
>>    as varargs on par with aggregates passed in named arguments.
>>    * Casting would be necessary in the same cases casting is required
>> for named args
>>    * Support for aggregates could be implemented via a new module-level
>> pass, much like PNaCl.
>>    * Alternatively, the conversion from the va_arg instruction to
>>    SelectionDAG could be modified. It might be desirable to convert the
>> vaarg
>>    instruction to a number of loads and a new node that is responsible
>> only for
>>    manipulating the va_list struct.
>
>
> We could automatically split va_arg on an LLVM struct type into a series of
> va_arg calls for each of the elements of the struct.  Not sure that
actually
> helps anyone much, though.
If converting va_arg {i8, i8} to two va_arg i8, you'd ideally ensure
this results loading the two i8 values from the same slot in the
vararg save area. Of course when passing structs direct for named
arguments, we currently rely on the frontend coercing structs for
cases like this. As such, the naive conversion shouldn't be any worse
than the status quo for named arguments.

Best,

Alex

Alex Bradbury via llvm-dev

2017-Aug-14 09:26 UTC

head link

[llvm-dev] [RFC] The future of the va_arg instruction

On 9 August 2017 at 19:38, Friedman, Eli <efriedma at codeaurora.org>
wrote:> On 8/9/2017 9:11 AM, Alex Bradbury via llvm-dev wrote:
>>
>> Option 3: Teach va_arg to handle aggregates
>>    * In this option, va_arg might reasonably be expected to handle a
>> struct,
>>    but would not be expected to have detailed ABI-specific knowledge.
e.g.
>> it
>>    won't automagically know whether a value of a certain size/type
is
>> passed
>>    indirectly or not. In a sense, this would put support for aggregates
>> passed
>>    as varargs on par with aggregates passed in named arguments.
>>    * Casting would be necessary in the same cases casting is required
>> for named args
>>    * Support for aggregates could be implemented via a new module-level
>> pass, much like PNaCl.
>>    * Alternatively, the conversion from the va_arg instruction to
>>    SelectionDAG could be modified. It might be desirable to convert the
>> vaarg
>>    instruction to a number of loads and a new node that is responsible
>> only for
>>    manipulating the va_list struct.
>
>
> We could automatically split va_arg on an LLVM struct type into a series of
> va_arg calls for each of the elements of the struct.  Not sure that
actually
> helps anyone much, though.
>
> Anything more requires full type information, which isn't currently
encoded
> into IR; for example, on x86-64, to properly lower va_arg on a struct, you
> need to figure out whether the struct would be passed in integer registers,
> floating-point registers, or memory.
I've been thinking more about this. Firstly, if anyone has insight in
to any cases where the va_arg instruction actually provides better
optimisation opportunities, please do share. The va_arg IR instruction
has been supported in LLVM for over a decade, but Clang doesn't
generate it for the vast majority of the "top tier" targets. I'm
trying to determine if it just needs more love, or if perhaps it
wasn't really the right thing to express at the IR level. Is the main
motivation of va_arg to allow such argument access to be specified
concisely in IR, or is there a particular way it makes life easier for
optimisations or analysis (and if so, which ones and at which point in
compilation?).

va_arg really does three things:
* Calculates how to load a value of the given type
* Increments the appropriate fields in the va_list struct
* Loads a value of the given type

The problem I see is it's fairly difficult to specialise its behaviour
depending on the target. In one of the many previous threads about ABI
lowering, I think someone commented that in LLVM it happens both too
early and too late (in the frontend, and on the SelectionDAG). This
seems to be the case here, to support targets with a more complex
va_list struct featuring separate save areas for GPRs and FPRs,
splitting a va_arg in to multiple operations (one per element of an
aggregate) doesn't seem like it could work without heroic gymnastics
in the backend.

Converting the va_arg instruction to a new GETVAARG SelectionDAG node
plus a series of LOADs seems like it may provide a straight-forward
path to supporting aggregates on targets that use a pointer for
va_list. Of course this ends up exposing loads plus offset generation
in the SelectionDAG, just hiding the va_list increment behind
GETVAARG. For such an approach to work, you must be able to load the
given type from a contiguous region of memory, which won't always be
true for targets with a more complex va_list struct.

Best,

Alex

llvm dev - Aug 2017 - [RFC] The future of the va_arg instruction

[llvm-dev] [RFC] The future of the va_arg instruction

[llvm-dev] [RFC] The future of the va_arg instruction

[llvm-dev] [RFC] The future of the va_arg instruction

[llvm-dev] [RFC] The future of the va_arg instruction

[llvm-dev] [RFC] The future of the va_arg instruction