Alex Bradbury via llvm-dev
2017-Aug-09 16:11 UTC
[llvm-dev] [RFC] The future of the va_arg instruction
# The future of the va_arg instruction ## Summary LLVM IR currently defines a va_arg instruction, which can be used to access a vararg. Few Clang targets make use of it, and it has a number of limitations. This RFC hopes to promote discussion on its future - how 'smart' should va_arg be? Should we be aiming to transition all targets and LLVM frontends to using it? ## Background on va_arg The va_arg instruction is described in the language reference here <http://llvm.org/docs/LangRef.html#int-varargs> and here <http://llvm.org/docs/LangRef.html#i-va-arg>. When it's possible to use va_arg, it frees the frontend from worrying about manipulation of the target-specific va_list struct. This also has the potential to make analysis of the IR more straight-forward. However, va_arg can't currently be used with an aggregate type (such as a struct). The difficulty of adding support for aggregates is discussed later in this email. Which Clang targets generate va_arg? * PNaCl always uses va_arg, even for aggregates. Their ExpandVarArgs pass replaces it with appropriate loads and stors. * AArch64/Darwin generates va_arg if possible. When not possible, as for aggregates or illegal vector types, it generates the usual va_list manipulation code. It is not used for other AARch64 platforms. * A few other targets such as MSP430, Lanai and AVR seem to use it due to DefaultABIInfo Which in-tree backends support va_arg? * AArch64, ARM, Hexagon, Lanai, MSP430, Mips, PPC, Sparc, WebAssembly, X86, XCore It's worth nothing there has been some relevant prior discussion, see these messages from Will Dietz and Renato Golin <http://lists.llvm.org/pipermail/llvm-dev/2011-August/042505.html> <http://lists.llvm.org/pipermail/llvm-dev/2011-August/042509.html>. ## Options for the future of va_arg Option 1: Discourage use of va_arg and aim to remove it in the future * Most targets frontends have to directly manipulate va_list in at least some cases. You could argue we'd be better off by having varargs handled in a uniform manner, even if va_list manipulation is more explicit and target specific? Option 2: Status quo * va_arg is there. Most backends can at least expand it, though it's not clear how heavily tested this is. * There's still a question of what the reccomendation should be for frontends. If we keep va_arg as-is, would it be beneficial to modify Clang to use it when possible, while falling back to explicit manipulation if necessary like on Darwin/AArch64? Alternatively, casting may allow va_arg to be used for a wider variety of types. Option 3: Teach va_arg to handle aggregates * In this option, va_arg might reasonably be expected to handle a struct, but would not be expected to have detailed ABI-specific knowledge. e.g. it won't automagically know whether a value of a certain size/type is passed indirectly or not. In a sense, this would put support for aggregates passed as varargs on par with aggregates passed in named arguments. * Casting would be necessary in the same cases casting is required for named args * Support for aggregates could be implemented via a new module-level pass, much like PNaCl. * Alternatively, the conversion from the va_arg instruction to SelectionDAG could be modified. It might be desirable to convert the vaarg instruction to a number of loads and a new node that is responsible only for manipulating the va_list struct. Option 4: Expect va_arg to handle all ABI details * In this more extreme option, va_arg with any type would expected to generate ABI-compliant code. e.g. a va_arg with i128 would "do the right thing", regardless of whether an i128 is passed indirectly or not for the given ABI. * This would be nice, but probably only makes sense as part of a larger effort to reduce the ABI lowering burden on frontends. This sort of effort has been discussed many times, and is not a small project. ## Next steps I'd really appreciate any input on the issues here. Do people have strong feelings about the future direction of va_arg? Will GlobalISel have any effect on the relative difficulty or desirability of these options? Thanks, Alex
Martin J. O'Riordan via llvm-dev
2017-Aug-09 17:08 UTC
[llvm-dev] [RFC] The future of the va_arg instruction
I don't feel strongly about it, though since it is really an ABI issue I think it lives at a higher level than LLVM IR (Front-End language semantics). We don't use 'va_arg' in our TableGen descriptions, but we do have special handling for 'ISD::VAARG' during lowering to handle various vector lengths for which we don’t have native register support, but which should still be extracted to and from a particular register class. For example, 'v2i8' which we map to the lower half of a 32-bit SIMD register and 'v2i32' which we map to the lower half of a 128-bit SIMD register. The TTI (or TRI perhaps) would need to be able to describe these special register interactions in another way to remove the need for custom handling of these optimisations if a generic target agnostic implementation was preferred. We also have optimisations for vectors that are larger than our registers can handle, which the default implementation does not provide an optimal solution. I think the memory load/store handling could be made generic, but the optimal destination/source register(s) is not so straight-forward. Curiously, I have a group of test failures to do with 'va_arg' and aggregates that I haven't solved. Always assumed they were my fault, but perhaps not from what you describe below. MartinO -----Original Message----- From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Alex Bradbury via llvm-dev Sent: 09 August 2017 17:11 To: llvm-dev <llvm-dev at lists.llvm.org> Subject: [llvm-dev] [RFC] The future of the va_arg instruction # The future of the va_arg instruction ## Summary LLVM IR currently defines a va_arg instruction, which can be used to access a vararg. Few Clang targets make use of it, and it has a number of limitations. This RFC hopes to promote discussion on its future - how 'smart' should va_arg be? Should we be aiming to transition all targets and LLVM frontends to using it? ## Background on va_arg The va_arg instruction is described in the language reference here <http://llvm.org/docs/LangRef.html#int-varargs> and here <http://llvm.org/docs/LangRef.html#i-va-arg>. When it's possible to use va_arg, it frees the frontend from worrying about manipulation of the target-specific va_list struct. This also has the potential to make analysis of the IR more straight-forward. However, va_arg can't currently be used with an aggregate type (such as a struct). The difficulty of adding support for aggregates is discussed later in this email. Which Clang targets generate va_arg? * PNaCl always uses va_arg, even for aggregates. Their ExpandVarArgs pass replaces it with appropriate loads and stors. * AArch64/Darwin generates va_arg if possible. When not possible, as for aggregates or illegal vector types, it generates the usual va_list manipulation code. It is not used for other AARch64 platforms. * A few other targets such as MSP430, Lanai and AVR seem to use it due to DefaultABIInfo Which in-tree backends support va_arg? * AArch64, ARM, Hexagon, Lanai, MSP430, Mips, PPC, Sparc, WebAssembly, X86, XCore It's worth nothing there has been some relevant prior discussion, see these messages from Will Dietz and Renato Golin <http://lists.llvm.org/pipermail/llvm-dev/2011-August/042505.html> <http://lists.llvm.org/pipermail/llvm-dev/2011-August/042509.html>. ## Options for the future of va_arg Option 1: Discourage use of va_arg and aim to remove it in the future * Most targets frontends have to directly manipulate va_list in at least some cases. You could argue we'd be better off by having varargs handled in a uniform manner, even if va_list manipulation is more explicit and target specific? Option 2: Status quo * va_arg is there. Most backends can at least expand it, though it's not clear how heavily tested this is. * There's still a question of what the reccomendation should be for frontends. If we keep va_arg as-is, would it be beneficial to modify Clang to use it when possible, while falling back to explicit manipulation if necessary like on Darwin/AArch64? Alternatively, casting may allow va_arg to be used for a wider variety of types. Option 3: Teach va_arg to handle aggregates * In this option, va_arg might reasonably be expected to handle a struct, but would not be expected to have detailed ABI-specific knowledge. e.g. it won't automagically know whether a value of a certain size/type is passed indirectly or not. In a sense, this would put support for aggregates passed as varargs on par with aggregates passed in named arguments. * Casting would be necessary in the same cases casting is required for named args * Support for aggregates could be implemented via a new module-level pass, much like PNaCl. * Alternatively, the conversion from the va_arg instruction to SelectionDAG could be modified. It might be desirable to convert the vaarg instruction to a number of loads and a new node that is responsible only for manipulating the va_list struct. Option 4: Expect va_arg to handle all ABI details * In this more extreme option, va_arg with any type would expected to generate ABI-compliant code. e.g. a va_arg with i128 would "do the right thing", regardless of whether an i128 is passed indirectly or not for the given ABI. * This would be nice, but probably only makes sense as part of a larger effort to reduce the ABI lowering burden on frontends. This sort of effort has been discussed many times, and is not a small project. ## Next steps I'd really appreciate any input on the issues here. Do people have strong feelings about the future direction of va_arg? Will GlobalISel have any effect on the relative difficulty or desirability of these options? Thanks, Alex _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Friedman, Eli via llvm-dev
2017-Aug-09 18:38 UTC
[llvm-dev] [RFC] The future of the va_arg instruction
On 8/9/2017 9:11 AM, Alex Bradbury via llvm-dev wrote:> Option 3: Teach va_arg to handle aggregates > * In this option, va_arg might reasonably be expected to handle a struct, > but would not be expected to have detailed ABI-specific knowledge. e.g. it > won't automagically know whether a value of a certain size/type is passed > indirectly or not. In a sense, this would put support for aggregates passed > as varargs on par with aggregates passed in named arguments. > * Casting would be necessary in the same cases casting is required > for named args > * Support for aggregates could be implemented via a new module-level > pass, much like PNaCl. > * Alternatively, the conversion from the va_arg instruction to > SelectionDAG could be modified. It might be desirable to convert the vaarg > instruction to a number of loads and a new node that is responsible only for > manipulating the va_list struct.We could automatically split va_arg on an LLVM struct type into a series of va_arg calls for each of the elements of the struct. Not sure that actually helps anyone much, though. Anything more requires full type information, which isn't currently encoded into IR; for example, on x86-64, to properly lower va_arg on a struct, you need to figure out whether the struct would be passed in integer registers, floating-point registers, or memory.> ## Next steps > I'd really appreciate any input on the issues here. Do people have strong > feelings about the future direction of va_arg? Will GlobalISel have any effect > on the relative difficulty or desirability of these options? >For GlobalISel, the important bit is the mostly orthogonal question of *when* we lower va_arg. If we do it sometime before isel, we save a bit of implementation work. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Alex Bradbury via llvm-dev
2017-Aug-09 19:34 UTC
[llvm-dev] [RFC] The future of the va_arg instruction
On 9 August 2017 at 19:38, Friedman, Eli <efriedma at codeaurora.org> wrote:> On 8/9/2017 9:11 AM, Alex Bradbury via llvm-dev wrote: >> >> Option 3: Teach va_arg to handle aggregates >> * In this option, va_arg might reasonably be expected to handle a >> struct, >> but would not be expected to have detailed ABI-specific knowledge. e.g. >> it >> won't automagically know whether a value of a certain size/type is >> passed >> indirectly or not. In a sense, this would put support for aggregates >> passed >> as varargs on par with aggregates passed in named arguments. >> * Casting would be necessary in the same cases casting is required >> for named args >> * Support for aggregates could be implemented via a new module-level >> pass, much like PNaCl. >> * Alternatively, the conversion from the va_arg instruction to >> SelectionDAG could be modified. It might be desirable to convert the >> vaarg >> instruction to a number of loads and a new node that is responsible >> only for >> manipulating the va_list struct. > > > We could automatically split va_arg on an LLVM struct type into a series of > va_arg calls for each of the elements of the struct. Not sure that actually > helps anyone much, though.If converting va_arg {i8, i8} to two va_arg i8, you'd ideally ensure this results loading the two i8 values from the same slot in the vararg save area. Of course when passing structs direct for named arguments, we currently rely on the frontend coercing structs for cases like this. As such, the naive conversion shouldn't be any worse than the status quo for named arguments. Best, Alex
Alex Bradbury via llvm-dev
2017-Aug-14 09:26 UTC
[llvm-dev] [RFC] The future of the va_arg instruction
On 9 August 2017 at 19:38, Friedman, Eli <efriedma at codeaurora.org> wrote:> On 8/9/2017 9:11 AM, Alex Bradbury via llvm-dev wrote: >> >> Option 3: Teach va_arg to handle aggregates >> * In this option, va_arg might reasonably be expected to handle a >> struct, >> but would not be expected to have detailed ABI-specific knowledge. e.g. >> it >> won't automagically know whether a value of a certain size/type is >> passed >> indirectly or not. In a sense, this would put support for aggregates >> passed >> as varargs on par with aggregates passed in named arguments. >> * Casting would be necessary in the same cases casting is required >> for named args >> * Support for aggregates could be implemented via a new module-level >> pass, much like PNaCl. >> * Alternatively, the conversion from the va_arg instruction to >> SelectionDAG could be modified. It might be desirable to convert the >> vaarg >> instruction to a number of loads and a new node that is responsible >> only for >> manipulating the va_list struct. > > > We could automatically split va_arg on an LLVM struct type into a series of > va_arg calls for each of the elements of the struct. Not sure that actually > helps anyone much, though. > > Anything more requires full type information, which isn't currently encoded > into IR; for example, on x86-64, to properly lower va_arg on a struct, you > need to figure out whether the struct would be passed in integer registers, > floating-point registers, or memory.I've been thinking more about this. Firstly, if anyone has insight in to any cases where the va_arg instruction actually provides better optimisation opportunities, please do share. The va_arg IR instruction has been supported in LLVM for over a decade, but Clang doesn't generate it for the vast majority of the "top tier" targets. I'm trying to determine if it just needs more love, or if perhaps it wasn't really the right thing to express at the IR level. Is the main motivation of va_arg to allow such argument access to be specified concisely in IR, or is there a particular way it makes life easier for optimisations or analysis (and if so, which ones and at which point in compilation?). va_arg really does three things: * Calculates how to load a value of the given type * Increments the appropriate fields in the va_list struct * Loads a value of the given type The problem I see is it's fairly difficult to specialise its behaviour depending on the target. In one of the many previous threads about ABI lowering, I think someone commented that in LLVM it happens both too early and too late (in the frontend, and on the SelectionDAG). This seems to be the case here, to support targets with a more complex va_list struct featuring separate save areas for GPRs and FPRs, splitting a va_arg in to multiple operations (one per element of an aggregate) doesn't seem like it could work without heroic gymnastics in the backend. Converting the va_arg instruction to a new GETVAARG SelectionDAG node plus a series of LOADs seems like it may provide a straight-forward path to supporting aggregates on targets that use a pointer for va_list. Of course this ends up exposing loads plus offset generation in the SelectionDAG, just hiding the va_list increment behind GETVAARG. For such an approach to work, you must be able to load the given type from a contiguous region of memory, which won't always be true for targets with a more complex va_list struct. Best, Alex