Alex Bradbury via llvm-dev
2017-Aug-14  09:26 UTC
[llvm-dev] [RFC] The future of the va_arg instruction
On 9 August 2017 at 19:38, Friedman, Eli <efriedma at codeaurora.org> wrote:> On 8/9/2017 9:11 AM, Alex Bradbury via llvm-dev wrote: >> >> Option 3: Teach va_arg to handle aggregates >> * In this option, va_arg might reasonably be expected to handle a >> struct, >> but would not be expected to have detailed ABI-specific knowledge. e.g. >> it >> won't automagically know whether a value of a certain size/type is >> passed >> indirectly or not. In a sense, this would put support for aggregates >> passed >> as varargs on par with aggregates passed in named arguments. >> * Casting would be necessary in the same cases casting is required >> for named args >> * Support for aggregates could be implemented via a new module-level >> pass, much like PNaCl. >> * Alternatively, the conversion from the va_arg instruction to >> SelectionDAG could be modified. It might be desirable to convert the >> vaarg >> instruction to a number of loads and a new node that is responsible >> only for >> manipulating the va_list struct. > > > We could automatically split va_arg on an LLVM struct type into a series of > va_arg calls for each of the elements of the struct. Not sure that actually > helps anyone much, though. > > Anything more requires full type information, which isn't currently encoded > into IR; for example, on x86-64, to properly lower va_arg on a struct, you > need to figure out whether the struct would be passed in integer registers, > floating-point registers, or memory.I've been thinking more about this. Firstly, if anyone has insight in to any cases where the va_arg instruction actually provides better optimisation opportunities, please do share. The va_arg IR instruction has been supported in LLVM for over a decade, but Clang doesn't generate it for the vast majority of the "top tier" targets. I'm trying to determine if it just needs more love, or if perhaps it wasn't really the right thing to express at the IR level. Is the main motivation of va_arg to allow such argument access to be specified concisely in IR, or is there a particular way it makes life easier for optimisations or analysis (and if so, which ones and at which point in compilation?). va_arg really does three things: * Calculates how to load a value of the given type * Increments the appropriate fields in the va_list struct * Loads a value of the given type The problem I see is it's fairly difficult to specialise its behaviour depending on the target. In one of the many previous threads about ABI lowering, I think someone commented that in LLVM it happens both too early and too late (in the frontend, and on the SelectionDAG). This seems to be the case here, to support targets with a more complex va_list struct featuring separate save areas for GPRs and FPRs, splitting a va_arg in to multiple operations (one per element of an aggregate) doesn't seem like it could work without heroic gymnastics in the backend. Converting the va_arg instruction to a new GETVAARG SelectionDAG node plus a series of LOADs seems like it may provide a straight-forward path to supporting aggregates on targets that use a pointer for va_list. Of course this ends up exposing loads plus offset generation in the SelectionDAG, just hiding the va_list increment behind GETVAARG. For such an approach to work, you must be able to load the given type from a contiguous region of memory, which won't always be true for targets with a more complex va_list struct. Best, Alex
Friedman, Eli via llvm-dev
2017-Aug-14  20:12 UTC
[llvm-dev] [RFC] The future of the va_arg instruction
On 8/14/2017 2:26 AM, Alex Bradbury wrote:> On 9 August 2017 at 19:38, Friedman, Eli <efriedma at codeaurora.org> wrote: >> On 8/9/2017 9:11 AM, Alex Bradbury via llvm-dev wrote: >>> Option 3: Teach va_arg to handle aggregates >>> * In this option, va_arg might reasonably be expected to handle a >>> struct, >>> but would not be expected to have detailed ABI-specific knowledge. e.g. >>> it >>> won't automagically know whether a value of a certain size/type is >>> passed >>> indirectly or not. In a sense, this would put support for aggregates >>> passed >>> as varargs on par with aggregates passed in named arguments. >>> * Casting would be necessary in the same cases casting is required >>> for named args >>> * Support for aggregates could be implemented via a new module-level >>> pass, much like PNaCl. >>> * Alternatively, the conversion from the va_arg instruction to >>> SelectionDAG could be modified. It might be desirable to convert the >>> vaarg >>> instruction to a number of loads and a new node that is responsible >>> only for >>> manipulating the va_list struct. >> >> We could automatically split va_arg on an LLVM struct type into a series of >> va_arg calls for each of the elements of the struct. Not sure that actually >> helps anyone much, though. >> >> Anything more requires full type information, which isn't currently encoded >> into IR; for example, on x86-64, to properly lower va_arg on a struct, you >> need to figure out whether the struct would be passed in integer registers, >> floating-point registers, or memory. > I've been thinking more about this. Firstly, if anyone has insight in > to any cases where the va_arg instruction actually provides better > optimisation opportunities, please do share. The va_arg IR instruction > has been supported in LLVM for over a decade, but Clang doesn't > generate it for the vast majority of the "top tier" targets. I'm > trying to determine if it just needs more love, or if perhaps it > wasn't really the right thing to express at the IR level. Is the main > motivation of va_arg to allow such argument access to be specified > concisely in IR, or is there a particular way it makes life easier for > optimisations or analysis (and if so, which ones and at which point in > compilation?).We don't have any optimizations that touch va_arg, as far as I know. It's an instruction mostly because it got added when LLVM was first written, and nobody has bothered to try to get rid of it.> va_arg really does three things: > * Calculates how to load a value of the given type > * Increments the appropriate fields in the va_list struct > * Loads a value of the given type > > The problem I see is it's fairly difficult to specialise its behaviour > depending on the target. In one of the many previous threads about ABI > lowering, I think someone commented that in LLVM it happens both too > early and too late (in the frontend, and on the SelectionDAG). This > seems to be the case here, to support targets with a more complex > va_list struct featuring separate save areas for GPRs and FPRs, > splitting a va_arg in to multiple operations (one per element of an > aggregate) doesn't seem like it could work without heroic gymnastics > in the backend. > > Converting the va_arg instruction to a new GETVAARG SelectionDAG node > plus a series of LOADs seems like it may provide a straight-forward > path to supporting aggregates on targets that use a pointer for > va_list. Of course this ends up exposing loads plus offset generation > in the SelectionDAG, just hiding the va_list increment behind > GETVAARG. For such an approach to work, you must be able to load the > given type from a contiguous region of memory, which won't always be > true for targets with a more complex va_list struct.Really, IMO, we shouldn't have a va_arg instruction at all, but deprecating it is too much work to be worthwhile. :) If we are going to keep it around, though, we should really do the lowering in IR, before we hit SelectionDAG. Like you explained, it's just a bunch of load and store operations, so there isn't any reason to wait, and transforming IR is much easier than lowering in SelectionDAG. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Alex Bradbury via llvm-dev
2017-Aug-17  10:27 UTC
[llvm-dev] [RFC] The future of the va_arg instruction
On 14 August 2017 at 21:12, Friedman, Eli <efriedma at codeaurora.org> wrote:> We don't have any optimizations that touch va_arg, as far as I know. It's > an instruction mostly because it got added when LLVM was first written, and > nobody has bothered to try to get rid of it.I couldn't find any optimisations that directly touch it either, and it doesn't sound like people are rushing forwards with examples where generating IR with explicit va_list manipulation results in pessimised codegen.>> va_arg really does three things: >> * Calculates how to load a value of the given type >> * Increments the appropriate fields in the va_list struct >> * Loads a value of the given type >> >> The problem I see is it's fairly difficult to specialise its behaviour >> depending on the target. In one of the many previous threads about ABI >> lowering, I think someone commented that in LLVM it happens both too >> early and too late (in the frontend, and on the SelectionDAG). This >> seems to be the case here, to support targets with a more complex >> va_list struct featuring separate save areas for GPRs and FPRs, >> splitting a va_arg in to multiple operations (one per element of an >> aggregate) doesn't seem like it could work without heroic gymnastics >> in the backend. >> >> Converting the va_arg instruction to a new GETVAARG SelectionDAG node >> plus a series of LOADs seems like it may provide a straight-forward >> path to supporting aggregates on targets that use a pointer for >> va_list. Of course this ends up exposing loads plus offset generation >> in the SelectionDAG, just hiding the va_list increment behind >> GETVAARG. For such an approach to work, you must be able to load the >> given type from a contiguous region of memory, which won't always be >> true for targets with a more complex va_list struct. > > > Really, IMO, we shouldn't have a va_arg instruction at all, but deprecating > it is too much work to be worthwhile. :) > > If we are going to keep it around, though, we should really do the lowering > in IR, before we hit SelectionDAG. Like you explained, it's just a bunch of > load and store operations, so there isn't any reason to wait, and > transforming IR is much easier than lowering in SelectionDAG.I agree. It seems there's an argument that va_arg could be much more useful in the future, as part of an IR-level ABI lowering. Until that exists it's perhaps not a big deal either way. I'm CCing Tim Northover who committed the Clang AArch64 Darwin ABI lowering, and perhaps has a view on whether there's much value in using va_arg when possible. va_list manipulation doesn't produce that much noise in the IR when va_list is just a pointer. I suspect it's more noisy when va_list is a struct, but there's not a clear path for expanding va_arg to handle aggregates for those cases outside of an IR-level transform. I'm also adding in Will Dietz, who has been involved in previous discussions around this topic. Best, Alex