thr3ads.net - llvm dev - [llvm-dev] [RFC] The future of the va

If this information is useful, please help other people find it:
Share via:

Alex Bradbury via llvm-dev

2017-Aug-14 09:26 UTC

[llvm-dev] [RFC] The future of the va_arg instruction

On 9 August 2017 at 19:38, Friedman, Eli <efriedma at codeaurora.org>
wrote:> On 8/9/2017 9:11 AM, Alex Bradbury via llvm-dev wrote:
>>
>> Option 3: Teach va_arg to handle aggregates
>>    * In this option, va_arg might reasonably be expected to handle a
>> struct,
>>    but would not be expected to have detailed ABI-specific knowledge.
e.g.
>> it
>>    won't automagically know whether a value of a certain size/type
is
>> passed
>>    indirectly or not. In a sense, this would put support for aggregates
>> passed
>>    as varargs on par with aggregates passed in named arguments.
>>    * Casting would be necessary in the same cases casting is required
>> for named args
>>    * Support for aggregates could be implemented via a new module-level
>> pass, much like PNaCl.
>>    * Alternatively, the conversion from the va_arg instruction to
>>    SelectionDAG could be modified. It might be desirable to convert the
>> vaarg
>>    instruction to a number of loads and a new node that is responsible
>> only for
>>    manipulating the va_list struct.
>
>
> We could automatically split va_arg on an LLVM struct type into a series of
> va_arg calls for each of the elements of the struct.  Not sure that
actually
> helps anyone much, though.
>
> Anything more requires full type information, which isn't currently
encoded
> into IR; for example, on x86-64, to properly lower va_arg on a struct, you
> need to figure out whether the struct would be passed in integer registers,
> floating-point registers, or memory.
I've been thinking more about this. Firstly, if anyone has insight in
to any cases where the va_arg instruction actually provides better
optimisation opportunities, please do share. The va_arg IR instruction
has been supported in LLVM for over a decade, but Clang doesn't
generate it for the vast majority of the "top tier" targets. I'm
trying to determine if it just needs more love, or if perhaps it
wasn't really the right thing to express at the IR level. Is the main
motivation of va_arg to allow such argument access to be specified
concisely in IR, or is there a particular way it makes life easier for
optimisations or analysis (and if so, which ones and at which point in
compilation?).

va_arg really does three things:
* Calculates how to load a value of the given type
* Increments the appropriate fields in the va_list struct
* Loads a value of the given type

The problem I see is it's fairly difficult to specialise its behaviour
depending on the target. In one of the many previous threads about ABI
lowering, I think someone commented that in LLVM it happens both too
early and too late (in the frontend, and on the SelectionDAG). This
seems to be the case here, to support targets with a more complex
va_list struct featuring separate save areas for GPRs and FPRs,
splitting a va_arg in to multiple operations (one per element of an
aggregate) doesn't seem like it could work without heroic gymnastics
in the backend.

Converting the va_arg instruction to a new GETVAARG SelectionDAG node
plus a series of LOADs seems like it may provide a straight-forward
path to supporting aggregates on targets that use a pointer for
va_list. Of course this ends up exposing loads plus offset generation
in the SelectionDAG, just hiding the va_list increment behind
GETVAARG. For such an approach to work, you must be able to load the
given type from a contiguous region of memory, which won't always be
true for targets with a more complex va_list struct.

Best,

Alex

Friedman, Eli via llvm-dev

2017-Aug-14 20:12 UTC

head link

[llvm-dev] [RFC] The future of the va_arg instruction

On 8/14/2017 2:26 AM, Alex Bradbury wrote:> On 9 August 2017 at 19:38, Friedman, Eli <efriedma at codeaurora.org>
wrote:
>> On 8/9/2017 9:11 AM, Alex Bradbury via llvm-dev wrote:
>>> Option 3: Teach va_arg to handle aggregates
>>>     * In this option, va_arg might reasonably be expected to handle
a
>>> struct,
>>>     but would not be expected to have detailed ABI-specific
knowledge. e.g.
>>> it
>>>     won't automagically know whether a value of a certain
size/type is
>>> passed
>>>     indirectly or not. In a sense, this would put support for
aggregates
>>> passed
>>>     as varargs on par with aggregates passed in named arguments.
>>>     * Casting would be necessary in the same cases casting is
required
>>> for named args
>>>     * Support for aggregates could be implemented via a new
module-level
>>> pass, much like PNaCl.
>>>     * Alternatively, the conversion from the va_arg instruction to
>>>     SelectionDAG could be modified. It might be desirable to
convert the
>>> vaarg
>>>     instruction to a number of loads and a new node that is
responsible
>>> only for
>>>     manipulating the va_list struct.
>>
>> We could automatically split va_arg on an LLVM struct type into a
series of
>> va_arg calls for each of the elements of the struct.  Not sure that
actually
>> helps anyone much, though.
>>
>> Anything more requires full type information, which isn't currently
encoded
>> into IR; for example, on x86-64, to properly lower va_arg on a struct,
you
>> need to figure out whether the struct would be passed in integer
registers,
>> floating-point registers, or memory.
> I've been thinking more about this. Firstly, if anyone has insight in
> to any cases where the va_arg instruction actually provides better
> optimisation opportunities, please do share. The va_arg IR instruction
> has been supported in LLVM for over a decade, but Clang doesn't
> generate it for the vast majority of the "top tier" targets.
I'm
> trying to determine if it just needs more love, or if perhaps it
> wasn't really the right thing to express at the IR level. Is the main
> motivation of va_arg to allow such argument access to be specified
> concisely in IR, or is there a particular way it makes life easier for
> optimisations or analysis (and if so, which ones and at which point in
> compilation?).
We don't have any optimizations that touch va_arg, as far as I know.  
It's an instruction mostly because it got added when LLVM was first 
written, and nobody has bothered to try to get rid of it.
> va_arg really does three things:
> * Calculates how to load a value of the given type
> * Increments the appropriate fields in the va_list struct
> * Loads a value of the given type
>
> The problem I see is it's fairly difficult to specialise its behaviour
> depending on the target. In one of the many previous threads about ABI
> lowering, I think someone commented that in LLVM it happens both too
> early and too late (in the frontend, and on the SelectionDAG). This
> seems to be the case here, to support targets with a more complex
> va_list struct featuring separate save areas for GPRs and FPRs,
> splitting a va_arg in to multiple operations (one per element of an
> aggregate) doesn't seem like it could work without heroic gymnastics
> in the backend.
>
> Converting the va_arg instruction to a new GETVAARG SelectionDAG node
> plus a series of LOADs seems like it may provide a straight-forward
> path to supporting aggregates on targets that use a pointer for
> va_list. Of course this ends up exposing loads plus offset generation
> in the SelectionDAG, just hiding the va_list increment behind
> GETVAARG. For such an approach to work, you must be able to load the
> given type from a contiguous region of memory, which won't always be
> true for targets with a more complex va_list struct.
Really, IMO, we shouldn't have a va_arg instruction at all, but 
deprecating it is too much work to be worthwhile. :)

If we are going to keep it around, though, we should really do the 
lowering in IR, before we hit SelectionDAG.  Like you explained, it's 
just a bunch of load and store operations, so there isn't any reason to 
wait, and transforming IR is much easier than lowering in SelectionDAG.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

Alex Bradbury via llvm-dev

2017-Aug-17 10:27 UTC

head link

[llvm-dev] [RFC] The future of the va_arg instruction

On 14 August 2017 at 21:12, Friedman, Eli <efriedma at codeaurora.org>
wrote:> We don't have any optimizations that touch va_arg, as far as I know. 
It's
> an instruction mostly because it got added when LLVM was first written, and
> nobody has bothered to try to get rid of it.
I couldn't find any optimisations that directly touch it either, and
it doesn't sound like people are rushing forwards with examples where
generating IR with explicit va_list manipulation results in pessimised
codegen.
>> va_arg really does three things:
>> * Calculates how to load a value of the given type
>> * Increments the appropriate fields in the va_list struct
>> * Loads a value of the given type
>>
>> The problem I see is it's fairly difficult to specialise its
behaviour
>> depending on the target. In one of the many previous threads about ABI
>> lowering, I think someone commented that in LLVM it happens both too
>> early and too late (in the frontend, and on the SelectionDAG). This
>> seems to be the case here, to support targets with a more complex
>> va_list struct featuring separate save areas for GPRs and FPRs,
>> splitting a va_arg in to multiple operations (one per element of an
>> aggregate) doesn't seem like it could work without heroic
gymnastics
>> in the backend.
>>
>> Converting the va_arg instruction to a new GETVAARG SelectionDAG node
>> plus a series of LOADs seems like it may provide a straight-forward
>> path to supporting aggregates on targets that use a pointer for
>> va_list. Of course this ends up exposing loads plus offset generation
>> in the SelectionDAG, just hiding the va_list increment behind
>> GETVAARG. For such an approach to work, you must be able to load the
>> given type from a contiguous region of memory, which won't always
be
>> true for targets with a more complex va_list struct.
>
>
> Really, IMO, we shouldn't have a va_arg instruction at all, but
deprecating
> it is too much work to be worthwhile. :)
>
> If we are going to keep it around, though, we should really do the lowering
> in IR, before we hit SelectionDAG.  Like you explained, it's just a
bunch of
> load and store operations, so there isn't any reason to wait, and
> transforming IR is much easier than lowering in SelectionDAG.
I agree. It seems there's an argument that va_arg could be much more
useful in the future, as part of an IR-level ABI lowering. Until that
exists it's perhaps not a big deal either way. I'm CCing Tim Northover
who committed the Clang AArch64 Darwin ABI lowering, and perhaps has a
view on whether there's much value in using va_arg when possible.

va_list manipulation doesn't produce that much noise in the IR when
va_list is just a pointer. I suspect it's more noisy when va_list is a
struct, but there's not a clear path for expanding va_arg to handle
aggregates for those cases outside of an IR-level transform. I'm also
adding in Will Dietz, who has been involved in previous discussions
around this topic.

Best,

Alex

llvm dev - Aug 2017 - [RFC] The future of the va_arg instruction

[llvm-dev] [RFC] The future of the va_arg instruction

[llvm-dev] [RFC] The future of the va_arg instruction

[llvm-dev] [RFC] The future of the va_arg instruction