thr3ads.net - llvm dev - [LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Tim Northover

2012-Mar-30 19:58 UTC

[LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates

Hi,

(Forward from cfe-commits, where some backend stuff has come up).

This is an issue I've been thinking about quite a bit recently, and I agree
that the biggest problem is the one below:
> * The big thing still missing here is that there is no logic to check how
many VFP registers have already been used for other arguments.  When deciding
whether to pass an argument as a homogeneous aggregate, one of the criteria is
that the entire aggregate has to fit into the remaining unused argument
registers, right?
I tend to think that if every front-end has to implement the entire VFP PCS to
decide how to pass an HFA, something has gone wrong. So I've come to the
conclusion that the real flaw is LLVM not exposing enough information to the
target-dependent backend code for it to do the right thing. By the time the
target is involved, all that remains of any composite type is:
  * The fields completely separated if it was naturally by value. {float, float}
just gives you two "float" parameters for example.
  * i32, the ByValSize and ByValAlign if it was a byval pointer: e.g.
"{float, float}* byval".

Even in the first case there's no indication of where a composite type
begins and ends. The latter could be bludgeoned to mean "this is an HFA,
put it in VFP regs", but it would be unspeakably ugly.

I believe that if the LLVM original Type* pointer is exposed to TargetLowering
(perhaps as part of InputArg/OutputArg), then LLVM itself can decide what to do
with both Small Structures and HFAs in a sane manner: writing a front-end which
adheres to the PCS would be much easier for any source language. The worry is
the apparent layering violation by passing a Type* further down. But I'd
argue that the TargetLowering functions involved are constructing a DAG from
nothing rather than transforming an existing DAG; giving them LLVM source-level
information is justifiable.

Given that, the simpler implementation is via byval pointers, but they have some
issues with efficiency (phases like ScalarRepl can't get to work replacing
getelementptrs with extracts since the implicit alloca happens during DAG
construction -- just look at what happens to mips small structs now). With more
work, the truly natural equivalence would be possible and a front-end could
simply "call void @foo({float, float} %val)" and everything would
work.

Of course, while the second approach is nice in isolation, it may not exactly
fit in with what other backends do.

Any thoughts?

Tim.

-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium.  Thank you.

Patrik Hägglund H

2012-Apr-04 11:41 UTC

head link

[LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates

Hi Tim,
> So I've come to the conclusion that the real flaw is LLVM
> not exposing enough information to the target-dependent
> backend code for it to do the right thing.
We also had this problem. You might find this patch useful as a starting point:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048266.html

/Patrik Hägglund

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Tim Northover
Sent: den 30 mars 2012 21:58
To: Bob Wilson; Anton Korobeynikov
Cc: James Molloy; cfe-commits; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates

Hi,

(Forward from cfe-commits, where some backend stuff has come up).

This is an issue I've been thinking about quite a bit recently, and I agree
that the biggest problem is the one below:
> * The big thing still missing here is that there is no logic to check how
many VFP registers have already been used for other arguments.  When deciding
whether to pass an argument as a homogeneous aggregate, one of the criteria is
that the entire aggregate has to fit into the remaining unused argument
registers, right?
I tend to think that if every front-end has to implement the entire VFP PCS to
decide how to pass an HFA, something has gone wrong. So I've come to the
conclusion that the real flaw is LLVM not exposing enough information to the
target-dependent backend code for it to do the right thing. By the time the
target is involved, all that remains of any composite type is:
  * The fields completely separated if it was naturally by value. {float, float}
just gives you two "float" parameters for example.
  * i32, the ByValSize and ByValAlign if it was a byval pointer: e.g.
"{float, float}* byval".

Even in the first case there's no indication of where a composite type
begins and ends. The latter could be bludgeoned to mean "this is an HFA,
put it in VFP regs", but it would be unspeakably ugly.

I believe that if the LLVM original Type* pointer is exposed to TargetLowering
(perhaps as part of InputArg/OutputArg), then LLVM itself can decide what to do
with both Small Structures and HFAs in a sane manner: writing a front-end which
adheres to the PCS would be much easier for any source language. The worry is
the apparent layering violation by passing a Type* further down. But I'd
argue that the TargetLowering functions involved are constructing a DAG from
nothing rather than transforming an existing DAG; giving them LLVM source-level
information is justifiable.

Given that, the simpler implementation is via byval pointers, but they have some
issues with efficiency (phases like ScalarRepl can't get to work replacing
getelementptrs with extracts since the implicit alloca happens during DAG
construction -- just look at what happens to mips small structs now). With more
work, the truly natural equivalence would be possible and a front-end could
simply "call void @foo({float, float} %val)" and everything would
work.

Of course, while the second approach is nice in isolation, it may not exactly
fit in with what other backends do.

Any thoughts?

Tim.

-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium.  Thank you.


_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Tim Northover

2012-Apr-04 12:27 UTC

head link

[LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates

On Wednesday 04 Apr 2012 12:41:49 Patrik Hägglund H
wrote:> Hi Tim,
> 
> > So I've come to the conclusion that the real flaw is LLVM
> > not exposing enough information to the target-dependent
> > backend code for it to do the right thing.
> 
> We also had this problem. You might find this patch useful as a starting
> point: http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048266.html
Thanks. I'd considered using MachineFunction fiddling purely from a 
LowerFormalArguments perspective (I hadnn't noticed the subtlety that 
LowerCall needs this info to be passed in).

Doesn't this mean you have to replicate all the machinations of 
SelectionDAGBuilder to work out which argument you're dealing with at any 
given moment, though? I'm thinking of how it splits structs and implicitly 
adds sret parameters in particular, though there may be more I don't know
of.

How does this information get handled by the TableGen calling-conv code in 
your situation? The only way I can think of is a custom CCState which gets 
told about each argument as it passes by and allows CCCustom functions to 
access its special information (or, possibly, a CCIf with a cast).

    CCCustom<"TellCCStateAboutArg">,
    [...]
    CCIf<"cast<MyCCState>(State).isPointerArg()">,
CCAssignToReg<[P1, P2]>>,

Putting that information in the InputArg/OutputArg and incorporating it the 
CCAssignFn interface  allows a more straightforward implementation in the 
targets, in my view (for both our uses). It's also information that's
readily
available when InputArg/OutputArgs are being constructed. In your case:

    CCIf<"SourceTy->isPointerTy()", CCAssignToReg<[P1,
P2]>>;

I've got a patch which implements it for ARM and X86 (though not HFAs using 
the features yet, I'm still musing on the best interface to present there --
"HFA* byval" for target simplicity or "HFA" for user
simplicity), I'll see if
I can clean it up for other targets and send it for comparison.

The main issue with my approach is that split struct args are still tricky: 
they get identical types and another custom CCState is needed to handle them 
en-masse (to find out where we are in the struct). Optimal for that case might 
be an extra flag similar to isSplit(), but for structs.

Thoughts?

Tim.

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Mar 2012 - [LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates

[LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates

[LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates

[LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates

Possibly Parallel Threads