Tim Northover
2012-Mar-30 19:58 UTC
[LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates
Hi, (Forward from cfe-commits, where some backend stuff has come up). This is an issue I've been thinking about quite a bit recently, and I agree that the biggest problem is the one below:> * The big thing still missing here is that there is no logic to check how many VFP registers have already been used for other arguments. When deciding whether to pass an argument as a homogeneous aggregate, one of the criteria is that the entire aggregate has to fit into the remaining unused argument registers, right?I tend to think that if every front-end has to implement the entire VFP PCS to decide how to pass an HFA, something has gone wrong. So I've come to the conclusion that the real flaw is LLVM not exposing enough information to the target-dependent backend code for it to do the right thing. By the time the target is involved, all that remains of any composite type is: * The fields completely separated if it was naturally by value. {float, float} just gives you two "float" parameters for example. * i32, the ByValSize and ByValAlign if it was a byval pointer: e.g. "{float, float}* byval". Even in the first case there's no indication of where a composite type begins and ends. The latter could be bludgeoned to mean "this is an HFA, put it in VFP regs", but it would be unspeakably ugly. I believe that if the LLVM original Type* pointer is exposed to TargetLowering (perhaps as part of InputArg/OutputArg), then LLVM itself can decide what to do with both Small Structures and HFAs in a sane manner: writing a front-end which adheres to the PCS would be much easier for any source language. The worry is the apparent layering violation by passing a Type* further down. But I'd argue that the TargetLowering functions involved are constructing a DAG from nothing rather than transforming an existing DAG; giving them LLVM source-level information is justifiable. Given that, the simpler implementation is via byval pointers, but they have some issues with efficiency (phases like ScalarRepl can't get to work replacing getelementptrs with extracts since the implicit alloca happens during DAG construction -- just look at what happens to mips small structs now). With more work, the truly natural equivalence would be possible and a front-end could simply "call void @foo({float, float} %val)" and everything would work. Of course, while the second approach is nice in isolation, it may not exactly fit in with what other backends do. Any thoughts? Tim. -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Patrik Hägglund H
2012-Apr-04 11:41 UTC
[LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates
Hi Tim,> So I've come to the conclusion that the real flaw is LLVM > not exposing enough information to the target-dependent > backend code for it to do the right thing.We also had this problem. You might find this patch useful as a starting point: http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048266.html /Patrik Hägglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Tim Northover Sent: den 30 mars 2012 21:58 To: Bob Wilson; Anton Korobeynikov Cc: James Molloy; cfe-commits; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates Hi, (Forward from cfe-commits, where some backend stuff has come up). This is an issue I've been thinking about quite a bit recently, and I agree that the biggest problem is the one below:> * The big thing still missing here is that there is no logic to check how many VFP registers have already been used for other arguments. When deciding whether to pass an argument as a homogeneous aggregate, one of the criteria is that the entire aggregate has to fit into the remaining unused argument registers, right?I tend to think that if every front-end has to implement the entire VFP PCS to decide how to pass an HFA, something has gone wrong. So I've come to the conclusion that the real flaw is LLVM not exposing enough information to the target-dependent backend code for it to do the right thing. By the time the target is involved, all that remains of any composite type is: * The fields completely separated if it was naturally by value. {float, float} just gives you two "float" parameters for example. * i32, the ByValSize and ByValAlign if it was a byval pointer: e.g. "{float, float}* byval". Even in the first case there's no indication of where a composite type begins and ends. The latter could be bludgeoned to mean "this is an HFA, put it in VFP regs", but it would be unspeakably ugly. I believe that if the LLVM original Type* pointer is exposed to TargetLowering (perhaps as part of InputArg/OutputArg), then LLVM itself can decide what to do with both Small Structures and HFAs in a sane manner: writing a front-end which adheres to the PCS would be much easier for any source language. The worry is the apparent layering violation by passing a Type* further down. But I'd argue that the TargetLowering functions involved are constructing a DAG from nothing rather than transforming an existing DAG; giving them LLVM source-level information is justifiable. Given that, the simpler implementation is via byval pointers, but they have some issues with efficiency (phases like ScalarRepl can't get to work replacing getelementptrs with extracts since the implicit alloca happens during DAG construction -- just look at what happens to mips small structs now). With more work, the truly natural equivalence would be possible and a front-end could simply "call void @foo({float, float} %val)" and everything would work. Of course, while the second approach is nice in isolation, it may not exactly fit in with what other backends do. Any thoughts? Tim. -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Tim Northover
2012-Apr-04 12:27 UTC
[LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates
On Wednesday 04 Apr 2012 12:41:49 Patrik Hägglund H wrote:> Hi Tim, > > > So I've come to the conclusion that the real flaw is LLVM > > not exposing enough information to the target-dependent > > backend code for it to do the right thing. > > We also had this problem. You might find this patch useful as a starting > point: http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048266.htmlThanks. I'd considered using MachineFunction fiddling purely from a LowerFormalArguments perspective (I hadnn't noticed the subtlety that LowerCall needs this info to be passed in). Doesn't this mean you have to replicate all the machinations of SelectionDAGBuilder to work out which argument you're dealing with at any given moment, though? I'm thinking of how it splits structs and implicitly adds sret parameters in particular, though there may be more I don't know of. How does this information get handled by the TableGen calling-conv code in your situation? The only way I can think of is a custom CCState which gets told about each argument as it passes by and allows CCCustom functions to access its special information (or, possibly, a CCIf with a cast). CCCustom<"TellCCStateAboutArg">, [...] CCIf<"cast<MyCCState>(State).isPointerArg()">, CCAssignToReg<[P1, P2]>>, Putting that information in the InputArg/OutputArg and incorporating it the CCAssignFn interface allows a more straightforward implementation in the targets, in my view (for both our uses). It's also information that's readily available when InputArg/OutputArgs are being constructed. In your case: CCIf<"SourceTy->isPointerTy()", CCAssignToReg<[P1, P2]>>; I've got a patch which implements it for ARM and X86 (though not HFAs using the features yet, I'm still musing on the best interface to present there -- "HFA* byval" for target simplicity or "HFA" for user simplicity), I'll see if I can clean it up for other targets and send it for comparison. The main issue with my approach is that split struct args are still tricky: they get identical types and another custom CCState is needed to handle them en-masse (to find out where we are in the struct). Optimal for that case might be an extra flag similar to isSplit(), but for structs. Thoughts? Tim.
Possibly Parallel Threads
- [LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates
- [LLVMdev] [cfe-commits] Fix handling of ARM homogenous aggregates
- [LLVMdev] [cfe-commits] [Patch?] Fix handling of ARM homogenous aggregates
- [LLVMdev] [llvm-commits] [cfe-commits] [Patch?] Fix handling of ARM homogenous aggregates
- [LLVMdev] [llvm-commits] [cfe-commits] [Patch?] Fix handling of ARM homogenous aggregates