Alex Bradbury via llvm-dev
2017-Mar-08 09:02 UTC
[llvm-dev] Current preferred approach for handling 'byval' struct arguments
On 7 March 2017 at 17:58, Reid Kleckner <rnk at google.com> wrote:> Today, the vast majority of target in Clang coerce aggregates passed this > way into appropriate word-sized types. They all use their own custom > heuristics to compute the LLVM types used for the coercions. It's terrible, > but this is the current consensus. > > I would like to improve the situation so that passing LLVM aggregates > directly does the right thing when the LLVM struct type and C struct type > are effectively the same, so that custom frontend lowering is required for > hard cases involving things like _Complex and unions.Thanks for the response Reid. Looking more closely, it appears that the relationship between the target's ABI and whether aggregates are represented as byval structs, and whether these are coerced to something else in Clang's ABI handling)is more complex than I first described. For instance, looking at X86 code in clang/lib/CodeGen/TargetInfo.cpp I see that small aggregates are coerced as long as all argument registers are known to be used (i.e. we know the backend will place it on the stack, as demanded by the calling convention). ARM will also coerce structures below a certain size, however the call lowering code in ARMISelLowering still has logic to split a byval aggregate between the stack and registers (why not I have to say looking at AArch64ISelLowering and the clang code it's not immediately obvious to me where aggregates get split between the stack and registers (which is quite clear in MipsTargetLowering::passByValArg). What am I missing here? It seems to me there are a few possibilities for targets where the ABI indicates aggregates may be passed in registers: * Clang always passes aggregates as byval, LLVM call lowering may assign some or all of the aggregate to registers. Seemingly nobody does this * Clang's ABI lowering code is aware of how many argument registers have been used. If they have been exhausted, then leave the aggregate as byval. If aggregate will be partially in registers and partially on the stack, then coerce to two arguments - one byval and one direct. Seemingly nobody does this. * Split responsibilities between the Clang ABI lowering and the LLVM backend lowering. If an aggregate is below a certain size, then coerce and pass it direct. Depending on the ABI, the LLVM backend still has the possibility that a byval aggregate may be passed partially in registers and the stack and should handle that appropriately. This seems to be more common Best, Alex
Alex Bradbury via llvm-dev
2017-Mar-18 15:37 UTC
[llvm-dev] Current preferred approach for handling 'byval' struct arguments
On 8 March 2017 at 09:02, Alex Bradbury <asb at asbradbury.org> wrote:> On 7 March 2017 at 17:58, Reid Kleckner <rnk at google.com> wrote: >> Today, the vast majority of target in Clang coerce aggregates passed this >> way into appropriate word-sized types. They all use their own custom >> heuristics to compute the LLVM types used for the coercions. It's terrible, >> but this is the current consensus. >> >> I would like to improve the situation so that passing LLVM aggregates >> directly does the right thing when the LLVM struct type and C struct type >> are effectively the same, so that custom frontend lowering is required for >> hard cases involving things like _Complex and unions. > > Thanks for the response Reid. Looking more closely, it appears that > the relationship between the target's ABI and whether aggregates are > represented as byval structs, and whether these are coerced to > something else in Clang's ABI handling)is more complex than I first > described.There are also tradeoffs for passing large scalar values, which I thought I'd share here in case it's useful for someone else (and indeed, if anyone has extra input). In the RISC-V calling convention, large scalars (larger than 2 GPRs) are passed indirect, just like large aggregates. e.g. an i128 or a long double on a 32-bit platform. It's tempting to let the frontend emit i128 and fp128 arguments/return values, however making the argument indirect is somewhat easier in the frontend. This is because by the time you get to the LLVM CC code the type has already been legalised and so converted to a series of word-sized values. You can detect that the arguments were formed by splitting a larger value and fix it up so it all works properly (see CC_SystemZ_I128Indirect in SystemZCallingConv.h) but this is more hassle than just having the frontend pass it indirect in the first place. When return values have the same rules, meaning an implicit parameter has to be generated it's even more complex (in fact it's not immediately obvious to me how to do that in a tablegen-based calling convention implementation). Best, Alex
Alex Bradbury via llvm-dev
2017-Mar-18 16:10 UTC
[llvm-dev] Current preferred approach for handling 'byval' struct arguments
On 18 March 2017 at 15:37, Alex Bradbury <asb at asbradbury.org> wrote:> There are also tradeoffs for passing large scalar values, which I > thought I'd share here in case it's useful for someone else (and > indeed, if anyone has extra input). > > In the RISC-V calling convention, large scalars (larger than 2 GPRs) > are passed indirect, just like large aggregates. e.g. an i128 or a > long double on a 32-bit platform. It's tempting to let the frontend > emit i128 and fp128 arguments/return values, however making the > argument indirect is somewhat easier in the frontend. This is because > by the time you get to the LLVM CC code the type has already been > legalised and so converted to a series of word-sized values. You can > detect that the arguments were formed by splitting a larger value and > fix it up so it all works properly (see CC_SystemZ_I128Indirect in > SystemZCallingConv.h) but this is more hassle than just having the > frontend pass it indirect in the first place. When return values have > the same rules, meaning an implicit parameter has to be generated it's > even more complex (in fact it's not immediately obvious to me how to > do that in a tablegen-based calling convention implementation).Sorry to respond to myself again so soon, but as usual I've spotted another issue. libcalls (e.g. fp128 equality) will be emitted in TargetLowering, which won't do the necessary pass-indirect conversion for you. Therefore the following C would generate something sensible: `long double f_fp_scalar_3(long double x) { return x; }`. But if your code tries to do anything with fp128 values (e.g. `fcmp une fp128 %1, %2` gets generated), you're stuck. I think I'm now understanding why the SystemZ backend make the choices it did. Best, Alex
Reid Kleckner via llvm-dev
2017-Mar-20 15:54 UTC
[llvm-dev] Current preferred approach for handling 'byval' struct arguments
On Sat, Mar 18, 2017 at 8:37 AM, Alex Bradbury <asb at asbradbury.org> wrote:> In the RISC-V calling convention, large scalars (larger than 2 GPRs) > are passed indirect, just like large aggregates. e.g. an i128 or a > long double on a 32-bit platform. It's tempting to let the frontend > emit i128 and fp128 arguments/return values, however making the > argument indirect is somewhat easier in the frontend.This is a great example for why the responsibilities are currently weirdly split between the frontend and backend. The more ABI lowering you do in the frontend, the more information is available for the mid-level optimizer to hack on. If the backend was responsible for creating a temporary i128 value in memory and taking its address, the mid-level would never have an opportunity to optimize those memory loads and stores, or realize that the callee never modifies its argument, making the copy is unnecessary. Of course, there are many drawbacks to the current split of responsibilities, so it's definitely a tradeoff. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170320/50d41400/attachment.html>