John McCall via llvm-dev
2016-Mar-02 20:03 UTC
[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang
> On Mar 2, 2016, at 11:33 AM, Renato Golin <renato.golin at linaro.org> wrote: > On 2 March 2016 at 18:48, John McCall <rjmccall at apple.com> wrote: >> The frontend will not tell the backend explicitly which parameters will be >> in registers; it will just pass a bunch of independent scalar values, and >> the backend will assign them to registers or the stack as appropriate. > > I'm assuming you already have code in the back-end that does that in > the way you want, as you said earlier you may want to use variable > number of registers for PCS. > > >> Our intent is to completely bypass all of the passing-structures-in-registers >> code in the backend by simply not exposing the backend to any parameters >> of aggregate type. The frontend will turn a struct into (say) an i32, a float, >> and an i8; if the first two get passed in registers and the last gets passed >> on the stack, so be it. > > How do you differentiate the @foo's below? > > struct A { i32, float }; > struct B { float, i32 }; > > define @foo (A, i32) -> @foo(i32, float, i32); > > and > > define @foo (i32, B) -> @foo(i32, float, i32);We don’t need to. We don't use the intermediary convention’s rules for aggregates. The Swift rule for aggregate arguments is literally “if it’s too complex according to <foo>, pass it indirectly; otherwise, expand it into a sequence of scalar values and pass them separately”. If that means it’s partially passed in registers and partially on the stack, that’s okay; we might need to re-assemble it in the callee, but the first part of the rule limits how expensive that can ever get.>> The only difficulty with this plan is that, when we have multiple results, we >> don’t have a choice but to return a struct type. To the extent that backends >> try to infer that the function actually needs to be sret, instead of just trying >> to find a way to return all the components of the struct type in appropriate >> registers, that will be sub-optimal for us. If that’s a pervasive problem, then >> we probably just need to introduce a swift calling convention in LLVM. > > Oh, yeah, some back-ends will fiddle with struct return. Not all > languages have single-value-return restrictions, but I think that ship > has sailed already for IR. > > That's another reason to try and pass all by pointer at the end of the > parameter list, instead of receive as an argument and return.That’s pretty sub-optimal compared to just returning in registers. Also, most backends do have the ability to return small structs in multiple registers already.>> A direct result is something that’s returned in registers. An indirect >> result is something that’s returned by storing it in an implicit out-parameter. > > Oh, I see. In that case, any assumption on the variable would have to > be invalidated, maybe use global volatile variables, or special > built-ins, so that no optimisation tries to get away with it. But that > would mess up your optimal code, especially if they have to get passed > in registers.I don’t understand what you mean here. The out-parameter is still explicit in LLVM IR. Nothing about this is novel, except that C frontends generally won’t combine indirect results with direct results. Worst case, if pervasive LLVM assumptions prevent us from combining the sret attribute with a direct result, we just won’t use the sret attribute.>> Oh, sorry, I forgot to talk about that. Yes, the frontend already rearranges >> these arguments to the end, which means the optimizer’s default behavior >> of silently dropping extra call arguments ends up doing the right thing. > > Excellent! > > >> I’m reluctant to say that the convention always requires these arguments. >> If we have to do that, we can, but I’d rather not; it would involve generating >> a lot of unnecessary IR and would probably create unnecessary >> code-generation differences, and I don’t think it would be sufficient for >> error results anyway. > > This should be ok for internal functions, but maybe not for global / > public interfaces. The ARM ABI has specific behaviour guarantees for > public interfaces (like large alignment) that would be prohibitively > bad for all functions, but ok for public ones. > > If hells break loose, you could enforce that for public interfaces only. > > >> We don’t want checking or setting the error result to actually involve memory >> access. > > And even though most of those access could be optimised away, there's > no guarantee.Right. The backend isn’t great about removing memory operations that survive to it.> Another option would be to have a special built-in to recognise > context/error variables, and plug in a late IR pass to clean up > everything. But I'd only recommend that if we can't find another way > around. > > >> The ability to call a non-throwing function as a throwing function means >> we’d have to provide this extra explicit result on every single function with >> the Swift convention, because the optimizer is definitely not going to >> gracefully handle result-type mismatches; so even a function as simple as >> func foo() -> Int32 >> would have to be lowered into IR as >> define { i32, i8* } @foo(i8*) > > Indeed, very messy. > > I'm going on a tangent, here, may be all rubbish, but... > > C++ handles exception handling with the exception being thrown > allocated in library code, not the program. If, like C++, Swift can > only handle one exception at a time, why can't the error variable be a > global? > > The ARM back-end accepts the -rreserve-r9 option, and others seem to > have similar options, so you could use that to force your global > variable to live on the platform register. > > That way, all your error handling built-ins deal with that global > variable, which the back-end knows is on registers. You will need a > special DAG node, but I'm assuming you already have/want one. You also > drop any problem with arguments and PCS, at least for the error part.Swift does not run in an independent environment; it has to interact with existing C code. That existing code does not reserve any registers globally for this use. Even if that were feasible, we don’t actually want to steal a register globally from all the C code on the system that probably never interacts with Swift. John.
Renato Golin via llvm-dev
2016-Mar-03 10:00 UTC
[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang
On 2 March 2016 at 20:03, John McCall <rjmccall at apple.com> wrote:> We don’t need to. We don't use the intermediary convention’s rules for aggregates. > The Swift rule for aggregate arguments is literally “if it’s too complex according to > <foo>, pass it indirectly; otherwise, expand it into a sequence of scalar values and > pass them separately”. If that means it’s partially passed in registers and partially > on the stack, that’s okay; we might need to re-assemble it in the callee, but the > first part of the rule limits how expensive that can ever get.Right. My worry is, then, how this plays out with ARM's AAPCS. As you said below, you *have* to interoperate with C code, so you will *have* to interoperate with AAPCS on ARM. AAPCS's rules on aggregates are not simple, but they also allow part of it in registers, part on the stack. I'm guessing you won't have the same exact rules, but similar ones, which may prove harder to implement than the former.> That’s pretty sub-optimal compared to just returning in registers. Also, most > backends do have the ability to return small structs in multiple registers already.Yes, but not all of them can return more than two, which may constrain you if you have both error and context values in a function call, in addition to the return value.> I don’t understand what you mean here. The out-parameter is still explicit in > LLVM IR. Nothing about this is novel, except that C frontends generally won’t > combine indirect results with direct results.Sorry, I had understood this, but your reply (for some reason) made me think it was a hidden contract, not an explicit argument. Ignore me, then. :)> Right. The backend isn’t great about removing memory operations that survive to it.Precisely!> Swift does not run in an independent environment; it has to interact with > existing C code. That existing code does not reserve any registers globally > for this use. Even if that were feasible, we don’t actually want to steal a > register globally from all the C code on the system that probably never > interacts with Swift.So, as Reid said, usage of built-ins might help you here. Relying on LLVM's ability to not mess up your fiddling with variable arguments seems unstable. Adding specific attributes to functions or arguments seem too invasive. So a solution would be to add a built-in in the beginning of the function to mark those arguments as special. Instead of alloca %a + load -> store + return, you could have llvm.swift.error.load(%a) -> llvm.swift.error.return(%a), which survives most of middle-end passes intact, and a late pass then change the function to return a composite type, either a structure or a larger type, that will be lowered in more than one register. This makes sure error propagation won't be optimised away, and that you can receive the error in any register (or even stack), but will always return it in the same registers (ex. on ARM, R1 for i32, R2+R3 for i64, etc). I understand this might be far off what you guys did, and I'm not trying to re-write history, just brainstorming a bit. IMO, both David and Richard are right. This is likely not a huge deal for the CC code, but we'd be silly not to take this opportunity to make it less fragile overall. cheers, --renato
John McCall via llvm-dev
2016-Mar-03 17:36 UTC
[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang
> On Mar 3, 2016, at 2:00 AM, Renato Golin <renato.golin at linaro.org> wrote: > > On 2 March 2016 at 20:03, John McCall <rjmccall at apple.com> wrote: >> We don’t need to. We don't use the intermediary convention’s rules for aggregates. >> The Swift rule for aggregate arguments is literally “if it’s too complex according to >> <foo>, pass it indirectly; otherwise, expand it into a sequence of scalar values and >> pass them separately”. If that means it’s partially passed in registers and partially >> on the stack, that’s okay; we might need to re-assemble it in the callee, but the >> first part of the rule limits how expensive that can ever get. > > Right. My worry is, then, how this plays out with ARM's AAPCS. > > As you said below, you *have* to interoperate with C code, so you will > *have* to interoperate with AAPCS on ARM.I’m not sure of your point here. We don’t use the Swift CC to call C functions. It does not matter, at all, whether the frontend lowering of an aggregate under the Swift CC resembles the frontend lowering of the same aggregate under AAPCS. I brought up interoperation with C code as a counterpoint to the idea of globally reserving a register.> AAPCS's rules on aggregates are not simple, but they also allow part > of it in registers, part on the stack. I'm guessing you won't have the > same exact rules, but similar ones, which may prove harder to > implement than the former.>> That’s pretty sub-optimal compared to just returning in registers. Also, most >> backends do have the ability to return small structs in multiple registers already. > > Yes, but not all of them can return more than two, which may constrain > you if you have both error and context values in a function call, in > addition to the return value.We do actually use a different swiftcc calling convention in IR. I don’t see any serious interop problems here. The “intermediary” convention is just the original basis of swiftcc on the target.>> I don’t understand what you mean here. The out-parameter is still explicit in >> LLVM IR. Nothing about this is novel, except that C frontends generally won’t >> combine indirect results with direct results. > > Sorry, I had understood this, but your reply (for some reason) made me > think it was a hidden contract, not an explicit argument. Ignore me, > then. :) > > >> Right. The backend isn’t great about removing memory operations that survive to it. > > Precisely! > > >> Swift does not run in an independent environment; it has to interact with >> existing C code. That existing code does not reserve any registers globally >> for this use. Even if that were feasible, we don’t actually want to steal a >> register globally from all the C code on the system that probably never >> interacts with Swift. > > So, as Reid said, usage of built-ins might help you here. > > Relying on LLVM's ability to not mess up your fiddling with variable > arguments seems unstable. Adding specific attributes to functions or > arguments seem too invasive.I’m not sure why you say that. We already do have parameter ABI override attributes with target-specific behavior in LLVM IR: sret and inreg. I can understand being uneasy with adding new swiftcc-specific attributes, though. It would be reasonable to make this more general. Attributes can be parameterized; maybe we could just say something like abi(“context”), and leave it to the CC to interpret that? Having that sort of ability might make some special cases easier for C lowering, too, come to think of it. Imagine an x86 ABI that — based on type information otherwise erased by the conversion to LLVM IR — sometimes returns a float in an SSE register and sometimes on the x86 stack. It would be very awkward to express that today, but some sort of abi(“x87”) attribute would make it easy.> So a solution would be to add a built-in > in the beginning of the function to mark those arguments as special. > > Instead of alloca %a + load -> store + return, you could have > llvm.swift.error.load(%a) -> llvm.swift.error.return(%a), which > survives most of middle-end passes intact, and a late pass then change > the function to return a composite type, either a structure or a > larger type, that will be lowered in more than one register. > > This makes sure error propagation won't be optimised away, and that > you can receive the error in any register (or even stack), but will > always return it in the same registers (ex. on ARM, R1 for i32, R2+R3 > for i64, etc). > > I understand this might be far off what you guys did, and I'm not > trying to re-write history, just brainstorming a bit. > > IMO, both David and Richard are right. This is likely not a huge deal > for the CC code, but we'd be silly not to take this opportunity to > make it less fragile overall.The lowering required for this would be very similar to the lowering that Manman’s patch does for swift-error: the backend basically does special value propagation. The main difference is that it’s completely opaque to the middle-end by default instead of looking like a load or store that ordinary memory optimizations can handle. That seems like a loss, since those optimizations would actually do the right thing. John.