John McCall via llvm-dev
2016-Mar-02 18:48 UTC
[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang
> On Mar 2, 2016, at 1:33 AM, Renato Golin <renato.golin at linaro.org> wrote: > > On 2 March 2016 at 01:14, John McCall via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Hi, all. >> - We sometimes want to return more values in registers than the convention normally does, and we want to be able to use both integer and floating-point registers. For example, we want to return a value of struct A, above, purely in registers. For the most part, I don’t think this is a problem to layer on to an existing IR convention: C frontends will generally use explicit sret arguments when the convention requires them, and so the Swift lowering will produce result types that don’t have legal interpretations as direct results under the C convention. But we can use a different IR convention if it’s necessary to disambiguate Swift’s desired treatment from the target's normal attempts to retroactively match the C convention. > > Is this a back-end decision, or do you expect the front-end to tell > the back-end (via annotation) which parameters will be in regs? Unless > you also have back-end patches, I don't think the latter is going to > work well. For example, the ARM back-end has a huge section related to > passing structures in registers, which conforms to the ARM EABI, not > necessarily your Swift ABI. > > Not to mention that this creates the versioning problem, where two > different LLVM releases can produce slightly different PCS register > usage (due to new features or bugs), and thus require re-compilation > of all libraries. This, however, is not a problem for your current > request, just a comment.The frontend will not tell the backend explicitly which parameters will be in registers; it will just pass a bunch of independent scalar values, and the backend will assign them to registers or the stack as appropriate. Our intent is to completely bypass all of the passing-structures-in-registers code in the backend by simply not exposing the backend to any parameters of aggregate type. The frontend will turn a struct into (say) an i32, a float, and an i8; if the first two get passed in registers and the last gets passed on the stack, so be it. The only difficulty with this plan is that, when we have multiple results, we don’t have a choice but to return a struct type. To the extent that backends try to infer that the function actually needs to be sret, instead of just trying to find a way to return all the components of the struct type in appropriate registers, that will be sub-optimal for us. If that’s a pervasive problem, then we probably just need to introduce a swift calling convention in LLVM.>> - We sometimes have both direct results and indirect results. It would be nice to take advantage of the sret convention even in the presence of direct results on targets that do use a different (profitable) ABI treatment for it. I don’t know how well-supported this is in LLVM. > > I'm not sure what you mean by direct or indirect results here. But if > this is a language feature, as long as the IR semantics is correct, I > don't see any problem.A direct result is something that’s returned in registers. An indirect result is something that’s returned by storing it in an implicit out-parameter. I would like to be able to form calls like this: %temp = alloca %my_big_struct_type call i32 @my_swift_function(sret %my_big_struct_type* %temp) This doesn’t normally happen today in LLVM IR because when C frontends use an sret result, they set the direct IR result to void. Like I said, I don’t think this is a serious problem, but I wanted to float the idea before assuming that.>> - We want a special “context” treatment for a certain argument. A pointer-sized value is passed in an integer register; the same value should be present in that register after the call. In some cases, the caller may pass a context argument to a function that doesn’t expect one, and this should not trigger undefined behavior. Both of these rules suggest that the context argument be passed in a register which is normally callee-save. > > I think it's going to be harder to get all opts to behave in the way > you want them to. And may also require back-end changes to make sure > those registers are saved in the right frame, or reserved from > register allocation, or popped back after the call, etc.I don’t expect the optimizer to be a problem, but I just realized that the main reason is something I didn’t talk about in my first post. See below. That this will require some support from the backend is a given.>> The Clang impact is relatively minor; it is focused on allowing the Swift runtime to define functions that use the convention. It adds a new calling convention attribute, a few new parameter attributes constrained to that calling convention, and some relatively un-invasive call lowering code in IR generation. > > This sounds like a normal change to support language perks, no big > deal. But I'm not a Clang expert, nor I've seen the code. > > >> - Using sret together with a direct result may or may not “just work". I certainly don’t see a reason why it shouldn’t work in the middle-end. Obviously, some targets can’t support it, but we can avoid doing this on those targets. > > All sret problems I've seen were back-end related (ABI conformance). > But I wasn't paying attention to the middle-end. > > >> - Opting in to the two argument treatments requires new parameter attributes. We discussed using separate calling conventions; unfortunately, error and context arguments can appear either separately or together, so we’d really need several new conventions for all the valid combinations. Furthermore, calling a context-free function with an ignored context argument could turn into a call to a function using a mismatched calling convention, which LLVM IR generally treats as undefined behavior. Also, it wasn’t obvious that just a calling convention would be sufficient for the error treatment; see the next bullet. > > Why not treat context and error like C's default arguments? Or like > named arguments in Python?> > Surely the front-end can easily re-order the arguments (according to > some ABI) and make sure every function that may be called with > context/error has it as the last arguments, and default them to null. > You can then later do an inter-procedural pass to clean it up for all > static functions that are never called with those arguments, etc.Oh, sorry, I forgot to talk about that. Yes, the frontend already rearranges these arguments to the end, which means the optimizer’s default behavior of silently dropping extra call arguments ends up doing the right thing. I’m reluctant to say that the convention always requires these arguments. If we have to do that, we can, but I’d rather not; it would involve generating a lot of unnecessary IR and would probably create unnecessary code-generation differences, and I don’t think it would be sufficient for error results anyway.>> - The “error” treatment requires some way to (1) pass and receive the value in the caller and (2) receive and change the value in the callee. The best way we could think of to represent this was to pretend that the argument is actually passed indirectly; the value is “passed” by storing to the pointer and “received” by loading from it. To simplify backend lowering, we require the argument to be a special kind of swifterror alloca that can only be loaded, stored, and passed as a swifterror argument; in the callee, swifterror arguments have similar restrictions. This ends up being fairly invasive in the backend, unfortunately. > > I think this logic is too high-level for the back-end to deal with. > This looks like a simple run of the mill pointer argument that can be > null (and is by default), but if it's not, the callee can change the > object pointed by but not the pointer itself, ie, "void foo(exception > * const Error = null)". I don't understand why you need this argument > to be of a special kind of SDNode.We don’t want checking or setting the error result to actually involve memory access. An alternative to the pseudo-indirect-result approach would be to model the result as an explicit result. That would really mess up the IR, though. The ability to call a non-throwing function as a throwing function means we’d have to provide this extra explicit result on every single function with the Swift convention, because the optimizer is definitely not going to gracefully handle result-type mismatches; so even a function as simple as func foo() -> Int32 would have to be lowered into IR as define { i32, i8* } @foo(i8*) John.
John McCall via llvm-dev
2016-Mar-02 19:01 UTC
[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang
> On Mar 2, 2016, at 10:48 AM, John McCall via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> On Mar 2, 2016, at 1:33 AM, Renato Golin <renato.golin at linaro.org> wrote: >> On 2 March 2016 at 01:14, John McCall via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >>> Hi, all. >>> - We sometimes want to return more values in registers than the convention normally does, and we want to be able to use both integer and floating-point registers. For example, we want to return a value of struct A, above, purely in registers. For the most part, I don’t think this is a problem to layer on to an existing IR convention: C frontends will generally use explicit sret arguments when the convention requires them, and so the Swift lowering will produce result types that don’t have legal interpretations as direct results under the C convention. But we can use a different IR convention if it’s necessary to disambiguate Swift’s desired treatment from the target's normal attempts to retroactively match the C convention. >> >> Is this a back-end decision, or do you expect the front-end to tell >> the back-end (via annotation) which parameters will be in regs? Unless >> you also have back-end patches, I don't think the latter is going to >> work well. For example, the ARM back-end has a huge section related to >> passing structures in registers, which conforms to the ARM EABI, not >> necessarily your Swift ABI. >> >> Not to mention that this creates the versioning problem, where two >> different LLVM releases can produce slightly different PCS register >> usage (due to new features or bugs), and thus require re-compilation >> of all libraries. This, however, is not a problem for your current >> request, just a comment. > > The frontend will not tell the backend explicitly which parameters will be > in registers; it will just pass a bunch of independent scalar values, and > the backend will assign them to registers or the stack as appropriate. > > Our intent is to completely bypass all of the passing-structures-in-registers > code in the backend by simply not exposing the backend to any parameters > of aggregate type. The frontend will turn a struct into (say) an i32, a float, > and an i8; if the first two get passed in registers and the last gets passed > on the stack, so be it. > > The only difficulty with this plan is that, when we have multiple results, we > don’t have a choice but to return a struct type. To the extent that backends > try to infer that the function actually needs to be sret, instead of just trying > to find a way to return all the components of the struct type in appropriate > registers, that will be sub-optimal for us. If that’s a pervasive problem, then > we probably just need to introduce a swift calling convention in LLVM.Also, just a quick question. I’m happy to continue to talk about the actual design and implementation of LLVM IR on this point, and I’d be happy to put out the actual patch we’re initially proposing. Obviously, all of this code needs to go through the normal LLVM/Clang code review processes. But before we continue with that, I just want to clarify one important point: assuming that the actual implementation ends up satisfying your technical requirements, do you have any objections to the general idea of supporting the Swift CC in mainline LLVM? John.
Renato Golin via llvm-dev
2016-Mar-02 19:04 UTC
[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang
On 2 March 2016 at 19:01, John McCall <rjmccall at apple.com> wrote:> Also, just a quick question. I’m happy to continue to talk about the actual > design and implementation of LLVM IR on this point, and I’d be happy to > put out the actual patch we’re initially proposing. Obviously, all of this code > needs to go through the normal LLVM/Clang code review processes. But > before we continue with that, I just want to clarify one important point: assuming > that the actual implementation ends up satisfying your technical requirements, > do you have any objections to the general idea of supporting the Swift CC > in mainline LLVM?I personally don't. I think we should treat Swift as any other language that we support, and if we can't use existing mechanisms in the back-end to lower Swift, then we need to expand the back-end to support that. That being said, if the Swift support starts to bit-rot (if, for instance, Apple stops supporting it in the future), it will be harder to clean up the back-end from its CC. But that, IMHO, is a very far-fetched future and a small price to pay. cheers, --renato
Renato Golin via llvm-dev
2016-Mar-02 19:33 UTC
[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang
On 2 March 2016 at 18:48, John McCall <rjmccall at apple.com> wrote:> The frontend will not tell the backend explicitly which parameters will be > in registers; it will just pass a bunch of independent scalar values, and > the backend will assign them to registers or the stack as appropriate.I'm assuming you already have code in the back-end that does that in the way you want, as you said earlier you may want to use variable number of registers for PCS.> Our intent is to completely bypass all of the passing-structures-in-registers > code in the backend by simply not exposing the backend to any parameters > of aggregate type. The frontend will turn a struct into (say) an i32, a float, > and an i8; if the first two get passed in registers and the last gets passed > on the stack, so be it.How do you differentiate the @foo's below? struct A { i32, float }; struct B { float, i32 }; define @foo (A, i32) -> @foo(i32, float, i32); and define @foo (i32, B) -> @foo(i32, float, i32);> The only difficulty with this plan is that, when we have multiple results, we > don’t have a choice but to return a struct type. To the extent that backends > try to infer that the function actually needs to be sret, instead of just trying > to find a way to return all the components of the struct type in appropriate > registers, that will be sub-optimal for us. If that’s a pervasive problem, then > we probably just need to introduce a swift calling convention in LLVM.Oh, yeah, some back-ends will fiddle with struct return. Not all languages have single-value-return restrictions, but I think that ship has sailed already for IR. That's another reason to try and pass all by pointer at the end of the parameter list, instead of receive as an argument and return.> A direct result is something that’s returned in registers. An indirect > result is something that’s returned by storing it in an implicit out-parameter.Oh, I see. In that case, any assumption on the variable would have to be invalidated, maybe use global volatile variables, or special built-ins, so that no optimisation tries to get away with it. But that would mess up your optimal code, especially if they have to get passed in registers.> Oh, sorry, I forgot to talk about that. Yes, the frontend already rearranges > these arguments to the end, which means the optimizer’s default behavior > of silently dropping extra call arguments ends up doing the right thing.Excellent!> I’m reluctant to say that the convention always requires these arguments. > If we have to do that, we can, but I’d rather not; it would involve generating > a lot of unnecessary IR and would probably create unnecessary > code-generation differences, and I don’t think it would be sufficient for > error results anyway.This should be ok for internal functions, but maybe not for global / public interfaces. The ARM ABI has specific behaviour guarantees for public interfaces (like large alignment) that would be prohibitively bad for all functions, but ok for public ones. If hells break loose, you could enforce that for public interfaces only.> We don’t want checking or setting the error result to actually involve memory > access.And even though most of those access could be optimised away, there's no guarantee. Another option would be to have a special built-in to recognise context/error variables, and plug in a late IR pass to clean up everything. But I'd only recommend that if we can't find another way around.> The ability to call a non-throwing function as a throwing function means > we’d have to provide this extra explicit result on every single function with > the Swift convention, because the optimizer is definitely not going to > gracefully handle result-type mismatches; so even a function as simple as > func foo() -> Int32 > would have to be lowered into IR as > define { i32, i8* } @foo(i8*)Indeed, very messy. I'm going on a tangent, here, may be all rubbish, but... C++ handles exception handling with the exception being thrown allocated in library code, not the program. If, like C++, Swift can only handle one exception at a time, why can't the error variable be a global? The ARM back-end accepts the -rreserve-r9 option, and others seem to have similar options, so you could use that to force your global variable to live on the platform register. That way, all your error handling built-ins deal with that global variable, which the back-end knows is on registers. You will need a special DAG node, but I'm assuming you already have/want one. You also drop any problem with arguments and PCS, at least for the error part. cheers, --renato
Tian, Xinmin via llvm-dev
2016-Mar-02 19:49 UTC
[llvm-dev] Proposal for function vectorization and loop vectorization with function calls
Proposal for function vectorization and loop vectorization with function calls =============================================================================Intel Corporation (3/2/2016) This is a proposal for an initial work towards Clang and LLVM implementation of vectorizing a function annotated with OpenMP 4.5's "#pragma omp declare simd" (named SIMD-enabled function) and its associated clauses based on the VectorABI [2]. On the caller side, we propose to improve LLVM loopVectorizer such that the code that calls the SIMD-enabled function can be vectorized. On the callee side, we propose to add Clang FE support for "#pragma omp declare simd" syntax and a new pass to transform the SIMD-enabled function body into a SIMD loop. This newly created loop can then be fed to LLVM loopVectorizer (or its future enhancement) for vectorization. This work does leverage LLVM's existing LoopVectorizer. Problem Statement ================Currently, if a loop calls a user-defined function or a 3rd party library function, the loop can't be vectorized unless the function is inlined. In the example below the LoopVectorizer fails to vectorize the k loop due to its function call to "dowork" because "dowork" is an external function. Note that inlining the "dowork" function may result in vectorization for some of the cases, but that is not a generally applicable solution. Also, there may be reasons why compiler may not (or can't) inline the "dowork" function call. Therefore, there is value in being able to vectorize the loop with a call to "dowork" function in it. #include<stdio.h> extern float dowork(float *a, int k); float a[4096]; int main() { int k; #pragma clang loop vectorize(enable) for (k = 0; k < 4096; k++) { a[k] = k * 0.5; a[k] = dowork(a, k); } printf("passed %f\n", a[1024]); } sh-4.1$ clang -c -O2 -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize loopvec.c loopvec.c:15:12: remark: loop not vectorized: call instruction cannot be vectorized [-Rpass-analysis] a[k] = dowork(a, k); ^ loopvec.c:13:3: remark: loop not vectorized: use -Rpass-analysis=loop-vectorize for more info (Force=true) [-Rpass-missed=loop-vectorize] for (k = 0; k < 4096; k++) { ^ loopvec.c:13:3: warning: loop not vectorized: failed explicitly specified loop vectorization [-Wpass-failed] 1 warning generated. New functionality of Vectorization =================================New functionalities and enhancements are proposed to address the issues stated above which include: a) Vectorize a function annotated by the programmer using OpenMP* SIMD extensions; b) Enhance LLVM's LoopVectorizer to vectorize a loop containing a call to SIMD-enabled function. For example, when writing: #include<stdio.h> #pragma omp declare simd uniform(a) linear(k) extern float dowork(float *a, int k); float a[4096]; int main() { int k; #pragma clang loop vectorize(enable) for (k = 0; k < 4096; k++) { a[k] = k * 0.5; a[k] = dowork(a, k); } printf("passed %f\n", a[1024]); } the programmer asserts that a) there will be a vector version of "dowork" available for the compiler to use (link with, with appropriate signature, explained below) when vectorizing the k loop; and that b) no loop-carried backward dependencies are introduced by the "dowork" call that prevent the vectorization of the k loop. The expected vector loop (shown as pseudo code, ignoring leftover iterations) resulting from LLVM's LoopVectorizer is ... ... vectorized_for (k = 0; k < 4096; k += VL) { a[k:VL] = {k, k+1, k+2, k+VL-1} * 0.5; a[k:VL] = _ZGVb4Nul_dowork(a, k); } ... ... In this example "_ZGVb4Nul_dowork" is a special name mangling where: _ZGV is a prefix based on C/C++ name mangling rule suggested by GCC community, 'b' indicates "xmm" (assume we vectorize here to 128bit xmm vector registers), '4' is VL (assume we vectorize here for length 4), 'N' indicates that the function is vectorized without a mask, M indicates that the function is vecrized with a mask. 'u' indicates that the first parameter has the "uniform" property, 'l' indicates that the second argement has the "linear" property. More details (including name mangling scheme) can be found in the following references [2]. References ========= 1. OpenMP SIMD language extensions: http://www.openmp.org/mp-documents/openmp-4. 5.pdf 2. VectorABI Documentation: https://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vecto r-Function-2012-v0.9.5.pdf https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=Vecto rABI.txt [[Note: VectorABI was reviewed at X86-64 System V Application Binary Interface mailing list. The discussion was recorded at https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4 ]] 3. The first paper on SIMD extensions and implementations: "Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors" by Xinmin Tian, Hideki Saito, Milind Girkar, Serguei Preis, Sergey Kozhukhov, et al., IPDPS Workshops 2012, pages 2349--2358 [[Note: the first implementation and the paper were done before VectorABI was finalized with the GCC community and Redhat. The latest VectorABI version for OpenMP 4.5 is ready to be published]] Proposed Implementation ======================1. Clang FE parses "#pragma omp declare simd [clauses]" and generates mangled name including these prefixes as vector signatures. These mangled name prefixes are recorded as function attributes in LLVM function attribute group. Note that it may be possible to have several mangled names associated with the same function, which correspond to several desired vectorized versions. Clang FE generates all function attributes for expected vector variants to be generated by the back-end. E.g., #pragma omp delcare simd uniform(a) linear(k) float dowork(float *a, int k) { a[k] = sinf(a[k]) + 9.8f; } define __stdcall f32 @_dowork(f32* %a, i32 %k) #0 ... ... attributes #0 = { nounwind uwtable "_ZGVbM4ul_" "_ZGVbN4ul_" ...} 2. A new vector function generation pass is introduced to generate vector variants of the original scalar function based on VectorABI (see [2, 3]). For example, one vector variant is generated for "_ZGVbN4ul_" attribute as follows (pseudo code): define __stdcall <4 x f32> @_ZGVbN4ul_dowork(f32* %a, i32 %k) #0 { #pragma clang loop vectorize(enable) for (int %t = k; %t < %k + 4; %t++) { %a[t] = sinf(%a[t]) + 9.8f; } vec_load xmm0, %a[k:VL] return xmm0; } The body of the function is wrapped inside a loop having VL iterations, which correspond to the vector lanes. The LLVM LoopVectorizer will vectorize the generated %t loop, expected to produce the following vectorized code eliminating the loop (pseudo code): define __stdcall <4 x f32> @_ZGVbN4ul_dowork(f32* %a, i32 %k) #0 { vec_load xmm1, %a[k: VL] xmm2 = call __svml_sinf(xmm1) xmm0 = vec_add xmm2, [9,8f, 9.8f, 9.8f, 9.8f] store %a[k:VL], xmm0 return xmm0; } [[Note: Vectorizer support for the Short Vector Math Library (SVML) functions will be a seperate proposal. ]] 3. The LLVM LoopVectorizer is enhanced to a) identify loops with calls that have been annotated with "#pragma omp declare simd" by checking function attribute groups; b) analyze each call instruction and its parameters in the loop, to determine if each parameter has the following properties: * uniform * linear + stride * vector * aligned * called inside a conditional branch or not ... ... Based on these properties, the signature of the vectorized call is generated; and c) performs signature matching to obtain the suitable vector variant among the signatures available for the called function. If no such signature is found, the call cannot be vectorized. Note that a similar enhancement can and should be made also to LLVM's SLP vectorizer. For example: #pragma omp declare simd uniform(a) linear(k) extern float dowork(float *a, int k); ... ... #pragma clang loop vectorize(enable) for (k = 0; k < 4096; k++) { a[k] = k * 0.5; a[k] = dowork(a, k); } ... ... Step a: "dowork" function is marked as SIMD-enabled function attributes #0 = { nounwind uwtable "_ZGVbM4ul_" "_ZGVbN4ul_" ...} Step b: 1) 'a' is uniform, as it is the base address of array 'a' 2) 'k' is linear, as 'k' is the induction variable with stride=1 3) SIMD "dowork" is called unconditionally in the candidate k loop. 4) it is compiled for SSE4.1 with the Vector Length VL=4. based on these properties, the signature is "_ZGVbN4ul_" [[Notes: For conditional call in the loop, it needs masking support, the implementation details seen in reference [1][2][3] ]] Step c: Check if the signature "_ZGVbN4ul_" exists in function attribute #0; if yes the suitable vectorized version is found and will be linked with. The below loop is expected to be produced by the LoopVectorizer: ... ... vectorized_for (k = 0; k < 4096; k += 4) { a[k:4] = {k, k+1, k+2, k+3} * 0.5; a[k:4] = _ZGVb4Nul_dowork(a, k); } ... ... [[Note: Vectorizer support for the Short Vector Math Library (SVML) functions will be a seperate proposal. ]] GCC and ICC Compatibility ========================With this proposal the callee function and the loop containing a call to it can each be compiled and vectorized by a different compiler, including Clang+LLVM with its LoopVectorizer as outlined above, GCC and ICC. The vectorized loop will then be linked with the vectorized callee function. Of-course each of these compilers can also be used to compile both loop and callee function. Current Implementation Status and Plan =====================================1. Clang FE is done by Intel Clang FE team according to #1. Note: Clang FE syntax process patch is implemented and under community review (http://reviews.llvm.org/D10599). In general, the review feedback is very positive from the Clang community. 2. A new pass for function vectorization is implemented to support #2 and to be prepared for LLVM community review. 3. Work is in progress to teach LLVM's LoopVectorizer to vectorize a loop with user-defined function calls according to #3. Call for Action ==============1. Please review this proposal and provide constructive feedback on its direction and key ideas. 2. Feel free to ask any technical questions related to this proposal and to read the associated references. 3. Help is also highly welcome and appreciated in the development and upstreaming process.
John McCall via llvm-dev
2016-Mar-02 20:03 UTC
[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang
> On Mar 2, 2016, at 11:33 AM, Renato Golin <renato.golin at linaro.org> wrote: > On 2 March 2016 at 18:48, John McCall <rjmccall at apple.com> wrote: >> The frontend will not tell the backend explicitly which parameters will be >> in registers; it will just pass a bunch of independent scalar values, and >> the backend will assign them to registers or the stack as appropriate. > > I'm assuming you already have code in the back-end that does that in > the way you want, as you said earlier you may want to use variable > number of registers for PCS. > > >> Our intent is to completely bypass all of the passing-structures-in-registers >> code in the backend by simply not exposing the backend to any parameters >> of aggregate type. The frontend will turn a struct into (say) an i32, a float, >> and an i8; if the first two get passed in registers and the last gets passed >> on the stack, so be it. > > How do you differentiate the @foo's below? > > struct A { i32, float }; > struct B { float, i32 }; > > define @foo (A, i32) -> @foo(i32, float, i32); > > and > > define @foo (i32, B) -> @foo(i32, float, i32);We don’t need to. We don't use the intermediary convention’s rules for aggregates. The Swift rule for aggregate arguments is literally “if it’s too complex according to <foo>, pass it indirectly; otherwise, expand it into a sequence of scalar values and pass them separately”. If that means it’s partially passed in registers and partially on the stack, that’s okay; we might need to re-assemble it in the callee, but the first part of the rule limits how expensive that can ever get.>> The only difficulty with this plan is that, when we have multiple results, we >> don’t have a choice but to return a struct type. To the extent that backends >> try to infer that the function actually needs to be sret, instead of just trying >> to find a way to return all the components of the struct type in appropriate >> registers, that will be sub-optimal for us. If that’s a pervasive problem, then >> we probably just need to introduce a swift calling convention in LLVM. > > Oh, yeah, some back-ends will fiddle with struct return. Not all > languages have single-value-return restrictions, but I think that ship > has sailed already for IR. > > That's another reason to try and pass all by pointer at the end of the > parameter list, instead of receive as an argument and return.That’s pretty sub-optimal compared to just returning in registers. Also, most backends do have the ability to return small structs in multiple registers already.>> A direct result is something that’s returned in registers. An indirect >> result is something that’s returned by storing it in an implicit out-parameter. > > Oh, I see. In that case, any assumption on the variable would have to > be invalidated, maybe use global volatile variables, or special > built-ins, so that no optimisation tries to get away with it. But that > would mess up your optimal code, especially if they have to get passed > in registers.I don’t understand what you mean here. The out-parameter is still explicit in LLVM IR. Nothing about this is novel, except that C frontends generally won’t combine indirect results with direct results. Worst case, if pervasive LLVM assumptions prevent us from combining the sret attribute with a direct result, we just won’t use the sret attribute.>> Oh, sorry, I forgot to talk about that. Yes, the frontend already rearranges >> these arguments to the end, which means the optimizer’s default behavior >> of silently dropping extra call arguments ends up doing the right thing. > > Excellent! > > >> I’m reluctant to say that the convention always requires these arguments. >> If we have to do that, we can, but I’d rather not; it would involve generating >> a lot of unnecessary IR and would probably create unnecessary >> code-generation differences, and I don’t think it would be sufficient for >> error results anyway. > > This should be ok for internal functions, but maybe not for global / > public interfaces. The ARM ABI has specific behaviour guarantees for > public interfaces (like large alignment) that would be prohibitively > bad for all functions, but ok for public ones. > > If hells break loose, you could enforce that for public interfaces only. > > >> We don’t want checking or setting the error result to actually involve memory >> access. > > And even though most of those access could be optimised away, there's > no guarantee.Right. The backend isn’t great about removing memory operations that survive to it.> Another option would be to have a special built-in to recognise > context/error variables, and plug in a late IR pass to clean up > everything. But I'd only recommend that if we can't find another way > around. > > >> The ability to call a non-throwing function as a throwing function means >> we’d have to provide this extra explicit result on every single function with >> the Swift convention, because the optimizer is definitely not going to >> gracefully handle result-type mismatches; so even a function as simple as >> func foo() -> Int32 >> would have to be lowered into IR as >> define { i32, i8* } @foo(i8*) > > Indeed, very messy. > > I'm going on a tangent, here, may be all rubbish, but... > > C++ handles exception handling with the exception being thrown > allocated in library code, not the program. If, like C++, Swift can > only handle one exception at a time, why can't the error variable be a > global? > > The ARM back-end accepts the -rreserve-r9 option, and others seem to > have similar options, so you could use that to force your global > variable to live on the platform register. > > That way, all your error handling built-ins deal with that global > variable, which the back-end knows is on registers. You will need a > special DAG node, but I'm assuming you already have/want one. You also > drop any problem with arguments and PCS, at least for the error part.Swift does not run in an independent environment; it has to interact with existing C code. That existing code does not reserve any registers globally for this use. Even if that were feasible, we don’t actually want to steal a register globally from all the C code on the system that probably never interacts with Swift. John.