thr3ads.net - llvm dev - [llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang [Mar 2016]

If this information is useful, please help other people find it:
Share via:

John McCall via llvm-dev

2016-Mar-02 18:48 UTC

[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang

> On Mar 2, 2016, at 1:33 AM, Renato Golin <renato.golin at linaro.org>
wrote:
> 
> On 2 March 2016 at 01:14, John McCall via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Hi, all.
>>  - We sometimes want to return more values in registers than the
convention normally does, and we want to be able to use both integer and
floating-point registers.  For example, we want to return a value of struct A,
above, purely in registers.  For the most part, I don’t think this is a problem
to layer on to an existing IR convention: C frontends will generally use
explicit sret arguments when the convention requires them, and so the Swift
lowering will produce result types that don’t have legal interpretations as
direct results under the C convention.  But we can use a different IR convention
if it’s necessary to disambiguate Swift’s desired treatment from the
target's normal attempts to retroactively match the C convention.
> 
> Is this a back-end decision, or do you expect the front-end to tell
> the back-end (via annotation) which parameters will be in regs? Unless
> you also have back-end patches, I don't think the latter is going to
> work well. For example, the ARM back-end has a huge section related to
> passing structures in registers, which conforms to the ARM EABI, not
> necessarily your Swift ABI.
> 
> Not to mention that this creates the versioning problem, where two
> different LLVM releases can produce slightly different PCS register
> usage (due to new features or bugs), and thus require re-compilation
> of all libraries. This, however, is not a problem for your current
> request, just a comment.
The frontend will not tell the backend explicitly which parameters will be
in registers; it will just pass a bunch of independent scalar values, and
the backend will assign them to registers or the stack as appropriate.

Our intent is to completely bypass all of the passing-structures-in-registers
code in the backend by simply not exposing the backend to any parameters
of aggregate type.  The frontend will turn a struct into (say) an i32, a float,
and an i8; if the first two get passed in registers and the last gets passed
on the stack, so be it.

The only difficulty with this plan is that, when we have multiple results, we
don’t have a choice but to return a struct type.  To the extent that backends
try to infer that the function actually needs to be sret, instead of just trying
to find a way to return all the components of the struct type in appropriate
registers, that will be sub-optimal for us.  If that’s a pervasive problem, then
we probably just need to introduce a swift calling convention in LLVM.
>>  - We sometimes have both direct results and indirect results.  It
would be nice to take advantage of the sret convention even in the presence of
direct results on targets that do use a different (profitable) ABI treatment for
it.  I don’t know how well-supported this is in LLVM.
> 
> I'm not sure what you mean by direct or indirect results here. But if
> this is a language feature, as long as the IR semantics is correct, I
> don't see any problem.
A direct result is something that’s returned in registers.  An indirect
result is something that’s returned by storing it in an implicit out-parameter.
I would like to be able to form calls like this:

  %temp = alloca %my_big_struct_type
  call i32 @my_swift_function(sret %my_big_struct_type* %temp)

This doesn’t normally happen today in LLVM IR because when C frontends
use an sret result, they set the direct IR result to void.

Like I said, I don’t think this is a serious problem, but I wanted to float the
idea
before assuming that.
>>  - We want a special “context” treatment for a certain argument.  A
pointer-sized value is passed in an integer register; the same value should be
present in that register after the call.  In some cases, the caller may pass a
context argument to a function that doesn’t expect one, and this should not
trigger undefined behavior.  Both of these rules suggest that the context
argument be passed in a register which is normally callee-save.
> 
> I think it's going to be harder to get all opts to behave in the way
> you want them to. And may also require back-end changes to make sure
> those registers are saved in the right frame, or reserved from
> register allocation, or popped back after the call, etc.
I don’t expect the optimizer to be a problem, but I just realized that the main
reason is something I didn’t talk about in my first post.  See below.

That this will require some support from the backend is a given.
>> The Clang impact is relatively minor; it is focused on allowing the
Swift runtime to define functions that use the convention.  It adds a new
calling convention attribute, a few new parameter attributes constrained to that
calling convention, and some relatively un-invasive call lowering code in IR
generation.
> 
> This sounds like a normal change to support language perks, no big
> deal. But I'm not a Clang expert, nor I've seen the code.
> 
> 
>>  - Using sret together with a direct result may or may not “just
work".  I certainly don’t see a reason why it shouldn’t work in the
middle-end.  Obviously, some targets can’t support it, but we can avoid doing
this on those targets.
> 
> All sret problems I've seen were back-end related (ABI conformance).
> But I wasn't paying attention to the middle-end.
> 
> 
>>  - Opting in to the two argument treatments requires new parameter
attributes.  We discussed using separate calling conventions; unfortunately,
error and context arguments can appear either separately or together, so we’d
really need several new conventions for all the valid combinations. 
Furthermore, calling a context-free function with an ignored context argument
could turn into a call to a function using a mismatched calling convention,
which LLVM IR generally treats as undefined behavior.  Also, it wasn’t obvious
that just a calling convention would be sufficient for the error treatment; see
the next bullet.
> 
> Why not treat context and error like C's default arguments? Or like
> named arguments in Python?
> 
> Surely the front-end can easily re-order the arguments (according to
> some ABI) and make sure every function that may be called with
> context/error has it as the last arguments, and default them to null.
> You can then later do an inter-procedural pass to clean it up for all
> static functions that are never called with those arguments, etc.
Oh, sorry, I forgot to talk about that.  Yes, the frontend already rearranges
these arguments to the end, which means the optimizer’s default behavior
of silently dropping extra call arguments ends up doing the right thing.

I’m reluctant to say that the convention always requires these arguments.
If we have to do that, we can, but I’d rather not; it would involve generating
a lot of unnecessary IR and would probably create unnecessary
code-generation differences, and I don’t think it would be sufficient for
error results anyway.
>>  - The “error” treatment requires some way to (1) pass and receive the
value in the caller and (2) receive and change the value in the callee.  The
best way we could think of to represent this was to pretend that the argument is
actually passed indirectly; the value is “passed” by storing to the pointer and
“received” by loading from it.  To simplify backend lowering, we require the
argument to be a special kind of swifterror alloca that can only be loaded,
stored, and passed as a swifterror argument; in the callee, swifterror arguments
have similar restrictions.  This ends up being fairly invasive in the backend,
unfortunately.
> 
> I think this logic is too high-level for the back-end to deal with.
> This looks like a simple run of the mill pointer argument that can be
> null (and is by default), but if it's not, the callee can change the
> object pointed by but not the pointer itself, ie, "void foo(exception
> * const Error = null)". I don't understand why you need this
argument
> to be of a special kind of SDNode.
We don’t want checking or setting the error result to actually involve memory
access.

An alternative to the pseudo-indirect-result approach would be to model
the result as an explicit result.  That would really mess up the IR, though.
The ability to call a non-throwing function as a throwing function means
we’d have to provide this extra explicit result on every single function with
the Swift convention, because the optimizer is definitely not going to
gracefully handle result-type mismatches; so even a function as simple as
  func foo() -> Int32
would have to be lowered into IR as
  define { i32, i8* } @foo(i8*)

John.

John McCall via llvm-dev

2016-Mar-02 19:01 UTC

head link

[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang

> On Mar 2, 2016, at 10:48 AM, John McCall via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>> On Mar 2, 2016, at 1:33 AM, Renato Golin <renato.golin at
linaro.org> wrote:
>> On 2 March 2016 at 01:14, John McCall via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> Hi, all.
>>> - We sometimes want to return more values in registers than the
convention normally does, and we want to be able to use both integer and
floating-point registers.  For example, we want to return a value of struct A,
above, purely in registers.  For the most part, I don’t think this is a problem
to layer on to an existing IR convention: C frontends will generally use
explicit sret arguments when the convention requires them, and so the Swift
lowering will produce result types that don’t have legal interpretations as
direct results under the C convention.  But we can use a different IR convention
if it’s necessary to disambiguate Swift’s desired treatment from the
target's normal attempts to retroactively match the C convention.
>> 
>> Is this a back-end decision, or do you expect the front-end to tell
>> the back-end (via annotation) which parameters will be in regs? Unless
>> you also have back-end patches, I don't think the latter is going
to
>> work well. For example, the ARM back-end has a huge section related to
>> passing structures in registers, which conforms to the ARM EABI, not
>> necessarily your Swift ABI.
>> 
>> Not to mention that this creates the versioning problem, where two
>> different LLVM releases can produce slightly different PCS register
>> usage (due to new features or bugs), and thus require re-compilation
>> of all libraries. This, however, is not a problem for your current
>> request, just a comment.
> 
> The frontend will not tell the backend explicitly which parameters will be
> in registers; it will just pass a bunch of independent scalar values, and
> the backend will assign them to registers or the stack as appropriate.
> 
> Our intent is to completely bypass all of the
passing-structures-in-registers
> code in the backend by simply not exposing the backend to any parameters
> of aggregate type.  The frontend will turn a struct into (say) an i32, a
float,
> and an i8; if the first two get passed in registers and the last gets
passed
> on the stack, so be it.
> 
> The only difficulty with this plan is that, when we have multiple results,
we
> don’t have a choice but to return a struct type.  To the extent that
backends
> try to infer that the function actually needs to be sret, instead of just
trying
> to find a way to return all the components of the struct type in
appropriate
> registers, that will be sub-optimal for us.  If that’s a pervasive problem,
then
> we probably just need to introduce a swift calling convention in LLVM.
Also, just a quick question.  I’m happy to continue to talk about the actual
design and implementation of LLVM IR on this point, and I’d be happy to
put out the actual patch we’re initially proposing.  Obviously, all of this code
needs to go through the normal LLVM/Clang code review processes.  But
before we continue with that, I just want to clarify one important point:
assuming
that the actual implementation ends up satisfying your technical requirements,
do you have any objections to the general idea of supporting the Swift CC
in mainline LLVM?

John.

Renato Golin via llvm-dev

2016-Mar-02 19:04 UTC

head link

[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang

On 2 March 2016 at 19:01, John McCall <rjmccall at apple.com>
wrote:> Also, just a quick question.  I’m happy to continue to talk about the
actual
> design and implementation of LLVM IR on this point, and I’d be happy to
> put out the actual patch we’re initially proposing.  Obviously, all of this
code
> needs to go through the normal LLVM/Clang code review processes.  But
> before we continue with that, I just want to clarify one important point:
assuming
> that the actual implementation ends up satisfying your technical
requirements,
> do you have any objections to the general idea of supporting the Swift CC
> in mainline LLVM?
I personally don't. I think we should treat Swift as any other
language that we support, and if we can't use existing mechanisms in
the back-end to lower Swift, then we need to expand the back-end to
support that.

That being said, if the Swift support starts to bit-rot (if, for
instance, Apple stops supporting it in the future), it will be harder
to clean up the back-end from its CC. But that, IMHO, is a very
far-fetched future and a small price to pay.

cheers,
--renato

Renato Golin via llvm-dev

2016-Mar-02 19:33 UTC

head link

[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang

On 2 March 2016 at 18:48, John McCall <rjmccall at apple.com>
wrote:> The frontend will not tell the backend explicitly which parameters will be
> in registers; it will just pass a bunch of independent scalar values, and
> the backend will assign them to registers or the stack as appropriate.
I'm assuming you already have code in the back-end that does that in
the way you want, as you said earlier you may want to use variable
number of registers for PCS.

> Our intent is to completely bypass all of the
passing-structures-in-registers
> code in the backend by simply not exposing the backend to any parameters
> of aggregate type.  The frontend will turn a struct into (say) an i32, a
float,
> and an i8; if the first two get passed in registers and the last gets
passed
> on the stack, so be it.
How do you differentiate the @foo's below?

struct A { i32, float };
struct B { float, i32 };

define @foo (A, i32) -> @foo(i32, float, i32);

and

define @foo (i32, B) -> @foo(i32, float, i32);

> The only difficulty with this plan is that, when we have multiple results,
we
> don’t have a choice but to return a struct type.  To the extent that
backends
> try to infer that the function actually needs to be sret, instead of just
trying
> to find a way to return all the components of the struct type in
appropriate
> registers, that will be sub-optimal for us.  If that’s a pervasive problem,
then
> we probably just need to introduce a swift calling convention in LLVM.
Oh, yeah, some back-ends will fiddle with struct return. Not all
languages have single-value-return restrictions, but I think that ship
has sailed already for IR.

That's another reason to try and pass all by pointer at the end of the
parameter list, instead of receive as an argument and return.

> A direct result is something that’s returned in registers.  An indirect
> result is something that’s returned by storing it in an implicit
out-parameter.
Oh, I see. In that case, any assumption on the variable would have to
be invalidated, maybe use global volatile variables, or special
built-ins, so that no optimisation tries to get away with it. But that
would mess up your optimal code, especially if they have to get passed
in registers.

> Oh, sorry, I forgot to talk about that.  Yes, the frontend already
rearranges
> these arguments to the end, which means the optimizer’s default behavior
> of silently dropping extra call arguments ends up doing the right thing.
Excellent!

> I’m reluctant to say that the convention always requires these arguments.
> If we have to do that, we can, but I’d rather not; it would involve
generating
> a lot of unnecessary IR and would probably create unnecessary
> code-generation differences, and I don’t think it would be sufficient for
> error results anyway.
This should be ok for internal functions, but maybe not for global /
public interfaces. The ARM ABI has specific behaviour guarantees for
public interfaces (like large alignment) that would be prohibitively
bad for all functions, but ok for public ones.

If hells break loose, you could enforce that for public interfaces only.

> We don’t want checking or setting the error result to actually involve
memory
> access.
And even though most of those access could be optimised away, there's
no guarantee.

Another option would be to have a special built-in to recognise
context/error variables, and plug in a late IR pass to clean up
everything. But I'd only recommend that if we can't find another way
around.

> The ability to call a non-throwing function as a throwing function means
> we’d have to provide this extra explicit result on every single function
with
> the Swift convention, because the optimizer is definitely not going to
> gracefully handle result-type mismatches; so even a function as simple as
>   func foo() -> Int32
> would have to be lowered into IR as
>   define { i32, i8* } @foo(i8*)
Indeed, very messy.

I'm going on a tangent, here, may be all rubbish, but...

C++ handles exception handling with the exception being thrown
allocated in library code, not the program. If, like C++, Swift can
only handle one exception at a time, why can't the error variable be a
global?

The ARM back-end accepts the -rreserve-r9 option, and others seem to
have similar options, so you could use that to force your global
variable to live on the platform register.

That way, all your error handling built-ins deal with that global
variable, which the back-end knows is on registers. You will need a
special DAG node, but I'm assuming you already have/want one. You also
drop any problem with arguments and PCS, at least for the error part.

cheers,
--renato

Tian, Xinmin via llvm-dev

2016-Mar-02 19:49 UTC

head link

[llvm-dev] Proposal for function vectorization and loop vectorization with function calls

Proposal for function vectorization and loop vectorization with function calls
=============================================================================Intel
Corporation (3/2/2016)

This is a proposal for an initial work towards Clang and LLVM implementation of
vectorizing a function annotated with  OpenMP 4.5's "#pragma omp
declare simd"
(named SIMD-enabled function) and its associated clauses based on the VectorABI
[2]. On the caller side, we propose to improve LLVM loopVectorizer such that
the code that calls the SIMD-enabled function can be vectorized. On the callee
side, we propose to add Clang FE support for "#pragma omp declare
simd" syntax
and a new pass to transform the SIMD-enabled function body into a SIMD loop.
This newly created loop can then be fed to LLVM loopVectorizer (or its future
enhancement) for vectorization. This work does leverage LLVM's existing
LoopVectorizer.


Problem Statement
================Currently, if a loop calls a user-defined function or a 3rd
party library
function, the loop can't be vectorized unless the function is inlined. In
the
example below the LoopVectorizer fails to vectorize the k loop due to its
function call to "dowork" because "dowork" is an external
function. Note that
inlining the "dowork" function may result in vectorization for some of
the
cases, but that is not a generally applicable solution. Also, there may be
reasons why compiler may not (or can't) inline the "dowork"
function call.
Therefore, there is value in being able to vectorize the loop with a call to
"dowork" function in it.

#include<stdio.h>
extern float dowork(float *a, int k);

float a[4096];
int main()
{ int k;
#pragma clang loop vectorize(enable)
  for (k = 0; k < 4096; k++) {
    a[k] = k * 0.5;
    a[k] = dowork(a, k);
  }
  printf("passed %f\n", a[1024]);
}

sh-4.1$ clang -c -O2 -Rpass=loop-vectorize -Rpass-missed=loop-vectorize
                     -Rpass-analysis=loop-vectorize loopvec.c
loopvec.c:15:12: remark: loop not vectorized: call instruction cannot be
      vectorized [-Rpass-analysis]
    a[k] = dowork(a, k);
           ^
loopvec.c:13:3: remark: loop not vectorized: use -Rpass-analysis=loop-vectorize
      for more info (Force=true) [-Rpass-missed=loop-vectorize]
  for (k = 0; k < 4096; k++) {
  ^
loopvec.c:13:3: warning: loop not vectorized: failed explicitly specified
                loop vectorization [-Wpass-failed]
1 warning generated.


New functionality of Vectorization
=================================New functionalities and enhancements are
proposed to address the issues
stated above which include: a) Vectorize a function annotated by the
programmer using OpenMP* SIMD extensions; b) Enhance LLVM's LoopVectorizer
to vectorize a loop containing a call to SIMD-enabled function.

For example, when writing:

#include<stdio.h>

#pragma omp declare simd uniform(a) linear(k)
extern float dowork(float *a, int k);

float a[4096];
int main()
{ int k;
#pragma clang loop vectorize(enable)
  for (k = 0; k < 4096; k++) {
    a[k] = k * 0.5;
    a[k] = dowork(a, k);
  }
  printf("passed %f\n", a[1024]);
}

the programmer asserts that
  a) there will be a vector version of "dowork" available for the
compiler to
     use (link with, with appropriate signature, explained below) when
     vectorizing the k loop; and that
  b) no loop-carried backward dependencies are introduced by the
"dowork"
     call that prevent the vectorization of the k loop.

The expected vector loop (shown as pseudo code, ignoring leftover iterations)
resulting from LLVM's LoopVectorizer is

  ... ...
  vectorized_for (k = 0; k < 4096; k += VL) {
    a[k:VL] = {k, k+1, k+2, k+VL-1} * 0.5;
    a[k:VL] = _ZGVb4Nul_dowork(a, k);
  }
  ... ...

In this example "_ZGVb4Nul_dowork" is a special name mangling where:
 _ZGV is a prefix based on C/C++ name mangling rule suggested by GCC community,
 'b' indicates "xmm" (assume we vectorize here to 128bit xmm
vector registers),
 '4' is VL (assume we vectorize here for length 4),
 'N' indicates that the function is vectorized without a mask, M
indicates that
     the function is vecrized with a mask.
 'u' indicates that the first parameter has the "uniform"
property,
 'l' indicates that the second argement has the "linear"
property.

More details (including name mangling scheme) can be found in the following
references [2].

References
=========
1. OpenMP SIMD language extensions: http://www.openmp.org/mp-documents/openmp-4.
5.pdf

2. VectorABI Documentation:
https://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vecto
r-Function-2012-v0.9.5.pdf
https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=Vecto
rABI.txt

[[Note: VectorABI was reviewed at X86-64 System V Application Binary Interface
        mailing list. The discussion was recorded at
        https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4 ]]

3. The first paper on SIMD extensions and implementations:
"Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on
Multicore-SIMD Processors" by Xinmin Tian, Hideki Saito, Milind Girkar,
Serguei Preis, Sergey Kozhukhov, et al., IPDPS Workshops 2012, pages 2349--2358
[[Note: the first implementation and the paper were done before VectorABI was
        finalized with the GCC community and Redhat. The latest VectorABI
        version for OpenMP 4.5 is ready to be published]]


Proposed Implementation
======================1. Clang FE parses "#pragma omp declare simd
[clauses]" and generates mangled
   name including these prefixes as vector signatures. These mangled name
   prefixes are recorded as function attributes in LLVM function attribute
   group. Note that it may be possible to have several mangled names associated
   with the same function, which correspond to several desired vectorized
   versions. Clang FE generates all function attributes for expected vector
   variants to be generated by the back-end. E.g.,

   #pragma omp delcare simd uniform(a) linear(k)
   float dowork(float *a, int k)
   {
      a[k] = sinf(a[k]) + 9.8f;
   }

   define __stdcall f32 @_dowork(f32* %a, i32 %k) #0
   ... ...
   attributes #0 = { nounwind uwtable "_ZGVbM4ul_"
"_ZGVbN4ul_" ...}

2. A new vector function generation pass is introduced to generate vector
   variants of the original scalar function based on VectorABI (see [2, 3]).
   For example, one vector variant is generated for "_ZGVbN4ul_"
attribute
   as follows (pseudo code):

   define __stdcall <4 x f32> @_ZGVbN4ul_dowork(f32* %a, i32 %k) #0
   {
     #pragma clang loop vectorize(enable)
     for (int %t = k; %t < %k + 4; %t++) {
       %a[t] = sinf(%a[t]) + 9.8f;
     }
     vec_load xmm0, %a[k:VL]
     return xmm0;
   }

   The body of the function is wrapped inside a loop having VL iterations,
   which correspond to the vector lanes.

   The LLVM LoopVectorizer will vectorize the generated %t loop, expected
   to produce the following vectorized code eliminating the loop (pseudo code):

   define __stdcall <4 x f32> @_ZGVbN4ul_dowork(f32* %a, i32 %k) #0
   {
     vec_load xmm1,  %a[k: VL]
     xmm2 = call __svml_sinf(xmm1)
     xmm0 = vec_add  xmm2, [9,8f, 9.8f, 9.8f, 9.8f]
     store %a[k:VL], xmm0
     return xmm0;
   }

   [[Note: Vectorizer support for the Short Vector Math Library (SVML)
           functions will be a seperate proposal. ]]

3. The LLVM LoopVectorizer is enhanced to
   a) identify loops with calls that have been annotated with
      "#pragma omp declare simd" by checking function attribute
groups;
   b) analyze each call instruction and its parameters in the loop, to
      determine if each parameter has the following properties:
        * uniform
        * linear + stride
        * vector
        * aligned
        * called inside a conditional branch or not
          ... ...
      Based on these properties, the signature of the vectorized call is
      generated; and
   c) performs signature matching to obtain the suitable vector variant
      among the signatures available for the called function. If no such
      signature is found, the call cannot be vectorized.

   Note that a similar enhancement can and should be made also to LLVM's
   SLP vectorizer.

   For example:

   #pragma omp declare simd uniform(a) linear(k)
   extern float dowork(float *a, int k);

   ... ...
   #pragma clang loop vectorize(enable)
   for (k = 0; k < 4096; k++) {
     a[k] = k * 0.5;
     a[k] = dowork(a, k);
   }
   ... ...

   Step a: "dowork" function is marked as SIMD-enabled function
           attributes #0 = { nounwind uwtable "_ZGVbM4ul_"
"_ZGVbN4ul_" ...}

   Step b: 1) 'a' is uniform, as it is the base address of array
'a'
           2) 'k' is linear, as 'k' is the induction variable
with stride=1
           3) SIMD "dowork" is called unconditionally in the candidate
k loop.
           4) it is compiled for SSE4.1 with the Vector Length VL=4.
              based on these properties, the signature is "_ZGVbN4ul_"

   [[Notes: For conditional call in the loop, it needs masking support,
            the implementation details seen in reference [1][2][3] ]]

   Step c: Check if the signature "_ZGVbN4ul_" exists in function
attribute #0;
           if yes the suitable vectorized version is found and will be linked
           with.

   The below loop is expected to be produced by the LoopVectorizer:
   ... ...
   vectorized_for (k = 0; k < 4096; k += 4) {
     a[k:4] = {k, k+1, k+2, k+3} * 0.5;
     a[k:4] = _ZGVb4Nul_dowork(a, k);
   }
   ... ...

[[Note: Vectorizer support for the Short Vector Math Library (SVML) functions
        will be a seperate proposal. ]]


GCC and ICC Compatibility
========================With this proposal the callee function and the loop
containing a call to it
can each be compiled and vectorized by a different compiler, including
Clang+LLVM with its LoopVectorizer as outlined above, GCC and ICC. The
vectorized loop will then be linked with the vectorized callee function.
Of-course each of these compilers can also be used to compile both loop and
callee function.


Current Implementation Status and Plan
=====================================1. Clang FE is done by Intel Clang FE team
according to #1. Note: Clang FE
   syntax process patch is implemented and under community review
   (http://reviews.llvm.org/D10599). In general, the review feedback is
   very positive from the Clang community.

2. A new pass for function vectorization is implemented to support #2 and
   to be prepared for LLVM community review.

3. Work is in progress to teach LLVM's LoopVectorizer to vectorize a loop
   with user-defined function calls according to #3.

Call for Action
==============1. Please review this proposal and provide constructive feedback
on its
   direction and key ideas.

2. Feel free to ask any technical questions related to this proposal and
   to read the associated references.

3. Help is also highly welcome and appreciated in the development and
   upstreaming process.

John McCall via llvm-dev

2016-Mar-02 20:03 UTC

head link

[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang

> On Mar 2, 2016, at 11:33 AM, Renato Golin <renato.golin at
linaro.org> wrote:
> On 2 March 2016 at 18:48, John McCall <rjmccall at apple.com> wrote:
>> The frontend will not tell the backend explicitly which parameters will
be
>> in registers; it will just pass a bunch of independent scalar values,
and
>> the backend will assign them to registers or the stack as appropriate.
> 
> I'm assuming you already have code in the back-end that does that in
> the way you want, as you said earlier you may want to use variable
> number of registers for PCS.
> 
> 
>> Our intent is to completely bypass all of the
passing-structures-in-registers
>> code in the backend by simply not exposing the backend to any
parameters
>> of aggregate type.  The frontend will turn a struct into (say) an i32,
a float,
>> and an i8; if the first two get passed in registers and the last gets
passed
>> on the stack, so be it.
> 
> How do you differentiate the @foo's below?
> 
> struct A { i32, float };
> struct B { float, i32 };
> 
> define @foo (A, i32) -> @foo(i32, float, i32);
> 
> and
> 
> define @foo (i32, B) -> @foo(i32, float, i32);
We don’t need to.  We don't use the intermediary convention’s rules for
aggregates.
The Swift rule for aggregate arguments is literally “if it’s too complex
according to
<foo>, pass it indirectly; otherwise, expand it into a sequence of scalar
values and
pass them separately”.  If that means it’s partially passed in registers and
partially
on the stack, that’s okay; we might need to re-assemble it in the callee, but
the
first part of the rule limits how expensive that can ever get.
>> The only difficulty with this plan is that, when we have multiple
results, we
>> don’t have a choice but to return a struct type.  To the extent that
backends
>> try to infer that the function actually needs to be sret, instead of
just trying
>> to find a way to return all the components of the struct type in
appropriate
>> registers, that will be sub-optimal for us.  If that’s a pervasive
problem, then
>> we probably just need to introduce a swift calling convention in LLVM.
> 
> Oh, yeah, some back-ends will fiddle with struct return. Not all
> languages have single-value-return restrictions, but I think that ship
> has sailed already for IR.
> 
> That's another reason to try and pass all by pointer at the end of the
> parameter list, instead of receive as an argument and return.
That’s pretty sub-optimal compared to just returning in registers.  Also, most
backends do have the ability to return small structs in multiple registers
already.
>> A direct result is something that’s returned in registers.  An indirect
>> result is something that’s returned by storing it in an implicit
out-parameter.
> 
> Oh, I see. In that case, any assumption on the variable would have to
> be invalidated, maybe use global volatile variables, or special
> built-ins, so that no optimisation tries to get away with it. But that
> would mess up your optimal code, especially if they have to get passed
> in registers.
I don’t understand what you mean here.  The out-parameter is still explicit in
LLVM IR.  Nothing about this is novel, except that C frontends generally won’t
combine indirect results with direct results.  Worst case, if pervasive LLVM
assumptions prevent us from combining the sret attribute with a direct result,
we just won’t use the sret attribute.
>> Oh, sorry, I forgot to talk about that.  Yes, the frontend already
rearranges
>> these arguments to the end, which means the optimizer’s default
behavior
>> of silently dropping extra call arguments ends up doing the right
thing.
> 
> Excellent!
> 
> 
>> I’m reluctant to say that the convention always requires these
arguments.
>> If we have to do that, we can, but I’d rather not; it would involve
generating
>> a lot of unnecessary IR and would probably create unnecessary
>> code-generation differences, and I don’t think it would be sufficient
for
>> error results anyway.
> 
> This should be ok for internal functions, but maybe not for global /
> public interfaces. The ARM ABI has specific behaviour guarantees for
> public interfaces (like large alignment) that would be prohibitively
> bad for all functions, but ok for public ones.
> 
> If hells break loose, you could enforce that for public interfaces only.
> 
> 
>> We don’t want checking or setting the error result to actually involve
memory
>> access.
> 
> And even though most of those access could be optimised away, there's
> no guarantee.
Right.  The backend isn’t great about removing memory operations that survive to
it.
> Another option would be to have a special built-in to recognise
> context/error variables, and plug in a late IR pass to clean up
> everything. But I'd only recommend that if we can't find another
way
> around.
> 
> 
>> The ability to call a non-throwing function as a throwing function
means
>> we’d have to provide this extra explicit result on every single
function with
>> the Swift convention, because the optimizer is definitely not going to
>> gracefully handle result-type mismatches; so even a function as simple
as
>>  func foo() -> Int32
>> would have to be lowered into IR as
>>  define { i32, i8* } @foo(i8*)
> 
> Indeed, very messy.
> 
> I'm going on a tangent, here, may be all rubbish, but...
> 
> C++ handles exception handling with the exception being thrown
> allocated in library code, not the program. If, like C++, Swift can
> only handle one exception at a time, why can't the error variable be a
> global?
> 
> The ARM back-end accepts the -rreserve-r9 option, and others seem to
> have similar options, so you could use that to force your global
> variable to live on the platform register.
> 
> That way, all your error handling built-ins deal with that global
> variable, which the back-end knows is on registers. You will need a
> special DAG node, but I'm assuming you already have/want one. You also
> drop any problem with arguments and PCS, at least for the error part.
Swift does not run in an independent environment; it has to interact with
existing C code.  That existing code does not reserve any registers globally
for this use.  Even if that were feasible, we don’t actually want to steal a
register globally from all the C code on the system that probably never
interacts with Swift.

John.

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Mar 2016 - RFC: Implementing the Swift calling convention in LLVM and Clang

[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang

[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang

[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang

[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang

[llvm-dev] Proposal for function vectorization and loop vectorization with function calls

[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang

Maybe Matching Threads