thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM) [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Chris Lattner via llvm-dev

2020-Jun-03 23:26 UTC

[llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

> On Jun 2, 2020, at 4:21 PM, comex via cfe-dev <cfe-dev at
lists.llvm.org> wrote:
> 
> While this is a different area of the codebase, another thing that
> would benefit greatly from being moved out of Clang is function call
> ABI handling.  Currently, that handling is split awkwardly between
> Clang and LLVM proper, forcing frontends that implement C FFI to
> either recreate the Clang parts themselves (like Rust does), depend on
> Clang (like Swift does), or live with FFI just not working with some
> function signatures.  I'm not sure what Flang currently does, but my
> understanding is that Flang does support C FFI, so it would probably
> benefit from this as well.  Just something to consider. :)
For what its worth, I think there is a pretty clear path on this, but it hinges
on Clang moving to MLIR as its code generation backend (an intermediary to
generating LLVM IR).

The approach is to factor the ABI lower part of clang out of Clang itself into a
specific dialect lowering pass, that works on a generic C type system (plus
callout to extended type systems).  MLIR has all the infra to support this, it
is just a massive job to refactor all the things to change clang’s architecture.

I also don’t think there is broad consensus on the direction for Clang here, but
given that Flang is already using MLIR for this, maybe it would make sense to
start work there.

If you’re curious, I co-delivered a talk about this recently, the slides are
available here
<https://docs.google.com/presentation/d/11-VjSNNNJoRhPlLxFgvtb909it1WNdxTnQFipryfAPU/edit#slide=id.g7d334b12e5_0_4>.

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200603/6290f880/attachment.html>

James Y Knight via llvm-dev

2020-Jun-04 04:54 UTC

head link

[llvm-dev] [cfe-dev] Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM)

While MLIR may be one part of the solution, I think it's also the case that
the function-ABI interface between Clang and LLVM is just wrong and should
be fixed -- independently of whether Clang might use MLIR in the future.

I've mentioned this idea before, I think, but never got around to writing
up a real proposal. And I still haven't. Maybe this email could inspire
someone else to work on that.

Essentially, I'd like to see the code in Clang responsible for function
parameter-type mangling as part of its ABI lowering deleted. Currently,
there is a secret "LLVM IR" ABI used between Clang and LLVM, which
involves
expanding some arguments into multiple arguments, adding a smattering of
"inreg" or "byval" attributes, and converting some types
into other types.
All in a completely target-dependent, complex, and undocumented manner.

So, while the IR function syntax appears at first glance to be generic and
target-independent, that's not at all true. Sadly, in some cases, clang
must even know how many registers different calling conventions use, and
count numbers of available registers left, in order to choose the right set
of those "generic" attributes to put on a parameter.

So: not only does a frontend need to understand the C ABI rules, they also
need to understand that complex dance for how to convert that into LLVM IR
-- and that's both completely undocumented, and a huge mess.

Instead, I believe clang should always pass function parameters in a
"naive" fashion. E.g. if a parameter type is "struct X", the
llvm function
should be lowered to LLVM IR with a function parameter of type %struct.X.
The decision on whether to then pass that in a register (or multiple
registers), on the stack, padded and then passed on the stack, etc, should
be the responsibility of LLVM. Only in the case of C++ types which *must* be
passed indirectly for correctness, independent of calling convention ABI,
should clang be explicitly making the decision to pass indirectly.

Of course, the tricky part is that LLVM doesn't -- and shouldn't -- have
the full C type system available to it, and the full C type system
typically is required to evaluate the ABI rules (e.g., distinguishing a
"_Complex float" from a struct containing two floats).

Therefore, in order to communicate the correct ABI information to LLVM, I'd
like clang to also emit *explicitly-ABI-specific* data (metadata?),
reflecting the extra information that the ABI rules require the backend to
know about the type. E.g., for X86_64, clang needs to inform LLVM of the
classification for each parameter's type into MEMORY, INTEGER, SSE, SSEUP,
X87, X87UP, COMPLEX_X87. Or, for PPC64 elfv2, Clang needs to inform LLVM
when a structure should be treated as a "homogenous aggregate" of
floating-point or vector type. (In both cases, that information cannot
correctly be extracted from the LLVM IR struct type, only from the C type
system.)

We should document what data is needed, for each architecture/abi. This
required data should be as straightforward an application of the ABI
document's rules as possible -- and be only the minimum data necessary.

If this is done, frontends (either a new one, or Clang itself) who want to
use the C ABI have a significantly simpler task. It remains non-trivial --
you do still need to understand ABI-specific rules, and write ABI-specific
code to generate ABI-specific metadata. But, at least the interface
boundary has become something which is readily-understandable and
implementable based on the ABI documents.

All that said, an MLIR encoding of the C type system can still be useful --
it could contain the code which distills the C types into the ABI-specific
metadata. But, I  see that as less important than getting the fundamentals
in LLVM-IR into a better shape. Even frontends without a C type system
representation should still be able to generate LLVM IR which conforms in
their own manner to the documented ABIs -- without it being super painful.
Also, the code in Clang now is really confusing, and nearly unmaintainable;
it would be a clear improvement to be able to eliminate the majority of it,
not just move it into an MLIR dialect.

On Wed, Jun 3, 2020 at 7:26 PM Chris Lattner via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> On Jun 2, 2020, at 4:21 PM, comex via cfe-dev <cfe-dev at
lists.llvm.org>
> wrote:
>
> While this is a different area of the codebase, another thing that
> would benefit greatly from being moved out of Clang is function call
> ABI handling.  Currently, that handling is split awkwardly between
> Clang and LLVM proper, forcing frontends that implement C FFI to
> either recreate the Clang parts themselves (like Rust does), depend on
> Clang (like Swift does), or live with FFI just not working with some
> function signatures.  I'm not sure what Flang currently does, but my
> understanding is that Flang does support C FFI, so it would probably
> benefit from this as well.  Just something to consider. :)
>
>
> For what its worth, I think there is a pretty clear path on this, but it
> hinges on Clang moving to MLIR as its code generation backend (an
> intermediary to generating LLVM IR).
>
> The approach is to factor the ABI lower part of clang out of Clang itself
> into a specific dialect lowering pass, that works on a generic C type
> system (plus callout to extended type systems).  MLIR has all the infra to
> support this, it is just a massive job to refactor all the things to change
> clang’s architecture.
>
> I also don’t think there is broad consensus on the direction for Clang
> here, but given that Flang is already using MLIR for this, maybe it would
> make sense to start work there.
>
> If you’re curious, I co-delivered a talk about this recently, the slides
are
> available here
>
<https://docs.google.com/presentation/d/11-VjSNNNJoRhPlLxFgvtb909it1WNdxTnQFipryfAPU/edit#slide=id.g7d334b12e5_0_4>
> .
>
> -Chris
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200604/7e600dbb/attachment.html>

comex via llvm-dev

2020-Jun-04 08:20 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

On Wed, Jun 3, 2020 at 4:26 PM Chris Lattner <clattner at nondot.org>
wrote:> On Jun 2, 2020, at 4:21 PM, comex via cfe-dev <cfe-dev at
lists.llvm.org> wrote:
>
> While this is a different area of the codebase, another thing that
> would benefit greatly from being moved out of Clang is function call
> ABI handling.  Currently, that handling is split awkwardly between
> Clang and LLVM proper, forcing frontends that implement C FFI to
> either recreate the Clang parts themselves (like Rust does), depend on
> Clang (like Swift does), or live with FFI just not working with some
> function signatures.  I'm not sure what Flang currently does, but my
> understanding is that Flang does support C FFI, so it would probably
> benefit from this as well.  Just something to consider. :)
>
>
> For what its worth, I think there is a pretty clear path on this, but it
hinges on Clang moving to MLIR as its code generation backend (an intermediary
to generating LLVM IR).
I'd be interested in seeing a higher-level Clang IR for many different
reasons. :) On the other hand, when it comes to calling conventions,
at least some of the things currently handled by Clang seem like they
would fit well into the existing LLVM IR.

For example, this C code, compiled for x86-64 Unix:

struct foo { uint64_t a, b; };
struct foo get_foo() { return (struct foo){0, 1}; }

is translated straightforwardly to LLVM IR (trimmed for readability):

define { i64, i64 } @get_foo() {
 ret { i64, i64 } { i64 0, i64 1 }
}

and the generated assembly returns the values in RAX and RDX,
corresponding to the C ABI.

If you add a third field to the struct, the ABI demands the struct be
returned in memory with a hidden parameter.  Rather than leave this to
LLVM, Clang implements this itself, generating IR like:

define void @get_foo(%struct.foo* noalias nocapture sret align 8 %0) {
 // ...
}

If, however, you instead modify the IR from the two-field case to add
a third field:

define { i64, i64, i64 } @get_foo() {
 ret { i64, i64, i64 } { i64 0, i64 1, i64 2 }
}

...LLVM accepts it, but the generated assembly returns the values in
RAX, RDX, and *RCX*, which is not part of the ABI at all!

If you proceed to add a fourth field, LLVM suddenly decides to handle
the out-parameter transformation itself, so all is well again.  Except
that the transformation seemingly happens too late in the pipeline, so
the generated code isn't vectorized.  (I'm not sure exactly how this
works.)

In these examples, I'd say LLVM IR is capable of expressing the
desired semantics ('follow the C ABI for returning a struct with these
fields'), and LLVM tries to implement those semantics, but it's
slightly off.  And then Clang papers over that by reimplementing parts
of those semantics itself, and only generating IR that LLVM does
handle correctly.  This seems inelegant to me; it would be better if
LLVM just 'did the right thing' here.

On the other hand, LLVM IR struct returns aren't currently expressive
enough to handle *all* C/C++ struct returns; you run into problems
with things like C++ guaranteed copy elision (which effectively
exposes the out pointer directly to user code), and call ABIs
depending on arcane C++ concepts like 'trivial for the purposes of
calls'.  I suppose a higher-level IR might help here... but I think an
ideal design might put parts of this in LLVM IR as well.  I'm not
sure.

Alexey Bataev via llvm-dev

2020-Jun-04 12:51 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

+1 for clang emitting MLIR.

Best regards,
Alexey Bataev

3 июня 2020 г., в 19:26, Chris Lattner via llvm-dev <llvm-dev at
lists.llvm.org> написал(а):



On Jun 2, 2020, at 4:21 PM, comex via cfe-dev <cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:

While this is a different area of the codebase, another thing that
would benefit greatly from being moved out of Clang is function call
ABI handling.  Currently, that handling is split awkwardly between
Clang and LLVM proper, forcing frontends that implement C FFI to
either recreate the Clang parts themselves (like Rust does), depend on
Clang (like Swift does), or live with FFI just not working with some
function signatures.  I'm not sure what Flang currently does, but my
understanding is that Flang does support C FFI, so it would probably
benefit from this as well.  Just something to consider. :)

For what its worth, I think there is a pretty clear path on this, but it hinges
on Clang moving to MLIR as its code generation backend (an intermediary to
generating LLVM IR).

The approach is to factor the ABI lower part of clang out of Clang itself into a
specific dialect lowering pass, that works on a generic C type system (plus
callout to extended type systems).  MLIR has all the infra to support this, it
is just a massive job to refactor all the things to change clang’s architecture.

I also don’t think there is broad consensus on the direction for Clang here, but
given that Flang is already using MLIR for this, maybe it would make sense to
start work there.

If you’re curious, I co-delivered a talk about this recently, the slides are
available
here<https://docs.google.com/presentation/d/11-VjSNNNJoRhPlLxFgvtb909it1WNdxTnQFipryfAPU/edit#slide=id.g7d334b12e5_0_4>.

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200604/15226e45/attachment.html>

John McCall via llvm-dev

2020-Jun-04 21:45 UTC

head link

[llvm-dev] [cfe-dev] Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM)

On 4 Jun 2020, at 0:54, James Y Knight via llvm-dev
wrote:> While MLIR may be one part of the solution, I think it's also the case 
> that
> the function-ABI interface between Clang and LLVM is just wrong and 
> should
> be fixed -- independently of whether Clang might use MLIR in the 
> future.
>
> I've mentioned this idea before, I think, but never got around to 
> writing
> up a real proposal. And I still haven't. Maybe this email could 
> inspire
> someone else to work on that.
>
> Essentially, I'd like to see the code in Clang responsible for 
> function
> parameter-type mangling as part of its ABI lowering deleted. 
> Currently,
> there is a secret "LLVM IR" ABI used between Clang and LLVM,
which
> involves
> expanding some arguments into multiple arguments, adding a smattering 
> of
> "inreg" or "byval" attributes, and converting some
types into other
> types.
> All in a completely target-dependent, complex, and undocumented 
> manner.
>
> So, while the IR function syntax appears at first glance to be generic 
> and
> target-independent, that's not at all true. Sadly, in some cases, 
> clang
> must even know how many registers different calling conventions use, 
> and
> count numbers of available registers left, in order to choose the 
> right set
> of those "generic" attributes to put on a parameter.
>
> So: not only does a frontend need to understand the C ABI rules, they 
> also
> need to understand that complex dance for how to convert that into 
> LLVM IR
> -- and that's both completely undocumented, and a huge mess.
>
> Instead, I believe clang should always pass function parameters in a
> "naive" fashion. E.g. if a parameter type is "struct
X", the llvm
> function
> should be lowered to LLVM IR with a function parameter of type 
> %struct.X.
> The decision on whether to then pass that in a register (or multiple
> registers), on the stack, padded and then passed on the stack, etc, 
> should
> be the responsibility of LLVM. Only in the case of C++ types which 
> *must* be
> passed indirectly for correctness, independent of calling convention 
> ABI,
> should clang be explicitly making the decision to pass indirectly.
>
> Of course, the tricky part is that LLVM doesn't -- and shouldn't --
> have
> the full C type system available to it, and the full C type system
> typically is required to evaluate the ABI rules (e.g., distinguishing 
> a
> "_Complex float" from a struct containing two floats).
>
> Therefore, in order to communicate the correct ABI information to 
> LLVM, I'd
> like clang to also emit *explicitly-ABI-specific* data (metadata?),
> reflecting the extra information that the ABI rules require the 
> backend to
> know about the type. E.g., for X86_64, clang needs to inform LLVM of 
> the
> classification for each parameter's type into MEMORY, INTEGER, SSE, 
> SSEUP,
> X87, X87UP, COMPLEX_X87. Or, for PPC64 elfv2, Clang needs to inform 
> LLVM
> when a structure should be treated as a "homogenous aggregate" of
> floating-point or vector type. (In both cases, that information cannot
> correctly be extracted from the LLVM IR struct type, only from the C 
> type
> system.)
These attributes would have to spell out the exact expected treatment by 
the backend in essentially every aggregate case, and the frontend would 
have to carefully select that treatment, and for many ABIs that would 
still require counting registers and so on.  I do actually like this 
approach in many ways, because it provides a path to a world where the 
backend stop permissively compiling everything the frontend throws at it 
and instead emits an error if the frontend asks for something that 
can’t be done, but it’s not going to make things more abstract.

Having worked in this space for years, I am convinced that there are two 
meaningful points for ABI lowering: (1) the high-level source-language 
information and (2) the low-level register and stack conventions.  (1), 
for C interop, is always going to be duplicative of Clang.  You can 
introduce an intermediate library and make Clang copy all relevant 
information out of its AST into that library’s type system, but 
fundamentally “all relevant information” is going to just keep 
expanding and expanding, and Clang is still going to have a ton of 
target-specific ABI lowering code to do that propagation.

John.

David Greene via llvm-dev

2020-Jun-09 20:31 UTC

head link

[llvm-dev] [cfe-dev] Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM)

James Y Knight via llvm-dev <llvm-dev at lists.llvm.org> writes:
> While MLIR may be one part of the solution, I think it's also the case
that
> the function-ABI interface between Clang and LLVM is just wrong and should
> be fixed -- independently of whether Clang might use MLIR in the future.
[...]
> All that said, an MLIR encoding of the C type system can still be useful --
> it could contain the code which distills the C types into the ABI-specific
> metadata. But, I  see that as less important than getting the fundamentals
> in LLVM-IR into a better shape. Even frontends without a C type system
> representation should still be able to generate LLVM IR which conforms in
> their own manner to the documented ABIs -- without it being super painful.
> Also, the code in Clang now is really confusing, and nearly unmaintainable;
> it would be a clear improvement to be able to eliminate the majority of it,
> not just move it into an MLIR dialect.
+1.  Having had to implement an interface between a proprietary frontend
and LLVM, it's a royal pain.  I've had to do it for three different
targets and each target has its own quirks.  A better frontend->LLVM
interface for ABI concerns would be a huge improvement.  I see moving to
MLIR as pretty orthogonal.  Definitely useful, don't get me wrong, but I
don't want to see that very complex process hold up improvements in the
ABI area.

                -David

llvm dev - Jun 2020 - [cfe-dev] Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM)

[llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

[llvm-dev] [cfe-dev] Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM)

[llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

[llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

[llvm-dev] [cfe-dev] Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM)

[llvm-dev] [cfe-dev] Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM)