thr3ads.net - llvm dev - [llvm-dev] [EXT] [RFC] arm64_32: upstreaming ILP32 support for AArch64 [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Tim Northover via llvm-dev

2019-Jan-31 15:05 UTC

[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64

As you may have noticed, we released a 64b S4 chip that runs an ILP32
variant of the AArch64 ABI, and now we'd like to upstream that work.
I've pushed preliminary patches to
https://github.com/TNorthover/llvm-project/pull/1/commits (arm64_32
branch in that repo) to accompany this RFC. The changes divide fairly
neatly into three categories.

First, there's AArch64 ILP32 support, which should be fairly easy to
adapt to the ELF (or COFF) world and be generally useful. This
involved changing some generic code in ways I'll discuss below.

Then there's the specific ABI we chose, which isn't quite the same as
AAPCS since it was designed in conjunction with armv7k so that IR
could be compiled to be compatible with arm64_32. Since people do use
third-party compilers based on LLVM having it upstream is expected to
be a good thing.

Finally we have a few passes that translate the necessarily
platform-specific parts of armv7k IR to arm64_32. Things like NEON
intrinsic calls and workarounds for certain assumptions the Swift
compiler made about C++ parameter passing. These aren't quite so
obviously useful to everyone, but could serve as examples in future
and (since they're self-contained IR passes) are likely to be low
maintenance. However, we'd understand if the community doesn't want
this
burden anyway.

Most of the target-specific changes are fairly straightforward, but I
think I should explain the changes made to generic CodeGen.

AArch64 ILP32 Addressing Modes
=============================
There are two basic issues with how the current SDAG lowering
interacts with AArch64 addressing-modes in an ILP32 scenario, both
stemming from the fact that all AArch64 addressing modes do 64-bit
arithmetic (unlike amd64, which can be told to do 32-bit arithmetic).
For the non-experts, AArch64 allows calculations like these to appear
in loads and stores:

    [x0, x1] == (add x0, x1)
    [x0, w1, sxtw] == (add x0, (sext w1))
    [x0, w1, uxtw] == (add x0, (zext w1))
    [x0, w1, sxtw #3] == (add x0, (shl (sext w1), 3))
    Plus some more shift modes that are even less relevant here.

The second is particularly important for arm64_32 since it mirrors GEP
semantics.

The first issue is that nothing except an inbounds GEP can really make
use of the extended addressing-modes in general. Most obviously a
2s-complement add has different wrapping behaviour:

    (load (add (Wn=0xffffffff, Wm=1))) != ldr ..., [Xn, Wm, sxtw]

Adding nuw would help here, allowing us to use the "uxtw"
addressing-mode. But nsw doesn't correspondingly allow "sxtw"
because
the AArch64 misbehaving overflow is at the 0xffffffff boundary, which
isn't special for nsw -- the counter-example above still applies.

Moreover, the vast majority of pointer offsets come from GEPs and they
don't map cleanly to either nsw or nuw semantics provided by the DAG.
Pointers are fundamentally unsigned objects, but the offsets are
signed; so you only get nuw when you can prove the offset is positive
(see visitGetElementPtr in SelectionDAGBuilder.cpp).

That leaves inbounds GEPs, which theoretically map very cleanly to the
addressing modes: we know there's no wrapping at any precision so we
don't have to extend everything, and GEP defines offsets to be signed
constants, so we can use sxtw.

The second major issue with using AArch64 addressing modes is that an
i32 in SDAG has undef rather than 0 bits [32,64) when in a 64-bit
register -- a trunc operation maps to EXTRACT_SUBREG (i.e. ignore high
bits) rather than UXTW (zero them). AArch64 addressing-modes do not
extend the base pointer, so they would frequently have to be preceeded
by a manual truncation of the base pointer. In our initial
implementation this contributed to a large code size penalty for
arm64_32.

This motivates two of the changes proposed to generic CodeGen.


CodeGenPrepare:
---------------

We teach CodeGenPrepare to sink GEPs as GEPs, and preserve the
inbounds marker. This is the only way they can possibly be exposed to
SDAG at the basic block level.

Pointers are still 64-bits, tricked ya!
---------------------------------------

The next question was how to expose these GEPs to the SDAG.

I first considered adding an ISD::GEP or an "inbounds" flag to
ISD::ADD. These would solve the first issue above, but not the second.

So the proposed solution is to allow pointers to have different
in-memory and in-DAG types, and specifically keep an i64 pointer in
the DAG on arm64_32. This immediately guarantees that (valid) pointers
will have their high bits zeroed, and just by creating the DAG we make
explicit the sign-extensions described by GEP semantics.

Addressing-modes can then be used with no change to the actual C++
code in AArch64 that selects them.

There are two possible disadvantages though. First, since pointers are
64-bits, they will consume 64-bit spill slots and potentially bloat
the stack. It's unclear how much of an issue that is in practice.

Second is the intrusiveness. On the plus side it's less intrusive than
ISD::GEP would be, but it still involves changes in some fairly
obscure bits of DAG -- often found when things broke rather than by
careful planning.

Details of the arm64_32 ABI
==========================
In outline the arm64_32 ABI is based on AAPCS, with the usual Darwin exceptions:

  * char is a signed type.
  * Anonymous varargs parameters go on the stack (occupying at least 4 bytes).
  * Small parameters are extended by the producer to 32-bits.

There are also a couple of arm64_32 specific changes.

Pointers
--------

Darwin has traditionally taken the view (at odds with AAPCS on
AArch64) that under-sized arguments should be extended by the caller
to the point at which they'll be useful (i.e. mostly i32).

We decided to apply this to pointers for arm64_32 on the grounds that
most uses  of pointers will be as 64-bit quantities. I'm still
wondering if that was the best  decision: It probably is slightly more
efficient,  but it's also not pretty and  didn't solve the issues
I'd
naively hoped it would with memcpy and friends (turns out size_t still
exists!).

Thus, pointers behave differently from intptr_t, and call lowering
code needs to know when it's dealing with one.

Arrays
------

We're translating armv7k bitcode to arm64_32, and the result has to be
compatible with code that is compiled directly to arm64_32.

The biggest barrier here was small structs. They generally get passed
in registers, possibly with alignment requirements.

    struct { int arr[2] }; goes in [rN,rN+1] or in xN.
    struct { uint64_t val; } goes in [rN,rN+1] (starting even), or xN

So we need a way to signal in IR that two values should be combined
into a single x-register when compiled for arm64_32. We chose LLVM
arrays for the job. So, unlike all other targets, the following two
functions will behave differently in arm64_32:

    void @foo([2 x i32] %x0)      ; Two i32s combined into 64-bit x0 register
    void @foo(i32 %w0, i32 %w1)   ; First i32 in w0, second in w1

Details of patch sequence
========================
Here's a brief outline of the patches in the link:

1. CodeGenPrep: sink GEPs as GEPs and preserve inbounds note. As
discussed above, this is a necessary generic change to get good
CodeGen.
2. AArch64: support binutils-like things on arm64_32. Basic Triple,
llvm-objdump support, and other low-level tools that need to
understand the binary format.
3-5. Perparatory changes to generic SelectionDAG to support arm64_32.
The biggest of these is splitting pointer representations in-memory
from those in-register.
6. Main patch adding CodeGen support for arm64_32 to lib/Target/AArch64.
7. The armv7k compatibility passes mentioned above. One of them
replaces ARM intrinsic calls with AARch64 ones (NEON in particular),
one works around an unwarranted assumption in Swift, and one fixes up
an ObjC marker at the module level.
8. Clang support for arm64_32. The usual mix of ABI definitions.
9-15. Various components of FastISel suppor for arm64_32, gradually
bringing it to parity with arm64.
16. compiler-rt support for arm64_32. This includes both builtins and
sanitizers.

Eli Friedman via llvm-dev

2019-Jan-31 19:48 UTC

head link

[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64

Comments inline
> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Tim
Northover
> via llvm-dev
> Sent: Thursday, January 31, 2019 7:06 AM
> To: LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
> Subject: [EXT] [llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for
> AArch64
> 
> CodeGenPrepare:
> ---------------
> 
> We teach CodeGenPrepare to sink GEPs as GEPs, and preserve the
> inbounds marker. This is the only way they can possibly be exposed to
> SDAG at the basic block level.
Isn't addr-sink-using-gep already a thing?
> 
> Pointers are still 64-bits, tricked ya!
> ---------------------------------------
> 
> The next question was how to expose these GEPs to the SDAG.
> 
> I first considered adding an ISD::GEP or an "inbounds" flag to
> ISD::ADD. These would solve the first issue above, but not the second.
> 
> So the proposed solution is to allow pointers to have different
> in-memory and in-DAG types, and specifically keep an i64 pointer in
> the DAG on arm64_32. This immediately guarantees that (valid) pointers
> will have their high bits zeroed, and just by creating the DAG we make
> explicit the sign-extensions described by GEP semantics.
> 
> Addressing-modes can then be used with no change to the actual C++
> code in AArch64 that selects them.
> 
> There are two possible disadvantages though. First, since pointers are
> 64-bits, they will consume 64-bit spill slots and potentially bloat
> the stack. It's unclear how much of an issue that is in practice.
> 
> Second is the intrusiveness. On the plus side it's less intrusive than
> ISD::GEP would be, but it still involves changes in some fairly
> obscure bits of DAG -- often found when things broke rather than by
> careful planning.
Did you consider modeling this with address spaces?  LLVM already has robust
support for address spaces with different pointer sizes, and you probably want
to expose support for 64-bit pointers anyway.
> Arrays
> ------
> 
> We're translating armv7k bitcode to arm64_32, and the result has to be
> compatible with code that is compiled directly to arm64_32.
> 
> The biggest barrier here was small structs. They generally get passed
> in registers, possibly with alignment requirements.
> 
>     struct { int arr[2] }; goes in [rN,rN+1] or in xN.
>     struct { uint64_t val; } goes in [rN,rN+1] (starting even), or xN
> 
> So we need a way to signal in IR that two values should be combined
> into a single x-register when compiled for arm64_32. We chose LLVM
> arrays for the job. So, unlike all other targets, the following two
> functions will behave differently in arm64_32:
> 
>     void @foo([2 x i32] %x0)      ; Two i32s combined into 64-bit x0
register
>     void @foo(i32 %w0, i32 %w1)   ; First i32 in w0, second in w1
I'm not sure I follow the difference between [2 x i32] and i64: if they both
go into a single register, why do you need both?  Or is this necessary to
support your automatic translation pass?

-Eli

Tim Northover via llvm-dev

2019-Feb-01 10:28 UTC

head link

[llvm-dev] [EXT] [RFC] arm64_32: upstreaming ILP32 support for AArch64

Hi Eli,

Thanks for the comments.

On Thu, 31 Jan 2019 at 19:48, Eli Friedman <efriedma at quicinc.com>
wrote:> > We teach CodeGenPrepare to sink GEPs as GEPs, and preserve the
> > inbounds marker. This is the only way they can possibly be exposed to
> > SDAG at the basic block level.
>
> Isn't addr-sink-using-gep already a thing?
Yes, I'm not sure why I wrote that (maybe I saw the new
addrSinkUsingGEPs in a patch and misremembered). It looks like what I
actually did was attempt to decouple the logic. It's currently based
on useAA, which seems to be an orthogonal question to me, so I added a
new virtual function hook. I'm now suspicious of the logic there too,
though. I'll inspect it further before uploading anything for review.
> > Second is the intrusiveness. On the plus side it's less intrusive
than
> > ISD::GEP would be, but it still involves changes in some fairly
> > obscure bits of DAG -- often found when things broke rather than by
> > careful planning.
>
> Did you consider modeling this with address spaces?  LLVM already has
robust support for address spaces with different pointer sizes,
I have to say I didn't, but I don't think it would solve the problem.
Alternate address-spaces still have just one pointer size per space as
far as I'm aware. If that's 64-bits we get efficient CodeGen but
loading or storing a pointer clobbers more data than it should, if
that's 32-bits then we get poor CodeGen.
> and you probably want to expose support for 64-bit pointers anyway.
It's a possibility, though no-one has asked for it yet. The biggest
request we've actually had is for signed 32-bit pointers so that both
TTBR0 and TTBR1 regions can be used. I could see a pretty strong
argument for exposing unsigned pointers via a different address-space
in that regime (for use in user_addr_t in kernel code), though you'd
have to be pretty disciplined to make it work I think.
> I'm not sure I follow the difference between [2 x i32] and i64: if they
both go into a single register, why do you need both?  Or is this necessary to
support your automatic translation pass?
Yep, it's entirely because we need to support code generated for
armv7k. On that platform [2 x i32] and i64 have different alignment
requirements on the stack; [2 x i32] would be used for struct {
int32_t val[2]; }, i64 would be used for struct { int64_t val; }. But
because AArch64 AAPCS puts more data in registers, some of these args
generated for the stack go in registers on arm64_32.

Cheers.

Tim.

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Feb 2019 - [EXT] [RFC] arm64_32: upstreaming ILP32 support for AArch64

[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64

[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64

[llvm-dev] [EXT] [RFC] arm64_32: upstreaming ILP32 support for AArch64

Reasonably Related Threads