Tim Northover via llvm-dev
2019-Jan-31 15:05 UTC
[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64
As you may have noticed, we released a 64b S4 chip that runs an ILP32 variant of the AArch64 ABI, and now we'd like to upstream that work. I've pushed preliminary patches to https://github.com/TNorthover/llvm-project/pull/1/commits (arm64_32 branch in that repo) to accompany this RFC. The changes divide fairly neatly into three categories. First, there's AArch64 ILP32 support, which should be fairly easy to adapt to the ELF (or COFF) world and be generally useful. This involved changing some generic code in ways I'll discuss below. Then there's the specific ABI we chose, which isn't quite the same as AAPCS since it was designed in conjunction with armv7k so that IR could be compiled to be compatible with arm64_32. Since people do use third-party compilers based on LLVM having it upstream is expected to be a good thing. Finally we have a few passes that translate the necessarily platform-specific parts of armv7k IR to arm64_32. Things like NEON intrinsic calls and workarounds for certain assumptions the Swift compiler made about C++ parameter passing. These aren't quite so obviously useful to everyone, but could serve as examples in future and (since they're self-contained IR passes) are likely to be low maintenance. However, we'd understand if the community doesn't want this burden anyway. Most of the target-specific changes are fairly straightforward, but I think I should explain the changes made to generic CodeGen. AArch64 ILP32 Addressing Modes ============================= There are two basic issues with how the current SDAG lowering interacts with AArch64 addressing-modes in an ILP32 scenario, both stemming from the fact that all AArch64 addressing modes do 64-bit arithmetic (unlike amd64, which can be told to do 32-bit arithmetic). For the non-experts, AArch64 allows calculations like these to appear in loads and stores: [x0, x1] == (add x0, x1) [x0, w1, sxtw] == (add x0, (sext w1)) [x0, w1, uxtw] == (add x0, (zext w1)) [x0, w1, sxtw #3] == (add x0, (shl (sext w1), 3)) Plus some more shift modes that are even less relevant here. The second is particularly important for arm64_32 since it mirrors GEP semantics. The first issue is that nothing except an inbounds GEP can really make use of the extended addressing-modes in general. Most obviously a 2s-complement add has different wrapping behaviour: (load (add (Wn=0xffffffff, Wm=1))) != ldr ..., [Xn, Wm, sxtw] Adding nuw would help here, allowing us to use the "uxtw" addressing-mode. But nsw doesn't correspondingly allow "sxtw" because the AArch64 misbehaving overflow is at the 0xffffffff boundary, which isn't special for nsw -- the counter-example above still applies. Moreover, the vast majority of pointer offsets come from GEPs and they don't map cleanly to either nsw or nuw semantics provided by the DAG. Pointers are fundamentally unsigned objects, but the offsets are signed; so you only get nuw when you can prove the offset is positive (see visitGetElementPtr in SelectionDAGBuilder.cpp). That leaves inbounds GEPs, which theoretically map very cleanly to the addressing modes: we know there's no wrapping at any precision so we don't have to extend everything, and GEP defines offsets to be signed constants, so we can use sxtw. The second major issue with using AArch64 addressing modes is that an i32 in SDAG has undef rather than 0 bits [32,64) when in a 64-bit register -- a trunc operation maps to EXTRACT_SUBREG (i.e. ignore high bits) rather than UXTW (zero them). AArch64 addressing-modes do not extend the base pointer, so they would frequently have to be preceeded by a manual truncation of the base pointer. In our initial implementation this contributed to a large code size penalty for arm64_32. This motivates two of the changes proposed to generic CodeGen. CodeGenPrepare: --------------- We teach CodeGenPrepare to sink GEPs as GEPs, and preserve the inbounds marker. This is the only way they can possibly be exposed to SDAG at the basic block level. Pointers are still 64-bits, tricked ya! --------------------------------------- The next question was how to expose these GEPs to the SDAG. I first considered adding an ISD::GEP or an "inbounds" flag to ISD::ADD. These would solve the first issue above, but not the second. So the proposed solution is to allow pointers to have different in-memory and in-DAG types, and specifically keep an i64 pointer in the DAG on arm64_32. This immediately guarantees that (valid) pointers will have their high bits zeroed, and just by creating the DAG we make explicit the sign-extensions described by GEP semantics. Addressing-modes can then be used with no change to the actual C++ code in AArch64 that selects them. There are two possible disadvantages though. First, since pointers are 64-bits, they will consume 64-bit spill slots and potentially bloat the stack. It's unclear how much of an issue that is in practice. Second is the intrusiveness. On the plus side it's less intrusive than ISD::GEP would be, but it still involves changes in some fairly obscure bits of DAG -- often found when things broke rather than by careful planning. Details of the arm64_32 ABI ========================== In outline the arm64_32 ABI is based on AAPCS, with the usual Darwin exceptions: * char is a signed type. * Anonymous varargs parameters go on the stack (occupying at least 4 bytes). * Small parameters are extended by the producer to 32-bits. There are also a couple of arm64_32 specific changes. Pointers -------- Darwin has traditionally taken the view (at odds with AAPCS on AArch64) that under-sized arguments should be extended by the caller to the point at which they'll be useful (i.e. mostly i32). We decided to apply this to pointers for arm64_32 on the grounds that most uses of pointers will be as 64-bit quantities. I'm still wondering if that was the best decision: It probably is slightly more efficient, but it's also not pretty and didn't solve the issues I'd naively hoped it would with memcpy and friends (turns out size_t still exists!). Thus, pointers behave differently from intptr_t, and call lowering code needs to know when it's dealing with one. Arrays ------ We're translating armv7k bitcode to arm64_32, and the result has to be compatible with code that is compiled directly to arm64_32. The biggest barrier here was small structs. They generally get passed in registers, possibly with alignment requirements. struct { int arr[2] }; goes in [rN,rN+1] or in xN. struct { uint64_t val; } goes in [rN,rN+1] (starting even), or xN So we need a way to signal in IR that two values should be combined into a single x-register when compiled for arm64_32. We chose LLVM arrays for the job. So, unlike all other targets, the following two functions will behave differently in arm64_32: void @foo([2 x i32] %x0) ; Two i32s combined into 64-bit x0 register void @foo(i32 %w0, i32 %w1) ; First i32 in w0, second in w1 Details of patch sequence ======================== Here's a brief outline of the patches in the link: 1. CodeGenPrep: sink GEPs as GEPs and preserve inbounds note. As discussed above, this is a necessary generic change to get good CodeGen. 2. AArch64: support binutils-like things on arm64_32. Basic Triple, llvm-objdump support, and other low-level tools that need to understand the binary format. 3-5. Perparatory changes to generic SelectionDAG to support arm64_32. The biggest of these is splitting pointer representations in-memory from those in-register. 6. Main patch adding CodeGen support for arm64_32 to lib/Target/AArch64. 7. The armv7k compatibility passes mentioned above. One of them replaces ARM intrinsic calls with AARch64 ones (NEON in particular), one works around an unwarranted assumption in Swift, and one fixes up an ObjC marker at the module level. 8. Clang support for arm64_32. The usual mix of ABI definitions. 9-15. Various components of FastISel suppor for arm64_32, gradually bringing it to parity with arm64. 16. compiler-rt support for arm64_32. This includes both builtins and sanitizers.
Eli Friedman via llvm-dev
2019-Jan-31 19:48 UTC
[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64
Comments inline> -----Original Message----- > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Tim Northover > via llvm-dev > Sent: Thursday, January 31, 2019 7:06 AM > To: LLVM Developers Mailing List <llvm-dev at lists.llvm.org> > Subject: [EXT] [llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for > AArch64 > > CodeGenPrepare: > --------------- > > We teach CodeGenPrepare to sink GEPs as GEPs, and preserve the > inbounds marker. This is the only way they can possibly be exposed to > SDAG at the basic block level.Isn't addr-sink-using-gep already a thing?> > Pointers are still 64-bits, tricked ya! > --------------------------------------- > > The next question was how to expose these GEPs to the SDAG. > > I first considered adding an ISD::GEP or an "inbounds" flag to > ISD::ADD. These would solve the first issue above, but not the second. > > So the proposed solution is to allow pointers to have different > in-memory and in-DAG types, and specifically keep an i64 pointer in > the DAG on arm64_32. This immediately guarantees that (valid) pointers > will have their high bits zeroed, and just by creating the DAG we make > explicit the sign-extensions described by GEP semantics. > > Addressing-modes can then be used with no change to the actual C++ > code in AArch64 that selects them. > > There are two possible disadvantages though. First, since pointers are > 64-bits, they will consume 64-bit spill slots and potentially bloat > the stack. It's unclear how much of an issue that is in practice. > > Second is the intrusiveness. On the plus side it's less intrusive than > ISD::GEP would be, but it still involves changes in some fairly > obscure bits of DAG -- often found when things broke rather than by > careful planning.Did you consider modeling this with address spaces? LLVM already has robust support for address spaces with different pointer sizes, and you probably want to expose support for 64-bit pointers anyway.> Arrays > ------ > > We're translating armv7k bitcode to arm64_32, and the result has to be > compatible with code that is compiled directly to arm64_32. > > The biggest barrier here was small structs. They generally get passed > in registers, possibly with alignment requirements. > > struct { int arr[2] }; goes in [rN,rN+1] or in xN. > struct { uint64_t val; } goes in [rN,rN+1] (starting even), or xN > > So we need a way to signal in IR that two values should be combined > into a single x-register when compiled for arm64_32. We chose LLVM > arrays for the job. So, unlike all other targets, the following two > functions will behave differently in arm64_32: > > void @foo([2 x i32] %x0) ; Two i32s combined into 64-bit x0 register > void @foo(i32 %w0, i32 %w1) ; First i32 in w0, second in w1I'm not sure I follow the difference between [2 x i32] and i64: if they both go into a single register, why do you need both? Or is this necessary to support your automatic translation pass? -Eli
Tim Northover via llvm-dev
2019-Feb-01 10:28 UTC
[llvm-dev] [EXT] [RFC] arm64_32: upstreaming ILP32 support for AArch64
Hi Eli, Thanks for the comments. On Thu, 31 Jan 2019 at 19:48, Eli Friedman <efriedma at quicinc.com> wrote:> > We teach CodeGenPrepare to sink GEPs as GEPs, and preserve the > > inbounds marker. This is the only way they can possibly be exposed to > > SDAG at the basic block level. > > Isn't addr-sink-using-gep already a thing?Yes, I'm not sure why I wrote that (maybe I saw the new addrSinkUsingGEPs in a patch and misremembered). It looks like what I actually did was attempt to decouple the logic. It's currently based on useAA, which seems to be an orthogonal question to me, so I added a new virtual function hook. I'm now suspicious of the logic there too, though. I'll inspect it further before uploading anything for review.> > Second is the intrusiveness. On the plus side it's less intrusive than > > ISD::GEP would be, but it still involves changes in some fairly > > obscure bits of DAG -- often found when things broke rather than by > > careful planning. > > Did you consider modeling this with address spaces? LLVM already has robust support for address spaces with different pointer sizes,I have to say I didn't, but I don't think it would solve the problem. Alternate address-spaces still have just one pointer size per space as far as I'm aware. If that's 64-bits we get efficient CodeGen but loading or storing a pointer clobbers more data than it should, if that's 32-bits then we get poor CodeGen.> and you probably want to expose support for 64-bit pointers anyway.It's a possibility, though no-one has asked for it yet. The biggest request we've actually had is for signed 32-bit pointers so that both TTBR0 and TTBR1 regions can be used. I could see a pretty strong argument for exposing unsigned pointers via a different address-space in that regime (for use in user_addr_t in kernel code), though you'd have to be pretty disciplined to make it work I think.> I'm not sure I follow the difference between [2 x i32] and i64: if they both go into a single register, why do you need both? Or is this necessary to support your automatic translation pass?Yep, it's entirely because we need to support code generated for armv7k. On that platform [2 x i32] and i64 have different alignment requirements on the stack; [2 x i32] would be used for struct { int32_t val[2]; }, i64 would be used for struct { int64_t val; }. But because AArch64 AAPCS puts more data in registers, some of these args generated for the stack go in registers on arm64_32. Cheers. Tim.
Possibly Parallel Threads
- [EXT] [RFC] arm64_32: upstreaming ILP32 support for AArch64
- [RFC] arm64_32: upstreaming ILP32 support for AArch64
- [RFC] arm64_32: upstreaming ILP32 support for AArch64
- [RFC] arm64_32: upstreaming ILP32 support for AArch64
- [RFC] arm64_32: upstreaming ILP32 support for AArch64