Tim Northover via llvm-dev
2019-Feb-01 10:28 UTC
[llvm-dev] [EXT] [RFC] arm64_32: upstreaming ILP32 support for AArch64
Hi Eli, Thanks for the comments. On Thu, 31 Jan 2019 at 19:48, Eli Friedman <efriedma at quicinc.com> wrote:> > We teach CodeGenPrepare to sink GEPs as GEPs, and preserve the > > inbounds marker. This is the only way they can possibly be exposed to > > SDAG at the basic block level. > > Isn't addr-sink-using-gep already a thing?Yes, I'm not sure why I wrote that (maybe I saw the new addrSinkUsingGEPs in a patch and misremembered). It looks like what I actually did was attempt to decouple the logic. It's currently based on useAA, which seems to be an orthogonal question to me, so I added a new virtual function hook. I'm now suspicious of the logic there too, though. I'll inspect it further before uploading anything for review.> > Second is the intrusiveness. On the plus side it's less intrusive than > > ISD::GEP would be, but it still involves changes in some fairly > > obscure bits of DAG -- often found when things broke rather than by > > careful planning. > > Did you consider modeling this with address spaces? LLVM already has robust support for address spaces with different pointer sizes,I have to say I didn't, but I don't think it would solve the problem. Alternate address-spaces still have just one pointer size per space as far as I'm aware. If that's 64-bits we get efficient CodeGen but loading or storing a pointer clobbers more data than it should, if that's 32-bits then we get poor CodeGen.> and you probably want to expose support for 64-bit pointers anyway.It's a possibility, though no-one has asked for it yet. The biggest request we've actually had is for signed 32-bit pointers so that both TTBR0 and TTBR1 regions can be used. I could see a pretty strong argument for exposing unsigned pointers via a different address-space in that regime (for use in user_addr_t in kernel code), though you'd have to be pretty disciplined to make it work I think.> I'm not sure I follow the difference between [2 x i32] and i64: if they both go into a single register, why do you need both? Or is this necessary to support your automatic translation pass?Yep, it's entirely because we need to support code generated for armv7k. On that platform [2 x i32] and i64 have different alignment requirements on the stack; [2 x i32] would be used for struct { int32_t val[2]; }, i64 would be used for struct { int64_t val; }. But because AArch64 AAPCS puts more data in registers, some of these args generated for the stack go in registers on arm64_32. Cheers. Tim.
Eli Friedman via llvm-dev
2019-Feb-01 19:25 UTC
[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64
> -----Original Message----- > From: Tim Northover <t.p.northover at gmail.com> > Sent: Friday, February 1, 2019 2:28 AM > To: Eli Friedman <efriedma at quicinc.com> > Cc: llvm-dev at lists.llvm.org > Subject: Re: [EXT] [llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for > AArch64 > > > > Second is the intrusiveness. On the plus side it's less intrusive than > > > ISD::GEP would be, but it still involves changes in some fairly > > > obscure bits of DAG -- often found when things broke rather than by > > > careful planning. > > > > Did you consider modeling this with address spaces? LLVM already has robust > support for address spaces with different pointer sizes, > > I have to say I didn't, but I don't think it would solve the problem. > Alternate address-spaces still have just one pointer size per space as > far as I'm aware. If that's 64-bits we get efficient CodeGen but > loading or storing a pointer clobbers more data than it should, if > that's 32-bits then we get poor CodeGen. >I was thinking of a model something like this: 32-bit pointers are addrspace 0, 64-bit pointers are addrspace 1. ISD::LOAD/STORE in addrspace 0 are not legal: they're custom-lowered to operations in addrspace 1. (An addrspacecast from 0 to 1 is just zero-extension.) At that point, since the cast from 32 bits to 64 bits is explicitly represented, we can optimize it in the DAG or IR. For example, we can transform a load of an inbounds gep in addrspace 0 into to a load of an inbounds gep in addrspace 1. I don't know that this ends up being easier to implement overall, but the model is closer to what the hardware actually supports, and it involves fewer changes to target-independent code. -Eli
Matt Arsenault via llvm-dev
2019-Feb-01 19:35 UTC
[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64
> On Feb 1, 2019, at 2:25 PM, Eli Friedman via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > I was thinking of a model something like this: 32-bit pointers are addrspace 0, 64-bit pointers are addrspace 1. ISD::LOAD/STORE in addrspace 0 are not legal: they're custom-lowered to operations in addrspace 1. (An addrspacecast from 0 to 1 is just zero-extension.) At that point, since the cast from 32 bits to 64 bits is explicitly represented, we can optimize it in the DAG or IR. For example, we can transform a load of an inbounds gep in addrspace 0 into to a load of an inbounds gep in addrspace 1.+1 This is basically what we do for one address space on AMDGPU -Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190201/9015f006/attachment.html>
Tim Northover via llvm-dev
2019-Feb-01 20:00 UTC
[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64
On Fri, 1 Feb 2019 at 19:25, Eli Friedman <efriedma at quicinc.com> wrote:> > Alternate address-spaces still have just one pointer size per space as > > far as I'm aware. If that's 64-bits we get efficient CodeGen but > > loading or storing a pointer clobbers more data than it should, if > > that's 32-bits then we get poor CodeGen. > > I was thinking of a model something like this: 32-bit pointers are addrspace 0, 64-bit pointers are addrspace 1. ISD::LOAD/STORE in addrspace 0 are not legal: they're custom-lowered to operations in addrspace 1. (An addrspacecast from 0 to 1 is just zero-extension.) At that point, since the cast from 32 bits to 64 bits is explicitly represented, we can optimize it in the DAG or IR. For example, we can transform a load of an inbounds gep in addrspace 0 into to a load of an inbounds gep in addrspace 1.That would have to be an IR-level pass I think; otherwise the default MVT for any J. Random Pointer Value is still i32, leading to the same efficiency issues when you eventually use that on a load/store. With a pass, within a function you ought to be able to promote all uses of addrspace(0) to addrspace(1), leaving (as you say) addrspacecasts at opaque sources and sinks (loads, stores, args, return, ...). Structs containing pointers would be (very?) messy. And you'd probably want it earlyish to recombine things. I do like LLVM passes as a solution for most problems, and it ought to give a big head start to GlobalISel implementation too. I'll definitely give it a go as an alternative next week. Cheers. Tim.
Tim Northover via llvm-dev
2019-Feb-06 15:35 UTC
[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64
Hi again, On Fri, 1 Feb 2019 at 19:25, Eli Friedman <efriedma at quicinc.com> wrote:> I don't know that this ends up being easier to implement overall, but the model is closer to what the hardware actually supports, and it involves fewer changes to target-independent code.I've now got something about largely working via an IR-level lowering pass (pushed to GitHub as https://github.com/TNorthover/llvm-project/tree/arm64_32-arch-pass, please excuse any artefacts of incompleteness). I feel like it's rapidly approaching an unpalatability horizon though. Most issues stem from the fact that not all pointers are visible or controllable in the IR: + FrameIndices: you can't change an alloca's address-space since it's fixed by the DataLayout. So they get through to the DAG as i32s, significantly complicating the Addressing-mode logic. + ConstantPool accesses are automatically put into addrspace(0) + BlockAddress is similar. + Some intrinsics are not polymorphic on pointer type, and adapting those that are is messy. + Returns demoted to x8-indirect are always implemented by stores in addrspace(0). I don't think any of these are truly insurmountable, but they do mean that the backend would have to cope with both i32 and i64 pointers in fairly ad-hoc ways, and add a lot of complexity to the approach. I think it's reached the point where the added complexity in AArch64 has outweighed the benefits to SelectionDAG so I'm inclined to stick with the original approach for now. Cheers. Tim.
Apparently Analagous Threads
- [RFC] arm64_32: upstreaming ILP32 support for AArch64
- [RFC] arm64_32: upstreaming ILP32 support for AArch64
- [RFC] arm64_32: upstreaming ILP32 support for AArch64
- [RFC] arm64_32: upstreaming ILP32 support for AArch64
- [EXT] [RFC] arm64_32: upstreaming ILP32 support for AArch64