thr3ads.net - llvm dev - [llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64 [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Tim Northover via llvm-dev

2019-Feb-01 10:28 UTC

[llvm-dev] [EXT] [RFC] arm64_32: upstreaming ILP32 support for AArch64

Hi Eli,

Thanks for the comments.

On Thu, 31 Jan 2019 at 19:48, Eli Friedman <efriedma at quicinc.com>
wrote:> > We teach CodeGenPrepare to sink GEPs as GEPs, and preserve the
> > inbounds marker. This is the only way they can possibly be exposed to
> > SDAG at the basic block level.
>
> Isn't addr-sink-using-gep already a thing?
Yes, I'm not sure why I wrote that (maybe I saw the new
addrSinkUsingGEPs in a patch and misremembered). It looks like what I
actually did was attempt to decouple the logic. It's currently based
on useAA, which seems to be an orthogonal question to me, so I added a
new virtual function hook. I'm now suspicious of the logic there too,
though. I'll inspect it further before uploading anything for review.
> > Second is the intrusiveness. On the plus side it's less intrusive
than
> > ISD::GEP would be, but it still involves changes in some fairly
> > obscure bits of DAG -- often found when things broke rather than by
> > careful planning.
>
> Did you consider modeling this with address spaces?  LLVM already has
robust support for address spaces with different pointer sizes,
I have to say I didn't, but I don't think it would solve the problem.
Alternate address-spaces still have just one pointer size per space as
far as I'm aware. If that's 64-bits we get efficient CodeGen but
loading or storing a pointer clobbers more data than it should, if
that's 32-bits then we get poor CodeGen.
> and you probably want to expose support for 64-bit pointers anyway.
It's a possibility, though no-one has asked for it yet. The biggest
request we've actually had is for signed 32-bit pointers so that both
TTBR0 and TTBR1 regions can be used. I could see a pretty strong
argument for exposing unsigned pointers via a different address-space
in that regime (for use in user_addr_t in kernel code), though you'd
have to be pretty disciplined to make it work I think.
> I'm not sure I follow the difference between [2 x i32] and i64: if they
both go into a single register, why do you need both?  Or is this necessary to
support your automatic translation pass?
Yep, it's entirely because we need to support code generated for
armv7k. On that platform [2 x i32] and i64 have different alignment
requirements on the stack; [2 x i32] would be used for struct {
int32_t val[2]; }, i64 would be used for struct { int64_t val; }. But
because AArch64 AAPCS puts more data in registers, some of these args
generated for the stack go in registers on arm64_32.

Cheers.

Tim.

Eli Friedman via llvm-dev

2019-Feb-01 19:25 UTC

head link

[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64

> -----Original Message-----
> From: Tim Northover <t.p.northover at gmail.com>
> Sent: Friday, February 1, 2019 2:28 AM
> To: Eli Friedman <efriedma at quicinc.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [EXT] [llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for
> AArch64
> 
> > > Second is the intrusiveness. On the plus side it's less
intrusive than
> > > ISD::GEP would be, but it still involves changes in some fairly
> > > obscure bits of DAG -- often found when things broke rather than
by
> > > careful planning.
> >
> > Did you consider modeling this with address spaces?  LLVM already has
robust
> support for address spaces with different pointer sizes,
> 
> I have to say I didn't, but I don't think it would solve the
problem.
> Alternate address-spaces still have just one pointer size per space as
> far as I'm aware. If that's 64-bits we get efficient CodeGen but
> loading or storing a pointer clobbers more data than it should, if
> that's 32-bits then we get poor CodeGen.
> 
I was thinking of a model something like this: 32-bit pointers are addrspace 0,
64-bit pointers are addrspace 1.  ISD::LOAD/STORE in addrspace 0 are not legal:
they're custom-lowered to operations in addrspace 1.  (An addrspacecast from
0 to 1 is just zero-extension.)  At that point, since the cast from 32 bits to
64 bits is explicitly represented, we can optimize it in the DAG or IR. For
example, we can transform a load of an inbounds gep in addrspace 0 into to a
load of an inbounds gep in addrspace 1.

I don't know that this ends up being easier to implement overall, but the
model is closer to what the hardware actually supports, and it involves fewer
changes to target-independent code.

-Eli

Matt Arsenault via llvm-dev

2019-Feb-01 19:35 UTC

head link

[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64

> On Feb 1, 2019, at 2:25 PM, Eli Friedman via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> I was thinking of a model something like this: 32-bit pointers are
addrspace 0, 64-bit pointers are addrspace 1.  ISD::LOAD/STORE in addrspace 0
are not legal: they're custom-lowered to operations in addrspace 1.  (An
addrspacecast from 0 to 1 is just zero-extension.)  At that point, since the
cast from 32 bits to 64 bits is explicitly represented, we can optimize it in
the DAG or IR. For example, we can transform a load of an inbounds gep in
addrspace 0 into to a load of an inbounds gep in addrspace 1.
+1

This is basically what we do for one address space on AMDGPU

-Matt

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190201/9015f006/attachment.html>

Tim Northover via llvm-dev

2019-Feb-01 20:00 UTC

head link

[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64

On Fri, 1 Feb 2019 at 19:25, Eli Friedman <efriedma at quicinc.com>
wrote:> > Alternate address-spaces still have just one pointer size per space as
> > far as I'm aware. If that's 64-bits we get efficient CodeGen
but
> > loading or storing a pointer clobbers more data than it should, if
> > that's 32-bits then we get poor CodeGen.
>
> I was thinking of a model something like this: 32-bit pointers are
addrspace 0, 64-bit pointers are addrspace 1.  ISD::LOAD/STORE in addrspace 0
are not legal: they're custom-lowered to operations in addrspace 1.  (An
addrspacecast from 0 to 1 is just zero-extension.)  At that point, since the
cast from 32 bits to 64 bits is explicitly represented, we can optimize it in
the DAG or IR. For example, we can transform a load of an inbounds gep in
addrspace 0 into to a load of an inbounds gep in addrspace 1.
That would have to be an IR-level pass I think; otherwise the default
MVT for any J. Random Pointer Value is still i32, leading to the same
efficiency issues when you eventually use that on a load/store.

With a pass, within a function you ought to be able to promote all
uses of addrspace(0) to addrspace(1), leaving (as you say)
addrspacecasts at opaque sources and sinks (loads, stores, args,
return, ...). Structs containing pointers would be (very?) messy. And
you'd probably want it earlyish to recombine things.

 I do like LLVM passes as a solution for most problems, and it ought
to give a big head start to GlobalISel implementation too. I'll
definitely give it a go as an alternative next week.

Cheers.

Tim.

Tim Northover via llvm-dev

2019-Feb-06 15:35 UTC

head link

[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64

Hi again,

On Fri, 1 Feb 2019 at 19:25, Eli Friedman <efriedma at quicinc.com>
wrote:> I don't know that this ends up being easier to implement overall, but
the model is closer to what the hardware actually supports, and it involves
fewer changes to target-independent code.
I've now got something about largely working via an IR-level lowering
pass (pushed to GitHub as
https://github.com/TNorthover/llvm-project/tree/arm64_32-arch-pass,
please excuse any artefacts of incompleteness). I feel like it's
rapidly approaching an unpalatability horizon though. Most issues stem
from the fact that not all pointers are visible or controllable in the
IR:

  + FrameIndices: you can't change an alloca's address-space since
it's fixed by the DataLayout. So they get through to the DAG as i32s,
significantly complicating the Addressing-mode logic.
  + ConstantPool accesses are automatically put into addrspace(0)
  + BlockAddress is similar.
  + Some intrinsics are not polymorphic on pointer type, and adapting
those that are is messy.
  + Returns demoted to x8-indirect are always implemented by stores in
addrspace(0).

I don't think any of these are truly insurmountable, but they do mean
that the backend would have to cope with both i32 and i64 pointers in
fairly ad-hoc ways, and add a lot of complexity to the approach. I
think it's reached the point where the added complexity in AArch64 has
outweighed the benefits to SelectionDAG so I'm inclined to stick with
the original approach for now.

Cheers.

Tim.

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Feb 2019 - [RFC] arm64_32: upstreaming ILP32 support for AArch64

[llvm-dev] [EXT] [RFC] arm64_32: upstreaming ILP32 support for AArch64

[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64

[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64

[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64

[llvm-dev] [RFC] arm64_32: upstreaming ILP32 support for AArch64

Possibly Parallel Threads