thr3ads.net - llvm dev - [llvm-dev] [AArch64] Address computation folding [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Meador Inge via llvm-dev

2015-Nov-11 19:57 UTC

[llvm-dev] [AArch64] Address computation folding

Hi,

I was looking at some AArch64 benchmarks and noticed some simple cases
where addresses are being folded into the address mode computations
and was curious as to why.  In particular, consider the following
simple example:

  void f2(unsigned long *x, unsigned long c)
  {
    x[c] *= 2;
  }

This generates:

  lsl x8, x1, #3
  ldr x9, [x0, x8]
  lsl x9, x9, #1
  str x9, [x0, x8]

Given the two uses of the address computation I was expecting this:

  add x8, x0, x1, lsl #3
  ldr x9, [x8]
  lsl x9, x9, #1
  str x9, [x8]
>From reading 'SelectAddrModeXRO' the computation is getting folded
ifthe add node is *only* used with memory related operations?

Why wouldn't it consider the number of uses in any operation?  The
"expected" code is easy to get by checking the number of uses.  This
may be desirable on some micro-architectures depending on the cost of
the various loads and stores.

-- Meador

Tim Northover via llvm-dev

2015-Nov-11 21:08 UTC

head link

[llvm-dev] [AArch64] Address computation folding

On 11 November 2015 at 11:57, Meador Inge <meadori at gmail.com>
wrote:> Why wouldn't it consider the number of uses in any operation?  The
> "expected" code is easy to get by checking the number of uses. 
This
> may be desirable on some micro-architectures depending on the cost of
> the various loads and stores.
As you say, very microarchitecture-dependent. The code produced is
probably optimal for Cyclone ("[x0, x8]" is no more expensive than
"[x8]" and the "lsl" is slightly cheaper than the
complicated "add").
If I'm reading the Cortex-A57 optimisation guide correctly, the same
reasoning applies there too.

Cheers.

Tim.

James Molloy via llvm-dev

2015-Nov-11 21:15 UTC

head link

[llvm-dev] [AArch64] Address computation folding

Hi,

Indeed, the complex add is more expensive on all Cortex cores I know of.

However there is an important point here that the code sequence we generate
requires two registers live instead of one. In high regpressure loops, were
probably losing performance.

James
On Wed, 11 Nov 2015 at 21:09, Tim Northover via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 11 November 2015 at 11:57, Meador Inge <meadori at gmail.com>
wrote:
> > Why wouldn't it consider the number of uses in any operation?  The
> > "expected" code is easy to get by checking the number of
uses.  This
> > may be desirable on some micro-architectures depending on the cost of
> > the various loads and stores.
>
> As you say, very microarchitecture-dependent. The code produced is
> probably optimal for Cyclone ("[x0, x8]" is no more expensive
than
> "[x8]" and the "lsl" is slightly cheaper than the
complicated "add").
> If I'm reading the Cortex-A57 optimisation guide correctly, the same
> reasoning applies there too.
>
> Cheers.
>
> Tim.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151111/93848503/attachment.html>

Meador Inge via llvm-dev

2015-Nov-11 22:44 UTC

head link

[llvm-dev] [AArch64] Address computation folding

On Wed, Nov 11, 2015 at 3:08 PM, Tim Northover <t.p.northover at
gmail.com> wrote:
> As you say, very microarchitecture-dependent. The code produced is
> probably optimal for Cyclone ("[x0, x8]" is no more expensive
than
> "[x8]" and the "lsl" is slightly cheaper than the
complicated "add").
> If I'm reading the Cortex-A57 optimisation guide correctly, the same
> reasoning applies there too.
Yeah, my reading is the same.  For Cortex-A57 it looks like the same
number of u-ops and latency either way (since LDR [x1, x2] is free).

-- Meador

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Nov 2015 - [AArch64] Address computation folding

[llvm-dev] [AArch64] Address computation folding

[llvm-dev] [AArch64] Address computation folding

[llvm-dev] [AArch64] Address computation folding

[llvm-dev] [AArch64] Address computation folding

Maybe Matching Threads