thr3ads.net - llvm dev - [llvm-dev] [AArch64] Address computation folding [Nov 2015]

If this information is useful, please help other people find it:
Share via:

James Molloy via llvm-dev

2015-Nov-11 21:15 UTC

[llvm-dev] [AArch64] Address computation folding

Hi,

Indeed, the complex add is more expensive on all Cortex cores I know of.

However there is an important point here that the code sequence we generate
requires two registers live instead of one. In high regpressure loops, were
probably losing performance.

James
On Wed, 11 Nov 2015 at 21:09, Tim Northover via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 11 November 2015 at 11:57, Meador Inge <meadori at gmail.com>
wrote:
> > Why wouldn't it consider the number of uses in any operation?  The
> > "expected" code is easy to get by checking the number of
uses.  This
> > may be desirable on some micro-architectures depending on the cost of
> > the various loads and stores.
>
> As you say, very microarchitecture-dependent. The code produced is
> probably optimal for Cyclone ("[x0, x8]" is no more expensive
than
> "[x8]" and the "lsl" is slightly cheaper than the
complicated "add").
> If I'm reading the Cortex-A57 optimisation guide correctly, the same
> reasoning applies there too.
>
> Cheers.
>
> Tim.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151111/93848503/attachment.html>

Chad Rosier via llvm-dev

2015-Nov-11 22:23 UTC

head link

[llvm-dev] [AArch64] Address computation folding

Meador,
If you have a patch I would be interested in experimenting with it.

 Chad
> Hi,
>
> Indeed, the complex add is more expensive on all Cortex cores I know of.
>
> However there is an important point here that the code sequence we
> generate
> requires two registers live instead of one. In high regpressure loops,
> were
> probably losing performance.
>
> James
> On Wed, 11 Nov 2015 at 21:09, Tim Northover via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On 11 November 2015 at 11:57, Meador Inge <meadori at gmail.com>
wrote:
>> > Why wouldn't it consider the number of uses in any operation? 
The
>> > "expected" code is easy to get by checking the number of
uses.  This
>> > may be desirable on some micro-architectures depending on the cost
of
>> > the various loads and stores.
>>
>> As you say, very microarchitecture-dependent. The code produced is
>> probably optimal for Cyclone ("[x0, x8]" is no more expensive
than
>> "[x8]" and the "lsl" is slightly cheaper than the
complicated "add").
>> If I'm reading the Cortex-A57 optimisation guide correctly, the
same
>> reasoning applies there too.
>>
>> Cheers.
>>
>> Tim.
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Meador Inge via llvm-dev

2015-Nov-12 00:00 UTC

head link

[llvm-dev] [AArch64] Address computation folding

Hi Chad,

The attached is what I was experimenting with to produce the code
snippet in my original mail.  I really only tested it with the LLVM
test suite (with an obvious failure in arm64-addr-mode-folding.ll) and
some toy examples.

Thanks,

Meador

On Wed, Nov 11, 2015 at 4:23 PM, Chad Rosier <mcrosier at codeaurora.org>
wrote:> Meador,
> If you have a patch I would be interested in experimenting with it.
>
>  Chad
>
>> Hi,
>>
>> Indeed, the complex add is more expensive on all Cortex cores I know
of.
>>
>> However there is an important point here that the code sequence we
>> generate
>> requires two registers live instead of one. In high regpressure loops,
>> were
>> probably losing performance.
>>
>> James
>> On Wed, 11 Nov 2015 at 21:09, Tim Northover via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> On 11 November 2015 at 11:57, Meador Inge <meadori at
gmail.com> wrote:
>>> > Why wouldn't it consider the number of uses in any
operation?  The
>>> > "expected" code is easy to get by checking the
number of uses.  This
>>> > may be desirable on some micro-architectures depending on the
cost of
>>> > the various loads and stores.
>>>
>>> As you say, very microarchitecture-dependent. The code produced is
>>> probably optimal for Cyclone ("[x0, x8]" is no more
expensive than
>>> "[x8]" and the "lsl" is slightly cheaper than
the complicated "add").
>>> If I'm reading the Cortex-A57 optimisation guide correctly, the
same
>>> reasoning applies there too.
>>>
>>> Cheers.
>>>
>>> Tim.
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>


-- 
# Meador
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aarch64-no-addr-fold-more-than-one-use.patch
Type: application/octet-stream
Size: 1581 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151111/d35ceca3/attachment.obj>

llvm dev - Nov 2015 - [AArch64] Address computation folding

[llvm-dev] [AArch64] Address computation folding

[llvm-dev] [AArch64] Address computation folding

[llvm-dev] [AArch64] Address computation folding