James Molloy via llvm-dev
2015-Nov-11 21:15 UTC
[llvm-dev] [AArch64] Address computation folding
Hi, Indeed, the complex add is more expensive on all Cortex cores I know of. However there is an important point here that the code sequence we generate requires two registers live instead of one. In high regpressure loops, were probably losing performance. James On Wed, 11 Nov 2015 at 21:09, Tim Northover via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 11 November 2015 at 11:57, Meador Inge <meadori at gmail.com> wrote: > > Why wouldn't it consider the number of uses in any operation? The > > "expected" code is easy to get by checking the number of uses. This > > may be desirable on some micro-architectures depending on the cost of > > the various loads and stores. > > As you say, very microarchitecture-dependent. The code produced is > probably optimal for Cyclone ("[x0, x8]" is no more expensive than > "[x8]" and the "lsl" is slightly cheaper than the complicated "add"). > If I'm reading the Cortex-A57 optimisation guide correctly, the same > reasoning applies there too. > > Cheers. > > Tim. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151111/93848503/attachment.html>
Chad Rosier via llvm-dev
2015-Nov-11 22:23 UTC
[llvm-dev] [AArch64] Address computation folding
Meador, If you have a patch I would be interested in experimenting with it. Chad> Hi, > > Indeed, the complex add is more expensive on all Cortex cores I know of. > > However there is an important point here that the code sequence we > generate > requires two registers live instead of one. In high regpressure loops, > were > probably losing performance. > > James > On Wed, 11 Nov 2015 at 21:09, Tim Northover via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On 11 November 2015 at 11:57, Meador Inge <meadori at gmail.com> wrote: >> > Why wouldn't it consider the number of uses in any operation? The >> > "expected" code is easy to get by checking the number of uses. This >> > may be desirable on some micro-architectures depending on the cost of >> > the various loads and stores. >> >> As you say, very microarchitecture-dependent. The code produced is >> probably optimal for Cyclone ("[x0, x8]" is no more expensive than >> "[x8]" and the "lsl" is slightly cheaper than the complicated "add"). >> If I'm reading the Cortex-A57 optimisation guide correctly, the same >> reasoning applies there too. >> >> Cheers. >> >> Tim. >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Meador Inge via llvm-dev
2015-Nov-12 00:00 UTC
[llvm-dev] [AArch64] Address computation folding
Hi Chad, The attached is what I was experimenting with to produce the code snippet in my original mail. I really only tested it with the LLVM test suite (with an obvious failure in arm64-addr-mode-folding.ll) and some toy examples. Thanks, Meador On Wed, Nov 11, 2015 at 4:23 PM, Chad Rosier <mcrosier at codeaurora.org> wrote:> Meador, > If you have a patch I would be interested in experimenting with it. > > Chad > >> Hi, >> >> Indeed, the complex add is more expensive on all Cortex cores I know of. >> >> However there is an important point here that the code sequence we >> generate >> requires two registers live instead of one. In high regpressure loops, >> were >> probably losing performance. >> >> James >> On Wed, 11 Nov 2015 at 21:09, Tim Northover via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> On 11 November 2015 at 11:57, Meador Inge <meadori at gmail.com> wrote: >>> > Why wouldn't it consider the number of uses in any operation? The >>> > "expected" code is easy to get by checking the number of uses. This >>> > may be desirable on some micro-architectures depending on the cost of >>> > the various loads and stores. >>> >>> As you say, very microarchitecture-dependent. The code produced is >>> probably optimal for Cyclone ("[x0, x8]" is no more expensive than >>> "[x8]" and the "lsl" is slightly cheaper than the complicated "add"). >>> If I'm reading the Cortex-A57 optimisation guide correctly, the same >>> reasoning applies there too. >>> >>> Cheers. >>> >>> Tim. >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >-- # Meador -------------- next part -------------- A non-text attachment was scrubbed... Name: aarch64-no-addr-fold-more-than-one-use.patch Type: application/octet-stream Size: 1581 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151111/d35ceca3/attachment.obj>