David Green via llvm-dev
2021-Oct-04 07:40 UTC
[llvm-dev] [RFC][AArch64] Make -mcpu=generic schedule for an in-order core
Hello folks, We would like to start pushing -mcpu=generic for AArch64 towards enabling a set of features that is believed to be beneficial in general - that improve performance the for some CPUs without hurting it on any others. A blend of the performance options hopefully beneficial to all CPUs. The largest part of that is enabling in-order scheduling using the Cortex-A55 schedule model. This is similar to the Arm backend change from eecb353d0e25ba which made -mcpu=generic perform inorder scheduling using the Cortex-A8 scheduling model. The idea is that in-order cpu's require the most help in instruction scheduling, whereas out-of-order cpus can for the most part out-of-order schedule around different codegen. Our benchmarking suggests that hypothesis holds, with in-order performance benefiting from the scheduling by between 1% and 4% geomean. Out of order performance was quite noisy and the results were within the noise margins, tending towards a slight improvement in general. When specifying an Apple target, clang will set "-target-cpu apple-a7" on the command line, so should not be affected by this change when running from clang. This also doesn't enable more runtime unrolling like -mcpu=cortex-a55 does, only changing the schedule used. There is a patch to make the change in https://reviews.llvm.org/D110830, with extra details about performance changes and all the tests that are updated. Let us know if you have comments. Thanks Dave
Renato Golin via llvm-dev
2021-Oct-04 09:08 UTC
[llvm-dev] [RFC][AArch64] Make -mcpu=generic schedule for an in-order core
On Mon, 4 Oct 2021 at 08:43, David Green via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello folks, > > We would like to start pushing -mcpu=generic for AArch64 towards enabling > a set of features that is believed to be beneficial in general - that > improve performance the for some CPUs without hurting it on any others. A > blend of the performance options hopefully beneficial to all CPUs. >Hi David, This is the usual LLVM definition of "generic", so working on that goal is always good. The largest part of that is enabling in-order scheduling using the> Cortex-A55 schedule model. This is similar to the Arm backend change from > eecb353d0e25ba which made -mcpu=generic perform inorder scheduling using > the Cortex-A8 scheduling model. >I think this makes sense because the A55 scheduling model is more likely to benefit the chips produced nowadays than the A8's. When specifying an Apple target, clang will set "-target-cpu apple-a7" on> the command line, so should not be affected by this change when running > from clang. This also doesn't enable more runtime unrolling like > -mcpu=cortex-a55 does, only changing the schedule used.Thinking out loud, what do people think of creating an additional "ooo" target? So, "generic" is the same as "in-order", but the "ooo" (or "unordered", whatever) would pick a base OOO target, like A57, A72, etc. A few years ago, when I was doing benchmarks for OpenBLAS changes on Arm, I realised doing that was beneficial to most targets, often only beaten by specifying the correct target. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211004/9cfbc7ff/attachment.html>