Neil Henning via llvm-dev
2020-Apr-14 08:26 UTC
[llvm-dev] 7-8% compile time slowdowns in LLVM 10
Hey list, TL;DR - LLVM 10 is around 7-8% slower than LLVM 9 when compiling the same inputs. So here at Unity our Burst HPC# compiler uses LLVM to provide our users with some very optimal codegen. LLVM is used in two ways: 1. In the Unity editor we JIT compile user code. 2. We also have an AOT mode for when our users are building a full game. Particularly for 1., compile time really matters for us. Anything we can do to improve compile time will increase our users' experience when editing their gameplay code and seeing what effect it has on the scene they are using. We keep metrics of compile time as a result, and after hearing the concerns from the Rust folks about LLVM 10's slowdowns I had a look at our numbers with an upgraded LLVM 10 toolchain - you can see a snap from the spreadsheet here that shows an overall 7-8% slowdown in the compiler https://twitter.com/sheredom/status/1247128694554087426. Also since we keep golden asm files for a huge range of important tests we need to preserve, I can also definitively say that the produced asm does not contain any significant improvements to warrant the extra compile time being used (some changes in register selection and a slightly better placement of instructions). I don't really have any answers to if or how this can be fixed, I just thought this was a useful data point to the community as a whole and I'd raise the visibility. Cheers, -Neil. -- Neil Henning Senior Software Engineer Compiler unity.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200414/77d03893/attachment.html>
Tobias Hieta via llvm-dev
2020-Apr-15 15:17 UTC
[llvm-dev] 7-8% compile time slowdowns in LLVM 10
Hi Neil, That's unfortunate - I am in the process of updating to clang 10 here and this slow down will impact our developers as well. If I have some time this weekend I will try to profile clang and see if I can figure out if there is a single regression or just many smaller things. Thanks, Tobias On Tue, Apr 14, 2020, 10:26 Neil Henning via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hey list, > > TL;DR - LLVM 10 is around 7-8% slower than LLVM 9 when compiling the same > inputs. > > So here at Unity our Burst HPC# compiler uses LLVM to provide our users > with some very optimal codegen. LLVM is used in two ways: > > 1. In the Unity editor we JIT compile user code. > 2. We also have an AOT mode for when our users are building a full game. > > Particularly for 1., compile time really matters for us. Anything we can > do to improve compile time will increase our users' experience when editing > their gameplay code and seeing what effect it has on the scene they are > using. We keep metrics of compile time as a result, and after hearing the > concerns from the Rust folks about LLVM 10's slowdowns I had a look at our > numbers with an upgraded LLVM 10 toolchain - you can see a snap from the > spreadsheet here that shows an overall 7-8% slowdown in the compiler > https://twitter.com/sheredom/status/1247128694554087426. > > Also since we keep golden asm files for a huge range of important tests we > need to preserve, I can also definitively say that the produced asm does > not contain any significant improvements to warrant the extra compile time > being used (some changes in register selection and a slightly better > placement of instructions). > > I don't really have any answers to if or how this can be fixed, I just > thought this was a useful data point to the community as a whole and I'd > raise the visibility. > > Cheers, > -Neil. > > -- > Neil Henning > Senior Software Engineer Compiler > unity.com > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200415/928687ad/attachment.html>
Alexandre Ganea via llvm-dev
2020-Apr-15 17:32 UTC
[llvm-dev] 7-8% compile time slowdowns in LLVM 10
I suggested Neil to apply https://reviews.llvm.org/D71786 (rpmalloc) and he reported: “[..] its about 6.5% faster using the rpmalloc LLVM vs LLVM 9 without. It is a cool 14% faster than LLVM 10 too!” (https://twitter.com/sheredom/status/1250138086811602944?s=20) I ran some tests locally, and as a first order of approximation, even without rpmalloc it seems Clang 10 is faster than Clang 9 for a whole build. Although locally when iterating on a single file, it could be slower like Neil suggests. The test consists in compiling LLVM|Clang|LLD on release/10.x at HEAD using the compilers below. I built each compiler with Clang 10, except MSVC which is provided for reference. Median timings for a clean build (`ninja all`). Clang 9.0.1 6 min 48 sec (+/- 23 sec) Clang 10.0 5 min 55 sec (+/- 5 sec) Clang 10.0 optimal 5 min 20 sec (+/- 5 sec) Clang 11 (994543ab) 6 min (+/- 5 sec) VS2019 16.5.4 7 min 20 (+/- 15 sec) Tested on Windows 10 build 1909, on a 36-core dual Xeon Gold 6140, with RAID-0 NVMe SSDs. I disabled Windows Defender for the test. The greater variability in “Clang 9.0.1” is due to launching two processes for each clang-cl invocation, where as 10+ calls only one (like MSVC), see -fintegrated-cc1. “Clang 10.0 optimal” is a two-stage build using ThinLTO and -O3 on the second stage and tailored for the 6140 (-march=skylake-avx512). It also has D71786 (rpmalloc) applied. I can possibly take some profile traces, see if anything stands out between 9 and 10. Alex. De : llvm-dev <llvm-dev-bounces at lists.llvm.org> De la part de Tobias Hieta via llvm-dev Envoyé : April 15, 2020 11:17 AM À : Neil Henning <neil.henning at unity3d.com> Cc : llvm-dev <llvm-dev at lists.llvm.org> Objet : Re: [llvm-dev] 7-8% compile time slowdowns in LLVM 10 Hi Neil, That's unfortunate - I am in the process of updating to clang 10 here and this slow down will impact our developers as well. If I have some time this weekend I will try to profile clang and see if I can figure out if there is a single regression or just many smaller things. Thanks, Tobias On Tue, Apr 14, 2020, 10:26 Neil Henning via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hey list, TL;DR - LLVM 10 is around 7-8% slower than LLVM 9 when compiling the same inputs. So here at Unity our Burst HPC# compiler uses LLVM to provide our users with some very optimal codegen. LLVM is used in two ways: 1. In the Unity editor we JIT compile user code. 2. We also have an AOT mode for when our users are building a full game. Particularly for 1., compile time really matters for us. Anything we can do to improve compile time will increase our users' experience when editing their gameplay code and seeing what effect it has on the scene they are using. We keep metrics of compile time as a result, and after hearing the concerns from the Rust folks about LLVM 10's slowdowns I had a look at our numbers with an upgraded LLVM 10 toolchain - you can see a snap from the spreadsheet here that shows an overall 7-8% slowdown in the compiler https://twitter.com/sheredom/status/1247128694554087426. Also since we keep golden asm files for a huge range of important tests we need to preserve, I can also definitively say that the produced asm does not contain any significant improvements to warrant the extra compile time being used (some changes in register selection and a slightly better placement of instructions). I don't really have any answers to if or how this can be fixed, I just thought this was a useful data point to the community as a whole and I'd raise the visibility. Cheers, -Neil. -- [https://unity3d.com/profiles/unity3d/themes/unity/images/ui/other/unity-logo-dark-email.png] Neil Henning Senior Software Engineer Compiler unity.com<http://unity.com> _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200415/97549b3a/attachment.html>
Josh Stone via llvm-dev
2020-May-08 23:40 UTC
[llvm-dev] 7-8% compile time slowdowns in LLVM 10
On 4/14/20 1:26 AM, Neil Henning via llvm-dev wrote:> Hey list, > > TL;DR - LLVM 10 is around 7-8% slower than LLVM 9 when compiling the > same inputs. > > [...] concerns from the Rust folks about LLVM 10's slowdowns [...]I'm one of those Rust folks, and in the last few days I've been trying to dig into this. I focused on one particular benchmark input -- I've attached that IR generated from Rust 1.43.0 for LLVM 9.0.1. I started running a git bisection to see if anything huge jumped out, but it seems to be gradual. Maybe others can take my input and find some pattern to the decline. My results below are with LLVM compiled from git on Fedora 32 with its Clang 10. My processor is a Ryzen 7 3800X. The command is just "perf stat -r5 bin/llc syn-*.ll". Let me know if I can provide more relevant details, but I hope this reproducible regardless, so others can investigate. When we're tracking rustc performance, we usually focus on the instruction count, as it tends to be more stable for comparison. Here I'm seeing 5.3% more instructions from 9.x to 10.x, and a further 4.8% from 10.x to master -- net 10.4% increase. release/9.x (c1a0a213378a458fbea1a5c77b315c7dce08fd05) Performance counter stats for 'bin/llc syn-e39d7fb4724c7e07.ll' (5 runs): 2,563.85 msec task-clock:u # 0.998 CPUs utilized ( +- 0.17% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 8,739 page-faults:u # 0.003 M/sec ( +- 0.07% ) 10,952,672,891 cycles:u # 4.272 GHz ( +- 0.11% ) (83.32%) 539,233,453 stalled-cycles-frontend:u # 4.92% frontend cycles idle ( +- 0.34% ) (83.32%) 1,066,121,274 stalled-cycles-backend:u # 9.73% backend cycles idle ( +- 0.53% ) (83.33%) 17,760,500,419 instructions:u # 1.62 insn per cycle # 0.06 stalled cycles per insn ( +- 0.03% ) (83.34%) 3,792,956,022 branches:u # 1479.398 M/sec ( +- 0.05% ) (83.35%) 98,635,111 branch-misses:u # 2.60% of all branches ( +- 0.19% ) (83.34%) 2.56860 +- 0.00447 seconds time elapsed ( +- 0.17% ) release/10.x (eaae6dfc545000e335e6f89abb9c78818383d7ad) Performance counter stats for 'bin/llc syn-e39d7fb4724c7e07.ll' (5 runs): 2,678.23 msec task-clock:u # 0.998 CPUs utilized ( +- 0.36% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 8,761 page-faults:u # 0.003 M/sec ( +- 0.07% ) 11,427,954,987 cycles:u # 4.267 GHz ( +- 0.15% ) (83.32%) 538,556,528 stalled-cycles-frontend:u # 4.71% frontend cycles idle ( +- 0.20% ) (83.33%) 1,149,190,672 stalled-cycles-backend:u # 10.06% backend cycles idle ( +- 0.55% ) (83.33%) 18,702,827,148 instructions:u # 1.64 insn per cycle # 0.06 stalled cycles per insn ( +- 0.02% ) (83.34%) 3,988,324,508 branches:u # 1489.164 M/sec ( +- 0.02% ) (83.35%) 103,988,578 branch-misses:u # 2.61% of all branches ( +- 0.11% ) (83.33%) 2.68326 +- 0.00968 seconds time elapsed ( +- 0.36% ) master (a1ae9566ea9ce46bf7f2af9ab1253eed05b5b622) Performance counter stats for 'bin/llc syn-e39d7fb4724c7e07.ll' (5 runs): 2,774.12 msec task-clock:u # 0.998 CPUs utilized ( +- 0.25% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 8,957 page-faults:u # 0.003 M/sec ( +- 0.07% ) 11,864,439,510 cycles:u # 4.277 GHz ( +- 0.17% ) (83.33%) 546,052,536 stalled-cycles-frontend:u # 4.60% frontend cycles idle ( +- 0.26% ) (83.33%) 1,157,735,744 stalled-cycles-backend:u # 9.76% backend cycles idle ( +- 0.54% ) (83.33%) 19,594,536,570 instructions:u # 1.65 insn per cycle # 0.06 stalled cycles per insn ( +- 0.03% ) (83.34%) 4,187,308,178 branches:u # 1509.418 M/sec ( +- 0.06% ) (83.35%) 105,573,875 branch-misses:u # 2.52% of all branches ( +- 0.27% ) (83.34%) 2.77928 +- 0.00693 seconds time elapsed ( +- 0.25% ) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200508/94acab95/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: syn-e39d7fb4724c7e07.ll.xz Type: application/x-xz Size: 386252 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200508/94acab95/attachment-0001.bin>
Chris Lattner via llvm-dev
2020-May-12 05:53 UTC
[llvm-dev] 7-8% compile time slowdowns in LLVM 10
> On May 8, 2020, at 4:40 PM, Josh Stone via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On 4/14/20 1:26 AM, Neil Henning via llvm-dev wrote: >> Hey list, >> >> TL;DR - LLVM 10 is around 7-8% slower than LLVM 9 when compiling the same inputs. >> >> [...] concerns from the Rust folks about LLVM 10's slowdowns [...] > I'm one of those Rust folks, and in the last few days I've been trying to dig into this. I focused on one particular benchmark input -- I've attached that IR generated from Rust 1.43.0 for LLVM 9.0.1. I started running a git bisection to see if anything huge jumped out, but it seems to be gradual. Maybe others can take my input and find some pattern to the decline. >Hi Josh, On behalf of the “LLVM people”, I’d like to publicly thank you for the this work to make sure that LLVM compile time performance stays acceptable over time. I think that the Rust community’s diligence in terms of tracking, improving, and publicizing compile time performance has been incredibly valuable. Thank you, -Chris> My results below are with LLVM compiled from git on Fedora 32 with its Clang 10. My processor is a Ryzen 7 3800X. The command is just "perf stat -r5 bin/llc syn-*.ll". Let me know if I can provide more relevant details, but I hope this reproducible regardless, so others can investigate. > > When we're tracking rustc performance, we usually focus on the instruction count, as it tends to be more stable for comparison. Here I'm seeing 5.3% more instructions from 9.x to 10.x, and a further 4.8% from 10.x to master -- net 10.4% increase. > > > > release/9.x (c1a0a213378a458fbea1a5c77b315c7dce08fd05) > > Performance counter stats for 'bin/llc syn-e39d7fb4724c7e07.ll' (5 runs): > > 2,563.85 msec task-clock:u # 0.998 CPUs utilized ( +- 0.17% ) > 0 context-switches:u # 0.000 K/sec > 0 cpu-migrations:u # 0.000 K/sec > 8,739 page-faults:u # 0.003 M/sec ( +- 0.07% ) > 10,952,672,891 cycles:u # 4.272 GHz ( +- 0.11% ) (83.32%) > 539,233,453 stalled-cycles-frontend:u # 4.92% frontend cycles idle ( +- 0.34% ) (83.32%) > 1,066,121,274 stalled-cycles-backend:u # 9.73% backend cycles idle ( +- 0.53% ) (83.33%) > 17,760,500,419 instructions:u # 1.62 insn per cycle > # 0.06 stalled cycles per insn ( +- 0.03% ) (83.34%) > 3,792,956,022 branches:u # 1479.398 M/sec ( +- 0.05% ) (83.35%) > 98,635,111 branch-misses:u # 2.60% of all branches ( +- 0.19% ) (83.34%) > > 2.56860 +- 0.00447 seconds time elapsed ( +- 0.17% ) > > > release/10.x (eaae6dfc545000e335e6f89abb9c78818383d7ad) > > Performance counter stats for 'bin/llc syn-e39d7fb4724c7e07.ll' (5 runs): > > 2,678.23 msec task-clock:u # 0.998 CPUs utilized ( +- 0.36% ) > 0 context-switches:u # 0.000 K/sec > 0 cpu-migrations:u # 0.000 K/sec > 8,761 page-faults:u # 0.003 M/sec ( +- 0.07% ) > 11,427,954,987 cycles:u # 4.267 GHz ( +- 0.15% ) (83.32%) > 538,556,528 stalled-cycles-frontend:u # 4.71% frontend cycles idle ( +- 0.20% ) (83.33%) > 1,149,190,672 stalled-cycles-backend:u # 10.06% backend cycles idle ( +- 0.55% ) (83.33%) > 18,702,827,148 instructions:u # 1.64 insn per cycle > # 0.06 stalled cycles per insn ( +- 0.02% ) (83.34%) > 3,988,324,508 branches:u # 1489.164 M/sec ( +- 0.02% ) (83.35%) > 103,988,578 branch-misses:u # 2.61% of all branches ( +- 0.11% ) (83.33%) > > 2.68326 +- 0.00968 seconds time elapsed ( +- 0.36% ) > > > master (a1ae9566ea9ce46bf7f2af9ab1253eed05b5b622) > > Performance counter stats for 'bin/llc syn-e39d7fb4724c7e07.ll' (5 runs): > > 2,774.12 msec task-clock:u # 0.998 CPUs utilized ( +- 0.25% ) > 0 context-switches:u # 0.000 K/sec > 0 cpu-migrations:u # 0.000 K/sec > 8,957 page-faults:u # 0.003 M/sec ( +- 0.07% ) > 11,864,439,510 cycles:u # 4.277 GHz ( +- 0.17% ) (83.33%) > 546,052,536 stalled-cycles-frontend:u # 4.60% frontend cycles idle ( +- 0.26% ) (83.33%) > 1,157,735,744 stalled-cycles-backend:u # 9.76% backend cycles idle ( +- 0.54% ) (83.33%) > 19,594,536,570 instructions:u # 1.65 insn per cycle > # 0.06 stalled cycles per insn ( +- 0.03% ) (83.34%) > 4,187,308,178 branches:u # 1509.418 M/sec ( +- 0.06% ) (83.35%) > 105,573,875 branch-misses:u # 2.52% of all branches ( +- 0.27% ) (83.34%) > > 2.77928 +- 0.00693 seconds time elapsed ( +- 0.25% ) > > > <syn-e39d7fb4724c7e07.ll.xz>_______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200511/12713cc1/attachment.html>