Quentin Colombet via llvm-dev
2017-May-24  17:31 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Hi Kristof, Thanks for the measurements.> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at arm.com> wrote: > >> >> On 23 May 2017, at 21:48, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote: >> >> Great! >> I thought I had to look at our pipeline at O0 to make sure optimized regalloc was supported (https://bugs.llvm.org/show_bug.cgi?id=33022 <https://bugs.llvm.org/show_bug.cgi?id=33022> in mind). Glad I was wrong, it saves me some time. >> >>> On May 22, 2017, at 12:51 AM, Kristof Beyls <kristof.beyls at arm.com <mailto:kristof.beyls at arm.com>> wrote: >>> >>> >>>> On 22 May 2017, at 09:09, Diana Picus <diana.picus at linaro.org <mailto:diana.picus at linaro.org>> wrote: >>>> >>>> Hi Quentin, >>>> >>>> I actually did a run with -mllvm -optimize-regalloc -mllvm >>>> -regalloc=greedy over the weekend and the test does pass with that. >>>> Haven't measured the compile time though. >>>> >>>> Cheers, >>>> Diana >>> >>> I also did my usual benchmarking run with the same options as Diana did above: >>> - Comparing against -O0 without globalisel: 2.5% performance drop, 0.8% code size improvement. >> >> That’s compared to 9.5% performance drop and 2.8% code size regression, without that regalloc scheme, right? > > Indeed. > >> >>> - Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop. >>> >>> In summary, the measurements indicate some good improvements. >>> I also haven't measure the impact on compile time. >> >> Do you have a mean to make this measurement? >> Ahmed did a bunch of compile time measurements on our side and I wanted to see if I need to put him on the hook again :). > > I did a quick setup with CTMark (part of the test-suite). I ran each of > * '-O0 -g', > * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and > * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm -optimize-regalloc -mllvm -regalloc=greedy' > 5 times, cross-compiling from X86 to AArch64, and took the median measured compile times. > In summary, I see GlobalISel having a compile time that's 3.5% higher than the current -O0 default. > With enabling the greedy register allocator, this increases to 28%. > 28% is probably too high?I think it is yes. I have attached a quick hack to the greedy allocator to feature a fast mode. Could you give it a try? To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true (default is false).> At the moment I can't think of an alternative to having a "constant materialization localizer" pass at -O0 to hit all the metrics we thought of as necessary before enabling GISel by default. > > It would be good if someone else could also do a compilation time experiment - just to make sure I didn't make any silly mistakes in my experiment. > > Here are the details I see: > > gisel gisel+greedy > CTMark/7zip/7zip-benchmark 102.8% 106.5% > CTMark/Bullet/bullet 100.5% 105.1% > CTMark/ClamAV/clamscan 101.6% 130.8% > CTMark/SPASS/SPASS 101.2% 120.0% > CTMark/consumer-typeset/consumer-typeset 105.7% 138.2% > CTMark/kimwitu++/kc 103.1% 122.6% > CTMark/lencod/lencod 106.2% 143.4% > CTMark/mafft/pairlocalalign 96.2% 135.4% > CTMark/sqlite3/sqlite3 109.1% 155.1% > CTMark/tramp3d-v4/tramp3d-v4 109.1% 132.0% > GEOMEAN 103.5% 128.0% > > > Thanks, > > KristofThanks, -Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170524/77fd12b2/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: regalloc-fastmode.diff Type: application/octet-stream Size: 2893 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170524/77fd12b2/attachment.obj> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170524/77fd12b2/attachment-0001.html>
Kristof Beyls via llvm-dev
2017-May-24  19:57 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at
apple.com<mailto:qcolombet at apple.com>> wrote:
Hi Kristof,
Thanks for the measurements.
On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at
arm.com<mailto:kristof.beyls at arm.com>> wrote:
- Comparing against -O0 without globalisel but with the above regalloc options:
5.6% performance drop, 1% code size drop.
In summary, the measurements indicate some good improvements.
I also haven't measure the impact on compile time.
Do you have a mean to make this measurement?
Ahmed did a bunch of compile time measurements on our side and I wanted to see
if I need to put him on the hook again :).
I did a quick setup with CTMark (part of the test-suite). I ran each of
* '-O0 -g',
* '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and
* '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm
-optimize-regalloc -mllvm -regalloc=greedy'
5 times, cross-compiling from X86 to AArch64, and took the median measured
compile times.
In summary, I see GlobalISel having a compile time that's 3.5% higher than
the current -O0 default.
With enabling the greedy register allocator, this increases to 28%.
28% is probably too high?
I think it is yes.
I have attached a quick hack to the greedy allocator to feature a fast mode.
Could you give it a try?
To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true (default
is false).
I'm afraid it doesn't seem to save much compile time. On geomean, I see
about 26% compile time increase against the current -O0 default (compared to 28%
increase for regalloc greedy without your patch).
At the moment I can't think of an alternative to having a "constant
materialization localizer" pass at -O0 to hit all the metrics we thought of
as necessary before enabling GISel by default.
It would be good if someone else could also do a compilation time experiment -
just to make sure I didn't make any silly mistakes in my experiment.
Here are the details I see:
        gisel   gisel+greedy
CTMark/7zip/7zip-benchmark      102.8%  106.5%
CTMark/Bullet/bullet    100.5%  105.1%
CTMark/ClamAV/clamscan  101.6%  130.8%
CTMark/SPASS/SPASS      101.2%  120.0%
CTMark/consumer-typeset/consumer-typeset        105.7%  138.2%
CTMark/kimwitu++/kc     103.1%  122.6%
CTMark/lencod/lencod    106.2%  143.4%
CTMark/mafft/pairlocalalign     96.2%   135.4%
CTMark/sqlite3/sqlite3  109.1%  155.1%
CTMark/tramp3d-v4/tramp3d-v4    109.1%  132.0%
GEOMEAN 103.5%  128.0%
Thanks,
Kristof
Thanks,
-Quentin
<regalloc-fastmode.diff>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170524/9bb0afff/attachment.html>
Quentin Colombet via llvm-dev
2017-May-24  20:01 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Hi Kristof, Thanks for going back so fast!> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at arm.com> wrote: > >> >> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote: >> >> Hi Kristof, >> >> Thanks for the measurements. >> >>> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at arm.com <mailto:kristof.beyls at arm.com>> wrote: >>> >>>> >>>>> - Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop. >>>>> >>>>> In summary, the measurements indicate some good improvements. >>>>> I also haven't measure the impact on compile time. >>>> >>>> Do you have a mean to make this measurement? >>>> Ahmed did a bunch of compile time measurements on our side and I wanted to see if I need to put him on the hook again :). >>> >>> I did a quick setup with CTMark (part of the test-suite). I ran each of >>> * '-O0 -g', >>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and >>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm -optimize-regalloc -mllvm -regalloc=greedy' >>> 5 times, cross-compiling from X86 to AArch64, and took the median measured compile times. >>> In summary, I see GlobalISel having a compile time that's 3.5% higher than the current -O0 default. >>> With enabling the greedy register allocator, this increases to 28%. >>> 28% is probably too high? >> >> I think it is yes. >> I have attached a quick hack to the greedy allocator to feature a fast mode. >> Could you give it a try? >> >> To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true (default is false). > > I'm afraid it doesn't seem to save much compile time. On geomean, I see about 26% compile time increase against the current -O0 default (compared to 28% increase for regalloc greedy without your patch).Interesting, I guess a lot of time is spent in the coalescer. Could you give a try with -join-liveintervals=false? Do you know where the time is spent (-time-passes)? Anyhow, fixing all of those, although this is I think the right approach, will take time, so we can go with the localizer. Cheers, -Quentin> >> >>> At the moment I can't think of an alternative to having a "constant materialization localizer" pass at -O0 to hit all the metrics we thought of as necessary before enabling GISel by default. >>> >>> It would be good if someone else could also do a compilation time experiment - just to make sure I didn't make any silly mistakes in my experiment. >>> >>> Here are the details I see: >>> >>> gisel gisel+greedy >>> CTMark/7zip/7zip-benchmark 102.8% 106.5% >>> CTMark/Bullet/bullet 100.5% 105.1% >>> CTMark/ClamAV/clamscan 101.6% 130.8% >>> CTMark/SPASS/SPASS 101.2% 120.0% >>> CTMark/consumer-typeset/consumer-typeset 105.7% 138.2% >>> CTMark/kimwitu++/kc 103.1% 122.6% >>> CTMark/lencod/lencod 106.2% 143.4% >>> CTMark/mafft/pairlocalalign 96.2% 135.4% >>> CTMark/sqlite3/sqlite3 109.1% 155.1% >>> CTMark/tramp3d-v4/tramp3d-v4 109.1% 132.0% >>> GEOMEAN 103.5% 128.0% >>> >>> >>> Thanks, >>> >>> Kristof >> >> Thanks, >> -Quentin >> <regalloc-fastmode.diff>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170524/b703c506/attachment.html>
Maybe Matching Threads
- [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
- [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
- [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
- [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
- [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!