thr3ads.net - llvm dev - [llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try! [May 2017]

If this information is useful, please help other people find it:
Share via:

Quentin Colombet via llvm-dev

2017-May-24 20:01 UTC

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Hi Kristof,

Thanks for going back so fast!
> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
> 
>> 
>> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at apple.com
<mailto:qcolombet at apple.com>> wrote:
>> 
>> Hi Kristof,
>> 
>> Thanks for the measurements.
>> 
>>> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at
arm.com <mailto:kristof.beyls at arm.com>> wrote:
>>> 
>>>> 
>>>>> - Comparing against -O0 without globalisel but with the
above regalloc options: 5.6% performance drop, 1% code size drop.
>>>>> 
>>>>> In summary, the measurements indicate some good
improvements.
>>>>> I also haven't measure the impact on compile time.
>>>> 
>>>> Do you have a mean to make this measurement?
>>>> Ahmed did a bunch of compile time measurements on our side and
I wanted to see if I need to put him on the hook again :).
>>> 
>>> I did a quick setup with CTMark (part of the test-suite). I ran
each of
>>> * '-O0 -g',
>>> * '-O0 -g -mllvm -global-isel=true -mllvm
-global-isel-abort=0', and
>>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0
-mllvm -optimize-regalloc -mllvm -regalloc=greedy'
>>> 5 times, cross-compiling from X86 to AArch64, and took the median
measured compile times.
>>> In summary, I see GlobalISel having a compile time that's 3.5%
higher than the current -O0 default.
>>> With enabling the greedy register allocator, this increases to 28%.
>>> 28% is probably too high?
>> 
>> I think it is yes.
>> I have attached a quick hack to the greedy allocator to feature a fast
mode.
>> Could you give it a try?
>> 
>> To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true
(default is false).
> 
> I'm afraid it doesn't seem to save much compile time. On geomean, I
see about 26% compile time increase against the current -O0 default (compared to
28% increase for regalloc greedy without your patch).
Interesting, I guess a lot of time is spent in the coalescer. Could you give a
try with -join-liveintervals=false?

Do you know where the time is spent (-time-passes)?

Anyhow, fixing all of those, although this is I think the right approach, will
take time, so we can go with the localizer.

Cheers,
-Quentin 
> 
>> 
>>> At the moment I can't think of an alternative to having a
"constant materialization localizer" pass at -O0 to hit all the
metrics we thought of as necessary before enabling GISel by default.
>>> 
>>> It would be good if someone else could also do a compilation time
experiment - just to make sure I didn't make any silly mistakes in my
experiment.
>>> 
>>> Here are the details I see:
>>> 
>>> gisel	gisel+greedy
>>> CTMark/7zip/7zip-benchmark	102.8%	106.5%
>>> CTMark/Bullet/bullet	100.5%	105.1%
>>> CTMark/ClamAV/clamscan	101.6%	130.8%
>>> CTMark/SPASS/SPASS	101.2%	120.0%
>>> CTMark/consumer-typeset/consumer-typeset	105.7%	138.2%
>>> CTMark/kimwitu++/kc	103.1%	122.6%
>>> CTMark/lencod/lencod	106.2%	143.4%
>>> CTMark/mafft/pairlocalalign	96.2%	135.4%
>>> CTMark/sqlite3/sqlite3	109.1%	155.1%
>>> CTMark/tramp3d-v4/tramp3d-v4	109.1%	132.0%
>>> GEOMEAN	103.5%	128.0%
>>> 
>>> 
>>> Thanks,
>>> 
>>> Kristof
>> 
>> Thanks,
>> -Quentin
>> <regalloc-fastmode.diff>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170524/b703c506/attachment.html>

Kristof Beyls via llvm-dev

2017-May-25 09:09 UTC

head link

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at
apple.com<mailto:qcolombet at apple.com>> wrote:

Hi Kristof,

Thanks for going back so fast!

On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at
arm.com<mailto:kristof.beyls at arm.com>> wrote:


On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at
apple.com<mailto:qcolombet at apple.com>> wrote:

Hi Kristof,

Thanks for the measurements.

On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at
arm.com<mailto:kristof.beyls at arm.com>> wrote:


- Comparing against -O0 without globalisel but with the above regalloc options:
5.6% performance drop, 1% code size drop.

In summary, the measurements indicate some good improvements.
I also haven't measure the impact on compile time.

Do you have a mean to make this measurement?
Ahmed did a bunch of compile time measurements on our side and I wanted to see
if I need to put him on the hook again :).

I did a quick setup with CTMark (part of the test-suite). I ran each of
* '-O0 -g',
* '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and
* '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm
-optimize-regalloc -mllvm -regalloc=greedy'
5 times, cross-compiling from X86 to AArch64, and took the median measured
compile times.
In summary, I see GlobalISel having a compile time that's 3.5% higher than
the current -O0 default.
With enabling the greedy register allocator, this increases to 28%.
28% is probably too high?

I think it is yes.
I have attached a quick hack to the greedy allocator to feature a fast mode.
Could you give it a try?

To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true (default
is false).

I'm afraid it doesn't seem to save much compile time. On geomean, I see
about 26% compile time increase against the current -O0 default (compared to 28%
increase for regalloc greedy without your patch).

Interesting, I guess a lot of time is spent in the coalescer. Could you give a
try with -join-liveintervals=false?

With adding -join-liveintervals=false, I see the compile time increase going up
to 28% again.


Do you know where the time is spent (-time-passes)?

I'm afraid I won't have time to have a closer look in the next couple of
days - I don't know where the time is spent at the moment.


Anyhow, fixing all of those, although this is I think the right approach, will
take time, so we can go with the localizer.

Right, I don't understand the register allocator well enough to know if that
compile time overhead can be fixed, while still getting the necessary codegen
benefits the greedy allocator gives.
Is there any specific help you're looking for with getting the localizer
work well enough for production use?

Thanks,

Kristof

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170525/241f3528/attachment.html>

Quentin Colombet via llvm-dev

2017-May-25 20:53 UTC

head link

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Hi Kristof,
> On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at arm.com>
wrote:
> 
>> 
>> On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at apple.com
<mailto:qcolombet at apple.com>> wrote:
>> 
>> Hi Kristof,
>> 
>> Thanks for going back so fast!
>> 
>>> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at
arm.com <mailto:kristof.beyls at arm.com>> wrote:
>>> 
>>>> 
>>>> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at
apple.com <mailto:qcolombet at apple.com>> wrote:
>>>> 
>>>> Hi Kristof,
>>>> 
>>>> Thanks for the measurements.
>>>> 
>>>>> On May 24, 2017, at 6:00 AM, Kristof Beyls
<kristof.beyls at arm.com <mailto:kristof.beyls at arm.com>> wrote:
>>>>> 
>>>>>> 
>>>>>>> - Comparing against -O0 without globalisel but with
the above regalloc options: 5.6% performance drop, 1% code size drop.
>>>>>>> 
>>>>>>> In summary, the measurements indicate some good
improvements.
>>>>>>> I also haven't measure the impact on compile
time.
>>>>>> 
>>>>>> Do you have a mean to make this measurement?
>>>>>> Ahmed did a bunch of compile time measurements on our
side and I wanted to see if I need to put him on the hook again :).
>>>>> 
>>>>> I did a quick setup with CTMark (part of the test-suite). I
ran each of
>>>>> * '-O0 -g',
>>>>> * '-O0 -g -mllvm -global-isel=true -mllvm
-global-isel-abort=0', and
>>>>> * '-O0 -g -mllvm -global-isel=true -mllvm
-global-isel-abort=0 -mllvm -optimize-regalloc -mllvm -regalloc=greedy'
>>>>> 5 times, cross-compiling from X86 to AArch64, and took the
median measured compile times.
>>>>> In summary, I see GlobalISel having a compile time
that's 3.5% higher than the current -O0 default.
>>>>> With enabling the greedy register allocator, this increases
to 28%.
>>>>> 28% is probably too high?
>>>> 
>>>> I think it is yes.
>>>> I have attached a quick hack to the greedy allocator to feature
a fast mode.
>>>> Could you give it a try?
>>>> 
>>>> To enable the fast mode, please use (-mllvm)
-regalloc-greedy-fast=true (default is false).
>>> 
>>> I'm afraid it doesn't seem to save much compile time. On
geomean, I see about 26% compile time increase against the current -O0 default
(compared to 28% increase for regalloc greedy without your patch).
>> 
>> Interesting, I guess a lot of time is spent in the coalescer. Could you
give a try with -join-liveintervals=false?
> 
> With adding -join-liveintervals=false, I see the compile time increase
going up to 28% again.
Heh, I am mildly surprised we hand much more live-ranges to the allocator when
we do that.
> 
>> 
>> Do you know where the time is spent (-time-passes)?
> 
> I'm afraid I won't have time to have a closer look in the next
couple of days - I don't know where the time is spent at the moment.
Fair enough, will investigate later.
> 
>> 
>> Anyhow, fixing all of those, although this is I think the right
approach, will take time, so we can go with the localizer.
> 
> Right, I don't understand the register allocator well enough to know if
that compile time overhead can be fixed, while still getting the necessary
codegen benefits the greedy allocator gives.
> Is there any specific help you're looking for with getting the
localizer work well enough for production use?
I’ll clean-up the WIP patch for the localizer, then you guys can fix the bug
that you found.

I’ll do that tomorrow.

Cheers,
-Quentin
> 
> Thanks,
> 
> Kristof
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170525/d3645aff/attachment.html>

llvm dev - May 2017 - [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!