thr3ads.net - llvm dev - [llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try! [May 2017]

If this information is useful, please help other people find it:
Share via:

Diana Picus via llvm-dev

2017-May-29 08:06 UTC

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Thanks Quentin, it's in progress now, I'll let you know how it goes.

Cheers,
Diana

On 27 May 2017 at 03:36, Quentin Colombet <qcolombet at apple.com>
wrote:> Hi Kristof,
>
> I’ve pushed the localizer in r304051 and added it in the AArch64 O0
pipeline
> in r304052.
>
> I let Diana investigate the seg fault she was seeing.
>
> @Diana, let me know if you need help.
>
> Cheers,
> -Quentin
>
> On May 25, 2017, at 1:53 PM, Quentin Colombet via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> Hi Kristof,
>
> On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at arm.com>
wrote:
>
>
> On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at apple.com>
wrote:
>
> Hi Kristof,
>
> Thanks for going back so fast!
>
> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
>
>
> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at apple.com>
wrote:
>
> Hi Kristof,
>
> Thanks for the measurements.
>
> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at arm.com>
wrote:
>
>
> - Comparing against -O0 without globalisel but with the above regalloc
> options: 5.6% performance drop, 1% code size drop.
>
> In summary, the measurements indicate some good improvements.
> I also haven't measure the impact on compile time.
>
>
> Do you have a mean to make this measurement?
> Ahmed did a bunch of compile time measurements on our side and I wanted to
> see if I need to put him on the hook again :).
>
>
> I did a quick setup with CTMark (part of the test-suite). I ran each of
> * '-O0 -g',
> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0',
and
> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm
> -optimize-regalloc -mllvm -regalloc=greedy'
> 5 times, cross-compiling from X86 to AArch64, and took the median measured
> compile times.
> In summary, I see GlobalISel having a compile time that's 3.5% higher
than
> the current -O0 default.
> With enabling the greedy register allocator, this increases to 28%.
> 28% is probably too high?
>
>
> I think it is yes.
> I have attached a quick hack to the greedy allocator to feature a fast
mode.
> Could you give it a try?
>
> To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true
> (default is false).
>
>
> I'm afraid it doesn't seem to save much compile time. On geomean, I
see
> about 26% compile time increase against the current -O0 default (compared
to
> 28% increase for regalloc greedy without your patch).
>
>
> Interesting, I guess a lot of time is spent in the coalescer. Could you
give
> a try with -join-liveintervals=false?
>
>
> With adding -join-liveintervals=false, I see the compile time increase
going
> up to 28% again.
>
>
> Heh, I am mildly surprised we hand much more live-ranges to the allocator
> when we do that.
>
>
>
> Do you know where the time is spent (-time-passes)?
>
>
> I'm afraid I won't have time to have a closer look in the next
couple of
> days - I don't know where the time is spent at the moment.
>
>
> Fair enough, will investigate later.
>
>
>
> Anyhow, fixing all of those, although this is I think the right approach,
> will take time, so we can go with the localizer.
>
>
> Right, I don't understand the register allocator well enough to know if
that
> compile time overhead can be fixed, while still getting the necessary
> codegen benefits the greedy allocator gives.
> Is there any specific help you're looking for with getting the
localizer
> work well enough for production use?
>
>
> I’ll clean-up the WIP patch for the localizer, then you guys can fix the
bug
> that you found.
>
> I’ll do that tomorrow.
>
> Cheers,
> -Quentin
>
>
> Thanks,
>
> Kristof
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

Diana Picus via llvm-dev

2017-May-30 13:56 UTC

head link

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Hi Quentin,

I've attached a reproducer for the problem.

I've described what I think the problem is in the file, but the short
version is that the localizer shouldn't assume that the iteration
order for the uses corresponds to the logical order of instructions in
a basic block (we're now localizing before the first use that we find,
but that may be later in the basic block, so we'd end up with uses
before the def).

I'm not sure it's possible to test this without running a couple of
passes. You might be able to trigger it only with reg bank select +
localize, but I haven't tried. Using only the localizer would mean
that the iteration order for the uses would be the order in which
they're read in, so you wouldn't have this problem.

Hope that helps,
Diana


On 29 May 2017 at 10:06, Diana Picus <diana.picus at linaro.org>
wrote:> Thanks Quentin, it's in progress now, I'll let you know how it
goes.
>
> Cheers,
> Diana
>
> On 27 May 2017 at 03:36, Quentin Colombet <qcolombet at apple.com>
wrote:
>> Hi Kristof,
>>
>> I’ve pushed the localizer in r304051 and added it in the AArch64 O0
pipeline
>> in r304052.
>>
>> I let Diana investigate the seg fault she was seeing.
>>
>> @Diana, let me know if you need help.
>>
>> Cheers,
>> -Quentin
>>
>> On May 25, 2017, at 1:53 PM, Quentin Colombet via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>
>> Hi Kristof,
>>
>> On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
>>
>>
>> On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at
apple.com> wrote:
>>
>> Hi Kristof,
>>
>> Thanks for going back so fast!
>>
>> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
>>
>>
>> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at
apple.com> wrote:
>>
>> Hi Kristof,
>>
>> Thanks for the measurements.
>>
>> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
>>
>>
>> - Comparing against -O0 without globalisel but with the above regalloc
>> options: 5.6% performance drop, 1% code size drop.
>>
>> In summary, the measurements indicate some good improvements.
>> I also haven't measure the impact on compile time.
>>
>>
>> Do you have a mean to make this measurement?
>> Ahmed did a bunch of compile time measurements on our side and I wanted
to
>> see if I need to put him on the hook again :).
>>
>>
>> I did a quick setup with CTMark (part of the test-suite). I ran each of
>> * '-O0 -g',
>> * '-O0 -g -mllvm -global-isel=true -mllvm
-global-isel-abort=0', and
>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0
-mllvm
>> -optimize-regalloc -mllvm -regalloc=greedy'
>> 5 times, cross-compiling from X86 to AArch64, and took the median
measured
>> compile times.
>> In summary, I see GlobalISel having a compile time that's 3.5%
higher than
>> the current -O0 default.
>> With enabling the greedy register allocator, this increases to 28%.
>> 28% is probably too high?
>>
>>
>> I think it is yes.
>> I have attached a quick hack to the greedy allocator to feature a fast
mode.
>> Could you give it a try?
>>
>> To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true
>> (default is false).
>>
>>
>> I'm afraid it doesn't seem to save much compile time. On
geomean, I see
>> about 26% compile time increase against the current -O0 default
(compared to
>> 28% increase for regalloc greedy without your patch).
>>
>>
>> Interesting, I guess a lot of time is spent in the coalescer. Could you
give
>> a try with -join-liveintervals=false?
>>
>>
>> With adding -join-liveintervals=false, I see the compile time increase
going
>> up to 28% again.
>>
>>
>> Heh, I am mildly surprised we hand much more live-ranges to the
allocator
>> when we do that.
>>
>>
>>
>> Do you know where the time is spent (-time-passes)?
>>
>>
>> I'm afraid I won't have time to have a closer look in the next
couple of
>> days - I don't know where the time is spent at the moment.
>>
>>
>> Fair enough, will investigate later.
>>
>>
>>
>> Anyhow, fixing all of those, although this is I think the right
approach,
>> will take time, so we can go with the localizer.
>>
>>
>> Right, I don't understand the register allocator well enough to
know if that
>> compile time overhead can be fixed, while still getting the necessary
>> codegen benefits the greedy allocator gives.
>> Is there any specific help you're looking for with getting the
localizer
>> work well enough for production use?
>>
>>
>> I’ll clean-up the WIP patch for the localizer, then you guys can fix
the bug
>> that you found.
>>
>> I’ll do that tomorrow.
>>
>> Cheers,
>> -Quentin
>>
>>
>> Thanks,
>>
>> Kristof
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>-------------- next part --------------
A non-text attachment was scrubbed...
Name: localizer-mo-order.mir
Type: application/octet-stream
Size: 2043 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170530/0b82b260/attachment.obj>

Quentin Colombet via llvm-dev

2017-May-30 14:42 UTC

head link

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Thanks Diana.

That is indeed the assumption in the code and this is obviously wrong.

Could you try the attached patch?

(I haven’t even tried to compile it though)

Cheers,
-Quentin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: localizer_tentative_fix.diff
Type: application/octet-stream
Size: 774 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170530/455d16da/attachment.obj>
-------------- next part --------------
> On May 30, 2017, at 6:56 AM, Diana Picus <diana.picus at linaro.org>
wrote:
> 
> Hi Quentin,
> 
> I've attached a reproducer for the problem.
> 
> I've described what I think the problem is in the file, but the short
> version is that the localizer shouldn't assume that the iteration
> order for the uses corresponds to the logical order of instructions in
> a basic block (we're now localizing before the first use that we find,
> but that may be later in the basic block, so we'd end up with uses
> before the def).
> 
> I'm not sure it's possible to test this without running a couple of
> passes. You might be able to trigger it only with reg bank select +
> localize, but I haven't tried. Using only the localizer would mean
> that the iteration order for the uses would be the order in which
> they're read in, so you wouldn't have this problem.
> 
> Hope that helps,
> Diana
> 
> 
> On 29 May 2017 at 10:06, Diana Picus <diana.picus at linaro.org>
wrote:
>> Thanks Quentin, it's in progress now, I'll let you know how it
goes.
>> 
>> Cheers,
>> Diana
>> 
>> On 27 May 2017 at 03:36, Quentin Colombet <qcolombet at
apple.com> wrote:
>>> Hi Kristof,
>>> 
>>> I’ve pushed the localizer in r304051 and added it in the AArch64 O0
pipeline
>>> in r304052.
>>> 
>>> I let Diana investigate the seg fault she was seeing.
>>> 
>>> @Diana, let me know if you need help.
>>> 
>>> Cheers,
>>> -Quentin
>>> 
>>> On May 25, 2017, at 1:53 PM, Quentin Colombet via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> Hi Kristof,
>>> 
>>> On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
>>> 
>>> 
>>> On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at
apple.com> wrote:
>>> 
>>> Hi Kristof,
>>> 
>>> Thanks for going back so fast!
>>> 
>>> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
>>> 
>>> 
>>> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at
apple.com> wrote:
>>> 
>>> Hi Kristof,
>>> 
>>> Thanks for the measurements.
>>> 
>>> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
>>> 
>>> 
>>> - Comparing against -O0 without globalisel but with the above
regalloc
>>> options: 5.6% performance drop, 1% code size drop.
>>> 
>>> In summary, the measurements indicate some good improvements.
>>> I also haven't measure the impact on compile time.
>>> 
>>> 
>>> Do you have a mean to make this measurement?
>>> Ahmed did a bunch of compile time measurements on our side and I
wanted to
>>> see if I need to put him on the hook again :).
>>> 
>>> 
>>> I did a quick setup with CTMark (part of the test-suite). I ran
each of
>>> * '-O0 -g',
>>> * '-O0 -g -mllvm -global-isel=true -mllvm
-global-isel-abort=0', and
>>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0
-mllvm
>>> -optimize-regalloc -mllvm -regalloc=greedy'
>>> 5 times, cross-compiling from X86 to AArch64, and took the median
measured
>>> compile times.
>>> In summary, I see GlobalISel having a compile time that's 3.5%
higher than
>>> the current -O0 default.
>>> With enabling the greedy register allocator, this increases to 28%.
>>> 28% is probably too high?
>>> 
>>> 
>>> I think it is yes.
>>> I have attached a quick hack to the greedy allocator to feature a
fast mode.
>>> Could you give it a try?
>>> 
>>> To enable the fast mode, please use (-mllvm)
-regalloc-greedy-fast=true
>>> (default is false).
>>> 
>>> 
>>> I'm afraid it doesn't seem to save much compile time. On
geomean, I see
>>> about 26% compile time increase against the current -O0 default
(compared to
>>> 28% increase for regalloc greedy without your patch).
>>> 
>>> 
>>> Interesting, I guess a lot of time is spent in the coalescer. Could
you give
>>> a try with -join-liveintervals=false?
>>> 
>>> 
>>> With adding -join-liveintervals=false, I see the compile time
increase going
>>> up to 28% again.
>>> 
>>> 
>>> Heh, I am mildly surprised we hand much more live-ranges to the
allocator
>>> when we do that.
>>> 
>>> 
>>> 
>>> Do you know where the time is spent (-time-passes)?
>>> 
>>> 
>>> I'm afraid I won't have time to have a closer look in the
next couple of
>>> days - I don't know where the time is spent at the moment.
>>> 
>>> 
>>> Fair enough, will investigate later.
>>> 
>>> 
>>> 
>>> Anyhow, fixing all of those, although this is I think the right
approach,
>>> will take time, so we can go with the localizer.
>>> 
>>> 
>>> Right, I don't understand the register allocator well enough to
know if that
>>> compile time overhead can be fixed, while still getting the
necessary
>>> codegen benefits the greedy allocator gives.
>>> Is there any specific help you're looking for with getting the
localizer
>>> work well enough for production use?
>>> 
>>> 
>>> I’ll clean-up the WIP patch for the localizer, then you guys can
fix the bug
>>> that you found.
>>> 
>>> I’ll do that tomorrow.
>>> 
>>> Cheers,
>>> -Quentin
>>> 
>>> 
>>> Thanks,
>>> 
>>> Kristof
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
>>> 
> <localizer-mo-order.mir>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - May 2017 - [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Seemingly Similar Threads