thr3ads.net - llvm dev - [llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try! [May 2017]

If this information is useful, please help other people find it:
Share via:

Quentin Colombet via llvm-dev

2017-May-30 14:42 UTC

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Thanks Diana.

That is indeed the assumption in the code and this is obviously wrong.

Could you try the attached patch?

(I haven’t even tried to compile it though)

Cheers,
-Quentin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: localizer_tentative_fix.diff
Type: application/octet-stream
Size: 774 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170530/455d16da/attachment.obj>
-------------- next part --------------
> On May 30, 2017, at 6:56 AM, Diana Picus <diana.picus at linaro.org>
wrote:
> 
> Hi Quentin,
> 
> I've attached a reproducer for the problem.
> 
> I've described what I think the problem is in the file, but the short
> version is that the localizer shouldn't assume that the iteration
> order for the uses corresponds to the logical order of instructions in
> a basic block (we're now localizing before the first use that we find,
> but that may be later in the basic block, so we'd end up with uses
> before the def).
> 
> I'm not sure it's possible to test this without running a couple of
> passes. You might be able to trigger it only with reg bank select +
> localize, but I haven't tried. Using only the localizer would mean
> that the iteration order for the uses would be the order in which
> they're read in, so you wouldn't have this problem.
> 
> Hope that helps,
> Diana
> 
> 
> On 29 May 2017 at 10:06, Diana Picus <diana.picus at linaro.org>
wrote:
>> Thanks Quentin, it's in progress now, I'll let you know how it
goes.
>> 
>> Cheers,
>> Diana
>> 
>> On 27 May 2017 at 03:36, Quentin Colombet <qcolombet at
apple.com> wrote:
>>> Hi Kristof,
>>> 
>>> I’ve pushed the localizer in r304051 and added it in the AArch64 O0
pipeline
>>> in r304052.
>>> 
>>> I let Diana investigate the seg fault she was seeing.
>>> 
>>> @Diana, let me know if you need help.
>>> 
>>> Cheers,
>>> -Quentin
>>> 
>>> On May 25, 2017, at 1:53 PM, Quentin Colombet via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> Hi Kristof,
>>> 
>>> On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
>>> 
>>> 
>>> On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at
apple.com> wrote:
>>> 
>>> Hi Kristof,
>>> 
>>> Thanks for going back so fast!
>>> 
>>> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
>>> 
>>> 
>>> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at
apple.com> wrote:
>>> 
>>> Hi Kristof,
>>> 
>>> Thanks for the measurements.
>>> 
>>> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
>>> 
>>> 
>>> - Comparing against -O0 without globalisel but with the above
regalloc
>>> options: 5.6% performance drop, 1% code size drop.
>>> 
>>> In summary, the measurements indicate some good improvements.
>>> I also haven't measure the impact on compile time.
>>> 
>>> 
>>> Do you have a mean to make this measurement?
>>> Ahmed did a bunch of compile time measurements on our side and I
wanted to
>>> see if I need to put him on the hook again :).
>>> 
>>> 
>>> I did a quick setup with CTMark (part of the test-suite). I ran
each of
>>> * '-O0 -g',
>>> * '-O0 -g -mllvm -global-isel=true -mllvm
-global-isel-abort=0', and
>>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0
-mllvm
>>> -optimize-regalloc -mllvm -regalloc=greedy'
>>> 5 times, cross-compiling from X86 to AArch64, and took the median
measured
>>> compile times.
>>> In summary, I see GlobalISel having a compile time that's 3.5%
higher than
>>> the current -O0 default.
>>> With enabling the greedy register allocator, this increases to 28%.
>>> 28% is probably too high?
>>> 
>>> 
>>> I think it is yes.
>>> I have attached a quick hack to the greedy allocator to feature a
fast mode.
>>> Could you give it a try?
>>> 
>>> To enable the fast mode, please use (-mllvm)
-regalloc-greedy-fast=true
>>> (default is false).
>>> 
>>> 
>>> I'm afraid it doesn't seem to save much compile time. On
geomean, I see
>>> about 26% compile time increase against the current -O0 default
(compared to
>>> 28% increase for regalloc greedy without your patch).
>>> 
>>> 
>>> Interesting, I guess a lot of time is spent in the coalescer. Could
you give
>>> a try with -join-liveintervals=false?
>>> 
>>> 
>>> With adding -join-liveintervals=false, I see the compile time
increase going
>>> up to 28% again.
>>> 
>>> 
>>> Heh, I am mildly surprised we hand much more live-ranges to the
allocator
>>> when we do that.
>>> 
>>> 
>>> 
>>> Do you know where the time is spent (-time-passes)?
>>> 
>>> 
>>> I'm afraid I won't have time to have a closer look in the
next couple of
>>> days - I don't know where the time is spent at the moment.
>>> 
>>> 
>>> Fair enough, will investigate later.
>>> 
>>> 
>>> 
>>> Anyhow, fixing all of those, although this is I think the right
approach,
>>> will take time, so we can go with the localizer.
>>> 
>>> 
>>> Right, I don't understand the register allocator well enough to
know if that
>>> compile time overhead can be fixed, while still getting the
necessary
>>> codegen benefits the greedy allocator gives.
>>> Is there any specific help you're looking for with getting the
localizer
>>> work well enough for production use?
>>> 
>>> 
>>> I’ll clean-up the WIP patch for the localizer, then you guys can
fix the bug
>>> that you found.
>>> 
>>> I’ll do that tomorrow.
>>> 
>>> Cheers,
>>> -Quentin
>>> 
>>> 
>>> Thanks,
>>> 
>>> Kristof
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
>>> 
> <localizer-mo-order.mir>

Quentin Colombet via llvm-dev

2017-May-30 20:57 UTC

head link

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Hi Diana,

I’ve actually gone ahead and pushed the fix as I was able to produce a small
reproducer.

This is r304244

Let me know if you encounter any other problem.

Cheers,
-Quentin> On May 30, 2017, at 7:42 AM, Quentin Colombet via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Thanks Diana.
> 
> That is indeed the assumption in the code and this is obviously wrong.
> 
> Could you try the attached patch?
> 
> (I haven’t even tried to compile it though)
> 
> Cheers,
> -Quentin
> <localizer_tentative_fix.diff>
>> On May 30, 2017, at 6:56 AM, Diana Picus <diana.picus at
linaro.org> wrote:
>> 
>> Hi Quentin,
>> 
>> I've attached a reproducer for the problem.
>> 
>> I've described what I think the problem is in the file, but the
short
>> version is that the localizer shouldn't assume that the iteration
>> order for the uses corresponds to the logical order of instructions in
>> a basic block (we're now localizing before the first use that we
find,
>> but that may be later in the basic block, so we'd end up with uses
>> before the def).
>> 
>> I'm not sure it's possible to test this without running a
couple of
>> passes. You might be able to trigger it only with reg bank select +
>> localize, but I haven't tried. Using only the localizer would mean
>> that the iteration order for the uses would be the order in which
>> they're read in, so you wouldn't have this problem.
>> 
>> Hope that helps,
>> Diana
>> 
>> 
>> On 29 May 2017 at 10:06, Diana Picus <diana.picus at linaro.org>
wrote:
>>> Thanks Quentin, it's in progress now, I'll let you know how
it goes.
>>> 
>>> Cheers,
>>> Diana
>>> 
>>> On 27 May 2017 at 03:36, Quentin Colombet <qcolombet at
apple.com> wrote:
>>>> Hi Kristof,
>>>> 
>>>> I’ve pushed the localizer in r304051 and added it in the
AArch64 O0 pipeline
>>>> in r304052.
>>>> 
>>>> I let Diana investigate the seg fault she was seeing.
>>>> 
>>>> @Diana, let me know if you need help.
>>>> 
>>>> Cheers,
>>>> -Quentin
>>>> 
>>>> On May 25, 2017, at 1:53 PM, Quentin Colombet via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>> 
>>>> Hi Kristof,
>>>> 
>>>> On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
>>>> 
>>>> 
>>>> On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at
apple.com> wrote:
>>>> 
>>>> Hi Kristof,
>>>> 
>>>> Thanks for going back so fast!
>>>> 
>>>> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls
at arm.com> wrote:
>>>> 
>>>> 
>>>> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at
apple.com> wrote:
>>>> 
>>>> Hi Kristof,
>>>> 
>>>> Thanks for the measurements.
>>>> 
>>>> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
>>>> 
>>>> 
>>>> - Comparing against -O0 without globalisel but with the above
regalloc
>>>> options: 5.6% performance drop, 1% code size drop.
>>>> 
>>>> In summary, the measurements indicate some good improvements.
>>>> I also haven't measure the impact on compile time.
>>>> 
>>>> 
>>>> Do you have a mean to make this measurement?
>>>> Ahmed did a bunch of compile time measurements on our side and
I wanted to
>>>> see if I need to put him on the hook again :).
>>>> 
>>>> 
>>>> I did a quick setup with CTMark (part of the test-suite). I ran
each of
>>>> * '-O0 -g',
>>>> * '-O0 -g -mllvm -global-isel=true -mllvm
-global-isel-abort=0', and
>>>> * '-O0 -g -mllvm -global-isel=true -mllvm
-global-isel-abort=0 -mllvm
>>>> -optimize-regalloc -mllvm -regalloc=greedy'
>>>> 5 times, cross-compiling from X86 to AArch64, and took the
median measured
>>>> compile times.
>>>> In summary, I see GlobalISel having a compile time that's
3.5% higher than
>>>> the current -O0 default.
>>>> With enabling the greedy register allocator, this increases to
28%.
>>>> 28% is probably too high?
>>>> 
>>>> 
>>>> I think it is yes.
>>>> I have attached a quick hack to the greedy allocator to feature
a fast mode.
>>>> Could you give it a try?
>>>> 
>>>> To enable the fast mode, please use (-mllvm)
-regalloc-greedy-fast=true
>>>> (default is false).
>>>> 
>>>> 
>>>> I'm afraid it doesn't seem to save much compile time.
On geomean, I see
>>>> about 26% compile time increase against the current -O0 default
(compared to
>>>> 28% increase for regalloc greedy without your patch).
>>>> 
>>>> 
>>>> Interesting, I guess a lot of time is spent in the coalescer.
Could you give
>>>> a try with -join-liveintervals=false?
>>>> 
>>>> 
>>>> With adding -join-liveintervals=false, I see the compile time
increase going
>>>> up to 28% again.
>>>> 
>>>> 
>>>> Heh, I am mildly surprised we hand much more live-ranges to the
allocator
>>>> when we do that.
>>>> 
>>>> 
>>>> 
>>>> Do you know where the time is spent (-time-passes)?
>>>> 
>>>> 
>>>> I'm afraid I won't have time to have a closer look in
the next couple of
>>>> days - I don't know where the time is spent at the moment.
>>>> 
>>>> 
>>>> Fair enough, will investigate later.
>>>> 
>>>> 
>>>> 
>>>> Anyhow, fixing all of those, although this is I think the right
approach,
>>>> will take time, so we can go with the localizer.
>>>> 
>>>> 
>>>> Right, I don't understand the register allocator well
enough to know if that
>>>> compile time overhead can be fixed, while still getting the
necessary
>>>> codegen benefits the greedy allocator gives.
>>>> Is there any specific help you're looking for with getting
the localizer
>>>> work well enough for production use?
>>>> 
>>>> 
>>>> I’ll clean-up the WIP patch for the localizer, then you guys
can fix the bug
>>>> that you found.
>>>> 
>>>> I’ll do that tomorrow.
>>>> 
>>>> Cheers,
>>>> -Quentin
>>>> 
>>>> 
>>>> Thanks,
>>>> 
>>>> Kristof
>>>> 
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> 
>>>> 
>> <localizer-mo-order.mir>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170530/1623ba2b/attachment.html>

Diana Picus via llvm-dev

2017-May-31 13:33 UTC

head link

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Cool test :)
It seems to work fine now, I don't see any new failures. IIUC, Kristof is
also giving it another run.

Cheers,
Diana

On 30 May 2017 at 22:57, Quentin Colombet <qcolombet at apple.com> wrote:
> Hi Diana,
>
> I’ve actually gone ahead and pushed the fix as I was able to produce a
> small reproducer.
>
> This is r304244
>
> Let me know if you encounter any other problem.
>
> Cheers,
> -Quentin
>
> On May 30, 2017, at 7:42 AM, Quentin Colombet via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Thanks Diana.
>
> That is indeed the assumption in the code and this is obviously wrong.
>
> Could you try the attached patch?
>
> (I haven’t even tried to compile it though)
>
> Cheers,
> -Quentin
> <localizer_tentative_fix.diff>
>
> On May 30, 2017, at 6:56 AM, Diana Picus <diana.picus at linaro.org>
wrote:
>
> Hi Quentin,
>
> I've attached a reproducer for the problem.
>
> I've described what I think the problem is in the file, but the short
> version is that the localizer shouldn't assume that the iteration
> order for the uses corresponds to the logical order of instructions in
> a basic block (we're now localizing before the first use that we find,
> but that may be later in the basic block, so we'd end up with uses
> before the def).
>
> I'm not sure it's possible to test this without running a couple of
> passes. You might be able to trigger it only with reg bank select +
> localize, but I haven't tried. Using only the localizer would mean
> that the iteration order for the uses would be the order in which
> they're read in, so you wouldn't have this problem.
>
> Hope that helps,
> Diana
>
>
> On 29 May 2017 at 10:06, Diana Picus <diana.picus at linaro.org>
wrote:
>
> Thanks Quentin, it's in progress now, I'll let you know how it
goes.
>
> Cheers,
> Diana
>
> On 27 May 2017 at 03:36, Quentin Colombet <qcolombet at apple.com>
wrote:
>
> Hi Kristof,
>
> I’ve pushed the localizer in r304051 and added it in the AArch64 O0
> pipeline
> in r304052.
>
> I let Diana investigate the seg fault she was seeing.
>
> @Diana, let me know if you need help.
>
> Cheers,
> -Quentin
>
> On May 25, 2017, at 1:53 PM, Quentin Colombet via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> Hi Kristof,
>
> On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at arm.com>
wrote:
>
>
> On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at apple.com>
wrote:
>
> Hi Kristof,
>
> Thanks for going back so fast!
>
> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at
arm.com> wrote:
>
>
> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at apple.com>
wrote:
>
> Hi Kristof,
>
> Thanks for the measurements.
>
> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at arm.com>
wrote:
>
>
> - Comparing against -O0 without globalisel but with the above regalloc
> options: 5.6% performance drop, 1% code size drop.
>
> In summary, the measurements indicate some good improvements.
> I also haven't measure the impact on compile time.
>
>
> Do you have a mean to make this measurement?
> Ahmed did a bunch of compile time measurements on our side and I wanted to
> see if I need to put him on the hook again :).
>
>
> I did a quick setup with CTMark (part of the test-suite). I ran each of
> * '-O0 -g',
> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0',
and
> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm
> -optimize-regalloc -mllvm -regalloc=greedy'
> 5 times, cross-compiling from X86 to AArch64, and took the median measured
> compile times.
> In summary, I see GlobalISel having a compile time that's 3.5% higher
than
> the current -O0 default.
> With enabling the greedy register allocator, this increases to 28%.
> 28% is probably too high?
>
>
> I think it is yes.
> I have attached a quick hack to the greedy allocator to feature a fast
> mode.
> Could you give it a try?
>
> To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true
> (default is false).
>
>
> I'm afraid it doesn't seem to save much compile time. On geomean, I
see
> about 26% compile time increase against the current -O0 default (compared
> to
> 28% increase for regalloc greedy without your patch).
>
>
> Interesting, I guess a lot of time is spent in the coalescer. Could you
> give
> a try with -join-liveintervals=false?
>
>
> With adding -join-liveintervals=false, I see the compile time increase
> going
> up to 28% again.
>
>
> Heh, I am mildly surprised we hand much more live-ranges to the allocator
> when we do that.
>
>
>
> Do you know where the time is spent (-time-passes)?
>
>
> I'm afraid I won't have time to have a closer look in the next
couple of
> days - I don't know where the time is spent at the moment.
>
>
> Fair enough, will investigate later.
>
>
>
> Anyhow, fixing all of those, although this is I think the right approach,
> will take time, so we can go with the localizer.
>
>
> Right, I don't understand the register allocator well enough to know if
> that
> compile time overhead can be fixed, while still getting the necessary
> codegen benefits the greedy allocator gives.
> Is there any specific help you're looking for with getting the
localizer
> work well enough for production use?
>
>
> I’ll clean-up the WIP patch for the localizer, then you guys can fix the
> bug
> that you found.
>
> I’ll do that tomorrow.
>
> Cheers,
> -Quentin
>
>
> Thanks,
>
> Kristof
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> <localizer-mo-order.mir>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170531/4046eb2d/attachment-0001.html>

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - May 2017 - [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Reasonably Related Threads