Quentin Colombet via llvm-dev
2017-May-25 20:53 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Hi Kristof,> On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at arm.com> wrote: > >> >> On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote: >> >> Hi Kristof, >> >> Thanks for going back so fast! >> >>> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at arm.com <mailto:kristof.beyls at arm.com>> wrote: >>> >>>> >>>> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote: >>>> >>>> Hi Kristof, >>>> >>>> Thanks for the measurements. >>>> >>>>> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at arm.com <mailto:kristof.beyls at arm.com>> wrote: >>>>> >>>>>> >>>>>>> - Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop. >>>>>>> >>>>>>> In summary, the measurements indicate some good improvements. >>>>>>> I also haven't measure the impact on compile time. >>>>>> >>>>>> Do you have a mean to make this measurement? >>>>>> Ahmed did a bunch of compile time measurements on our side and I wanted to see if I need to put him on the hook again :). >>>>> >>>>> I did a quick setup with CTMark (part of the test-suite). I ran each of >>>>> * '-O0 -g', >>>>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and >>>>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm -optimize-regalloc -mllvm -regalloc=greedy' >>>>> 5 times, cross-compiling from X86 to AArch64, and took the median measured compile times. >>>>> In summary, I see GlobalISel having a compile time that's 3.5% higher than the current -O0 default. >>>>> With enabling the greedy register allocator, this increases to 28%. >>>>> 28% is probably too high? >>>> >>>> I think it is yes. >>>> I have attached a quick hack to the greedy allocator to feature a fast mode. >>>> Could you give it a try? >>>> >>>> To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true (default is false). >>> >>> I'm afraid it doesn't seem to save much compile time. On geomean, I see about 26% compile time increase against the current -O0 default (compared to 28% increase for regalloc greedy without your patch). >> >> Interesting, I guess a lot of time is spent in the coalescer. Could you give a try with -join-liveintervals=false? > > With adding -join-liveintervals=false, I see the compile time increase going up to 28% again.Heh, I am mildly surprised we hand much more live-ranges to the allocator when we do that.> >> >> Do you know where the time is spent (-time-passes)? > > I'm afraid I won't have time to have a closer look in the next couple of days - I don't know where the time is spent at the moment.Fair enough, will investigate later.> >> >> Anyhow, fixing all of those, although this is I think the right approach, will take time, so we can go with the localizer. > > Right, I don't understand the register allocator well enough to know if that compile time overhead can be fixed, while still getting the necessary codegen benefits the greedy allocator gives. > Is there any specific help you're looking for with getting the localizer work well enough for production use?I’ll clean-up the WIP patch for the localizer, then you guys can fix the bug that you found. I’ll do that tomorrow. Cheers, -Quentin> > Thanks, > > Kristof-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170525/d3645aff/attachment.html>
Quentin Colombet via llvm-dev
2017-May-27 01:36 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Hi Kristof, I’ve pushed the localizer in r304051 and added it in the AArch64 O0 pipeline in r304052. I let Diana investigate the seg fault she was seeing. @Diana, let me know if you need help. Cheers, -Quentin> On May 25, 2017, at 1:53 PM, Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi Kristof, > >> On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at arm.com <mailto:kristof.beyls at arm.com>> wrote: >> >>> >>> On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote: >>> >>> Hi Kristof, >>> >>> Thanks for going back so fast! >>> >>>> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at arm.com <mailto:kristof.beyls at arm.com>> wrote: >>>> >>>>> >>>>> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote: >>>>> >>>>> Hi Kristof, >>>>> >>>>> Thanks for the measurements. >>>>> >>>>>> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at arm.com <mailto:kristof.beyls at arm.com>> wrote: >>>>>> >>>>>>> >>>>>>>> - Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop. >>>>>>>> >>>>>>>> In summary, the measurements indicate some good improvements. >>>>>>>> I also haven't measure the impact on compile time. >>>>>>> >>>>>>> Do you have a mean to make this measurement? >>>>>>> Ahmed did a bunch of compile time measurements on our side and I wanted to see if I need to put him on the hook again :). >>>>>> >>>>>> I did a quick setup with CTMark (part of the test-suite). I ran each of >>>>>> * '-O0 -g', >>>>>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and >>>>>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm -optimize-regalloc -mllvm -regalloc=greedy' >>>>>> 5 times, cross-compiling from X86 to AArch64, and took the median measured compile times. >>>>>> In summary, I see GlobalISel having a compile time that's 3.5% higher than the current -O0 default. >>>>>> With enabling the greedy register allocator, this increases to 28%. >>>>>> 28% is probably too high? >>>>> >>>>> I think it is yes. >>>>> I have attached a quick hack to the greedy allocator to feature a fast mode. >>>>> Could you give it a try? >>>>> >>>>> To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true (default is false). >>>> >>>> I'm afraid it doesn't seem to save much compile time. On geomean, I see about 26% compile time increase against the current -O0 default (compared to 28% increase for regalloc greedy without your patch). >>> >>> Interesting, I guess a lot of time is spent in the coalescer. Could you give a try with -join-liveintervals=false? >> >> With adding -join-liveintervals=false, I see the compile time increase going up to 28% again. > > Heh, I am mildly surprised we hand much more live-ranges to the allocator when we do that. > >> >>> >>> Do you know where the time is spent (-time-passes)? >> >> I'm afraid I won't have time to have a closer look in the next couple of days - I don't know where the time is spent at the moment. > > Fair enough, will investigate later. > >> >>> >>> Anyhow, fixing all of those, although this is I think the right approach, will take time, so we can go with the localizer. >> >> Right, I don't understand the register allocator well enough to know if that compile time overhead can be fixed, while still getting the necessary codegen benefits the greedy allocator gives. >> Is there any specific help you're looking for with getting the localizer work well enough for production use? > > I’ll clean-up the WIP patch for the localizer, then you guys can fix the bug that you found. > > I’ll do that tomorrow. > > Cheers, > -Quentin > >> >> Thanks, >> >> Kristof > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170526/af38252c/attachment-0001.html>
Diana Picus via llvm-dev
2017-May-29 08:06 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Thanks Quentin, it's in progress now, I'll let you know how it goes. Cheers, Diana On 27 May 2017 at 03:36, Quentin Colombet <qcolombet at apple.com> wrote:> Hi Kristof, > > I’ve pushed the localizer in r304051 and added it in the AArch64 O0 pipeline > in r304052. > > I let Diana investigate the seg fault she was seeing. > > @Diana, let me know if you need help. > > Cheers, > -Quentin > > On May 25, 2017, at 1:53 PM, Quentin Colombet via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Hi Kristof, > > On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at arm.com> wrote: > > > On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at apple.com> wrote: > > Hi Kristof, > > Thanks for going back so fast! > > On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at arm.com> wrote: > > > On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at apple.com> wrote: > > Hi Kristof, > > Thanks for the measurements. > > On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at arm.com> wrote: > > > - Comparing against -O0 without globalisel but with the above regalloc > options: 5.6% performance drop, 1% code size drop. > > In summary, the measurements indicate some good improvements. > I also haven't measure the impact on compile time. > > > Do you have a mean to make this measurement? > Ahmed did a bunch of compile time measurements on our side and I wanted to > see if I need to put him on the hook again :). > > > I did a quick setup with CTMark (part of the test-suite). I ran each of > * '-O0 -g', > * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and > * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm > -optimize-regalloc -mllvm -regalloc=greedy' > 5 times, cross-compiling from X86 to AArch64, and took the median measured > compile times. > In summary, I see GlobalISel having a compile time that's 3.5% higher than > the current -O0 default. > With enabling the greedy register allocator, this increases to 28%. > 28% is probably too high? > > > I think it is yes. > I have attached a quick hack to the greedy allocator to feature a fast mode. > Could you give it a try? > > To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true > (default is false). > > > I'm afraid it doesn't seem to save much compile time. On geomean, I see > about 26% compile time increase against the current -O0 default (compared to > 28% increase for regalloc greedy without your patch). > > > Interesting, I guess a lot of time is spent in the coalescer. Could you give > a try with -join-liveintervals=false? > > > With adding -join-liveintervals=false, I see the compile time increase going > up to 28% again. > > > Heh, I am mildly surprised we hand much more live-ranges to the allocator > when we do that. > > > > Do you know where the time is spent (-time-passes)? > > > I'm afraid I won't have time to have a closer look in the next couple of > days - I don't know where the time is spent at the moment. > > > Fair enough, will investigate later. > > > > Anyhow, fixing all of those, although this is I think the right approach, > will take time, so we can go with the localizer. > > > Right, I don't understand the register allocator well enough to know if that > compile time overhead can be fixed, while still getting the necessary > codegen benefits the greedy allocator gives. > Is there any specific help you're looking for with getting the localizer > work well enough for production use? > > > I’ll clean-up the WIP patch for the localizer, then you guys can fix the bug > that you found. > > I’ll do that tomorrow. > > Cheers, > -Quentin > > > Thanks, > > Kristof > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >
Maybe Matching Threads
- [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
- [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
- [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
- [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
- [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!