Diana Picus via llvm-dev
2017-May-22 07:09 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Hi Quentin, I actually did a run with -mllvm -optimize-regalloc -mllvm -regalloc=greedy over the weekend and the test does pass with that. Haven't measured the compile time though. Cheers, Diana On 19 May 2017 at 19:06, Quentin Colombet <qcolombet at apple.com> wrote:> Hi Diana, > > On May 19, 2017, at 1:33 AM, Diana Picus <diana.picus at linaro.org> wrote: > > On 18 May 2017 at 19:09, Quentin Colombet <qcolombet at apple.com> wrote: > > Hi Diana, > > On May 18, 2017, at 1:15 AM, Diana Picus <diana.picus at linaro.org> wrote: > > On 18 May 2017 at 09:06, Kristof Beyls <Kristof.Beyls at arm.com> wrote: > > I think Diana found that when enabling r299283, the bootstrap failed with > llvm-tblgen segfaulting. > So there clearly is some work required there. > > > Indeed. > > @Quentin, what is the status of that patch? > > > Initially it was meant as a prototype to investigate how we could address > those issues at O0. I didn’t mean to publish it. > > Have you been working on > it since then? > > > No, I haven’t touched it and I honestly didn’t plan to do that. > > Should I investigate the segfault more and send you a > reproducer? > > > Depends, do you guys want to pursue in that direction? > > > Not necessarily, if you think a better reg alloc scheme will do the > trick then I see no point in complicating the pass pipeline for now. > > My though was that it is probably best to rely on a better reg allocator > (basic or greedy) scheme for O0. I believe the only concern may be compile > time but it may just fly (+ I have ideas how to make it better pretty > easy). > > Is this patch the way forward, or do you have other ideas > for reducing the stack usage? > > > Better reg alloc scheme for O0 :). > > > Ok, let's go with that then :) > > > Sounds good, let me push a patch that would allow to use the greedy > allocator at O0 and see if that’s sufficient. Then, we will look at the > compile time that change will imply. > > Cheers, > -Quentin > > > > > Thanks, > Diana > > > Thanks, > > Kristof > > > On 16 May 2017, at 14:06, Diana Picus via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > > Turns out it really is a GlobalISel issue - we eat up too much stack > space because all the constants used in the switches are stored on the > stack. We need to fix this somehow before changing the default. I'll > try to give it a run with Quentin's r299283 on top to see if it helps. > > Cheers, > Diana > > On 15 May 2017 at 09:38, Diana Picus <diana.picus at linaro.org> wrote: > > Got another one: https://bugs.llvm.org/show_bug.cgi?id=33036 > > I'm still investigating whether this is an actual GlobalISel issue or > something else (I'll also start a build on a more recent revision to > see how that behaves). > > On 12 May 2017 at 20:06, Eric Christopher <echristo at gmail.com> wrote: > > Agreed. That was probably just luck before :) > > -eric > > On Fri, May 12, 2017 at 5:22 AM Diana Picus via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > It turns out that can be fixed by adding -lm to the link line, so I > will probably convert it into a test-suite bug. > > I don't suppose it's crucial to handle the fabs intrinsic nicely at -O0. > > On 12 May 2017 at 13:44, Diana Picus <diana.picus at linaro.org> wrote: > > Hi, > > I ran into a little snag on the test-suite: > https://bugs.llvm.org/show_bug.cgi?id=33021 > It boils down to GlobalISel generating calls to fabs instead of using > FABSDr (so we get undefined references). > > Cheers, > Diana > > On 11 May 2017 at 18:40, Quentin Colombet <qcolombet at apple.com> wrote: > > Hi Diana, > > Thanks for the summary. > > On May 11, 2017, at 4:01 AM, Diana Picus <diana.picus at linaro.org> > wrote: > > Hi all, > > I'm still running some validation on this, I'll send an email when > it's done. If that goes well I don't have anything against making the > switch. > > For the record, here's a summary of issues that were deferred for > later on (some of which are optimization-ish and we might decide to > never do at -O0 at all): > * Crash in RegBankSelect for half fp types: > https://bugs.llvm.org/show_bug.cgi?id=32560 > > > I’ll have a look. > > * Improving constant placement: > http://bugs.llvm.org/show_bug.cgi?id=32561 > > > I’ve commented in the PR to mention the localizer technic I was playing > with, if someone wants to give it a try. > > * Fancy switch lowering > * Transforming division-by-constant-power-of-2 into right shift > > > AFAICT all the other issues that were brought up were fixed (yay!). > > Cheers, > Diana > > > Cheers, > -Quentin > > > > On 11 May 2017 at 08:44, Kristof Beyls via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > On 10 May 2017, at 17:36, Quentin Colombet <qcolombet at apple.com> wrote: > > > MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%): > Function > Reference_IDCT: Probably due to creating all constants in the entry BB > + > spilling floating point data through an X register: > > FastISel: > fadd d0, d1, d0 > str d0, [sp,#528] > GlobalISel: > fadd d0, d1, d0 > fmov x9, d0 > stur x9, [x29,#-48] > > > Good finding, I forgot to do stores in my previous fix. I’ll do them > shortly. > > > Should be fixed by r302679 > > > Thanks Quentin, > > That reduces the slow-down when enabling globalisel at -O0 from 13% (on > r302453) to 9.5% (on r302679) in my experiments. > The code size increase also reduces from just over 3% to 2.8%. > > Kristof > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >
Kristof Beyls via llvm-dev
2017-May-22 07:51 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
> On 22 May 2017, at 09:09, Diana Picus <diana.picus at linaro.org> wrote: > > Hi Quentin, > > I actually did a run with -mllvm -optimize-regalloc -mllvm > -regalloc=greedy over the weekend and the test does pass with that. > Haven't measured the compile time though. > > Cheers, > DianaI also did my usual benchmarking run with the same options as Diana did above: - Comparing against -O0 without globalisel: 2.5% performance drop, 0.8% code size improvement. - Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop. In summary, the measurements indicate some good improvements. I also haven't measure the impact on compile time. Doing a few quick spot checks on the generated code, I still see some constants being created in the entry BB and stored on the stack. I haven't tried to investigate if the entry BB seems like a good place to materialize those remaining constants. Thanks, Kristof> > On 19 May 2017 at 19:06, Quentin Colombet <qcolombet at apple.com> wrote: >> Hi Diana, >> >> On May 19, 2017, at 1:33 AM, Diana Picus <diana.picus at linaro.org> wrote: >> >> On 18 May 2017 at 19:09, Quentin Colombet <qcolombet at apple.com> wrote: >> >> Hi Diana, >> >> On May 18, 2017, at 1:15 AM, Diana Picus <diana.picus at linaro.org> wrote: >> >> On 18 May 2017 at 09:06, Kristof Beyls <Kristof.Beyls at arm.com> wrote: >> >> I think Diana found that when enabling r299283, the bootstrap failed with >> llvm-tblgen segfaulting. >> So there clearly is some work required there. >> >> >> Indeed. >> >> @Quentin, what is the status of that patch? >> >> >> Initially it was meant as a prototype to investigate how we could address >> those issues at O0. I didn’t mean to publish it. >> >> Have you been working on >> it since then? >> >> >> No, I haven’t touched it and I honestly didn’t plan to do that. >> >> Should I investigate the segfault more and send you a >> reproducer? >> >> >> Depends, do you guys want to pursue in that direction? >> >> >> Not necessarily, if you think a better reg alloc scheme will do the >> trick then I see no point in complicating the pass pipeline for now. >> >> My though was that it is probably best to rely on a better reg allocator >> (basic or greedy) scheme for O0. I believe the only concern may be compile >> time but it may just fly (+ I have ideas how to make it better pretty >> easy). >> >> Is this patch the way forward, or do you have other ideas >> for reducing the stack usage? >> >> >> Better reg alloc scheme for O0 :). >> >> >> Ok, let's go with that then :) >> >> >> Sounds good, let me push a patch that would allow to use the greedy >> allocator at O0 and see if that’s sufficient. Then, we will look at the >> compile time that change will imply. >> >> Cheers, >> -Quentin >> >> >> >> >> Thanks, >> Diana >> >> >> Thanks, >> >> Kristof >> >> >> On 16 May 2017, at 14:06, Diana Picus via llvm-dev <llvm-dev at lists.llvm.org> >> wrote: >> >> Turns out it really is a GlobalISel issue - we eat up too much stack >> space because all the constants used in the switches are stored on the >> stack. We need to fix this somehow before changing the default. I'll >> try to give it a run with Quentin's r299283 on top to see if it helps. >> >> Cheers, >> Diana >> >> On 15 May 2017 at 09:38, Diana Picus <diana.picus at linaro.org> wrote: >> >> Got another one: https://bugs.llvm.org/show_bug.cgi?id=33036 >> >> I'm still investigating whether this is an actual GlobalISel issue or >> something else (I'll also start a build on a more recent revision to >> see how that behaves). >> >> On 12 May 2017 at 20:06, Eric Christopher <echristo at gmail.com> wrote: >> >> Agreed. That was probably just luck before :) >> >> -eric >> >> On Fri, May 12, 2017 at 5:22 AM Diana Picus via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> >> >> It turns out that can be fixed by adding -lm to the link line, so I >> will probably convert it into a test-suite bug. >> >> I don't suppose it's crucial to handle the fabs intrinsic nicely at -O0. >> >> On 12 May 2017 at 13:44, Diana Picus <diana.picus at linaro.org> wrote: >> >> Hi, >> >> I ran into a little snag on the test-suite: >> https://bugs.llvm.org/show_bug.cgi?id=33021 >> It boils down to GlobalISel generating calls to fabs instead of using >> FABSDr (so we get undefined references). >> >> Cheers, >> Diana >> >> On 11 May 2017 at 18:40, Quentin Colombet <qcolombet at apple.com> wrote: >> >> Hi Diana, >> >> Thanks for the summary. >> >> On May 11, 2017, at 4:01 AM, Diana Picus <diana.picus at linaro.org> >> wrote: >> >> Hi all, >> >> I'm still running some validation on this, I'll send an email when >> it's done. If that goes well I don't have anything against making the >> switch. >> >> For the record, here's a summary of issues that were deferred for >> later on (some of which are optimization-ish and we might decide to >> never do at -O0 at all): >> * Crash in RegBankSelect for half fp types: >> https://bugs.llvm.org/show_bug.cgi?id=32560 >> >> >> I’ll have a look. >> >> * Improving constant placement: >> http://bugs.llvm.org/show_bug.cgi?id=32561 >> >> >> I’ve commented in the PR to mention the localizer technic I was playing >> with, if someone wants to give it a try. >> >> * Fancy switch lowering >> * Transforming division-by-constant-power-of-2 into right shift >> >> >> AFAICT all the other issues that were brought up were fixed (yay!). >> >> Cheers, >> Diana >> >> >> Cheers, >> -Quentin >> >> >> >> On 11 May 2017 at 08:44, Kristof Beyls via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> >> >> On 10 May 2017, at 17:36, Quentin Colombet <qcolombet at apple.com> wrote: >> >> >> MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%): >> Function >> Reference_IDCT: Probably due to creating all constants in the entry BB >> + >> spilling floating point data through an X register: >> >> FastISel: >> fadd d0, d1, d0 >> str d0, [sp,#528] >> GlobalISel: >> fadd d0, d1, d0 >> fmov x9, d0 >> stur x9, [x29,#-48] >> >> >> Good finding, I forgot to do stores in my previous fix. I’ll do them >> shortly. >> >> >> Should be fixed by r302679 >> >> >> Thanks Quentin, >> >> That reduces the slow-down when enabling globalisel at -O0 from 13% (on >> r302453) to 9.5% (on r302679) in my experiments. >> The code size increase also reduces from just over 3% to 2.8%. >> >> Kristof >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >>
Quentin Colombet via llvm-dev
2017-May-23 19:48 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Great! I thought I had to look at our pipeline at O0 to make sure optimized regalloc was supported (https://bugs.llvm.org/show_bug.cgi?id=33022 <https://bugs.llvm.org/show_bug.cgi?id=33022> in mind). Glad I was wrong, it saves me some time.> On May 22, 2017, at 12:51 AM, Kristof Beyls <kristof.beyls at arm.com> wrote: > > >> On 22 May 2017, at 09:09, Diana Picus <diana.picus at linaro.org> wrote: >> >> Hi Quentin, >> >> I actually did a run with -mllvm -optimize-regalloc -mllvm >> -regalloc=greedy over the weekend and the test does pass with that. >> Haven't measured the compile time though. >> >> Cheers, >> Diana > > I also did my usual benchmarking run with the same options as Diana did above: > - Comparing against -O0 without globalisel: 2.5% performance drop, 0.8% code size improvement.That’s compared to 9.5% performance drop and 2.8% code size regression, without that regalloc scheme, right?> - Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop. > > In summary, the measurements indicate some good improvements. > I also haven't measure the impact on compile time.Do you have a mean to make this measurement? Ahmed did a bunch of compile time measurements on our side and I wanted to see if I need to put him on the hook again :).> > Doing a few quick spot checks on the generated code, I still see some constants being created in the entry BB and stored on the stack.Feel free to file PRs if you think that shouldn’t happen. @Eric, how does it look on your side? Cheers, -Quentin> I haven't tried to investigate if the entry BB seems like a good place to materialize those remaining constants. > > Thanks, > > Kristof-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170523/712751b4/attachment.html>