Diana Picus via llvm-dev
2017-May-19 08:33 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
On 18 May 2017 at 19:09, Quentin Colombet <qcolombet at apple.com> wrote:> Hi Diana, > >> On May 18, 2017, at 1:15 AM, Diana Picus <diana.picus at linaro.org> wrote: >> >> On 18 May 2017 at 09:06, Kristof Beyls <Kristof.Beyls at arm.com> wrote: >>> I think Diana found that when enabling r299283, the bootstrap failed with >>> llvm-tblgen segfaulting. >>> So there clearly is some work required there. >> >> Indeed. >> >> @Quentin, what is the status of that patch? > > Initially it was meant as a prototype to investigate how we could address those issues at O0. I didn’t mean to publish it. > >> Have you been working on >> it since then? > > No, I haven’t touched it and I honestly didn’t plan to do that. > >> Should I investigate the segfault more and send you a >> reproducer? > > Depends, do you guys want to pursue in that direction?Not necessarily, if you think a better reg alloc scheme will do the trick then I see no point in complicating the pass pipeline for now.> My though was that it is probably best to rely on a better reg allocator (basic or greedy) scheme for O0. I believe the only concern may be compile time but it may just fly (+ I have ideas how to make it better pretty easy). > >> Is this patch the way forward, or do you have other ideas >> for reducing the stack usage? > > Better reg alloc scheme for O0 :).Ok, let's go with that then :)> >> >> Thanks, >> Diana >> >>> >>> Thanks, >>> >>> Kristof >>> >>> >>> On 16 May 2017, at 14:06, Diana Picus via llvm-dev <llvm-dev at lists.llvm.org> >>> wrote: >>> >>> Turns out it really is a GlobalISel issue - we eat up too much stack >>> space because all the constants used in the switches are stored on the >>> stack. We need to fix this somehow before changing the default. I'll >>> try to give it a run with Quentin's r299283 on top to see if it helps. >>> >>> Cheers, >>> Diana >>> >>> On 15 May 2017 at 09:38, Diana Picus <diana.picus at linaro.org> wrote: >>> >>> Got another one: https://bugs.llvm.org/show_bug.cgi?id=33036 >>> >>> I'm still investigating whether this is an actual GlobalISel issue or >>> something else (I'll also start a build on a more recent revision to >>> see how that behaves). >>> >>> On 12 May 2017 at 20:06, Eric Christopher <echristo at gmail.com> wrote: >>> >>> Agreed. That was probably just luck before :) >>> >>> -eric >>> >>> On Fri, May 12, 2017 at 5:22 AM Diana Picus via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>> >>> >>> It turns out that can be fixed by adding -lm to the link line, so I >>> will probably convert it into a test-suite bug. >>> >>> I don't suppose it's crucial to handle the fabs intrinsic nicely at -O0. >>> >>> On 12 May 2017 at 13:44, Diana Picus <diana.picus at linaro.org> wrote: >>> >>> Hi, >>> >>> I ran into a little snag on the test-suite: >>> https://bugs.llvm.org/show_bug.cgi?id=33021 >>> It boils down to GlobalISel generating calls to fabs instead of using >>> FABSDr (so we get undefined references). >>> >>> Cheers, >>> Diana >>> >>> On 11 May 2017 at 18:40, Quentin Colombet <qcolombet at apple.com> wrote: >>> >>> Hi Diana, >>> >>> Thanks for the summary. >>> >>> On May 11, 2017, at 4:01 AM, Diana Picus <diana.picus at linaro.org> >>> wrote: >>> >>> Hi all, >>> >>> I'm still running some validation on this, I'll send an email when >>> it's done. If that goes well I don't have anything against making the >>> switch. >>> >>> For the record, here's a summary of issues that were deferred for >>> later on (some of which are optimization-ish and we might decide to >>> never do at -O0 at all): >>> * Crash in RegBankSelect for half fp types: >>> https://bugs.llvm.org/show_bug.cgi?id=32560 >>> >>> >>> I’ll have a look. >>> >>> * Improving constant placement: >>> http://bugs.llvm.org/show_bug.cgi?id=32561 >>> >>> >>> I’ve commented in the PR to mention the localizer technic I was playing >>> with, if someone wants to give it a try. >>> >>> * Fancy switch lowering >>> * Transforming division-by-constant-power-of-2 into right shift >>> >>> >>> AFAICT all the other issues that were brought up were fixed (yay!). >>> >>> Cheers, >>> Diana >>> >>> >>> Cheers, >>> -Quentin >>> >>> >>> >>> On 11 May 2017 at 08:44, Kristof Beyls via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>> >>> >>> On 10 May 2017, at 17:36, Quentin Colombet <qcolombet at apple.com> wrote: >>> >>> >>> MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%): >>> Function >>> Reference_IDCT: Probably due to creating all constants in the entry BB >>> + >>> spilling floating point data through an X register: >>> >>> FastISel: >>> fadd d0, d1, d0 >>> str d0, [sp,#528] >>> GlobalISel: >>> fadd d0, d1, d0 >>> fmov x9, d0 >>> stur x9, [x29,#-48] >>> >>> >>> Good finding, I forgot to do stores in my previous fix. I’ll do them >>> shortly. >>> >>> >>> Should be fixed by r302679 >>> >>> >>> Thanks Quentin, >>> >>> That reduces the slow-down when enabling globalisel at -O0 from 13% (on >>> r302453) to 9.5% (on r302679) in my experiments. >>> The code size increase also reduces from just over 3% to 2.8%. >>> >>> Kristof >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >
Quentin Colombet via llvm-dev
2017-May-19 17:06 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Hi Diana,> On May 19, 2017, at 1:33 AM, Diana Picus <diana.picus at linaro.org> wrote: > > On 18 May 2017 at 19:09, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote: >> Hi Diana, >> >>> On May 18, 2017, at 1:15 AM, Diana Picus <diana.picus at linaro.org> wrote: >>> >>> On 18 May 2017 at 09:06, Kristof Beyls <Kristof.Beyls at arm.com> wrote: >>>> I think Diana found that when enabling r299283, the bootstrap failed with >>>> llvm-tblgen segfaulting. >>>> So there clearly is some work required there. >>> >>> Indeed. >>> >>> @Quentin, what is the status of that patch? >> >> Initially it was meant as a prototype to investigate how we could address those issues at O0. I didn’t mean to publish it. >> >>> Have you been working on >>> it since then? >> >> No, I haven’t touched it and I honestly didn’t plan to do that. >> >>> Should I investigate the segfault more and send you a >>> reproducer? >> >> Depends, do you guys want to pursue in that direction? > > Not necessarily, if you think a better reg alloc scheme will do the > trick then I see no point in complicating the pass pipeline for now. > >> My though was that it is probably best to rely on a better reg allocator (basic or greedy) scheme for O0. I believe the only concern may be compile time but it may just fly (+ I have ideas how to make it better pretty easy). >> >>> Is this patch the way forward, or do you have other ideas >>> for reducing the stack usage? >> >> Better reg alloc scheme for O0 :). > > Ok, let's go with that then :)Sounds good, let me push a patch that would allow to use the greedy allocator at O0 and see if that’s sufficient. Then, we will look at the compile time that change will imply. Cheers, -Quentin> >> >>> >>> Thanks, >>> Diana >>> >>>> >>>> Thanks, >>>> >>>> Kristof >>>> >>>> >>>> On 16 May 2017, at 14:06, Diana Picus via llvm-dev <llvm-dev at lists.llvm.org> >>>> wrote: >>>> >>>> Turns out it really is a GlobalISel issue - we eat up too much stack >>>> space because all the constants used in the switches are stored on the >>>> stack. We need to fix this somehow before changing the default. I'll >>>> try to give it a run with Quentin's r299283 on top to see if it helps. >>>> >>>> Cheers, >>>> Diana >>>> >>>> On 15 May 2017 at 09:38, Diana Picus <diana.picus at linaro.org> wrote: >>>> >>>> Got another one: https://bugs.llvm.org/show_bug.cgi?id=33036 >>>> >>>> I'm still investigating whether this is an actual GlobalISel issue or >>>> something else (I'll also start a build on a more recent revision to >>>> see how that behaves). >>>> >>>> On 12 May 2017 at 20:06, Eric Christopher <echristo at gmail.com> wrote: >>>> >>>> Agreed. That was probably just luck before :) >>>> >>>> -eric >>>> >>>> On Fri, May 12, 2017 at 5:22 AM Diana Picus via llvm-dev >>>> <llvm-dev at lists.llvm.org> wrote: >>>> >>>> >>>> It turns out that can be fixed by adding -lm to the link line, so I >>>> will probably convert it into a test-suite bug. >>>> >>>> I don't suppose it's crucial to handle the fabs intrinsic nicely at -O0. >>>> >>>> On 12 May 2017 at 13:44, Diana Picus <diana.picus at linaro.org> wrote: >>>> >>>> Hi, >>>> >>>> I ran into a little snag on the test-suite: >>>> https://bugs.llvm.org/show_bug.cgi?id=33021 >>>> It boils down to GlobalISel generating calls to fabs instead of using >>>> FABSDr (so we get undefined references). >>>> >>>> Cheers, >>>> Diana >>>> >>>> On 11 May 2017 at 18:40, Quentin Colombet <qcolombet at apple.com> wrote: >>>> >>>> Hi Diana, >>>> >>>> Thanks for the summary. >>>> >>>> On May 11, 2017, at 4:01 AM, Diana Picus <diana.picus at linaro.org> >>>> wrote: >>>> >>>> Hi all, >>>> >>>> I'm still running some validation on this, I'll send an email when >>>> it's done. If that goes well I don't have anything against making the >>>> switch. >>>> >>>> For the record, here's a summary of issues that were deferred for >>>> later on (some of which are optimization-ish and we might decide to >>>> never do at -O0 at all): >>>> * Crash in RegBankSelect for half fp types: >>>> https://bugs.llvm.org/show_bug.cgi?id=32560 >>>> >>>> >>>> I’ll have a look. >>>> >>>> * Improving constant placement: >>>> http://bugs.llvm.org/show_bug.cgi?id=32561 >>>> >>>> >>>> I’ve commented in the PR to mention the localizer technic I was playing >>>> with, if someone wants to give it a try. >>>> >>>> * Fancy switch lowering >>>> * Transforming division-by-constant-power-of-2 into right shift >>>> >>>> >>>> AFAICT all the other issues that were brought up were fixed (yay!). >>>> >>>> Cheers, >>>> Diana >>>> >>>> >>>> Cheers, >>>> -Quentin >>>> >>>> >>>> >>>> On 11 May 2017 at 08:44, Kristof Beyls via llvm-dev >>>> <llvm-dev at lists.llvm.org> wrote: >>>> >>>> >>>> On 10 May 2017, at 17:36, Quentin Colombet <qcolombet at apple.com> wrote: >>>> >>>> >>>> MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%): >>>> Function >>>> Reference_IDCT: Probably due to creating all constants in the entry BB >>>> + >>>> spilling floating point data through an X register: >>>> >>>> FastISel: >>>> fadd d0, d1, d0 >>>> str d0, [sp,#528] >>>> GlobalISel: >>>> fadd d0, d1, d0 >>>> fmov x9, d0 >>>> stur x9, [x29,#-48] >>>> >>>> >>>> Good finding, I forgot to do stores in my previous fix. I’ll do them >>>> shortly. >>>> >>>> >>>> Should be fixed by r302679 >>>> >>>> >>>> Thanks Quentin, >>>> >>>> That reduces the slow-down when enabling globalisel at -O0 from 13% (on >>>> r302453) to 9.5% (on r302679) in my experiments. >>>> The code size increase also reduces from just over 3% to 2.8%. >>>> >>>> Kristof >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170519/0edbfa3f/attachment-0001.html>
Diana Picus via llvm-dev
2017-May-22 07:09 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Hi Quentin, I actually did a run with -mllvm -optimize-regalloc -mllvm -regalloc=greedy over the weekend and the test does pass with that. Haven't measured the compile time though. Cheers, Diana On 19 May 2017 at 19:06, Quentin Colombet <qcolombet at apple.com> wrote:> Hi Diana, > > On May 19, 2017, at 1:33 AM, Diana Picus <diana.picus at linaro.org> wrote: > > On 18 May 2017 at 19:09, Quentin Colombet <qcolombet at apple.com> wrote: > > Hi Diana, > > On May 18, 2017, at 1:15 AM, Diana Picus <diana.picus at linaro.org> wrote: > > On 18 May 2017 at 09:06, Kristof Beyls <Kristof.Beyls at arm.com> wrote: > > I think Diana found that when enabling r299283, the bootstrap failed with > llvm-tblgen segfaulting. > So there clearly is some work required there. > > > Indeed. > > @Quentin, what is the status of that patch? > > > Initially it was meant as a prototype to investigate how we could address > those issues at O0. I didn’t mean to publish it. > > Have you been working on > it since then? > > > No, I haven’t touched it and I honestly didn’t plan to do that. > > Should I investigate the segfault more and send you a > reproducer? > > > Depends, do you guys want to pursue in that direction? > > > Not necessarily, if you think a better reg alloc scheme will do the > trick then I see no point in complicating the pass pipeline for now. > > My though was that it is probably best to rely on a better reg allocator > (basic or greedy) scheme for O0. I believe the only concern may be compile > time but it may just fly (+ I have ideas how to make it better pretty > easy). > > Is this patch the way forward, or do you have other ideas > for reducing the stack usage? > > > Better reg alloc scheme for O0 :). > > > Ok, let's go with that then :) > > > Sounds good, let me push a patch that would allow to use the greedy > allocator at O0 and see if that’s sufficient. Then, we will look at the > compile time that change will imply. > > Cheers, > -Quentin > > > > > Thanks, > Diana > > > Thanks, > > Kristof > > > On 16 May 2017, at 14:06, Diana Picus via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > > Turns out it really is a GlobalISel issue - we eat up too much stack > space because all the constants used in the switches are stored on the > stack. We need to fix this somehow before changing the default. I'll > try to give it a run with Quentin's r299283 on top to see if it helps. > > Cheers, > Diana > > On 15 May 2017 at 09:38, Diana Picus <diana.picus at linaro.org> wrote: > > Got another one: https://bugs.llvm.org/show_bug.cgi?id=33036 > > I'm still investigating whether this is an actual GlobalISel issue or > something else (I'll also start a build on a more recent revision to > see how that behaves). > > On 12 May 2017 at 20:06, Eric Christopher <echristo at gmail.com> wrote: > > Agreed. That was probably just luck before :) > > -eric > > On Fri, May 12, 2017 at 5:22 AM Diana Picus via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > It turns out that can be fixed by adding -lm to the link line, so I > will probably convert it into a test-suite bug. > > I don't suppose it's crucial to handle the fabs intrinsic nicely at -O0. > > On 12 May 2017 at 13:44, Diana Picus <diana.picus at linaro.org> wrote: > > Hi, > > I ran into a little snag on the test-suite: > https://bugs.llvm.org/show_bug.cgi?id=33021 > It boils down to GlobalISel generating calls to fabs instead of using > FABSDr (so we get undefined references). > > Cheers, > Diana > > On 11 May 2017 at 18:40, Quentin Colombet <qcolombet at apple.com> wrote: > > Hi Diana, > > Thanks for the summary. > > On May 11, 2017, at 4:01 AM, Diana Picus <diana.picus at linaro.org> > wrote: > > Hi all, > > I'm still running some validation on this, I'll send an email when > it's done. If that goes well I don't have anything against making the > switch. > > For the record, here's a summary of issues that were deferred for > later on (some of which are optimization-ish and we might decide to > never do at -O0 at all): > * Crash in RegBankSelect for half fp types: > https://bugs.llvm.org/show_bug.cgi?id=32560 > > > I’ll have a look. > > * Improving constant placement: > http://bugs.llvm.org/show_bug.cgi?id=32561 > > > I’ve commented in the PR to mention the localizer technic I was playing > with, if someone wants to give it a try. > > * Fancy switch lowering > * Transforming division-by-constant-power-of-2 into right shift > > > AFAICT all the other issues that were brought up were fixed (yay!). > > Cheers, > Diana > > > Cheers, > -Quentin > > > > On 11 May 2017 at 08:44, Kristof Beyls via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > On 10 May 2017, at 17:36, Quentin Colombet <qcolombet at apple.com> wrote: > > > MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%): > Function > Reference_IDCT: Probably due to creating all constants in the entry BB > + > spilling floating point data through an X register: > > FastISel: > fadd d0, d1, d0 > str d0, [sp,#528] > GlobalISel: > fadd d0, d1, d0 > fmov x9, d0 > stur x9, [x29,#-48] > > > Good finding, I forgot to do stores in my previous fix. I’ll do them > shortly. > > > Should be fixed by r302679 > > > Thanks Quentin, > > That reduces the slow-down when enabling globalisel at -O0 from 13% (on > r302453) to 9.5% (on r302679) in my experiments. > The code size increase also reduces from just over 3% to 2.8%. > > Kristof > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >