Diana Picus via llvm-dev
2017-May-16 12:06 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Turns out it really is a GlobalISel issue - we eat up too much stack space because all the constants used in the switches are stored on the stack. We need to fix this somehow before changing the default. I'll try to give it a run with Quentin's r299283 on top to see if it helps. Cheers, Diana On 15 May 2017 at 09:38, Diana Picus <diana.picus at linaro.org> wrote:> Got another one: https://bugs.llvm.org/show_bug.cgi?id=33036 > > I'm still investigating whether this is an actual GlobalISel issue or > something else (I'll also start a build on a more recent revision to > see how that behaves). > > On 12 May 2017 at 20:06, Eric Christopher <echristo at gmail.com> wrote: >> Agreed. That was probably just luck before :) >> >> -eric >> >> On Fri, May 12, 2017 at 5:22 AM Diana Picus via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >>> >>> It turns out that can be fixed by adding -lm to the link line, so I >>> will probably convert it into a test-suite bug. >>> >>> I don't suppose it's crucial to handle the fabs intrinsic nicely at -O0. >>> >>> On 12 May 2017 at 13:44, Diana Picus <diana.picus at linaro.org> wrote: >>> > Hi, >>> > >>> > I ran into a little snag on the test-suite: >>> > https://bugs.llvm.org/show_bug.cgi?id=33021 >>> > It boils down to GlobalISel generating calls to fabs instead of using >>> > FABSDr (so we get undefined references). >>> > >>> > Cheers, >>> > Diana >>> > >>> > On 11 May 2017 at 18:40, Quentin Colombet <qcolombet at apple.com> wrote: >>> >> Hi Diana, >>> >> >>> >> Thanks for the summary. >>> >> >>> >> On May 11, 2017, at 4:01 AM, Diana Picus <diana.picus at linaro.org> >>> >> wrote: >>> >> >>> >> Hi all, >>> >> >>> >> I'm still running some validation on this, I'll send an email when >>> >> it's done. If that goes well I don't have anything against making the >>> >> switch. >>> >> >>> >> For the record, here's a summary of issues that were deferred for >>> >> later on (some of which are optimization-ish and we might decide to >>> >> never do at -O0 at all): >>> >> * Crash in RegBankSelect for half fp types: >>> >> https://bugs.llvm.org/show_bug.cgi?id=32560 >>> >> >>> >> >>> >> I’ll have a look. >>> >> >>> >> * Improving constant placement: >>> >> http://bugs.llvm.org/show_bug.cgi?id=32561 >>> >> >>> >> >>> >> I’ve commented in the PR to mention the localizer technic I was playing >>> >> with, if someone wants to give it a try. >>> >> >>> >> * Fancy switch lowering >>> >> * Transforming division-by-constant-power-of-2 into right shift >>> >> >>> >> >>> >> AFAICT all the other issues that were brought up were fixed (yay!). >>> >> >>> >> Cheers, >>> >> Diana >>> >> >>> >> >>> >> Cheers, >>> >> -Quentin >>> >> >>> >> >>> >> >>> >> On 11 May 2017 at 08:44, Kristof Beyls via llvm-dev >>> >> <llvm-dev at lists.llvm.org> wrote: >>> >> >>> >> >>> >> On 10 May 2017, at 17:36, Quentin Colombet <qcolombet at apple.com> wrote: >>> >> >>> >> >>> >> MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%): >>> >> Function >>> >> Reference_IDCT: Probably due to creating all constants in the entry BB >>> >> + >>> >> spilling floating point data through an X register: >>> >> >>> >> FastISel: >>> >> fadd d0, d1, d0 >>> >> str d0, [sp,#528] >>> >> GlobalISel: >>> >> fadd d0, d1, d0 >>> >> fmov x9, d0 >>> >> stur x9, [x29,#-48] >>> >> >>> >> >>> >> Good finding, I forgot to do stores in my previous fix. I’ll do them >>> >> shortly. >>> >> >>> >> >>> >> Should be fixed by r302679 >>> >> >>> >> >>> >> Thanks Quentin, >>> >> >>> >> That reduces the slow-down when enabling globalisel at -O0 from 13% (on >>> >> r302453) to 9.5% (on r302679) in my experiments. >>> >> The code size increase also reduces from just over 3% to 2.8%. >>> >> >>> >> Kristof >>> >> >>> >> >>> >> _______________________________________________ >>> >> LLVM Developers mailing list >>> >> llvm-dev at lists.llvm.org >>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> >>> >> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Kristof Beyls via llvm-dev
2017-May-18 07:06 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
FWIW, I ran Quentin's r299283 Localizer patch (see also http://bugs.llvm.org/show_bug.cgi?id=32561) on my benchmark set. This is taking the commit in r299283 + adding the pass to the pipeline right after RegBankSelect. It reduces the slow-down when enabling globalisel at -O0 from 9.5% (on r302453) to 6.3% (on r302679) in my experiments. The code size increase also reduces from just over 2.8% to 1.8%. I haven't measured the impact on compile-time. I think those are nice improvements; but I wouldn't hold up enabling-by-default for those improvements. However, the increase in stack size usage being so huge that a bootstrap fails seems like something that should be addressed before enabling-by-default. I think Diana found that when enabling r299283, the bootstrap failed with llvm-tblgen segfaulting. So there clearly is some work required there. Thanks, Kristof On 16 May 2017, at 14:06, Diana Picus via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Turns out it really is a GlobalISel issue - we eat up too much stack space because all the constants used in the switches are stored on the stack. We need to fix this somehow before changing the default. I'll try to give it a run with Quentin's r299283 on top to see if it helps. Cheers, Diana On 15 May 2017 at 09:38, Diana Picus <diana.picus at linaro.org<mailto:diana.picus at linaro.org>> wrote: Got another one: https://bugs.llvm.org/show_bug.cgi?id=33036 I'm still investigating whether this is an actual GlobalISel issue or something else (I'll also start a build on a more recent revision to see how that behaves). On 12 May 2017 at 20:06, Eric Christopher <echristo at gmail.com<mailto:echristo at gmail.com>> wrote: Agreed. That was probably just luck before :) -eric On Fri, May 12, 2017 at 5:22 AM Diana Picus via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: It turns out that can be fixed by adding -lm to the link line, so I will probably convert it into a test-suite bug. I don't suppose it's crucial to handle the fabs intrinsic nicely at -O0. On 12 May 2017 at 13:44, Diana Picus <diana.picus at linaro.org<mailto:diana.picus at linaro.org>> wrote: Hi, I ran into a little snag on the test-suite: https://bugs.llvm.org/show_bug.cgi?id=33021 It boils down to GlobalISel generating calls to fabs instead of using FABSDr (so we get undefined references). Cheers, Diana On 11 May 2017 at 18:40, Quentin Colombet <qcolombet at apple.com> wrote: Hi Diana, Thanks for the summary. On May 11, 2017, at 4:01 AM, Diana Picus <diana.picus at linaro.org> wrote: Hi all, I'm still running some validation on this, I'll send an email when it's done. If that goes well I don't have anything against making the switch. For the record, here's a summary of issues that were deferred for later on (some of which are optimization-ish and we might decide to never do at -O0 at all): * Crash in RegBankSelect for half fp types: https://bugs.llvm.org/show_bug.cgi?id=32560 I’ll have a look. * Improving constant placement: http://bugs.llvm.org/show_bug.cgi?id=32561 I’ve commented in the PR to mention the localizer technic I was playing with, if someone wants to give it a try. * Fancy switch lowering * Transforming division-by-constant-power-of-2 into right shift AFAICT all the other issues that were brought up were fixed (yay!). Cheers, Diana Cheers, -Quentin On 11 May 2017 at 08:44, Kristof Beyls via llvm-dev <llvm-dev at lists.llvm.org> wrote: On 10 May 2017, at 17:36, Quentin Colombet <qcolombet at apple.com> wrote: MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%): Function Reference_IDCT: Probably due to creating all constants in the entry BB + spilling floating point data through an X register: FastISel: fadd d0, d1, d0 str d0, [sp,#528] GlobalISel: fadd d0, d1, d0 fmov x9, d0 stur x9, [x29,#-48] Good finding, I forgot to do stores in my previous fix. I’ll do them shortly. Should be fixed by r302679 Thanks Quentin, That reduces the slow-down when enabling globalisel at -O0 from 13% (on r302453) to 9.5% (on r302679) in my experiments. The code size increase also reduces from just over 3% to 2.8%. Kristof _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170518/94903f85/attachment.html>
Diana Picus via llvm-dev
2017-May-18 08:15 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
On 18 May 2017 at 09:06, Kristof Beyls <Kristof.Beyls at arm.com> wrote:> I think Diana found that when enabling r299283, the bootstrap failed with > llvm-tblgen segfaulting. > So there clearly is some work required there.Indeed. @Quentin, what is the status of that patch? Have you been working on it since then? Should I investigate the segfault more and send you a reproducer? Is this patch the way forward, or do you have other ideas for reducing the stack usage? Thanks, Diana> > Thanks, > > Kristof > > > On 16 May 2017, at 14:06, Diana Picus via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > > Turns out it really is a GlobalISel issue - we eat up too much stack > space because all the constants used in the switches are stored on the > stack. We need to fix this somehow before changing the default. I'll > try to give it a run with Quentin's r299283 on top to see if it helps. > > Cheers, > Diana > > On 15 May 2017 at 09:38, Diana Picus <diana.picus at linaro.org> wrote: > > Got another one: https://bugs.llvm.org/show_bug.cgi?id=33036 > > I'm still investigating whether this is an actual GlobalISel issue or > something else (I'll also start a build on a more recent revision to > see how that behaves). > > On 12 May 2017 at 20:06, Eric Christopher <echristo at gmail.com> wrote: > > Agreed. That was probably just luck before :) > > -eric > > On Fri, May 12, 2017 at 5:22 AM Diana Picus via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > It turns out that can be fixed by adding -lm to the link line, so I > will probably convert it into a test-suite bug. > > I don't suppose it's crucial to handle the fabs intrinsic nicely at -O0. > > On 12 May 2017 at 13:44, Diana Picus <diana.picus at linaro.org> wrote: > > Hi, > > I ran into a little snag on the test-suite: > https://bugs.llvm.org/show_bug.cgi?id=33021 > It boils down to GlobalISel generating calls to fabs instead of using > FABSDr (so we get undefined references). > > Cheers, > Diana > > On 11 May 2017 at 18:40, Quentin Colombet <qcolombet at apple.com> wrote: > > Hi Diana, > > Thanks for the summary. > > On May 11, 2017, at 4:01 AM, Diana Picus <diana.picus at linaro.org> > wrote: > > Hi all, > > I'm still running some validation on this, I'll send an email when > it's done. If that goes well I don't have anything against making the > switch. > > For the record, here's a summary of issues that were deferred for > later on (some of which are optimization-ish and we might decide to > never do at -O0 at all): > * Crash in RegBankSelect for half fp types: > https://bugs.llvm.org/show_bug.cgi?id=32560 > > > I’ll have a look. > > * Improving constant placement: > http://bugs.llvm.org/show_bug.cgi?id=32561 > > > I’ve commented in the PR to mention the localizer technic I was playing > with, if someone wants to give it a try. > > * Fancy switch lowering > * Transforming division-by-constant-power-of-2 into right shift > > > AFAICT all the other issues that were brought up were fixed (yay!). > > Cheers, > Diana > > > Cheers, > -Quentin > > > > On 11 May 2017 at 08:44, Kristof Beyls via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > On 10 May 2017, at 17:36, Quentin Colombet <qcolombet at apple.com> wrote: > > > MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%): > Function > Reference_IDCT: Probably due to creating all constants in the entry BB > + > spilling floating point data through an X register: > > FastISel: > fadd d0, d1, d0 > str d0, [sp,#528] > GlobalISel: > fadd d0, d1, d0 > fmov x9, d0 > stur x9, [x29,#-48] > > > Good finding, I forgot to do stores in my previous fix. I’ll do them > shortly. > > > Should be fixed by r302679 > > > Thanks Quentin, > > That reduces the slow-down when enabling globalisel at -O0 from 13% (on > r302453) to 9.5% (on r302679) in my experiments. > The code size increase also reduces from just over 3% to 2.8%. > > Kristof > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >