Diana Picus via llvm-dev
2017-May-18 08:15 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
On 18 May 2017 at 09:06, Kristof Beyls <Kristof.Beyls at arm.com> wrote:> I think Diana found that when enabling r299283, the bootstrap failed with > llvm-tblgen segfaulting. > So there clearly is some work required there.Indeed. @Quentin, what is the status of that patch? Have you been working on it since then? Should I investigate the segfault more and send you a reproducer? Is this patch the way forward, or do you have other ideas for reducing the stack usage? Thanks, Diana> > Thanks, > > Kristof > > > On 16 May 2017, at 14:06, Diana Picus via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > > Turns out it really is a GlobalISel issue - we eat up too much stack > space because all the constants used in the switches are stored on the > stack. We need to fix this somehow before changing the default. I'll > try to give it a run with Quentin's r299283 on top to see if it helps. > > Cheers, > Diana > > On 15 May 2017 at 09:38, Diana Picus <diana.picus at linaro.org> wrote: > > Got another one: https://bugs.llvm.org/show_bug.cgi?id=33036 > > I'm still investigating whether this is an actual GlobalISel issue or > something else (I'll also start a build on a more recent revision to > see how that behaves). > > On 12 May 2017 at 20:06, Eric Christopher <echristo at gmail.com> wrote: > > Agreed. That was probably just luck before :) > > -eric > > On Fri, May 12, 2017 at 5:22 AM Diana Picus via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > It turns out that can be fixed by adding -lm to the link line, so I > will probably convert it into a test-suite bug. > > I don't suppose it's crucial to handle the fabs intrinsic nicely at -O0. > > On 12 May 2017 at 13:44, Diana Picus <diana.picus at linaro.org> wrote: > > Hi, > > I ran into a little snag on the test-suite: > https://bugs.llvm.org/show_bug.cgi?id=33021 > It boils down to GlobalISel generating calls to fabs instead of using > FABSDr (so we get undefined references). > > Cheers, > Diana > > On 11 May 2017 at 18:40, Quentin Colombet <qcolombet at apple.com> wrote: > > Hi Diana, > > Thanks for the summary. > > On May 11, 2017, at 4:01 AM, Diana Picus <diana.picus at linaro.org> > wrote: > > Hi all, > > I'm still running some validation on this, I'll send an email when > it's done. If that goes well I don't have anything against making the > switch. > > For the record, here's a summary of issues that were deferred for > later on (some of which are optimization-ish and we might decide to > never do at -O0 at all): > * Crash in RegBankSelect for half fp types: > https://bugs.llvm.org/show_bug.cgi?id=32560 > > > I’ll have a look. > > * Improving constant placement: > http://bugs.llvm.org/show_bug.cgi?id=32561 > > > I’ve commented in the PR to mention the localizer technic I was playing > with, if someone wants to give it a try. > > * Fancy switch lowering > * Transforming division-by-constant-power-of-2 into right shift > > > AFAICT all the other issues that were brought up were fixed (yay!). > > Cheers, > Diana > > > Cheers, > -Quentin > > > > On 11 May 2017 at 08:44, Kristof Beyls via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > On 10 May 2017, at 17:36, Quentin Colombet <qcolombet at apple.com> wrote: > > > MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%): > Function > Reference_IDCT: Probably due to creating all constants in the entry BB > + > spilling floating point data through an X register: > > FastISel: > fadd d0, d1, d0 > str d0, [sp,#528] > GlobalISel: > fadd d0, d1, d0 > fmov x9, d0 > stur x9, [x29,#-48] > > > Good finding, I forgot to do stores in my previous fix. I’ll do them > shortly. > > > Should be fixed by r302679 > > > Thanks Quentin, > > That reduces the slow-down when enabling globalisel at -O0 from 13% (on > r302453) to 9.5% (on r302679) in my experiments. > The code size increase also reduces from just over 3% to 2.8%. > > Kristof > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >
Quentin Colombet via llvm-dev
2017-May-18 17:09 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Hi Diana,> On May 18, 2017, at 1:15 AM, Diana Picus <diana.picus at linaro.org> wrote: > > On 18 May 2017 at 09:06, Kristof Beyls <Kristof.Beyls at arm.com> wrote: >> I think Diana found that when enabling r299283, the bootstrap failed with >> llvm-tblgen segfaulting. >> So there clearly is some work required there. > > Indeed. > > @Quentin, what is the status of that patch?Initially it was meant as a prototype to investigate how we could address those issues at O0. I didn’t mean to publish it.> Have you been working on > it since then?No, I haven’t touched it and I honestly didn’t plan to do that.> Should I investigate the segfault more and send you a > reproducer?Depends, do you guys want to pursue in that direction? My though was that it is probably best to rely on a better reg allocator (basic or greedy) scheme for O0. I believe the only concern may be compile time but it may just fly (+ I have ideas how to make it better pretty easy).> Is this patch the way forward, or do you have other ideas > for reducing the stack usage?Better reg alloc scheme for O0 :).> > Thanks, > Diana > >> >> Thanks, >> >> Kristof >> >> >> On 16 May 2017, at 14:06, Diana Picus via llvm-dev <llvm-dev at lists.llvm.org> >> wrote: >> >> Turns out it really is a GlobalISel issue - we eat up too much stack >> space because all the constants used in the switches are stored on the >> stack. We need to fix this somehow before changing the default. I'll >> try to give it a run with Quentin's r299283 on top to see if it helps. >> >> Cheers, >> Diana >> >> On 15 May 2017 at 09:38, Diana Picus <diana.picus at linaro.org> wrote: >> >> Got another one: https://bugs.llvm.org/show_bug.cgi?id=33036 >> >> I'm still investigating whether this is an actual GlobalISel issue or >> something else (I'll also start a build on a more recent revision to >> see how that behaves). >> >> On 12 May 2017 at 20:06, Eric Christopher <echristo at gmail.com> wrote: >> >> Agreed. That was probably just luck before :) >> >> -eric >> >> On Fri, May 12, 2017 at 5:22 AM Diana Picus via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> >> >> It turns out that can be fixed by adding -lm to the link line, so I >> will probably convert it into a test-suite bug. >> >> I don't suppose it's crucial to handle the fabs intrinsic nicely at -O0. >> >> On 12 May 2017 at 13:44, Diana Picus <diana.picus at linaro.org> wrote: >> >> Hi, >> >> I ran into a little snag on the test-suite: >> https://bugs.llvm.org/show_bug.cgi?id=33021 >> It boils down to GlobalISel generating calls to fabs instead of using >> FABSDr (so we get undefined references). >> >> Cheers, >> Diana >> >> On 11 May 2017 at 18:40, Quentin Colombet <qcolombet at apple.com> wrote: >> >> Hi Diana, >> >> Thanks for the summary. >> >> On May 11, 2017, at 4:01 AM, Diana Picus <diana.picus at linaro.org> >> wrote: >> >> Hi all, >> >> I'm still running some validation on this, I'll send an email when >> it's done. If that goes well I don't have anything against making the >> switch. >> >> For the record, here's a summary of issues that were deferred for >> later on (some of which are optimization-ish and we might decide to >> never do at -O0 at all): >> * Crash in RegBankSelect for half fp types: >> https://bugs.llvm.org/show_bug.cgi?id=32560 >> >> >> I’ll have a look. >> >> * Improving constant placement: >> http://bugs.llvm.org/show_bug.cgi?id=32561 >> >> >> I’ve commented in the PR to mention the localizer technic I was playing >> with, if someone wants to give it a try. >> >> * Fancy switch lowering >> * Transforming division-by-constant-power-of-2 into right shift >> >> >> AFAICT all the other issues that were brought up were fixed (yay!). >> >> Cheers, >> Diana >> >> >> Cheers, >> -Quentin >> >> >> >> On 11 May 2017 at 08:44, Kristof Beyls via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> >> >> On 10 May 2017, at 17:36, Quentin Colombet <qcolombet at apple.com> wrote: >> >> >> MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%): >> Function >> Reference_IDCT: Probably due to creating all constants in the entry BB >> + >> spilling floating point data through an X register: >> >> FastISel: >> fadd d0, d1, d0 >> str d0, [sp,#528] >> GlobalISel: >> fadd d0, d1, d0 >> fmov x9, d0 >> stur x9, [x29,#-48] >> >> >> Good finding, I forgot to do stores in my previous fix. I’ll do them >> shortly. >> >> >> Should be fixed by r302679 >> >> >> Thanks Quentin, >> >> That reduces the slow-down when enabling globalisel at -O0 from 13% (on >> r302453) to 9.5% (on r302679) in my experiments. >> The code size increase also reduces from just over 3% to 2.8%. >> >> Kristof >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >>
Diana Picus via llvm-dev
2017-May-19 08:33 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
On 18 May 2017 at 19:09, Quentin Colombet <qcolombet at apple.com> wrote:> Hi Diana, > >> On May 18, 2017, at 1:15 AM, Diana Picus <diana.picus at linaro.org> wrote: >> >> On 18 May 2017 at 09:06, Kristof Beyls <Kristof.Beyls at arm.com> wrote: >>> I think Diana found that when enabling r299283, the bootstrap failed with >>> llvm-tblgen segfaulting. >>> So there clearly is some work required there. >> >> Indeed. >> >> @Quentin, what is the status of that patch? > > Initially it was meant as a prototype to investigate how we could address those issues at O0. I didn’t mean to publish it. > >> Have you been working on >> it since then? > > No, I haven’t touched it and I honestly didn’t plan to do that. > >> Should I investigate the segfault more and send you a >> reproducer? > > Depends, do you guys want to pursue in that direction?Not necessarily, if you think a better reg alloc scheme will do the trick then I see no point in complicating the pass pipeline for now.> My though was that it is probably best to rely on a better reg allocator (basic or greedy) scheme for O0. I believe the only concern may be compile time but it may just fly (+ I have ideas how to make it better pretty easy). > >> Is this patch the way forward, or do you have other ideas >> for reducing the stack usage? > > Better reg alloc scheme for O0 :).Ok, let's go with that then :)> >> >> Thanks, >> Diana >> >>> >>> Thanks, >>> >>> Kristof >>> >>> >>> On 16 May 2017, at 14:06, Diana Picus via llvm-dev <llvm-dev at lists.llvm.org> >>> wrote: >>> >>> Turns out it really is a GlobalISel issue - we eat up too much stack >>> space because all the constants used in the switches are stored on the >>> stack. We need to fix this somehow before changing the default. I'll >>> try to give it a run with Quentin's r299283 on top to see if it helps. >>> >>> Cheers, >>> Diana >>> >>> On 15 May 2017 at 09:38, Diana Picus <diana.picus at linaro.org> wrote: >>> >>> Got another one: https://bugs.llvm.org/show_bug.cgi?id=33036 >>> >>> I'm still investigating whether this is an actual GlobalISel issue or >>> something else (I'll also start a build on a more recent revision to >>> see how that behaves). >>> >>> On 12 May 2017 at 20:06, Eric Christopher <echristo at gmail.com> wrote: >>> >>> Agreed. That was probably just luck before :) >>> >>> -eric >>> >>> On Fri, May 12, 2017 at 5:22 AM Diana Picus via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>> >>> >>> It turns out that can be fixed by adding -lm to the link line, so I >>> will probably convert it into a test-suite bug. >>> >>> I don't suppose it's crucial to handle the fabs intrinsic nicely at -O0. >>> >>> On 12 May 2017 at 13:44, Diana Picus <diana.picus at linaro.org> wrote: >>> >>> Hi, >>> >>> I ran into a little snag on the test-suite: >>> https://bugs.llvm.org/show_bug.cgi?id=33021 >>> It boils down to GlobalISel generating calls to fabs instead of using >>> FABSDr (so we get undefined references). >>> >>> Cheers, >>> Diana >>> >>> On 11 May 2017 at 18:40, Quentin Colombet <qcolombet at apple.com> wrote: >>> >>> Hi Diana, >>> >>> Thanks for the summary. >>> >>> On May 11, 2017, at 4:01 AM, Diana Picus <diana.picus at linaro.org> >>> wrote: >>> >>> Hi all, >>> >>> I'm still running some validation on this, I'll send an email when >>> it's done. If that goes well I don't have anything against making the >>> switch. >>> >>> For the record, here's a summary of issues that were deferred for >>> later on (some of which are optimization-ish and we might decide to >>> never do at -O0 at all): >>> * Crash in RegBankSelect for half fp types: >>> https://bugs.llvm.org/show_bug.cgi?id=32560 >>> >>> >>> I’ll have a look. >>> >>> * Improving constant placement: >>> http://bugs.llvm.org/show_bug.cgi?id=32561 >>> >>> >>> I’ve commented in the PR to mention the localizer technic I was playing >>> with, if someone wants to give it a try. >>> >>> * Fancy switch lowering >>> * Transforming division-by-constant-power-of-2 into right shift >>> >>> >>> AFAICT all the other issues that were brought up were fixed (yay!). >>> >>> Cheers, >>> Diana >>> >>> >>> Cheers, >>> -Quentin >>> >>> >>> >>> On 11 May 2017 at 08:44, Kristof Beyls via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>> >>> >>> On 10 May 2017, at 17:36, Quentin Colombet <qcolombet at apple.com> wrote: >>> >>> >>> MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%): >>> Function >>> Reference_IDCT: Probably due to creating all constants in the entry BB >>> + >>> spilling floating point data through an X register: >>> >>> FastISel: >>> fadd d0, d1, d0 >>> str d0, [sp,#528] >>> GlobalISel: >>> fadd d0, d1, d0 >>> fmov x9, d0 >>> stur x9, [x29,#-48] >>> >>> >>> Good finding, I forgot to do stores in my previous fix. I’ll do them >>> shortly. >>> >>> >>> Should be fixed by r302679 >>> >>> >>> Thanks Quentin, >>> >>> That reduces the slow-down when enabling globalisel at -O0 from 13% (on >>> r302453) to 9.5% (on r302679) in my experiments. >>> The code size increase also reduces from just over 3% to 2.8%. >>> >>> Kristof >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >