Quentin Colombet via llvm-dev
2017-May-30 14:42 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Thanks Diana. That is indeed the assumption in the code and this is obviously wrong. Could you try the attached patch? (I haven’t even tried to compile it though) Cheers, -Quentin -------------- next part -------------- A non-text attachment was scrubbed... Name: localizer_tentative_fix.diff Type: application/octet-stream Size: 774 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170530/455d16da/attachment.obj> -------------- next part --------------> On May 30, 2017, at 6:56 AM, Diana Picus <diana.picus at linaro.org> wrote: > > Hi Quentin, > > I've attached a reproducer for the problem. > > I've described what I think the problem is in the file, but the short > version is that the localizer shouldn't assume that the iteration > order for the uses corresponds to the logical order of instructions in > a basic block (we're now localizing before the first use that we find, > but that may be later in the basic block, so we'd end up with uses > before the def). > > I'm not sure it's possible to test this without running a couple of > passes. You might be able to trigger it only with reg bank select + > localize, but I haven't tried. Using only the localizer would mean > that the iteration order for the uses would be the order in which > they're read in, so you wouldn't have this problem. > > Hope that helps, > Diana > > > On 29 May 2017 at 10:06, Diana Picus <diana.picus at linaro.org> wrote: >> Thanks Quentin, it's in progress now, I'll let you know how it goes. >> >> Cheers, >> Diana >> >> On 27 May 2017 at 03:36, Quentin Colombet <qcolombet at apple.com> wrote: >>> Hi Kristof, >>> >>> I’ve pushed the localizer in r304051 and added it in the AArch64 O0 pipeline >>> in r304052. >>> >>> I let Diana investigate the seg fault she was seeing. >>> >>> @Diana, let me know if you need help. >>> >>> Cheers, >>> -Quentin >>> >>> On May 25, 2017, at 1:53 PM, Quentin Colombet via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>> >>> Hi Kristof, >>> >>> On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at arm.com> wrote: >>> >>> >>> On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at apple.com> wrote: >>> >>> Hi Kristof, >>> >>> Thanks for going back so fast! >>> >>> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at arm.com> wrote: >>> >>> >>> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at apple.com> wrote: >>> >>> Hi Kristof, >>> >>> Thanks for the measurements. >>> >>> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at arm.com> wrote: >>> >>> >>> - Comparing against -O0 without globalisel but with the above regalloc >>> options: 5.6% performance drop, 1% code size drop. >>> >>> In summary, the measurements indicate some good improvements. >>> I also haven't measure the impact on compile time. >>> >>> >>> Do you have a mean to make this measurement? >>> Ahmed did a bunch of compile time measurements on our side and I wanted to >>> see if I need to put him on the hook again :). >>> >>> >>> I did a quick setup with CTMark (part of the test-suite). I ran each of >>> * '-O0 -g', >>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and >>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm >>> -optimize-regalloc -mllvm -regalloc=greedy' >>> 5 times, cross-compiling from X86 to AArch64, and took the median measured >>> compile times. >>> In summary, I see GlobalISel having a compile time that's 3.5% higher than >>> the current -O0 default. >>> With enabling the greedy register allocator, this increases to 28%. >>> 28% is probably too high? >>> >>> >>> I think it is yes. >>> I have attached a quick hack to the greedy allocator to feature a fast mode. >>> Could you give it a try? >>> >>> To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true >>> (default is false). >>> >>> >>> I'm afraid it doesn't seem to save much compile time. On geomean, I see >>> about 26% compile time increase against the current -O0 default (compared to >>> 28% increase for regalloc greedy without your patch). >>> >>> >>> Interesting, I guess a lot of time is spent in the coalescer. Could you give >>> a try with -join-liveintervals=false? >>> >>> >>> With adding -join-liveintervals=false, I see the compile time increase going >>> up to 28% again. >>> >>> >>> Heh, I am mildly surprised we hand much more live-ranges to the allocator >>> when we do that. >>> >>> >>> >>> Do you know where the time is spent (-time-passes)? >>> >>> >>> I'm afraid I won't have time to have a closer look in the next couple of >>> days - I don't know where the time is spent at the moment. >>> >>> >>> Fair enough, will investigate later. >>> >>> >>> >>> Anyhow, fixing all of those, although this is I think the right approach, >>> will take time, so we can go with the localizer. >>> >>> >>> Right, I don't understand the register allocator well enough to know if that >>> compile time overhead can be fixed, while still getting the necessary >>> codegen benefits the greedy allocator gives. >>> Is there any specific help you're looking for with getting the localizer >>> work well enough for production use? >>> >>> >>> I’ll clean-up the WIP patch for the localizer, then you guys can fix the bug >>> that you found. >>> >>> I’ll do that tomorrow. >>> >>> Cheers, >>> -Quentin >>> >>> >>> Thanks, >>> >>> Kristof >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> > <localizer-mo-order.mir>
Quentin Colombet via llvm-dev
2017-May-30 20:57 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Hi Diana, I’ve actually gone ahead and pushed the fix as I was able to produce a small reproducer. This is r304244 Let me know if you encounter any other problem. Cheers, -Quentin> On May 30, 2017, at 7:42 AM, Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Thanks Diana. > > That is indeed the assumption in the code and this is obviously wrong. > > Could you try the attached patch? > > (I haven’t even tried to compile it though) > > Cheers, > -Quentin > <localizer_tentative_fix.diff> >> On May 30, 2017, at 6:56 AM, Diana Picus <diana.picus at linaro.org> wrote: >> >> Hi Quentin, >> >> I've attached a reproducer for the problem. >> >> I've described what I think the problem is in the file, but the short >> version is that the localizer shouldn't assume that the iteration >> order for the uses corresponds to the logical order of instructions in >> a basic block (we're now localizing before the first use that we find, >> but that may be later in the basic block, so we'd end up with uses >> before the def). >> >> I'm not sure it's possible to test this without running a couple of >> passes. You might be able to trigger it only with reg bank select + >> localize, but I haven't tried. Using only the localizer would mean >> that the iteration order for the uses would be the order in which >> they're read in, so you wouldn't have this problem. >> >> Hope that helps, >> Diana >> >> >> On 29 May 2017 at 10:06, Diana Picus <diana.picus at linaro.org> wrote: >>> Thanks Quentin, it's in progress now, I'll let you know how it goes. >>> >>> Cheers, >>> Diana >>> >>> On 27 May 2017 at 03:36, Quentin Colombet <qcolombet at apple.com> wrote: >>>> Hi Kristof, >>>> >>>> I’ve pushed the localizer in r304051 and added it in the AArch64 O0 pipeline >>>> in r304052. >>>> >>>> I let Diana investigate the seg fault she was seeing. >>>> >>>> @Diana, let me know if you need help. >>>> >>>> Cheers, >>>> -Quentin >>>> >>>> On May 25, 2017, at 1:53 PM, Quentin Colombet via llvm-dev >>>> <llvm-dev at lists.llvm.org> wrote: >>>> >>>> Hi Kristof, >>>> >>>> On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at arm.com> wrote: >>>> >>>> >>>> On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at apple.com> wrote: >>>> >>>> Hi Kristof, >>>> >>>> Thanks for going back so fast! >>>> >>>> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at arm.com> wrote: >>>> >>>> >>>> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at apple.com> wrote: >>>> >>>> Hi Kristof, >>>> >>>> Thanks for the measurements. >>>> >>>> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at arm.com> wrote: >>>> >>>> >>>> - Comparing against -O0 without globalisel but with the above regalloc >>>> options: 5.6% performance drop, 1% code size drop. >>>> >>>> In summary, the measurements indicate some good improvements. >>>> I also haven't measure the impact on compile time. >>>> >>>> >>>> Do you have a mean to make this measurement? >>>> Ahmed did a bunch of compile time measurements on our side and I wanted to >>>> see if I need to put him on the hook again :). >>>> >>>> >>>> I did a quick setup with CTMark (part of the test-suite). I ran each of >>>> * '-O0 -g', >>>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and >>>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm >>>> -optimize-regalloc -mllvm -regalloc=greedy' >>>> 5 times, cross-compiling from X86 to AArch64, and took the median measured >>>> compile times. >>>> In summary, I see GlobalISel having a compile time that's 3.5% higher than >>>> the current -O0 default. >>>> With enabling the greedy register allocator, this increases to 28%. >>>> 28% is probably too high? >>>> >>>> >>>> I think it is yes. >>>> I have attached a quick hack to the greedy allocator to feature a fast mode. >>>> Could you give it a try? >>>> >>>> To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true >>>> (default is false). >>>> >>>> >>>> I'm afraid it doesn't seem to save much compile time. On geomean, I see >>>> about 26% compile time increase against the current -O0 default (compared to >>>> 28% increase for regalloc greedy without your patch). >>>> >>>> >>>> Interesting, I guess a lot of time is spent in the coalescer. Could you give >>>> a try with -join-liveintervals=false? >>>> >>>> >>>> With adding -join-liveintervals=false, I see the compile time increase going >>>> up to 28% again. >>>> >>>> >>>> Heh, I am mildly surprised we hand much more live-ranges to the allocator >>>> when we do that. >>>> >>>> >>>> >>>> Do you know where the time is spent (-time-passes)? >>>> >>>> >>>> I'm afraid I won't have time to have a closer look in the next couple of >>>> days - I don't know where the time is spent at the moment. >>>> >>>> >>>> Fair enough, will investigate later. >>>> >>>> >>>> >>>> Anyhow, fixing all of those, although this is I think the right approach, >>>> will take time, so we can go with the localizer. >>>> >>>> >>>> Right, I don't understand the register allocator well enough to know if that >>>> compile time overhead can be fixed, while still getting the necessary >>>> codegen benefits the greedy allocator gives. >>>> Is there any specific help you're looking for with getting the localizer >>>> work well enough for production use? >>>> >>>> >>>> I’ll clean-up the WIP patch for the localizer, then you guys can fix the bug >>>> that you found. >>>> >>>> I’ll do that tomorrow. >>>> >>>> Cheers, >>>> -Quentin >>>> >>>> >>>> Thanks, >>>> >>>> Kristof >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >> <localizer-mo-order.mir> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170530/1623ba2b/attachment.html>
Diana Picus via llvm-dev
2017-May-31 13:33 UTC
[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
Cool test :) It seems to work fine now, I don't see any new failures. IIUC, Kristof is also giving it another run. Cheers, Diana On 30 May 2017 at 22:57, Quentin Colombet <qcolombet at apple.com> wrote:> Hi Diana, > > I’ve actually gone ahead and pushed the fix as I was able to produce a > small reproducer. > > This is r304244 > > Let me know if you encounter any other problem. > > Cheers, > -Quentin > > On May 30, 2017, at 7:42 AM, Quentin Colombet via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Thanks Diana. > > That is indeed the assumption in the code and this is obviously wrong. > > Could you try the attached patch? > > (I haven’t even tried to compile it though) > > Cheers, > -Quentin > <localizer_tentative_fix.diff> > > On May 30, 2017, at 6:56 AM, Diana Picus <diana.picus at linaro.org> wrote: > > Hi Quentin, > > I've attached a reproducer for the problem. > > I've described what I think the problem is in the file, but the short > version is that the localizer shouldn't assume that the iteration > order for the uses corresponds to the logical order of instructions in > a basic block (we're now localizing before the first use that we find, > but that may be later in the basic block, so we'd end up with uses > before the def). > > I'm not sure it's possible to test this without running a couple of > passes. You might be able to trigger it only with reg bank select + > localize, but I haven't tried. Using only the localizer would mean > that the iteration order for the uses would be the order in which > they're read in, so you wouldn't have this problem. > > Hope that helps, > Diana > > > On 29 May 2017 at 10:06, Diana Picus <diana.picus at linaro.org> wrote: > > Thanks Quentin, it's in progress now, I'll let you know how it goes. > > Cheers, > Diana > > On 27 May 2017 at 03:36, Quentin Colombet <qcolombet at apple.com> wrote: > > Hi Kristof, > > I’ve pushed the localizer in r304051 and added it in the AArch64 O0 > pipeline > in r304052. > > I let Diana investigate the seg fault she was seeing. > > @Diana, let me know if you need help. > > Cheers, > -Quentin > > On May 25, 2017, at 1:53 PM, Quentin Colombet via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Hi Kristof, > > On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at arm.com> wrote: > > > On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at apple.com> wrote: > > Hi Kristof, > > Thanks for going back so fast! > > On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at arm.com> wrote: > > > On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at apple.com> wrote: > > Hi Kristof, > > Thanks for the measurements. > > On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at arm.com> wrote: > > > - Comparing against -O0 without globalisel but with the above regalloc > options: 5.6% performance drop, 1% code size drop. > > In summary, the measurements indicate some good improvements. > I also haven't measure the impact on compile time. > > > Do you have a mean to make this measurement? > Ahmed did a bunch of compile time measurements on our side and I wanted to > see if I need to put him on the hook again :). > > > I did a quick setup with CTMark (part of the test-suite). I ran each of > * '-O0 -g', > * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and > * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm > -optimize-regalloc -mllvm -regalloc=greedy' > 5 times, cross-compiling from X86 to AArch64, and took the median measured > compile times. > In summary, I see GlobalISel having a compile time that's 3.5% higher than > the current -O0 default. > With enabling the greedy register allocator, this increases to 28%. > 28% is probably too high? > > > I think it is yes. > I have attached a quick hack to the greedy allocator to feature a fast > mode. > Could you give it a try? > > To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true > (default is false). > > > I'm afraid it doesn't seem to save much compile time. On geomean, I see > about 26% compile time increase against the current -O0 default (compared > to > 28% increase for regalloc greedy without your patch). > > > Interesting, I guess a lot of time is spent in the coalescer. Could you > give > a try with -join-liveintervals=false? > > > With adding -join-liveintervals=false, I see the compile time increase > going > up to 28% again. > > > Heh, I am mildly surprised we hand much more live-ranges to the allocator > when we do that. > > > > Do you know where the time is spent (-time-passes)? > > > I'm afraid I won't have time to have a closer look in the next couple of > days - I don't know where the time is spent at the moment. > > > Fair enough, will investigate later. > > > > Anyhow, fixing all of those, although this is I think the right approach, > will take time, so we can go with the localizer. > > > Right, I don't understand the register allocator well enough to know if > that > compile time overhead can be fixed, while still getting the necessary > codegen benefits the greedy allocator gives. > Is there any specific help you're looking for with getting the localizer > work well enough for production use? > > > I’ll clean-up the WIP patch for the localizer, then you guys can fix the > bug > that you found. > > I’ll do that tomorrow. > > Cheers, > -Quentin > > > Thanks, > > Kristof > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > <localizer-mo-order.mir> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170531/4046eb2d/attachment-0001.html>