Daniel Sanders via llvm-dev
2017-May-25  15:39 UTC
[llvm-dev] Buildbots timing out on full builds
Thanks for trying that patch. I agree that 34 mins still isn't good enough but we're heading in the right direction. Changing the partitioning predicate to the instruction opcode rather than the number of operands in the top-level instruction will hopefully cut it down further. I also have a patch that shaves a small amount off of the compile-time by replacing the various LLT::scalar()/LLT::vector() calls with references to LLT objects that were created in advance. I tried something similar with the getRegBankForRegClass() but I haven't written that as a patch yet since that one requires some refactors to get access to a mapping that RegisterBankEmitter.cpp knows. In my experiment I edited this information into AArchGenGlobalISel.inc by hand. I think the real solution is to convert the generated C++ to the state-machine that we intended to end up with. I don't think we'll be able to put it off much longer given that we're hitting compile-time problems when we can only import 25% of the rules. That said, I have a couple more nearly-finished patches I'd like to get in before we introduce the state machine. Hopefully, the above tricks will be enough to save me a re-write.> On 25 May 2017, at 16:11, Diana Picus <diana.picus at linaro.org> wrote: > > Hi Daniel, > > I built r303542, then applied your patch and built again and it still takes > real 34m30.279s > user 84m36.553s > sys 0m58.372s > > This is better than the 50m I saw before, but I think we should try to > make it a bit faster. Do you have any other ideas to make it work? > > Thanks, > Diana > > > On 22 May 2017 at 11:22, Diana Picus <diana.picus at linaro.org> wrote: >> Hi Daniel, >> >> I did your experiment on a TK1 machine (same as the bots) and for r303258 I get: >> real 18m28.882s >> user 35m37.091s >> sys 0m44.726s >> >> and for r303259: >> real 50m52.048s >> user 88m25.473s >> sys 0m46.548s >> >> If I can help investigate, please let me know, otherwise we can just >> try your fixes and see how they affect compilation time. >> >> Thanks, >> Diana >> >> On 22 May 2017 at 10:49, Daniel Sanders <daniel_l_sanders at apple.com> wrote: >>> r303341 is the re-commit of the r303259 which tripled the number of rules >>> that can be imported into GlobalISel from SelectionDAG. A compile time >>> regression is to be expected but when I looked into it I found it was ~25s >>> on my machine for the whole incremental build rather than the ~12mins you >>> are seeing. I'll take another look. >>> >>> I'm aware of a couple easy improvements we could make to the way the >>> importer works. I was leaving them until we change it over to a state >>> machine but the most obvious is to group rules by their top-level gMIR >>> instruction. This would reduce the cost of the std::sort that handles the >>> rule priorities in generating the source file and will also make it simpler >>> for the compiler to compile it. >>> >>> >>> On 21 May 2017, at 11:16, Vitaly Buka <vitalybuka at google.com> wrote: >>> >>> It must be r303341, I commented on corresponding llvm-commits thread. >>> >>> On Fri, May 19, 2017 at 7:34 AM, Diana Picus via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>>> >>>> Ok, thanks. I'll try to do a bisect next week to see if I can find it. >>>> >>>> Cheers, >>>> Diana >>>> >>>> On 19 May 2017 at 16:29, Daniel Sanders <daniel_l_sanders at apple.com> >>>> wrote: >>>>> >>>>>> On 19 May 2017, at 14:54, Daniel Sanders via llvm-dev >>>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>> >>>>>> r303259 will have increased compile-time since it tripled the number of >>>>>> importable >>>>>> SelectionDAG rules but a quick measurement building the affected file: >>>>>> ninja >>>>>> lib/Target/<Target>/CMakeFiles/LLVM<Target>CodeGen.dir/<Target>InstructionSelector.cpp.o >>>>>> for both ARM and AArch64 didn't show a significant increase. I'll check >>>>>> whether >>>>>> it made a different to linking. >>>>> >>>>> I don't think it's r303259. Starting with a fully built r303259, then >>>>> updating to r303258 and running 'ninja' gives me: >>>>> real 2m28.273s >>>>> user 13m23.171s >>>>> sys 0m47.725s >>>>> then updating to r303259 and running 'ninja' again gives me: >>>>> real 2m19.052s >>>>> user 13m38.802s >>>>> sys 0m44.551s >>>>> >>>>>> sanitizer-x86_64-linux-fast also timed out after one of my commits this >>>>>> morning. >>>>>> >>>>>>> On 19 May 2017, at 14:14, Diana Picus <diana.picus at linaro.org> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> We've noticed that recently some of our bots (mostly >>>>>>> clang-cmake-armv7-a15 and clang-cmake-thumbv7-a15) started timing out >>>>>>> whenever someone commits a change to TableGen: >>>>>>> >>>>>>> r303418: >>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7268 >>>>>>> r303346: >>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7242 >>>>>>> r303341: >>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7239 >>>>>>> r303259: >>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7198 >>>>>>> >>>>>>> TableGen changes before that (I checked about 3-4 of them) don't have >>>>>>> this problem: >>>>>>> r303253: >>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7197 >>>>>>> >>>>>>> That one in particular actually finishes the whole build in 635s, >>>>>>> which is only a bit over 50% of the timeout limit (1200s). So, between >>>>>>> r303253 and now, something happened that made full builds >>>>>>> significantly slower. Does anyone have any idea what that might have >>>>>>> been? Also, has anyone noticed this on other bots? >>>>>>> >>>>>>> Thanks, >>>>>>> Diana >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >>>
Diana Picus via llvm-dev
2017-May-26  08:10 UTC
[llvm-dev] Buildbots timing out on full builds
Ok, that sounds reasonable. I'm happy to test more patches for you when they're ready. On 25 May 2017 at 17:39, Daniel Sanders <daniel_l_sanders at apple.com> wrote:> Thanks for trying that patch. I agree that 34 mins still isn't good enough but we're heading in the right direction. > > Changing the partitioning predicate to the instruction opcode rather than the number of operands in the top-level instruction will hopefully cut it down further. I also have a patch that shaves a small amount off of the compile-time by replacing the various LLT::scalar()/LLT::vector() calls with references to LLT objects that were created in advance. I tried something similar with the getRegBankForRegClass() but I haven't written that as a patch yet since that one requires some refactors to get access to a mapping that RegisterBankEmitter.cpp knows. In my experiment I edited this information into AArchGenGlobalISel.inc by hand. > > I think the real solution is to convert the generated C++ to the state-machine that we intended to end up with. I don't think we'll be able to put it off much longer given that we're hitting compile-time problems when we can only import 25% of the rules. That said, I have a couple more nearly-finished patches I'd like to get in before we introduce the state machine. Hopefully, the above tricks will be enough to save me a re-write. > >> On 25 May 2017, at 16:11, Diana Picus <diana.picus at linaro.org> wrote: >> >> Hi Daniel, >> >> I built r303542, then applied your patch and built again and it still takes >> real 34m30.279s >> user 84m36.553s >> sys 0m58.372s >> >> This is better than the 50m I saw before, but I think we should try to >> make it a bit faster. Do you have any other ideas to make it work? >> >> Thanks, >> Diana >> >> >> On 22 May 2017 at 11:22, Diana Picus <diana.picus at linaro.org> wrote: >>> Hi Daniel, >>> >>> I did your experiment on a TK1 machine (same as the bots) and for r303258 I get: >>> real 18m28.882s >>> user 35m37.091s >>> sys 0m44.726s >>> >>> and for r303259: >>> real 50m52.048s >>> user 88m25.473s >>> sys 0m46.548s >>> >>> If I can help investigate, please let me know, otherwise we can just >>> try your fixes and see how they affect compilation time. >>> >>> Thanks, >>> Diana >>> >>> On 22 May 2017 at 10:49, Daniel Sanders <daniel_l_sanders at apple.com> wrote: >>>> r303341 is the re-commit of the r303259 which tripled the number of rules >>>> that can be imported into GlobalISel from SelectionDAG. A compile time >>>> regression is to be expected but when I looked into it I found it was ~25s >>>> on my machine for the whole incremental build rather than the ~12mins you >>>> are seeing. I'll take another look. >>>> >>>> I'm aware of a couple easy improvements we could make to the way the >>>> importer works. I was leaving them until we change it over to a state >>>> machine but the most obvious is to group rules by their top-level gMIR >>>> instruction. This would reduce the cost of the std::sort that handles the >>>> rule priorities in generating the source file and will also make it simpler >>>> for the compiler to compile it. >>>> >>>> >>>> On 21 May 2017, at 11:16, Vitaly Buka <vitalybuka at google.com> wrote: >>>> >>>> It must be r303341, I commented on corresponding llvm-commits thread. >>>> >>>> On Fri, May 19, 2017 at 7:34 AM, Diana Picus via llvm-dev >>>> <llvm-dev at lists.llvm.org> wrote: >>>>> >>>>> Ok, thanks. I'll try to do a bisect next week to see if I can find it. >>>>> >>>>> Cheers, >>>>> Diana >>>>> >>>>> On 19 May 2017 at 16:29, Daniel Sanders <daniel_l_sanders at apple.com> >>>>> wrote: >>>>>> >>>>>>> On 19 May 2017, at 14:54, Daniel Sanders via llvm-dev >>>>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>>> >>>>>>> r303259 will have increased compile-time since it tripled the number of >>>>>>> importable >>>>>>> SelectionDAG rules but a quick measurement building the affected file: >>>>>>> ninja >>>>>>> lib/Target/<Target>/CMakeFiles/LLVM<Target>CodeGen.dir/<Target>InstructionSelector.cpp.o >>>>>>> for both ARM and AArch64 didn't show a significant increase. I'll check >>>>>>> whether >>>>>>> it made a different to linking. >>>>>> >>>>>> I don't think it's r303259. Starting with a fully built r303259, then >>>>>> updating to r303258 and running 'ninja' gives me: >>>>>> real 2m28.273s >>>>>> user 13m23.171s >>>>>> sys 0m47.725s >>>>>> then updating to r303259 and running 'ninja' again gives me: >>>>>> real 2m19.052s >>>>>> user 13m38.802s >>>>>> sys 0m44.551s >>>>>> >>>>>>> sanitizer-x86_64-linux-fast also timed out after one of my commits this >>>>>>> morning. >>>>>>> >>>>>>>> On 19 May 2017, at 14:14, Diana Picus <diana.picus at linaro.org> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> We've noticed that recently some of our bots (mostly >>>>>>>> clang-cmake-armv7-a15 and clang-cmake-thumbv7-a15) started timing out >>>>>>>> whenever someone commits a change to TableGen: >>>>>>>> >>>>>>>> r303418: >>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7268 >>>>>>>> r303346: >>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7242 >>>>>>>> r303341: >>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7239 >>>>>>>> r303259: >>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7198 >>>>>>>> >>>>>>>> TableGen changes before that (I checked about 3-4 of them) don't have >>>>>>>> this problem: >>>>>>>> r303253: >>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7197 >>>>>>>> >>>>>>>> That one in particular actually finishes the whole build in 635s, >>>>>>>> which is only a bit over 50% of the timeout limit (1200s). So, between >>>>>>>> r303253 and now, something happened that made full builds >>>>>>>> significantly slower. Does anyone have any idea what that might have >>>>>>>> been? Also, has anyone noticed this on other bots? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Diana >>>>>>> >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> llvm-dev at lists.llvm.org >>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>>> >
Daniel Sanders via llvm-dev
2017-May-31  08:21 UTC
[llvm-dev] Buildbots timing out on full builds
Hi Diana and Vitaly, Could you give https://reviews.llvm.org/differential/diff/100829/ <https://reviews.llvm.org/differential/diff/100829/> a try? When measuring the compile of AArch64InstructionSelector.cpp.o with asan enabled and running under instruments's Allocation profiler, my machine reports that the cumulative memory allocations is down to ~3.5GB (was ~10GB), the number of allocations down to ~4 million (was ~23 million), and the compile time down to ~15s (was ~60s). The patch is based on r303542 and the main change is that most of the generated C++ has been replaced with a state-machine based implementation. It's not fully converted to a state-machine yet since it generates lots of smaller machines (one matcher and one emitter per rule) instead of a single machine but it's hopefully sufficient to unblock my patch series.> On 26 May 2017, at 09:10, Diana Picus <diana.picus at linaro.org> wrote: > > Ok, that sounds reasonable. I'm happy to test more patches for you > when they're ready. > > On 25 May 2017 at 17:39, Daniel Sanders <daniel_l_sanders at apple.com> wrote: >> Thanks for trying that patch. I agree that 34 mins still isn't good enough but we're heading in the right direction. >> >> Changing the partitioning predicate to the instruction opcode rather than the number of operands in the top-level instruction will hopefully cut it down further. I also have a patch that shaves a small amount off of the compile-time by replacing the various LLT::scalar()/LLT::vector() calls with references to LLT objects that were created in advance. I tried something similar with the getRegBankForRegClass() but I haven't written that as a patch yet since that one requires some refactors to get access to a mapping that RegisterBankEmitter.cpp knows. In my experiment I edited this information into AArchGenGlobalISel.inc by hand. >> >> I think the real solution is to convert the generated C++ to the state-machine that we intended to end up with. I don't think we'll be able to put it off much longer given that we're hitting compile-time problems when we can only import 25% of the rules. That said, I have a couple more nearly-finished patches I'd like to get in before we introduce the state machine. Hopefully, the above tricks will be enough to save me a re-write. >> >>> On 25 May 2017, at 16:11, Diana Picus <diana.picus at linaro.org> wrote: >>> >>> Hi Daniel, >>> >>> I built r303542, then applied your patch and built again and it still takes >>> real 34m30.279s >>> user 84m36.553s >>> sys 0m58.372s >>> >>> This is better than the 50m I saw before, but I think we should try to >>> make it a bit faster. Do you have any other ideas to make it work? >>> >>> Thanks, >>> Diana >>> >>> >>> On 22 May 2017 at 11:22, Diana Picus <diana.picus at linaro.org> wrote: >>>> Hi Daniel, >>>> >>>> I did your experiment on a TK1 machine (same as the bots) and for r303258 I get: >>>> real 18m28.882s >>>> user 35m37.091s >>>> sys 0m44.726s >>>> >>>> and for r303259: >>>> real 50m52.048s >>>> user 88m25.473s >>>> sys 0m46.548s >>>> >>>> If I can help investigate, please let me know, otherwise we can just >>>> try your fixes and see how they affect compilation time. >>>> >>>> Thanks, >>>> Diana >>>> >>>> On 22 May 2017 at 10:49, Daniel Sanders <daniel_l_sanders at apple.com> wrote: >>>>> r303341 is the re-commit of the r303259 which tripled the number of rules >>>>> that can be imported into GlobalISel from SelectionDAG. A compile time >>>>> regression is to be expected but when I looked into it I found it was ~25s >>>>> on my machine for the whole incremental build rather than the ~12mins you >>>>> are seeing. I'll take another look. >>>>> >>>>> I'm aware of a couple easy improvements we could make to the way the >>>>> importer works. I was leaving them until we change it over to a state >>>>> machine but the most obvious is to group rules by their top-level gMIR >>>>> instruction. This would reduce the cost of the std::sort that handles the >>>>> rule priorities in generating the source file and will also make it simpler >>>>> for the compiler to compile it. >>>>> >>>>> >>>>> On 21 May 2017, at 11:16, Vitaly Buka <vitalybuka at google.com> wrote: >>>>> >>>>> It must be r303341, I commented on corresponding llvm-commits thread. >>>>> >>>>> On Fri, May 19, 2017 at 7:34 AM, Diana Picus via llvm-dev >>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>> >>>>>> Ok, thanks. I'll try to do a bisect next week to see if I can find it. >>>>>> >>>>>> Cheers, >>>>>> Diana >>>>>> >>>>>> On 19 May 2017 at 16:29, Daniel Sanders <daniel_l_sanders at apple.com> >>>>>> wrote: >>>>>>> >>>>>>>> On 19 May 2017, at 14:54, Daniel Sanders via llvm-dev >>>>>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>>>> >>>>>>>> r303259 will have increased compile-time since it tripled the number of >>>>>>>> importable >>>>>>>> SelectionDAG rules but a quick measurement building the affected file: >>>>>>>> ninja >>>>>>>> lib/Target/<Target>/CMakeFiles/LLVM<Target>CodeGen.dir/<Target>InstructionSelector.cpp.o >>>>>>>> for both ARM and AArch64 didn't show a significant increase. I'll check >>>>>>>> whether >>>>>>>> it made a different to linking. >>>>>>> >>>>>>> I don't think it's r303259. Starting with a fully built r303259, then >>>>>>> updating to r303258 and running 'ninja' gives me: >>>>>>> real 2m28.273s >>>>>>> user 13m23.171s >>>>>>> sys 0m47.725s >>>>>>> then updating to r303259 and running 'ninja' again gives me: >>>>>>> real 2m19.052s >>>>>>> user 13m38.802s >>>>>>> sys 0m44.551s >>>>>>> >>>>>>>> sanitizer-x86_64-linux-fast also timed out after one of my commits this >>>>>>>> morning. >>>>>>>> >>>>>>>>> On 19 May 2017, at 14:14, Diana Picus <diana.picus at linaro.org> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> We've noticed that recently some of our bots (mostly >>>>>>>>> clang-cmake-armv7-a15 and clang-cmake-thumbv7-a15) started timing out >>>>>>>>> whenever someone commits a change to TableGen: >>>>>>>>> >>>>>>>>> r303418: >>>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7268 >>>>>>>>> r303346: >>>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7242 >>>>>>>>> r303341: >>>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7239 >>>>>>>>> r303259: >>>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7198 >>>>>>>>> >>>>>>>>> TableGen changes before that (I checked about 3-4 of them) don't have >>>>>>>>> this problem: >>>>>>>>> r303253: >>>>>>>>> http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/7197 >>>>>>>>> >>>>>>>>> That one in particular actually finishes the whole build in 635s, >>>>>>>>> which is only a bit over 50% of the timeout limit (1200s). So, between >>>>>>>>> r303253 and now, something happened that made full builds >>>>>>>>> significantly slower. Does anyone have any idea what that might have >>>>>>>>> been? Also, has anyone noticed this on other bots? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Diana >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> llvm-dev at lists.llvm.org >>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>>> >>>>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170531/6776b217/attachment.html>