Tobias Hieta via llvm-dev
2020-Sep-09 10:25 UTC
[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info
Hello, We use PGO to optimize clang itself. I can see if I have time to give this patch some testing. Anything special to look out for except compile benchmark and time to build clang, do you expect any changes in code size? On Wed, Sep 9, 2020, 10:03 Renato Golin via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Wed, 9 Sep 2020 at 01:21, Min-Yih Hsu via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> From the above experiments we observed that compilation / link time >> improvement scaled linearly with the percentage of cold functions we >> skipped. Even if we only skipped functions that never got executed (i.e. >> had counter values equal to zero, which is effectively “0%”), we already >> had 5~10% of “free ride” on compilation / linking speed improvement and >> barely had any target performance penalty. >> > > Hi Min (Paul, Edd), > > This is great work! Small, clear patch, substantial impact, virtually no > downsides. > > Just looking at your test-suite numbers, not optimising functions "never > used" during the profile run sounds like an obvious "default PGO behaviour" > to me. The flag defining the percentage range is a good option for > development builds. > > I imagine you guys have run this on internal programs and found > beneficial, too, not just the LLVM test-suite (which is very small and > non-representative). It would be nice if other groups that already use PGO > could try that locally and spot any issues. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/5535c2bc/attachment.html>
Nemanja Ivanovic via llvm-dev
2020-Sep-09 13:27 UTC
[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info
This sounds very interesting and the compile time gains in the conservative range (say under 25%) seem quite promising. One concern that comes to mind is if it is possible for performance to degrade severely in the situation where a function has a hot call site (where it gets inlined) and some non-zero number of cold sites (where it does not get inlined). When we decorate the function with `optnone, noinline` it will presumably not be inlined into the hot call site any longer and will furthermore be unoptimized. Have you considered such a case and if so, is it something that cannot happen (i.e. inlining has already happened, etc.) or something that we can mitigate in the future? A more aesthetic comment I have is that personally, I would prefer a single option with a default percentage (say 0%) rather than having to specify two options. Also, it might be useful to add an option to dump the names of functions that are decorated so the user can track an execution count of such functions when running their code. But of course, the debug messages may be adequate for this purpose. Nemanja On Wed, Sep 9, 2020 at 6:26 AM Tobias Hieta via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello, > > We use PGO to optimize clang itself. I can see if I have time to give this > patch some testing. Anything special to look out for except compile > benchmark and time to build clang, do you expect any changes in code size? > > On Wed, Sep 9, 2020, 10:03 Renato Golin via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On Wed, 9 Sep 2020 at 01:21, Min-Yih Hsu via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> From the above experiments we observed that compilation / link time >>> improvement scaled linearly with the percentage of cold functions we >>> skipped. Even if we only skipped functions that never got executed (i.e. >>> had counter values equal to zero, which is effectively “0%”), we already >>> had 5~10% of “free ride” on compilation / linking speed improvement and >>> barely had any target performance penalty. >>> >> >> Hi Min (Paul, Edd), >> >> This is great work! Small, clear patch, substantial impact, virtually no >> downsides. >> >> Just looking at your test-suite numbers, not optimising functions "never >> used" during the profile run sounds like an obvious "default PGO behaviour" >> to me. The flag defining the percentage range is a good option for >> development builds. >> >> I imagine you guys have run this on internal programs and found >> beneficial, too, not just the LLVM test-suite (which is very small and >> non-representative). It would be nice if other groups that already use PGO >> could try that locally and spot any issues. >> >> cheers, >> --renato >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/3bc21098/attachment.html>
Renato Golin via llvm-dev
2020-Sep-09 14:10 UTC
[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info
On Wed, 9 Sep 2020 at 14:27, Nemanja Ivanovic <nemanja.i.ibm at gmail.com> wrote:> A more aesthetic comment I have is that personally, I would prefer a > single option with a default percentage (say 0%) rather than having to > specify two options. >0% doesn't mean "don't do it", just means "only do that to functions I didn't see running at all", which could be misrepresented in the profiling run. If we agree this should be *always* enabled, then only one option is needed. Otherwise, we'd need negative percentages to mean "don't do that" and that would be weird. :)> Also, it might be useful to add an option to dump the names of functions > that are decorated so the user can track an execution count of such > functions when running their code. But of course, the debug messages may be > adequate for this purpose. >Remark options should be enough for that. --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/71858aca/attachment.html>
Min-Yih Hsu via llvm-dev
2020-Sep-09 17:15 UTC
[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info
Hi Tobias and Dominique, I didn't evaluate the impact on code size in the first place since it was not my primary goal. But thanks to the design of LLVM Test Suite benchmarking infrastructure, I can call out those numbers right away. (Non-LTO) Experiment Name Code Size Increase Percentage DeOpt Cold Zero Count 5.2% DeOpt Cold 25% 6.8% DeOpt Cold 50% 7.0% DeOpt Cold 75% 7.0% (FullLTO) Experiment Name Code Size Increase Percentage DeOpt Cold Zero Count 4.8% DeOpt Cold 25% 6.4% DeOpt Cold 50% 6.2% DeOpt Cold 75% 5.3% For non-LTO its cap is around 7%. For FullLTO things got a little more interesting where code size actually decreased when we increased the cold threshold, but I'll say it's around 6%. To dive a little deeper, the majority of increased code size was (not-surprisingly) coming from the .text section. The PLT section contributed a little bit, and the rest of sections brealey changed. Though the overhead on code size is higher than the target performance overhead, I think it's still acceptable in normal cases. In addition, David mentioned in D87337 that LLVM has used similar techniques on code size (not sure what he was referencing, my guess will be something related to hot-cold code splitting). So I think the feature we're proposing here can be a complement to that one. Finally: Tobias, thanks for evaluating the impact on Clang, I'm really interested to see the result. Best, Min On Wed, Sep 9, 2020 at 3:26 AM Tobias Hieta <tobias at plexapp.com> wrote:> Hello, > > We use PGO to optimize clang itself. I can see if I have time to give this > patch some testing. Anything special to look out for except compile > benchmark and time to build clang, do you expect any changes in code size? > > On Wed, Sep 9, 2020, 10:03 Renato Golin via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On Wed, 9 Sep 2020 at 01:21, Min-Yih Hsu via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> From the above experiments we observed that compilation / link time >>> improvement scaled linearly with the percentage of cold functions we >>> skipped. Even if we only skipped functions that never got executed (i.e. >>> had counter values equal to zero, which is effectively “0%”), we already >>> had 5~10% of “free ride” on compilation / linking speed improvement and >>> barely had any target performance penalty. >>> >> >> Hi Min (Paul, Edd), >> >> This is great work! Small, clear patch, substantial impact, virtually no >> downsides. >> >> Just looking at your test-suite numbers, not optimising functions "never >> used" during the profile run sounds like an obvious "default PGO behaviour" >> to me. The flag defining the percentage range is a good option for >> development builds. >> >> I imagine you guys have run this on internal programs and found >> beneficial, too, not just the LLVM test-suite (which is very small and >> non-representative). It would be nice if other groups that already use PGO >> could try that locally and spot any issues. >> >> cheers, >> --renato >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-- Min-Yih Hsu Ph.D Student in ICS Department, University of California, Irvine (UCI). -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/e543f40f/attachment.html>
Renato Golin via llvm-dev
2020-Sep-09 17:28 UTC
[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info
On Wed, 9 Sep 2020 at 18:15, Min-Yih Hsu via llvm-dev < llvm-dev at lists.llvm.org> wrote:> David mentioned in D87337 that LLVM has used similar techniques on code > size (not sure what he was referencing, my guess will be something related > to hot-cold code splitting). >IIUC, it's just using optsize instead of optnone. The idea is that, if the code really doesn't run often/at all, then the performance impact of reducing the size is negligible, but the size impact is considerable. I'd wager that optsize could even be faster than optnone, as it would delete a lot of useless code... but not noticeable, as it wouldn't run much. This is an idea that we (Verona Language) are interested in, too. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/e397f06d/attachment.html>