thr3ads.net - llvm dev - [llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Tobias Hieta via llvm-dev

2020-Sep-09 10:25 UTC

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

Hello,

We use PGO to optimize clang itself. I can see if I have time to give this
patch some testing. Anything special to look out for except compile
benchmark and time to build clang, do you expect any changes in code size?

On Wed, Sep 9, 2020, 10:03 Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Wed, 9 Sep 2020 at 01:21, Min-Yih Hsu via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> From the above experiments we observed that compilation / link time
>> improvement scaled linearly with the percentage of cold functions we
>> skipped. Even if we only skipped functions that never got executed
(i.e.
>> had counter values equal to zero, which is effectively “0%”), we
already
>> had 5~10% of “free ride” on compilation / linking speed improvement and
>> barely had any target performance penalty.
>>
>
> Hi Min (Paul, Edd),
>
> This is great work! Small, clear patch, substantial impact, virtually no
> downsides.
>
> Just looking at your test-suite numbers, not optimising functions
"never
> used" during the profile run sounds like an obvious "default PGO
behaviour"
> to me. The flag defining the percentage range is a good option for
> development builds.
>
> I imagine you guys have run this on internal programs and found
> beneficial, too, not just the LLVM test-suite (which is very small and
> non-representative). It would be nice if other groups that already use PGO
> could try that locally and spot any issues.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/5535c2bc/attachment.html>

Nemanja Ivanovic via llvm-dev

2020-Sep-09 13:27 UTC

head link

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

This sounds very interesting and the compile time gains in the conservative
range (say under 25%) seem quite promising.

One concern that comes to mind is if it is possible for performance to
degrade severely in the situation where a function has a hot call site
(where it gets inlined) and some non-zero number of cold sites (where it
does not get inlined). When we decorate the function with `optnone,
noinline` it will presumably not be inlined into the hot call site any
longer and will furthermore be unoptimized.
Have you considered such a case and if so, is it something that cannot
happen (i.e. inlining has already happened, etc.) or something that we can
mitigate in the future?

A more aesthetic comment I have is that personally, I would prefer a single
option with a default percentage (say 0%) rather than having to specify two
options.
Also, it might be useful to add an option to dump the names of functions
that are decorated so the user can track an execution count of such
functions when running their code. But of course, the debug messages may be
adequate for this purpose.

Nemanja

On Wed, Sep 9, 2020 at 6:26 AM Tobias Hieta via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hello,
>
> We use PGO to optimize clang itself. I can see if I have time to give this
> patch some testing. Anything special to look out for except compile
> benchmark and time to build clang, do you expect any changes in code size?
>
> On Wed, Sep 9, 2020, 10:03 Renato Golin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Wed, 9 Sep 2020 at 01:21, Min-Yih Hsu via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> From the above experiments we observed that compilation / link time
>>> improvement scaled linearly with the percentage of cold functions
we
>>> skipped. Even if we only skipped functions that never got executed
(i.e.
>>> had counter values equal to zero, which is effectively “0%”), we
already
>>> had 5~10% of “free ride” on compilation / linking speed improvement
and
>>> barely had any target performance penalty.
>>>
>>
>> Hi Min (Paul, Edd),
>>
>> This is great work! Small, clear patch, substantial impact, virtually
no
>> downsides.
>>
>> Just looking at your test-suite numbers, not optimising functions
"never
>> used" during the profile run sounds like an obvious "default
PGO behaviour"
>> to me. The flag defining the percentage range is a good option for
>> development builds.
>>
>> I imagine you guys have run this on internal programs and found
>> beneficial, too, not just the LLVM test-suite (which is very small and
>> non-representative). It would be nice if other groups that already use
PGO
>> could try that locally and spot any issues.
>>
>> cheers,
>> --renato
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/3bc21098/attachment.html>

Renato Golin via llvm-dev

2020-Sep-09 14:10 UTC

head link

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

On Wed, 9 Sep 2020 at 14:27, Nemanja Ivanovic <nemanja.i.ibm at gmail.com>
wrote:
> A more aesthetic comment I have is that personally, I would prefer a
> single option with a default percentage (say 0%) rather than having to
> specify two options.
>
0% doesn't mean "don't do it", just means "only do that
to functions I
didn't see running at all", which could be misrepresented in the
profiling
run.

If we agree this should be *always* enabled, then only one option is
needed. Otherwise, we'd need negative percentages to mean "don't do
that"
and that would be weird. :)


> Also, it might be useful to add an option to dump the names of functions
> that are decorated so the user can track an execution count of such
> functions when running their code. But of course, the debug messages may be
> adequate for this purpose.
>
Remark options should be enough for that.

--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/71858aca/attachment.html>

Min-Yih Hsu via llvm-dev

2020-Sep-09 17:15 UTC

head link

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

Hi Tobias and Dominique,

I didn't evaluate the impact on code size in the first place since it was
not my primary goal. But thanks to the design of LLVM Test Suite
benchmarking infrastructure, I can call out those numbers right away.

(Non-LTO)
Experiment Name                 Code Size Increase Percentage
DeOpt Cold Zero Count                        5.2%
DeOpt Cold 25%                                   6.8%
DeOpt Cold 50%                                   7.0%
DeOpt Cold 75%                                   7.0%

(FullLTO)
Experiment Name                 Code Size Increase Percentage
DeOpt Cold Zero Count                        4.8%
DeOpt Cold 25%                                   6.4%
DeOpt Cold 50%                                   6.2%
DeOpt Cold 75%                                   5.3%

For non-LTO its cap is around 7%. For FullLTO things got a little more
interesting where code size actually decreased when we increased the cold
threshold, but I'll say it's around 6%. To dive a little deeper, the
majority of increased code size was (not-surprisingly) coming from the
.text section. The PLT section contributed a little bit, and the rest of
sections brealey changed.

Though the overhead on code size is higher than the target performance
overhead, I think it's still acceptable in normal cases. In addition, David
mentioned in D87337 that LLVM has used similar techniques on code size (not
sure what he was referencing, my guess will be something related to
hot-cold code splitting). So I think the feature we're proposing here can
be a complement to that one.

Finally: Tobias, thanks for evaluating the impact on Clang, I'm really
interested to see the result.

Best,
Min

On Wed, Sep 9, 2020 at 3:26 AM Tobias Hieta <tobias at plexapp.com> wrote:
> Hello,
>
> We use PGO to optimize clang itself. I can see if I have time to give this
> patch some testing. Anything special to look out for except compile
> benchmark and time to build clang, do you expect any changes in code size?
>
> On Wed, Sep 9, 2020, 10:03 Renato Golin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Wed, 9 Sep 2020 at 01:21, Min-Yih Hsu via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> From the above experiments we observed that compilation / link time
>>> improvement scaled linearly with the percentage of cold functions
we
>>> skipped. Even if we only skipped functions that never got executed
(i.e.
>>> had counter values equal to zero, which is effectively “0%”), we
already
>>> had 5~10% of “free ride” on compilation / linking speed improvement
and
>>> barely had any target performance penalty.
>>>
>>
>> Hi Min (Paul, Edd),
>>
>> This is great work! Small, clear patch, substantial impact, virtually
no
>> downsides.
>>
>> Just looking at your test-suite numbers, not optimising functions
"never
>> used" during the profile run sounds like an obvious "default
PGO behaviour"
>> to me. The flag defining the percentage range is a good option for
>> development builds.
>>
>> I imagine you guys have run this on internal programs and found
>> beneficial, too, not just the LLVM test-suite (which is very small and
>> non-representative). It would be nice if other groups that already use
PGO
>> could try that locally and spot any issues.
>>
>> cheers,
>> --renato
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-- 
Min-Yih Hsu
Ph.D Student in ICS Department, University of California, Irvine (UCI).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/e543f40f/attachment.html>

Renato Golin via llvm-dev

2020-Sep-09 17:28 UTC

head link

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

On Wed, 9 Sep 2020 at 18:15, Min-Yih Hsu via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> David mentioned in D87337 that LLVM has used similar techniques on code
> size (not sure what he was referencing, my guess will be something related
> to hot-cold code splitting).
>
IIUC, it's just using optsize instead of optnone. The idea is that, if the
code really doesn't run often/at all, then the performance impact of
reducing the size is negligible, but the size impact is considerable.

I'd wager that optsize could even be faster than optnone, as it would
delete a lot of useless code... but not noticeable, as it wouldn't run much.

This is an idea that we (Verona Language) are interested in, too.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/e397f06d/attachment.html>

llvm dev - Sep 2020 - [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info