thr3ads.net - llvm dev - [llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Min-Yih Hsu via llvm-dev

2020-Sep-09 00:20 UTC

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

We would like to propose a new feature to disable optimizations on IR
Functions that are considered “cold” by PGO profiles. The primary goal for
this work is to improve code optimization speed (which also improves
compilation and LTO speed) without making too much impact on target code
performance.

The mechanism is pretty simple: In the second phase (i.e. optimization
phase) of PGO, we would add `optnone` attributes on functions that are
considered “cold”. That is, functions with low profiling counts. Similar
approach can be applied on loops. The rationale behind this idea is pretty
simple as well: If a given IR Function will not be frequently executed, we
shouldn’t waste time optimizing it. Similar approaches can be found in
modern JIT compilers for dynamic languages (e.g. Javascript and Python)
that adopt a multi-tier compilation model: Only “hot” functions or
execution traces will be brought to higher-tier compilers for aggressive
optimizations.

In addition to de-optimizing on functions whose profiling counts are
exactly zero (`-fprofile-deopt-cold`), we also provide a knob
(`-fprofile-deopt-cold-percent=<X percent>`) to adjust the “cold
threshold”. That is, after sorting profiling counts of all functions, this
knob provides an option to de-optimize functions whose count values are
sitting in the lower X percent.

We evaluated this feature on LLVM Test Suite (the Bitcode, SingleSource,
and MultiSource sub-folders were selected). Both compilation speed and
target program performance are measured by the number of instructions
reported by Linux perf. The table below shows the percentage of compilation
speed improvement and target performance overhead relative to the baseline
that only uses (instrumentation-based) PGO.

Experiment Name Compile Speedup Target Overhead
DeOpt Cold Zero Count 5.13% 0.02%
DeOpt Cold 25% 8.06%
0.12%
DeOpt Cold 50% 13.32%
2.38%
DeOpt Cold 75% 17.53%
7.07%

(The “DeOpt Cold Zero Count” experiment will only disable optimizations on
functions whose profiling counts are exactly zero. Rest of the experiments
are disabling optimizations on functions whose profiling counts are in the
lower X%.)

We also did evaluations on FullLTO, here are the numbers:

Experiment Name Link Time Speedup Target Overhead
DeOpt Cold Zero Count 10.87% 1.29%
DeOpt Cold 25% 18.76%
1.50%
DeOpt Cold 50% 30.16%
3.94%
DeOpt Cold 75% 38.71%
8.97%

(The link time presented here included the LTO and code generation time. We
omitted the compile time numbers here since it’s not really interesting in
LTO setup)
>From the above experiments we observed that compilation / link timeimprovement scaled linearly with the percentage of cold functions we
skipped. Even if we only skipped functions that never got executed (i.e.
had counter values equal to zero, which is effectively “0%”), we already
had 5~10% of “free ride” on compilation / linking speed improvement and
barely had any target performance penalty.

We believed that the above numbers had justified this patch to be useful on
improving build time with little overhead.

Here are the patches for review:
* Modifications on LLVM instrumentation-based PGO:
https://reviews.llvm.org/D87337
* Modifications on Clang driver: https://reviews.llvm.org/D87338

Credit: This project was originally started by Paul Robinson <
paul.robinson at sony.com> and Edward Dawson <Edd.Dawson at sony.com>
from Sony
PlayStation compiler team. I picked it up when I was interning there this
summer.

Thank you for your reading.
-Min
--
Min-Yih Hsu
Ph.D Student in ICS Department, University of California, Irvine (UCI).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200908/eab3480f/attachment.html>

Renato Golin via llvm-dev

2020-Sep-09 08:03 UTC

head link

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

On Wed, 9 Sep 2020 at 01:21, Min-Yih Hsu via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> From the above experiments we observed that compilation / link time
> improvement scaled linearly with the percentage of cold functions we
> skipped. Even if we only skipped functions that never got executed (i.e.
> had counter values equal to zero, which is effectively “0%”), we already
> had 5~10% of “free ride” on compilation / linking speed improvement and
> barely had any target performance penalty.
>
Hi Min (Paul, Edd),

This is great work! Small, clear patch, substantial impact, virtually no
downsides.

Just looking at your test-suite numbers, not optimising functions "never
used" during the profile run sounds like an obvious "default PGO
behaviour"
to me. The flag defining the percentage range is a good option for
development builds.

I imagine you guys have run this on internal programs and found beneficial,
too, not just the LLVM test-suite (which is very small and
non-representative). It would be nice if other groups that already use PGO
could try that locally and spot any issues.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/f867cb41/attachment.html>

Tobias Hieta via llvm-dev

2020-Sep-09 10:25 UTC

head link

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

Hello,

We use PGO to optimize clang itself. I can see if I have time to give this
patch some testing. Anything special to look out for except compile
benchmark and time to build clang, do you expect any changes in code size?

On Wed, Sep 9, 2020, 10:03 Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Wed, 9 Sep 2020 at 01:21, Min-Yih Hsu via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> From the above experiments we observed that compilation / link time
>> improvement scaled linearly with the percentage of cold functions we
>> skipped. Even if we only skipped functions that never got executed
(i.e.
>> had counter values equal to zero, which is effectively “0%”), we
already
>> had 5~10% of “free ride” on compilation / linking speed improvement and
>> barely had any target performance penalty.
>>
>
> Hi Min (Paul, Edd),
>
> This is great work! Small, clear patch, substantial impact, virtually no
> downsides.
>
> Just looking at your test-suite numbers, not optimising functions
"never
> used" during the profile run sounds like an obvious "default PGO
behaviour"
> to me. The flag defining the percentage range is a good option for
> development builds.
>
> I imagine you guys have run this on internal programs and found
> beneficial, too, not just the LLVM test-suite (which is very small and
> non-representative). It would be nice if other groups that already use PGO
> could try that locally and spot any issues.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/5535c2bc/attachment.html>

Min-Yih Hsu via llvm-dev

2020-Sep-09 16:42 UTC

head link

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

Hi Renato,

On Wed, Sep 9, 2020 at 1:03 AM Renato Golin <rengolin at gmail.com> wrote:
> On Wed, 9 Sep 2020 at 01:21, Min-Yih Hsu via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> From the above experiments we observed that compilation / link time
>> improvement scaled linearly with the percentage of cold functions we
>> skipped. Even if we only skipped functions that never got executed
(i.e.
>> had counter values equal to zero, which is effectively “0%”), we
already
>> had 5~10% of “free ride” on compilation / linking speed improvement and
>> barely had any target performance penalty.
>>
>
> Hi Min (Paul, Edd),
>
> This is great work! Small, clear patch, substantial impact, virtually no
> downsides.
>
Thank you :-)
>
> Just looking at your test-suite numbers, not optimising functions
"never
> used" during the profile run sounds like an obvious "default PGO
behaviour"
> to me. The flag defining the percentage range is a good option for
> development builds.
>
> I imagine you guys have run this on internal programs and found
> beneficial, too, not just the LLVM test-suite (which is very small and
> non-representative). It would be nice if other groups that already use PGO
> could try that locally and spot any issues.
>Good point! We are aware that LLVM Test Suite is too "SPEC-alike" and
lean
toward scientific computation rather than real-world use cases. So we
actually did experiments on the V8 javascript engine, which is absolutely a
huge code base and a good real-world example. And it showed a 10~13% speed
improvement on optimization + codegen time with up to 4% of target
performance overhead (Note that due to some hacky reasons, for many of the
V8 source files, over 80% or even 95% of compilation time was spent on
frontend, so measuring by total compilation time will be heavily skewed and
unable to reflect the impact of this feature)

Best
-Min

>
> cheers,
> --renato
>

-- 
Min-Yih Hsu
Ph.D Student in ICS Department, University of California, Irvine (UCI).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200909/5fd6f8a9/attachment.html>

Modi Mo via llvm-dev

2020-Sep-10 01:18 UTC

head link

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

The 1.29% is pretty considerable on functions that should never be hit according
to profile information. This can indicate that there might be something amiss
with the profile quality and that certain hot functions are not getting caught.
Alternatively, given the ~5% code size increase you mention in the other thread
the cold code may not be being moved out to a cold page so i-cache pollution
ends up being a factor. I think it would be worthwhile to dig deeper into why
there’s any performance degradation on functions that should never be called.

Also if you’re curious on how to build clang itself with PGO the documentation
is here: https://llvm.org/docs/HowToBuildWithPGO.html

On 9/8/20, 5:21 PM, "llvm-dev on behalf of Min-Yih Hsu via llvm-dev"
<llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at
lists.llvm.org> on behalf of llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>> wrote:

We also did evaluations on FullLTO, here are the numbers:

Experiment Name               Link Time Speedup         Target Overhead
DeOpt Cold Zero Count                10.87%                           1.29%
DeOpt Cold 25%                           18.76%                           1.50%
DeOpt Cold 50%                           30.16%                           3.94%
DeOpt Cold 75%                           38.71%                           8.97%


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200910/85313df3/attachment-0001.html>

Wenlei He via llvm-dev

2020-Sep-10 04:50 UTC

head link

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

1%+ overhead is indeed interesting. If you use lld as linker (together with new
pass manager), you should be able to have a good profile guided function level
layout so dead functions are moved out of the hot pages.

This may also be related to subtle pass ordering issue. Pre-inline counts may
not be super accurate, but we can’t use post-inline counts either given CGSCC
inline is half way through the opt pipeline. Looking at the patch, it seems the
decision is made at PGO annotation time which is between pre-instrumentation
inline and CGSCC inline.

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Modi Mo
via llvm-dev <llvm-dev at lists.llvm.org>
Reply-To: Modi Mo <modimo at fb.com>
Date: Wednesday, September 9, 2020 at 6:18 PM
To: Min-Yih Hsu <minyihh at uci.edu>, llvm-dev <llvm-dev at
lists.llvm.org>, "cfe-dev (cfe-dev at lists.llvm.org)" <cfe-dev
at lists.llvm.org>, Hongtao Yu <hoy at fb.com>
Subject: Re: [llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions
using PGO Info

The 1.29% is pretty considerable on functions that should never be hit according
to profile information. This can indicate that there might be something amiss
with the profile quality and that certain hot functions are not getting caught.
Alternatively, given the ~5% code size increase you mention in the other thread
the cold code may not be being moved out to a cold page so i-cache pollution
ends up being a factor. I think it would be worthwhile to dig deeper into why
there’s any performance degradation on functions that should never be called.

Also if you’re curious on how to build clang itself with PGO the documentation
is here: https://llvm.org/docs/HowToBuildWithPGO.html

On 9/8/20, 5:21 PM, "llvm-dev on behalf of Min-Yih Hsu via llvm-dev"
<llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at
lists.llvm.org> on behalf of llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>> wrote:

We also did evaluations on FullLTO, here are the numbers:

Experiment Name               Link Time Speedup         Target Overhead
DeOpt Cold Zero Count                10.87%                           1.29%
DeOpt Cold 25%                           18.76%                           1.50%
DeOpt Cold 50%                           30.16%                           3.94%
DeOpt Cold 75%                           38.71%                           8.97%



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200910/2146944f/attachment.html>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Sep 2020 - [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info

Seemingly Similar Threads