thr3ads.net - llvm dev - [llvm-dev] RFC: PGO Late instrumentation for LLVM [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Ivan Baev via llvm-dev

2015-Sep-02 02:21 UTC

[llvm-dev] RFC: PGO Late instrumentation for LLVM

> Date: Tue, 1 Sep 2015 14:21:16 -0700
> From: Rong Xu via llvm-dev <llvm-dev at lists.llvm.org>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>, David Li <davidxl at
google.com>Subject: Re: [llvm-dev] RFC: PGO Late instrumentation for LLVM
>>>> *(2) Performance impact of context sensitivity*
>>>> LLVM does not use the profile information fully in the back-endoptimizations, for instance, inlining does not fully use the profile
counts>>>> -- it only marks hot/cold function attribute based on function
entrycounts. To evaluate the impact of profile context sensitivity, GCC is
used>>>> in the experiment. Note that GCC PGO improves clang performance
a lot
more>>>> than clang PGO.
>>>> First we summarize the methodology used in the experiment:
>>>> 0)  build clang with GCC O2 without early inlining and measure
clang's>>>> performance. GCC early inlining (einline) is similar to
pre-inline
used by>>>> late instrumentation.
>>>> 1) build clang with GCC O2 with early inlining and measure
>>>> performance.
>>>> The performance difference of 1) and 0) is denoted as E which
measures>>>> the contribution of early inlining.
>>>> 2) build clang with GCC O2 + PGO without early inlining.
>>>> 3) build clang with GCC O2 + PGO with early inlining.
>>>> The performance difference of 3) and 2) is denoted as EC. It
>>>> constitutes
>>>> roughly two parts a) early inlining contribution b) context
sensitive
profiling enabled with early inlining.>>>> The contribution of context sensitive profiling can be
estimated by
EC>>>> -
>>>> E above.
>>>>
-------------------------------------------------------------------------------Config                        wall_time_for_use 
speedup_vs_(0)>>>>  speedup_vs_(1)
>>>> (0) base w/o einline             84.946            1.000
0.934>>>> (1) base O2                      79.310            1.071
1.000>>>> (2) profile-arcs w/o einline     63.518            1.337
1.249>>>> (3) profile-arcs                 48.364            1.756
1.640>>>> We see the following:
>>>> 1) GCC PGO with early inlining improves clang performance by
64.0%
(v.s.>>>> base O2 w/ early inline).
>>>> 2) GCC PGO w/o early inlining improves clang performance by
33.7%
(v.s.>>>> base O2 w/o early inline).
>>>> 3) Early inlining performance contribution is about 7.1%.
>>>> 4) Profile context sensitivity contribution is estimated to be
22.2%(i.e. 64.0% -33.7% - 7.1%), which is pretty significant.

Rong,
Sorry for the late response. Just wanted to clarify my understanding of
data in (2) Performance impact of context sensitivity.

On clang as an application:
3) Early inlining contribution is about 7.1%,
2) PGO w/o early inlining contribution is about 33.7%,

4) so the additional combined effect of 2 and 3 is about 22.2%, correct?
In other words, just avoiding inlining small/simple callees and updating
their profile counts in the call graph by the main inliner - all through
the use of early inlining - improves clang performance by 22.2%.

Thanks,
Ivan

Xinliang David Li via llvm-dev

2015-Sep-02 17:01 UTC

head link

[llvm-dev] RFC: PGO Late instrumentation for LLVM

On Tue, Sep 1, 2015 at 7:21 PM, Ivan Baev via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> > Date: Tue, 1 Sep 2015 14:21:16 -0700
> > From: Rong Xu via llvm-dev <llvm-dev at lists.llvm.org>
> > Cc: llvm-dev <llvm-dev at lists.llvm.org>, David Li <davidxl
at google.com>
> Subject: Re: [llvm-dev] RFC: PGO Late instrumentation for LLVM
>
> >>>> *(2) Performance impact of context sensitivity*
> >>>> LLVM does not use the profile information fully in the
back-end
> optimizations, for instance, inlining does not fully use the profile counts
> >>>> -- it only marks hot/cold function attribute based on
function entry
> counts. To evaluate the impact of profile context sensitivity, GCC is used
> >>>> in the experiment. Note that GCC PGO improves clang
performance a lot
> more
> >>>> than clang PGO.
> >>>> First we summarize the methodology used in the experiment:
> >>>> 0)  build clang with GCC O2 without early inlining and
measure
> clang's
> >>>> performance. GCC early inlining (einline) is similar to
pre-inline
> used by
> >>>> late instrumentation.
> >>>> 1) build clang with GCC O2 with early inlining and measure
> >>>> performance.
> >>>> The performance difference of 1) and 0) is denoted as E
which
> measures
> >>>> the contribution of early inlining.
> >>>> 2) build clang with GCC O2 + PGO without early inlining.
> >>>> 3) build clang with GCC O2 + PGO with early inlining.
> >>>> The performance difference of 3) and 2) is denoted as EC.
It
> >>>> constitutes
> >>>> roughly two parts a) early inlining contribution b)
context sensitive
> profiling enabled with early inlining.
> >>>> The contribution of context sensitive profiling can be
estimated by
> EC
> >>>> -
> >>>> E above.
> >>>>
>
-------------------------------------------------------------------------------
> Config                        wall_time_for_use  speedup_vs_(0)
> >>>>  speedup_vs_(1)
> >>>> (0) base w/o einline             84.946            1.000
> 0.934
> >>>> (1) base O2                      79.310            1.071
> 1.000
> >>>> (2) profile-arcs w/o einline     63.518            1.337
> 1.249
> >>>> (3) profile-arcs                 48.364            1.756
> 1.640
> >>>> We see the following:
> >>>> 1) GCC PGO with early inlining improves clang performance
by 64.0%
> (v.s.
> >>>> base O2 w/ early inline).
> >>>> 2) GCC PGO w/o early inlining improves clang performance
by 33.7%
> (v.s.
> >>>> base O2 w/o early inline).
> >>>> 3) Early inlining performance contribution is about 7.1%.
> >>>> 4) Profile context sensitivity contribution is estimated
to be 22.2%
> (i.e. 64.0% -33.7% - 7.1%), which is pretty significant.
>
> Rong,
> Sorry for the late response. Just wanted to clarify my understanding of
> data in (2) Performance impact of context sensitivity.
>
> On clang as an application:
> 3) Early inlining contribution is about 7.1%,
>
This is the effect of pre-inlining without profile guidance.

> 2) PGO w/o early inlining contribution is about 33.7%,
>
> 4) so the additional combined effect of 2 and 3 is about 22.2%, correct?
>
Not combined effect -- but remaining effect (by excluding 2 and 3)

> In other words, just avoiding inlining small/simple callees and updating
> their profile counts in the call graph by the main inliner - all through
> the use of early inlining - improves clang performance by 22.2%.
>
Not sure what you mean here. 22% is the estimate of the effect of CS
profile due to clones of profile counters during instrumentation (through
pre-inlining). Profile update with inlining always exist including in 2).

David

>
> Thanks,
> Ivan
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150902/b333cae3/attachment.html>

llvm dev - Sep 2015 - RFC: PGO Late instrumentation for LLVM

[llvm-dev] RFC: PGO Late instrumentation for LLVM

[llvm-dev] RFC: PGO Late instrumentation for LLVM