Ivan Baev via llvm-dev
2015-Sep-02 02:21 UTC
[llvm-dev] RFC: PGO Late instrumentation for LLVM
> Date: Tue, 1 Sep 2015 14:21:16 -0700 > From: Rong Xu via llvm-dev <llvm-dev at lists.llvm.org> > Cc: llvm-dev <llvm-dev at lists.llvm.org>, David Li <davidxl at google.com>Subject: Re: [llvm-dev] RFC: PGO Late instrumentation for LLVM>>>> *(2) Performance impact of context sensitivity* >>>> LLVM does not use the profile information fully in the back-endoptimizations, for instance, inlining does not fully use the profile counts>>>> -- it only marks hot/cold function attribute based on function entrycounts. To evaluate the impact of profile context sensitivity, GCC is used>>>> in the experiment. Note that GCC PGO improves clang performance a lotmore>>>> than clang PGO. >>>> First we summarize the methodology used in the experiment: >>>> 0) build clang with GCC O2 without early inlining and measureclang's>>>> performance. GCC early inlining (einline) is similar to pre-inlineused by>>>> late instrumentation. >>>> 1) build clang with GCC O2 with early inlining and measure >>>> performance. >>>> The performance difference of 1) and 0) is denoted as E whichmeasures>>>> the contribution of early inlining. >>>> 2) build clang with GCC O2 + PGO without early inlining. >>>> 3) build clang with GCC O2 + PGO with early inlining. >>>> The performance difference of 3) and 2) is denoted as EC. It >>>> constitutes >>>> roughly two parts a) early inlining contribution b) context sensitiveprofiling enabled with early inlining.>>>> The contribution of context sensitive profiling can be estimated byEC>>>> - >>>> E above. >>>> -------------------------------------------------------------------------------Config wall_time_for_use speedup_vs_(0)>>>> speedup_vs_(1) >>>> (0) base w/o einline 84.946 1.0000.934>>>> (1) base O2 79.310 1.0711.000>>>> (2) profile-arcs w/o einline 63.518 1.3371.249>>>> (3) profile-arcs 48.364 1.7561.640>>>> We see the following: >>>> 1) GCC PGO with early inlining improves clang performance by 64.0%(v.s.>>>> base O2 w/ early inline). >>>> 2) GCC PGO w/o early inlining improves clang performance by 33.7%(v.s.>>>> base O2 w/o early inline). >>>> 3) Early inlining performance contribution is about 7.1%. >>>> 4) Profile context sensitivity contribution is estimated to be 22.2%(i.e. 64.0% -33.7% - 7.1%), which is pretty significant. Rong, Sorry for the late response. Just wanted to clarify my understanding of data in (2) Performance impact of context sensitivity. On clang as an application: 3) Early inlining contribution is about 7.1%, 2) PGO w/o early inlining contribution is about 33.7%, 4) so the additional combined effect of 2 and 3 is about 22.2%, correct? In other words, just avoiding inlining small/simple callees and updating their profile counts in the call graph by the main inliner - all through the use of early inlining - improves clang performance by 22.2%. Thanks, Ivan
Xinliang David Li via llvm-dev
2015-Sep-02 17:01 UTC
[llvm-dev] RFC: PGO Late instrumentation for LLVM
On Tue, Sep 1, 2015 at 7:21 PM, Ivan Baev via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > Date: Tue, 1 Sep 2015 14:21:16 -0700 > > From: Rong Xu via llvm-dev <llvm-dev at lists.llvm.org> > > Cc: llvm-dev <llvm-dev at lists.llvm.org>, David Li <davidxl at google.com> > Subject: Re: [llvm-dev] RFC: PGO Late instrumentation for LLVM > > >>>> *(2) Performance impact of context sensitivity* > >>>> LLVM does not use the profile information fully in the back-end > optimizations, for instance, inlining does not fully use the profile counts > >>>> -- it only marks hot/cold function attribute based on function entry > counts. To evaluate the impact of profile context sensitivity, GCC is used > >>>> in the experiment. Note that GCC PGO improves clang performance a lot > more > >>>> than clang PGO. > >>>> First we summarize the methodology used in the experiment: > >>>> 0) build clang with GCC O2 without early inlining and measure > clang's > >>>> performance. GCC early inlining (einline) is similar to pre-inline > used by > >>>> late instrumentation. > >>>> 1) build clang with GCC O2 with early inlining and measure > >>>> performance. > >>>> The performance difference of 1) and 0) is denoted as E which > measures > >>>> the contribution of early inlining. > >>>> 2) build clang with GCC O2 + PGO without early inlining. > >>>> 3) build clang with GCC O2 + PGO with early inlining. > >>>> The performance difference of 3) and 2) is denoted as EC. It > >>>> constitutes > >>>> roughly two parts a) early inlining contribution b) context sensitive > profiling enabled with early inlining. > >>>> The contribution of context sensitive profiling can be estimated by > EC > >>>> - > >>>> E above. > >>>> > ------------------------------------------------------------------------------- > Config wall_time_for_use speedup_vs_(0) > >>>> speedup_vs_(1) > >>>> (0) base w/o einline 84.946 1.000 > 0.934 > >>>> (1) base O2 79.310 1.071 > 1.000 > >>>> (2) profile-arcs w/o einline 63.518 1.337 > 1.249 > >>>> (3) profile-arcs 48.364 1.756 > 1.640 > >>>> We see the following: > >>>> 1) GCC PGO with early inlining improves clang performance by 64.0% > (v.s. > >>>> base O2 w/ early inline). > >>>> 2) GCC PGO w/o early inlining improves clang performance by 33.7% > (v.s. > >>>> base O2 w/o early inline). > >>>> 3) Early inlining performance contribution is about 7.1%. > >>>> 4) Profile context sensitivity contribution is estimated to be 22.2% > (i.e. 64.0% -33.7% - 7.1%), which is pretty significant. > > Rong, > Sorry for the late response. Just wanted to clarify my understanding of > data in (2) Performance impact of context sensitivity. > > On clang as an application: > 3) Early inlining contribution is about 7.1%, >This is the effect of pre-inlining without profile guidance.> 2) PGO w/o early inlining contribution is about 33.7%, > > 4) so the additional combined effect of 2 and 3 is about 22.2%, correct? >Not combined effect -- but remaining effect (by excluding 2 and 3)> In other words, just avoiding inlining small/simple callees and updating > their profile counts in the call graph by the main inliner - all through > the use of early inlining - improves clang performance by 22.2%. >Not sure what you mean here. 22% is the estimate of the effect of CS profile due to clones of profile counters during instrumentation (through pre-inlining). Profile update with inlining always exist including in 2). David> > Thanks, > Ivan > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150902/b333cae3/attachment.html>