Ivan Baev via llvm-dev
2015-Sep-02 19:10 UTC
[llvm-dev] RFC: PGO Late instrumentation for LLVM
> On Tue, Sep 1, 2015 at 7:21 PM, Ivan Baev via llvm-dev < > llvm-dev at lists.llvm.org> wrote: >> > Date: Tue, 1 Sep 2015 14:21:16 -0700 >> > From: Rong Xu via llvm-dev <llvm-dev at lists.llvm.org> >> > Cc: llvm-dev <llvm-dev at lists.llvm.org>, David Li <davidxl at google.com> >> Subject: Re: [llvm-dev] RFC: PGO Late instrumentation for LLVM >> >>>> *(2) Performance impact of context sensitivity* >> >>>> LLVM does not use the profile information fully in the back-end >> optimizations, for instance, inlining does not fully use the profilecounts>> >>>> -- it only marks hot/cold function attribute based on function >> entry >> counts. To evaluate the impact of profile context sensitivity, GCC isused>> >>>> in the experiment. Note that GCC PGO improves clang performance a >> lot >> more >> >>>> than clang PGO. >> >>>> First we summarize the methodology used in the experiment: 0)build clang with GCC O2 without early inlining and measure>> clang's >> >>>> performance. GCC early inlining (einline) is similar to pre-inline >> used by >> >>>> late instrumentation. >> >>>> 1) build clang with GCC O2 with early inlining and measureperformance.>> >>>> The performance difference of 1) and 0) is denoted as E which >> measures >> >>>> the contribution of early inlining. >> >>>> 2) build clang with GCC O2 + PGO without early inlining. >> >>>> 3) build clang with GCC O2 + PGO with early inlining. >> >>>> The performance difference of 3) and 2) is denoted as EC. Itconstitutes>> >>>> roughly two parts a) early inlining contribution b) context >> sensitive >> profiling enabled with early inlining. >> >>>> The contribution of context sensitive profiling can be estimatedby>> EC >> >>>> - >> >>>> E above. >> -------------------------------------------------------------------------------Config wall_time_for_use speedup_vs_(0)>> >>>> speedup_vs_(1) >> >>>> (0) base w/o einline 84.946 1.000 >> 0.934 >> >>>> (1) base O2 79.310 1.071 >> 1.000 >> >>>> (2) profile-arcs w/o einline 63.518 1.337 >> 1.249 >> >>>> (3) profile-arcs 48.364 1.756 >> 1.640 >> >>>> We see the following: >> >>>> 1) GCC PGO with early inlining improves clang performance by 64.0% >> (v.s. >> >>>> base O2 w/ early inline). >> >>>> 2) GCC PGO w/o early inlining improves clang performance by 33.7% >> (v.s. >> >>>> base O2 w/o early inline). >> >>>> 3) Early inlining performance contribution is about 7.1%. >> >>>> 4) Profile context sensitivity contribution is estimated to be >> 22.2% >> (i.e. 64.0% -33.7% - 7.1%), which is pretty significant. >> Rong, >> Sorry for the late response. Just wanted to clarify my understanding ofdata in (2) Performance impact of context sensitivity.>> On clang as an application: >> 3) Early inlining contribution is about 7.1%, > This is the effect of pre-inlining without profile guidance. >> 2) PGO w/o early inlining contribution is about 33.7%, >> 4) so the additional combined effect of 2 and 3 is about 22.2%,correct?> Not combined effect -- but remaining effect (by excluding 2 and 3) >> In other words, just avoiding inlining small/simple callees andupdating>> their profile counts in the call graph by the main inliner - allthrough>> the use of early inlining - improves clang performance by 22.2%. > Not sure what you mean here. 22% is the estimate of the effect of CSprofile due to clones of profile counters during instrumentation (through> pre-inlining). Profile update with inlining always exist including in2). If we compare times for: (2) profile-arcs w/o einline - 63.518 secs, v.s. (3) profile-arcs - 48.364 secs, we get about 31.3% improvement due to early inline with PGO. If we compare times for: (0) base w/o einline - 84.946, v.s. (1) base O2 - 79.310. we get about 7.1% improvement due to early inline without PGO. What can we attribute the difference of 24.2% (31.3 - 7.1) to? 31.3% is the total contribution of early inline with PGO. Is 24.2% the context-sensitivity part of it, meaning that the profile counts in the call graph are more precise duing the inlining process, inlining decisions are better, etc.? Ivan
Xinliang David Li via llvm-dev
2015-Sep-02 19:26 UTC
[llvm-dev] RFC: PGO Late instrumentation for LLVM
On Wed, Sep 2, 2015 at 12:10 PM, Ivan Baev <ibaev at codeaurora.org> wrote:> > On Tue, Sep 1, 2015 at 7:21 PM, Ivan Baev via llvm-dev < > > llvm-dev at lists.llvm.org> wrote: > >> > Date: Tue, 1 Sep 2015 14:21:16 -0700 > >> > From: Rong Xu via llvm-dev <llvm-dev at lists.llvm.org> > >> > Cc: llvm-dev <llvm-dev at lists.llvm.org>, David Li <davidxl at google.com> > >> Subject: Re: [llvm-dev] RFC: PGO Late instrumentation for LLVM > >> >>>> *(2) Performance impact of context sensitivity* > >> >>>> LLVM does not use the profile information fully in the back-end > >> optimizations, for instance, inlining does not fully use the profile > counts > >> >>>> -- it only marks hot/cold function attribute based on function > >> entry > >> counts. To evaluate the impact of profile context sensitivity, GCC is > used > >> >>>> in the experiment. Note that GCC PGO improves clang performance a > >> lot > >> more > >> >>>> than clang PGO. > >> >>>> First we summarize the methodology used in the experiment: 0) > build clang with GCC O2 without early inlining and measure > >> clang's > >> >>>> performance. GCC early inlining (einline) is similar to pre-inline > >> used by > >> >>>> late instrumentation. > >> >>>> 1) build clang with GCC O2 with early inlining and measure > performance. > >> >>>> The performance difference of 1) and 0) is denoted as E which > >> measures > >> >>>> the contribution of early inlining. > >> >>>> 2) build clang with GCC O2 + PGO without early inlining. > >> >>>> 3) build clang with GCC O2 + PGO with early inlining. > >> >>>> The performance difference of 3) and 2) is denoted as EC. It > constitutes > >> >>>> roughly two parts a) early inlining contribution b) context > >> sensitive > >> profiling enabled with early inlining. > >> >>>> The contribution of context sensitive profiling can be estimated > by > >> EC > >> >>>> - > >> >>>> E above. > >> > ------------------------------------------------------------------------------- > Config wall_time_for_use speedup_vs_(0) > >> >>>> speedup_vs_(1) > >> >>>> (0) base w/o einline 84.946 1.000 > >> 0.934 > >> >>>> (1) base O2 79.310 1.071 > >> 1.000 > >> >>>> (2) profile-arcs w/o einline 63.518 1.337 > >> 1.249 > >> >>>> (3) profile-arcs 48.364 1.756 > >> 1.640 > >> >>>> We see the following: > >> >>>> 1) GCC PGO with early inlining improves clang performance by 64.0% > >> (v.s. > >> >>>> base O2 w/ early inline). > >> >>>> 2) GCC PGO w/o early inlining improves clang performance by 33.7% > >> (v.s. > >> >>>> base O2 w/o early inline). > >> >>>> 3) Early inlining performance contribution is about 7.1%. > >> >>>> 4) Profile context sensitivity contribution is estimated to be > >> 22.2% > >> (i.e. 64.0% -33.7% - 7.1%), which is pretty significant. > >> Rong, > >> Sorry for the late response. Just wanted to clarify my understanding of > data in (2) Performance impact of context sensitivity. > >> On clang as an application: > >> 3) Early inlining contribution is about 7.1%, > > This is the effect of pre-inlining without profile guidance. > >> 2) PGO w/o early inlining contribution is about 33.7%, > >> 4) so the additional combined effect of 2 and 3 is about 22.2%, > correct? > > Not combined effect -- but remaining effect (by excluding 2 and 3) > >> In other words, just avoiding inlining small/simple callees and > updating > >> their profile counts in the call graph by the main inliner - all > through > >> the use of early inlining - improves clang performance by 22.2%. > > Not sure what you mean here. 22% is the estimate of the effect of CS > profile due to clones of profile counters during instrumentation > (through > > pre-inlining). Profile update with inlining always exist including in > 2). > > If we compare times for: > (2) profile-arcs w/o einline - 63.518 secs, v.s. > (3) profile-arcs - 48.364 secs, > we get about 31.3% improvement due to early inline with PGO. > > If we compare times for: > (0) base w/o einline - 84.946, v.s. > (1) base O2 - 79.310. > we get about 7.1% improvement due to early inline without PGO. > > What can we attribute the difference of 24.2% (31.3 - 7.1) to? > 31.3% is the total contribution of early inline with PGO. > Is 24.2% the context-sensitivity part of it, meaning that the profile > counts in the call graph are more precise duing the inlining process, > inlining decisions are better, etc.? >yes -- that is it. David> > Ivan > > > > > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150902/1b3587ad/attachment.html>