On Wed, Feb 25, 2015 at 10:52 AM, Philip Reames
<listmail at philipreames.com> wrote:> On 02/24/2015 03:31 PM, Diego Novillo wrote:
>
>
> We (Google) have started to look more closely at the profiling
> infrastructure in LLVM. Internally, we have a large dependency on PGO to
get
> peak performance in generated code.
>
> Some of the dependencies we have on profiling are still not present in LLVM
> (e.g., the inliner) but we will still need to incorporate changes to
support
> our work on these optimizations. Some of the changes may be addressed as
> individual bug fixes on the existing profiling infrastructure. Other
changes
> may be better implemented as either new extensions or as replacements of
> existing code.
>
> I think we will try to minimize infrastructure replacement at least in the
> short/medium term. After all, it doesn't make too much sense to replace
> infrastructure that is broken for code that doesn't exist yet.
>
> David Li and I are preparing a document where we describe the major issues
> that we'd like to address. The document is a bit on the lengthy side,
so it
> may be easier to start with an email discussion.
>
> I would personally be interested in seeing a copy of that document, but it
> might be more appropriate for a blog post then a discussion on llvm-dev. I
> worry that we'd end up with a very unfocused discussion. It might be
better
> to frame this as your plan of attack and reserve discussion on llvm-dev for
> things that are being proposed semi near term. Just my 2 cents.
>
> This is a summary of the main changes we are looking at:
>
> Need to faithfully represent the execution count taken from dynamic
> profiles. Currently, MD_prof does not really represent an execution count.
> This makes things like comparing hotness across functions hard or
> impossible. We need a concept of global hotness.
>
> What does MD_prof actually represent when used from Clang? I know I've
been
> using it for execution counters in my frontend. Am I approaching that
> wrong?
>
> As a side comment: I'm a bit leery of the notion of a consistent notion
of
> hotness based on counters across functions. These counters are almost
> always approximate in practice and counting problems run rampant. I'd
> almost rather see a consistent count inferred from data that's assumed
to be
> questionable than make the frontend try to generate consistent profiling
> metadata. I think either approach could be made to work, we just need to
> think about it carefully.
>
> When the CFG or callgraph change, there need to exist an API for
> incrementally updating/scaling counts. For instance, when a function is
> inlined or partially inlined, when the CFG is modified, etc. These counts
> need to be updated incrementally (or perhaps re-computed as a first step
> into that direction).
>
> Agreed. Do you have a sense how much of an issue this in practice? I
> haven't see it kick in much, but it's also not something I've
been looking
> for.
>
> The inliner (and other optimizations) needs to use profile information and
> update it accordingly. This is predicated on Chandler's work on the
pass
> manager, of course.
>
> Its worth noting that the inliner work can be done independently of the
pass
> manager work. We can always explicitly recompute relevant analysis in the
> inliner if needed. This will cost compile time, so we might need to make
> this an off by default option. (Maybe -O3 only?) Being able to work on
the
> inliner independently of the pass management structure is valuable enough
> that we should probably consider doing this.
>
> PGO inlining is an area I'm very interested in. I'd really
encourage you to
> work incrementally in tree. I'm likely to start putting non-trivial
amounts
> of time into this topic in the next few weeks. I just need to clear a few
> things off my plate first.
>
> Other than the inliner, can you list the passes you think are profitable to
> teach about profiling data? My list so far is: PRE (particularly of
> loads!), the vectorizer (i.e. duplicate work down both a hot and cold path
> when it can be vectorized on the hot path), LoopUnswitch, IRCE, &
LoopUnroll
> (avoiding code size explosion in cold code). I'm much more interested
in
> sources of improved performance than I am simply code size reduction.
> (Reducing code size can improve performance of course.)
Also, code layout (bb layout, function layout, function splitting).
>
> Need to represent global profile summary data. For example, for global
> hotness determination, it is useful to compute additional global summary
> info, such as a histogram of counts that can be used to determine hotness
> and working set size estimates for a large percentage of the profiled
> execution.
>
> Er, not clear what you're trying to say here?
The idea is to get a sense of a good global profile count threshold to
use given an application's profile, i.e. when determining whether a
profile count is hot in the given profile. For example, what is the
minimum profile count contributing to the hottest 99% of the
application's profile.
Teresa
>
> There are other changes that we will need to incorporate. David, Teresa,
> Chandler, please add anything large that I missed.
>
> My main question at the moment is what would be the best way of addressing
> them. Some seem to require new concepts to be implemented (e.g., execution
> counts). Others could be addressed as simple bugs to be fixed in the
current
> framework.
>
> Would it make sense to present everything in a unified document and discuss
> that? I've got some reservations about that approach because we will
end up
> discussing everything at once and it may not lead to concrete progress.
> Another approach would be to present each issue individually either as
> patches or RFCs or bugs.
>
> See above.
>
>
> I will be taking on the implementation of several of these issues. Some of
> them involve the SamplePGO harness that I added last year. I would also
like
> to know what other bugs or problems people have in mind that I could also
> roll into this work.
>
>
> Thanks. Diego.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
--
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413