This is a proposal for improving LLVM's support for profile information.
Many optimizations can benefit from using that information, especially things
like inlining and code placement, but LLVM's current profiling support is
not very usable. We should fix that. I've been collecting ideas from other
LLVM developers over the last few weeks, and I've put them together here to
solicit feedback. Most of the good ideas here are thanks to Andy Trick.
Background:
There are at least 3 different sources of profile information:
* __builtin_expect: The programmer provides branch prediction hints using this
builtin function.
* Static estimates: The compiler analyzes the code and uses simple heuristics or
more sophisticated analyses to estimate the profile information.
* Profile feedback: Profile information is collected during one or more
executions of the program and then fed back to the compiler.
Regardless of the source, these are all providing essentially the same
information, perhaps with different levels of detail or reliability.
LLVM currently has minimal support for collecting several different kinds of
profile feedback (basic block counts, edge counts, etc.) but none of the
standard passes in LLVM make any use of it. Some passes also use heuristics to
guess branch probabilities (e.g., if-conversion). There is no support at all
for __builtin_expect.
Proposals:
* LLVM should provide a single API to access and update profile information,
regardless of its source. Different passes may want to ask different questions
(e.g., "what is the probability this branch will be taken?", "how
many times is this block expected to execute?", etc.), and we can support
whatever makes sense. I'm not proposing anything specific yet -- the
important point is that passes should be able to use one API for this.
* An LLVM analysis pass can be created to use static analysis and various
heuristics to estimate the profile information. The API for profiling info
would be provided by this pass.
* In the usual LLVM way, passes that modify the IR could either update the
profile analysis info directly or indicate that they do not preserve it so that
subsequent uses would rerun the analysis.
* __builtin_expect info can be attached to the LLVM IR as metadata.
Specifically, conditional branches can have metadata that records the branch
probabilities (e.g., taken vs. not-taken) as fixed-point fractions. When the
profile analysis pass sees the metadata, it will use it in place of any other
heuristics or estimates for that branch.
* Profile feedback can use the same metadata as __builtin_expect to record the
branch probabilities for conditional branches. In the same way, multi-way
branches can have metadata with the probability for each outgoing edge. If all
of the branch probabilities are known, they are sufficient to calculate the
block execution counts and edge counts per invocation of a function. If each
function also has metadata recording an invocation count, the profile analysis
pass will be able to calculate the absolute number of block and edge execution
counts.
Discussion:
Recording branch probabilities (as opposed to edge counts) in the metadata makes
it easier to keep the information accurate as various transformations are
applied. In many cases, e.g., when duplicating a block, the branch probability
can simply be copied along with the rest of the code, and there are no problems
of adjusting all the counts to keep them consistent with each other.
Using metadata is a natural fit for this. Among other things, it allows the
profile feedback support to be independent from the rest of the pieces. The raw
profile information can be read in, correlated with the IR, and used to produce
the metadata. From that point on, the rest of the system doesn't need to
know about it.
The branch probability metadata can be specified for only a subset of the
branches, with the rest of the control flow being estimated by the profile
analysis pass. This is typically the case when __builtin_expect is used, but it
will also enable profile feedback to fail gracefully when the program has been
modified since the profile was collected. It might be a while before we get to
the point of caring about that, but it's a good thing to have a system that
supports it.
Many of the uses for profile info are in the code generator, which has its own
IR. It is unfortunate that we can't use the same interfaces throughout the
compiler, and I'm not entirely certain of the best thing to do for codegen.
My current thought is to record the branch probabilities for each machine basic
block and require codegen passes to keep those values updated. If necessary, we
could define a version of the profile analysis pass that works on the codegen
IR, but I'm not sure yet if that will be necessary.
Plans:
I don't have a lot of time to devote to this right now, but I'm hoping
that if we get a high-level plan (like this one) in place, we can start working
toward it incrementally. As an initial step over the next few months, I would
like to set up __builtin_expect to provide metadata for conditional branches,
and then create a very basic profile info analysis pass to combine that metadata
with simple static estimates. Contributions from others would be welcome.
Comments or suggestions?