On Jun 1, 2011, at 5:03 PM, Jakub Staszak wrote:
> I just found a small bug. Fixed version attached.
>
> <kuba_bp3.patch>
Committed as r132613 (and r132616). Thanks Jakub!
To help reviewers understand the patch and where we're headed with it,
I've prepared the following design documentation. For now I'm sending it
to llvm-dev until we have an official design doc for the profiling work. Please
note that Jakub's first patch only pertains to the "Branch
Probability" section below. More patches will follow.
One of the goals of the branch profile framework is to keep the final output of
compilation independent of floating point imprecision. In other words, we want
LLVM to be a deterministic cross-compiler. A sure way to do this is to represent
the original profile in fixed point and use only fixed point arithmetic to
compute derived forms of the profile. Representing branch profile and block
frequency as fixed point values, however, leads to confusing terminology. This
document defines the important terms, which should be understood before
reviewing the LLVM profiling framework.
** Branch Profile
Profile information derived from the compiler front-end or external source. At
IR level, this conceptually represents probabilities on CFG edges such that
block successor probabilities sum to 1.0.
Representation: 32-bit unsigned int stored in IR meta-data.
A single integer per profiled conditional branch would be sufficient. But for
more convenient incremental update, and consistency with Branch Probability
Analysis, we may want an integer per branch target such that
profiledEdgeProbability(src, dest) = profileWeight(src, dest)
/ (sum profileWeight(src, succ) for succ in src.successors())
Of course, that takes care of switches too.
Note that profiled edge weights are *not* raw sample counts provided by a
profiler. The weights are adjusted to express branch probability when compared
with other weights from a predecessor block's successor list. The same
probability can be expressed multiple ways by scaling the values of all sucessor
edges. Successor edge weights need not sum to any particular value. Edge weights
have no meaning when compared across different branches.
The details of this representation aren't important because it is not the
primary interface used by the optimizer. In fact, we could easily support new
meta-data formats without changing anything except the branch probability
analysis pass.
Note: Runtime profiling implementers may choose to embed additional information,
such as profile counts, in meta-data for the purpose of validating the profiler.
But such data is opaque to the optimizer and code generator, so does not need to
be addressed by the framework design.
** Branch Probability
The result of Branch Probability Analysis. This is the mid-level IR pass that
interprets meta-data and performs static prediction when needed.
Representation: Map of 32-bit unsigned int "edge weight" per CFG edge
Conceptually this is similar to Branch Profile:
edgeProbability(src, dest) = edgeWeight(src, dest)
/ (sum edgeWeight(src, succ) for succ in src.successors())
As with branch profile, these edge weights express branch probability as the
ratio between a successor edge and the sum of all other sucessors originating
from the same branch. They do not need to sum to a fixed value, so all successor
edge weights can be scaled without changing the probability. Edge weights
associated with different branches cannot be meaningfully compared.
It's easy and inexpensive to recompute branch probability when needed. But
it worth noting that it's also not hard to incrementally update. Removing a
CFG edge requires no update, unless we want to cleanup stale entries in the map.
Adding an edge requires adding a map entry, which may require scaling down the
weights of the sibling successors to avoid integer overflow. To avoid integer
overflow, we do ensure that the sum of all successor edge weights from a single
branch is less than UINT_MAX.
This is redundant with branch profile meta data when the meta-data exists. If
it's a concern, we have the option of omitting the branch probability for
branches with meta data and doing on-the-fly look up.
Complication arises when clients want to make use of branch probability. Since
we don't want to expose floating point values, except for generating
diagnostic output, and we don't want to expose edge weights, because they
have no independent meaning, we need a richer API.
Example:
BasicBlock *getHotSucc(BasicBlock *BB);
bool isEdgeHot(BasicBlock *pred, BasicBlock *succ);
The API will be implemented entirely using fixed point arithmetic.
Example:
class BranchProbability {
friend BranchProbabilityInfo;
unsigned numerator;
unsigned denominator;
BranchProbability(...); // no public ctor
...
};
class BranchProbabilityInfo {
static BranchProbability HotProb;
static initProb();
...
};
void BranchProbabilityInfo::initProb() {
HotProb = BranchProbability(4, 5);
}
bool isEdgeHot(src, dest) {
(uint64_t(HotProb.denominator) * edgeWeight(src, dest))
> (uint64_t(HotProb.numerator)) *
(sum edgeWeight(src, succ) for succ in pred.successors())
}
We could make the API richer as follows:
edgeProbExceeds(BasicBlock *pred, BasicBlock *succ, BranchProb prob)
Where BranchProb is an opaque type with various predefined levels of
"hotness" (e.g. Unexpected, Unlikely, Likely, Expected...).
A low level interface that exposes edge weights will only be used by passes that
need to incrementally update the result, or perform computations using branch
probability.
** Block Frequency
The result produced by BlockFrequencyAnalysis for mid-level IR and
MBBFrequencyAnalysis for MachineInstrs with mostly shared implementation across
IR and MI.
Conceptually, this is a floating point value that satisfies the closed form:
frequency(block) == (sum edgeFrequency(pred, block) for pred in block.preds())
Where "edge frequency" is defined as:
edgeFrequency(src, dest) == frequency(src) * edgeProbability(src, dest)
Representation: 32-bit unsigned int "scaled frequency" per block
For the purpose of diagnostics, the entry block always has a frequency of 1.0.
Consequently, we define scaled frequency as a fixed-point value with the lower
bits corresponding to the fractional component of frequency. For example, if we
want 8 bits of fractional frequency, a frequency of 1.0 is has a scaled
frequency of 2^8.
To avoid floating-point imprecision, multiplication by branch probability will
be implemented as fixed point arithmetic directly using the edge weights
provided by BranchProbabilityInfo.
Example:
edgeFrequency(src, dest) = uint64_t(frequency(src)) * edgeWeight(src, dest)
/ (sum edgeWeight(src, succ) for succ in src.successors())
...with some logic for saturating on overflow and underflow.
As with BranchProbilityInfo, BlockFrequencyAnalysis would have a rich API for
use by optimization passes to interpret the scaled frequencies as meaningful
values. Typically, a pass will be comparing the frequency of two blocks to
determine the relative hotness.
Scaled frequencies can be wrapped in a simple class that supports only
comparison and addition to ensure that clients don't misuse them.
Subtraction is not supported.
class ScaledFreq {
static ScaledFreq unity();
friend bool operator<(); // and friends
friend ScaledFreq operator+();
ScaledFreq scaled(unsigned num, unsigned denom) const;
};
A high-level API for comparing frequencies may look something like this:
// return true if the probability of B1 relative to B2 is at least Prob.
// e.g. compareFreq(B1, B2, Likely) is true if B1 is likely executed
// relative to B2.
bool compareFreq(B1, B2, Prob) {
return (uint64_t(B1) * (Prob.denom - Prob.numer))
> (uint64_t(B2) * Prob.numerator)
}