thr3ads.net - llvm dev - [LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one [Feb 2014]

If this information is useful, please help other people find it:
Share via:

Chandler Carruth

2014-Feb-02 10:13 UTC

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

Right now, all profile information is funneled through two analysis passes
prior to any part of the optimizer using it.

First, we have BranchProbabilityInfo, which provides a simple interface to
the simplest form of profile information: local and relative branch
probabilities. These merely express the likelihood of taking one of a
mutually exclusive set of exit paths from a basic block. They are very
simple, and the foundation of the profile information. Even the other
analysis is merely built on top of this one.

Second we have BlockFrequencyInfo which attempts to provide a more
"global"
(function-wide, not actually program wide) view of the statistical
frequency with which any particular basic block is executed. This is nicely
principled analysis that just computes the probabilistic flow of control
through the various branches according to their probabilities established
in the first analysis.


However, I think that BlockFrequencyInfo provides the wrong set of
information. There is one critical reason why. Let's take a totally
uninteresting CFG:

A -> B1, B2
B1 -> C1, C2
B2 -> C3, C4
C1 -> D1, D2
C2 -> ret
C3 -> D3, D4
C4 -> ret
D1 -> E1, E2
D2 -> ret
D3 -> E3, E4
D4 -> ret

You can imagine this repeating on for as many levels as you like. This
isn't an uncommon situation with real code. BlockFrequencyInfo computes for
this a very logical answer:

---- Block Freqs ----
 a = 1.0
  a -> b1 = 0.5
  a -> b2 = 0.5
 b1 = 0.5
  b1 -> c1 = 0.25
  b1 -> c2 = 0.25
 b2 = 0.5
  b2 -> c3 = 0.25
  b2 -> c4 = 0.25
 c1 = 0.25
  c1 -> d1 = 0.125
  c1 -> d2 = 0.125
 c2 = 0.25
 c3 = 0.25
  c3 -> d3 = 0.125
  c3 -> d4 = 0.125
 c4 = 0.25
 d1 = 0.125
  d1 -> e1 = 0.0625
  d1 -> e2 = 0.0625
 d2 = 0.125
 d3 = 0.125
  d3 -> e3 = 0.0625
  d3 -> e4 = 0.0625
 d4 = 0.125

One way of thinking about this is that for any basic block X which is
predicated by N branches of unbiased probability (50/50), the frequency
computed for that block is 2^(-N).

The problem is that this doesn't represent anything close to reality.
Processors' branch prediction works precisely because very, *very* few
branches in programs are 50/50. Most programs do not systematically explore
breadth first the full diversity of paths through the program. And yet, in
the absense of better information our heuristics would lead us to believe
(and act!) as though this were true.

Now, I'm not saying that the computation of block frequencies is wrong.
Merely that it cannot possibly be used for at least one of it purposes --
it's relative frequency (to the entry block "basis" frequency) is
completely useless for detecting hot or cold regions of a function -- it
will simply claim that all regions of the function are cold. What it *is*
useful for is establishing a total ordering over the basic blocks of a
function. So it works well for some things like code layout, but is grossly
misleading for others.


There are several possible solutions here. I'll outline my proposal as well
as some other ideas.


BlockWeights instead of BlockFrequencies. My idea is that we don't really
care about the depth of the control dependence for a particular basic
block. We care about the accumulated *bias* toward or away from a basic
block. This is predicated on the idea that branches are overwhelmingly
predictable. As a consequence, evenly distributed probabilities are really
just *uncertainties*. My proposed way of implementing this would take the
exact algorithm already used in BlockFrequency, but instead of computing a
frequency for an edge based on the probability of the branch times the
frequency of the predecessor, instead compute it as the frequency of the
predecessor times how "biased" the branch is relative to other
branches in
the predecessor. Essentially, this would make a branch probability set of
{0.5, 0.5} produce edge frequencies equal to the predecessor's edge
frequency.

The result of such a system would produce weights for every block in the
above CFG as '1.0', or equivalent to the entry block weight. This to me
is
a really useful metric -- it indicates that no block in the CFG is really
more or less likely than any other. Only *biases* in a specific direction
would cause the block weights to go up or down significantly. This would
immediately make the analysis useful to consumers such as the vectorizer,
unroller, or when we have the capability, the inliner and an outliner which
respect cold regions of functions.


One alternative proposed by Nick was using the average of block frequencies
across a domtree as the denominator to which a particular block's frequency
is compared. I haven't really thought as much about this because it seems
quite expensive to compute and is less intuitive to me, but I wanted to
mention it. Nick might be able to provide an explanation of it that would
help folks.

Related to the idea of using the domtree, I can think of many layers on top
of the existing block frequencies that could be used to accurately do "cold
region" detection, but they would all be significantly more analysis on top
of the existing system and so are somewhat less appealing to me.

Perhaps others have still more ideas? Also comments or suggestions on mine
are very welcome.

I have a really simple patch that implements my suggestion using the
existing frequency infrastructure (as it is almost exactly the same). I
would also want to rename everything to avoid confusion as these would no
longer be "frequencies" in any strict sense. I can provide test data
as
needed on the result of this change if people like the direction.

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140202/3e874fcc/attachment.html>

Andrew Trick

2014-Feb-03 02:18 UTC

head link

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

On Feb 2, 2014, at 2:13 AM, Chandler Carruth <chandlerc at gmail.com>
wrote:
> Right now, all profile information is funneled through two analysis passes
prior to any part of the optimizer using it.
> 
> First, we have BranchProbabilityInfo, which provides a simple interface to
the simplest form of profile information: local and relative branch
probabilities. These merely express the likelihood of taking one of a mutually
exclusive set of exit paths from a basic block. They are very simple, and the
foundation of the profile information. Even the other analysis is merely built
on top of this one.
> 
> Second we have BlockFrequencyInfo which attempts to provide a more
"global" (function-wide, not actually program wide) view of the
statistical frequency with which any particular basic block is executed. This is
nicely principled analysis that just computes the probabilistic flow of control
through the various branches according to their probabilities established in the
first analysis.
> 
> 
> However, I think that BlockFrequencyInfo provides the wrong set of
information. There is one critical reason why. Let's take a totally
uninteresting CFG:
> 
> A -> B1, B2
> B1 -> C1, C2
> B2 -> C3, C4
> C1 -> D1, D2
> C2 -> ret
> C3 -> D3, D4
> C4 -> ret
> D1 -> E1, E2
> D2 -> ret
> D3 -> E3, E4
> D4 -> ret
> 
> You can imagine this repeating on for as many levels as you like. This
isn't an uncommon situation with real code. BlockFrequencyInfo computes for
this a very logical answer:
> 
> ---- Block Freqs ----
>  a = 1.0
>   a -> b1 = 0.5
>   a -> b2 = 0.5
>  b1 = 0.5
>   b1 -> c1 = 0.25
>   b1 -> c2 = 0.25
>  b2 = 0.5
>   b2 -> c3 = 0.25
>   b2 -> c4 = 0.25
>  c1 = 0.25
>   c1 -> d1 = 0.125
>   c1 -> d2 = 0.125
>  c2 = 0.25
>  c3 = 0.25
>   c3 -> d3 = 0.125
>   c3 -> d4 = 0.125
>  c4 = 0.25
>  d1 = 0.125
>   d1 -> e1 = 0.0625
>   d1 -> e2 = 0.0625
>  d2 = 0.125
>  d3 = 0.125
>   d3 -> e3 = 0.0625
>   d3 -> e4 = 0.0625
>  d4 = 0.125
> 
> One way of thinking about this is that for any basic block X which is
predicated by N branches of unbiased probability (50/50), the frequency computed
for that block is 2^(-N).
> 
> The problem is that this doesn't represent anything close to reality.
Processors' branch prediction works precisely because very, *very* few
branches in programs are 50/50. Most programs do not systematically explore
breadth first the full diversity of paths through the program. And yet, in the
absense of better information our heuristics would lead us to believe (and act!)
as though this were true.
> 
> Now, I'm not saying that the computation of block frequencies is wrong.
Merely that it cannot possibly be used for at least one of it purposes --
it's relative frequency (to the entry block "basis" frequency) is
completely useless for detecting hot or cold regions of a function -- it will
simply claim that all regions of the function are cold. What it *is* useful for
is establishing a total ordering over the basic blocks of a function. So it
works well for some things like code layout, but is grossly misleading for
others.
> 
> 
> There are several possible solutions here. I'll outline my proposal as
well as some other ideas.
> 
> 
> BlockWeights instead of BlockFrequencies. My idea is that we don't
really care about the depth of the control dependence for a particular basic
block. We care about the accumulated *bias* toward or away from a basic block.
This is predicated on the idea that branches are overwhelmingly predictable. As
a consequence, evenly distributed probabilities are really just *uncertainties*.
This sounds like an interesting way to merge unkown branch probabilities, static
heuristics, statistical profiles, and partial profiles.
> My proposed way of implementing this would take the exact algorithm already
used in BlockFrequency, but instead of computing a frequency for an edge based
on the probability of the branch times the frequency of the predecessor, instead
compute it as the frequency of the predecessor times how "biased" the
branch is relative to other branches in the predecessor. Essentially, this would
make a branch probability set of {0.5, 0.5} produce edge frequencies equal to
the predecessor's edge frequency.
> 
> The result of such a system would produce weights for every block in the
above CFG as '1.0', or equivalent to the entry block weight. This to me
is a really useful metric -- it indicates that no block in the CFG is really
more or less likely than any other. Only *biases* in a specific direction would
cause the block weights to go up or down significantly.
I don't like this statement, or don't understand it. It is useful to
know a branch is unbiased. Currently we assume branches are unbiased then
optimize conservatively in those cases (do no harm). But if we had greater
confidence in unbiased branches (because the branch was actually profiled), we
could if-convert much more aggressively.
> This would immediately make the analysis useful to consumers such as the
vectorizer, unroller, or when we have the capability, the inliner and an
outliner which respect cold regions of functions.
I'm a little concerned that you're adding complexity that may not be
necessary. Algorithms like if-conversion make use of the fact that the block
weights are additive. I'm sure they could be adapted to something more
sophisticated, but these heuristics are notoriously hard to tune. There is value
in a simple model. You say that your approach is simple, but I would have to
think hard about handling block frequencies that are inconsistent with their
immediate branch probabilities.

So the question is: why do you really need more sophistication?

Part of you problem is that you're looking at frequency relative to the
entry block when you should be looking at frequency relative to the region being
optimized. When deciding to unroll, compare the loop header with the preheader.

I think you're trying to avoid unrolling even high trip count loops that are
seldom reached, and you're using a non-negligible frequency for
"cold" code (e.g. > 2^-10). In that case it sounds like we need a
static heuristic to handle the pattern of chains of branches with no CFG merge.
The fall-thru path should be > 0.5.

So, I think this is a cool idea. If I better understood how it worked I might be
less concerned about the complexity. I'm not very convinced that it is
necessary yet.

-Andy
> One alternative proposed by Nick was using the average of block frequencies
across a domtree as the denominator to which a particular block's frequency
is compared. I haven't really thought as much about this because it seems
quite expensive to compute and is less intuitive to me, but I wanted to mention
it. Nick might be able to provide an explanation of it that would help folks.
> 
> Related to the idea of using the domtree, I can think of many layers on top
of the existing block frequencies that could be used to accurately do "cold
region" detection, but they would all be significantly more analysis on top
of the existing system and so are somewhat less appealing to me.
> 
> Perhaps others have still more ideas? Also comments or suggestions on mine
are very welcome.
> 
> I have a really simple patch that implements my suggestion using the
existing frequency infrastructure (as it is almost exactly the same). I would
also want to rename everything to avoid confusion as these would no longer be
"frequencies" in any strict sense. I can provide test data as needed
on the result of this change if people like the direction.
> 
> -Chandler

Chandler Carruth

2014-Feb-03 02:55 UTC

head link

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

On Sun, Feb 2, 2014 at 6:18 PM, Andrew Trick <atrick at apple.com> wrote:
>
> On Feb 2, 2014, at 2:13 AM, Chandler Carruth <chandlerc at gmail.com>
wrote:
>
> > Right now, all profile information is funneled through two analysis
> passes prior to any part of the optimizer using it.
> >
> > First, we have BranchProbabilityInfo, which provides a simple
interface
> to the simplest form of profile information: local and relative branch
> probabilities. These merely express the likelihood of taking one of a
> mutually exclusive set of exit paths from a basic block. They are very
> simple, and the foundation of the profile information. Even the other
> analysis is merely built on top of this one.
> >
> > Second we have BlockFrequencyInfo which attempts to provide a more
> "global" (function-wide, not actually program wide) view of the
statistical
> frequency with which any particular basic block is executed. This is nicely
> principled analysis that just computes the probabilistic flow of control
> through the various branches according to their probabilities established
> in the first analysis.
> >
> >
> > However, I think that BlockFrequencyInfo provides the wrong set of
> information. There is one critical reason why. Let's take a totally
> uninteresting CFG:
> >
> > A -> B1, B2
> > B1 -> C1, C2
> > B2 -> C3, C4
> > C1 -> D1, D2
> > C2 -> ret
> > C3 -> D3, D4
> > C4 -> ret
> > D1 -> E1, E2
> > D2 -> ret
> > D3 -> E3, E4
> > D4 -> ret
> >
> > You can imagine this repeating on for as many levels as you like. This
> isn't an uncommon situation with real code. BlockFrequencyInfo computes
for
> this a very logical answer:
> >
> > ---- Block Freqs ----
> >  a = 1.0
> >   a -> b1 = 0.5
> >   a -> b2 = 0.5
> >  b1 = 0.5
> >   b1 -> c1 = 0.25
> >   b1 -> c2 = 0.25
> >  b2 = 0.5
> >   b2 -> c3 = 0.25
> >   b2 -> c4 = 0.25
> >  c1 = 0.25
> >   c1 -> d1 = 0.125
> >   c1 -> d2 = 0.125
> >  c2 = 0.25
> >  c3 = 0.25
> >   c3 -> d3 = 0.125
> >   c3 -> d4 = 0.125
> >  c4 = 0.25
> >  d1 = 0.125
> >   d1 -> e1 = 0.0625
> >   d1 -> e2 = 0.0625
> >  d2 = 0.125
> >  d3 = 0.125
> >   d3 -> e3 = 0.0625
> >   d3 -> e4 = 0.0625
> >  d4 = 0.125
> >
> > One way of thinking about this is that for any basic block X which is
> predicated by N branches of unbiased probability (50/50), the frequency
> computed for that block is 2^(-N).
> >
> > The problem is that this doesn't represent anything close to
reality.
> Processors' branch prediction works precisely because very, *very* few
> branches in programs are 50/50. Most programs do not systematically explore
> breadth first the full diversity of paths through the program. And yet, in
> the absense of better information our heuristics would lead us to believe
> (and act!) as though this were true.
> >
> > Now, I'm not saying that the computation of block frequencies is
wrong.
> Merely that it cannot possibly be used for at least one of it purposes --
> it's relative frequency (to the entry block "basis"
frequency) is
> completely useless for detecting hot or cold regions of a function -- it
> will simply claim that all regions of the function are cold. What it *is*
> useful for is establishing a total ordering over the basic blocks of a
> function. So it works well for some things like code layout, but is grossly
> misleading for others.
> >
> >
> > There are several possible solutions here. I'll outline my
proposal as
> well as some other ideas.
> >
> >
> > BlockWeights instead of BlockFrequencies. My idea is that we don't
> really care about the depth of the control dependence for a particular
> basic block. We care about the accumulated *bias* toward or away from a
> basic block. This is predicated on the idea that branches are
> overwhelmingly predictable. As a consequence, evenly distributed
> probabilities are really just *uncertainties*.
>
> This sounds like an interesting way to merge unkown branch probabilities,
> static heuristics, statistical profiles, and partial profiles.
>
> > My proposed way of implementing this would take the exact algorithm
> already used in BlockFrequency, but instead of computing a frequency for an
> edge based on the probability of the branch times the frequency of the
> predecessor, instead compute it as the frequency of the predecessor times
> how "biased" the branch is relative to other branches in the
predecessor.
> Essentially, this would make a branch probability set of {0.5, 0.5} produce
> edge frequencies equal to the predecessor's edge frequency.
> >
> > The result of such a system would produce weights for every block in
the
> above CFG as '1.0', or equivalent to the entry block weight. This
to me is
> a really useful metric -- it indicates that no block in the CFG is really
> more or less likely than any other. Only *biases* in a specific direction
> would cause the block weights to go up or down significantly.
>
> I don't like this statement, or don't understand it. It is useful
to know
> a branch is unbiased. Currently we assume branches are unbiased then
> optimize conservatively in those cases (do no harm). But if we had greater
> confidence in unbiased branches (because the branch was actually profiled),
> we could if-convert much more aggressively.
>
I think the key is that I'm not talking about changing how we model
branches at all. I agree that it would be useful to model both our
confidence in the relative probabilities and the probabilities themselves
for things like if-conversion. We can still do that.

But the current users of 'block frequency' are not trying to understand
a
specific branch as biased or un-biased, or deal with our confidence.
They're just trying to understand how "important" a specific basic
block is
to the execution of the function.

> > This would immediately make the analysis useful to consumers such as
the
> vectorizer, unroller, or when we have the capability, the inliner and an
> outliner which respect cold regions of functions.
>
> I'm a little concerned that you're adding complexity that may not
be
> necessary. Algorithms like if-conversion make use of the fact that the
> block weights are additive.

Are you talking about branch probabilities here? I don't believe we use
block frequencies in anything other than:
1) MachineBlockPlacement
2) Spill cost, placement, and generally the register allocator
3) StackSlotColoring (but really only for spill weight computation)
4) LoopVectorizer (disabled currently)

I just want to make sure we're on the same page. I don't think it really
invalidates your concern.

I'm sure they could be adapted to something more sophisticated, but
these> heuristics are notoriously hard to tune. There is value in a simple model.
> You say that your approach is simple, but I would have to think hard about
> handling block frequencies that are inconsistent with their immediate
> branch probabilities.
>
If they were inconsistent, I would agree. But they are in fact consistent
and additive. They even preserve information during CFG merge points. The
only difference is that they scale (both up and down) relative to the bias
of a branch (probability's delta from the mean) rather than relative to the
flat probability.

So I don't think this makes the computation, propagation or mathematical
model more complex. What it does is shift it from a flow frequency modeling
to biased branch modeling. That is certainly a change and worth justifying,
I just don't want you to imagine it as having more impact than it does.

So the question is: why do you really need more
sophistication?>
> Part of you problem is that you're looking at frequency relative to the
> entry block when you should be looking at frequency relative to the region
> being optimized. When deciding to unroll, compare the loop header with the
> preheader.
>
Actually, I did that already. ;] I really thought that would work back when
we were first discussing how to use block frequencies and have not
forgotten it. However, that doesn't actually capture the the frequency
relative to the region. I think an example is best. Let's imagine that in
my CFG above "E1" is a loop preheader. There is some loop header below
it
in the CFG that I hadn't reached yet.

The first thing to realize is that the heuristic we want is not how hot (or
cold) the loop header is relative to the preheader. That tells us, once we
execute this loop, how many times are we likely to take the backedge (or
for a nested loop, one of the backedges). But this number may be correctly
be arbitrarily large if we have a loop in a cold region which when executed
loops for a very long time. What we actually want to check for is how
frequently we enter the loop from outside the loop at all.

So the test is how "hot" is the loop preheader relative to its
"region".
Now, if the loop preheader is nested inside of another loop, we might
usefully compare the inner loop's preheader to the outer loop's header
and
get a useful metric of "hot" or "cold". But what happens
when we are trying
to test for this in the outer most loop? We need some way to define a
"region" of the function body itself then. There are a bunch of viable
definitions. The one which maps to the loop header for a loop nest is the
function entry block. However, that is the precise circumstance that
doesn't work -- in a branch program, most blocks are "cold"
relative to the
function entry if we simply trust the block frequency.

Now, there are other ways to define a "region". I mentioned one as an
alternative to my proposal -- we could look at block frequencies only
relative to the root of their domtree. Another would be to use SESE
regions, or any number of others. However, they all have the problem that
we must do substantial work and add substantial complexity and
sophistication to compute the region, the frequency for that region, and
then compare ours against it.

As a practical matter, they also have the problem of accumulating
significant error rapidly due to continual division of the entry frequency.
My proposal actually mitigates such problems significantly.

> I think you're trying to avoid unrolling even high trip count loops
that
> are  seldom reached, and you're using a non-negligible frequency for
"cold"
> code (e.g. > 2^-10). In that case it sounds like we need a static
heuristic
> to handle the pattern of chains of branches with no CFG merge. The
> fall-thru path should be > 0.5.
>
I'm really not looking at a single problem or solution. There are a wide
variety of consumers of profile information that want to specially handle
both cold and hot regions of a function. Looking at the frequencies as they
exist today, these transformations cannot conclude either cold or hot for a
region of code from them. We need something better.

If we had any static heuristic that was good at predicting *which* long
chain should be >0.5 probability, we would be using it already. The problem
is that there is no single idea of "fallthrough". We canonicalize the
branches, and even programmers cannot make up their mind. Is the early exit
the fast-path or is the early exit the error handling? Both patterns exist
widely. If we have raw profile information, then yes, this is *much* less
of a problem because we will actually know what direction all of the
predictable branches in the program actually take. But our analyses
currently are deceptive in the absence of strong profile information by
giving the impression that the resulting branches are *unpredictable*
instead of being predictable and unknown.

So, I think this is a cool idea. If I better understood how it worked
I> might be less concerned about the complexity. I'm not very convinced
that
> it is necessary yet.
>
Ok, for the necessity, simply take any function from a simple benchmark,
and look at its static block frequencies. Arnold posted a great example
from libquantum. Here we have the most boring code (1 loop in a tiny basic
block) and in a boring benchmark that is completely predictable. Yet still,
we think that the loop is reached less than 20% of the times that the
function is called. That just doesn't make sense.

If you want to take a more complex approach, we could definitely add a
confidence dimension to the branch probability. This would allow us to
express a high-confidence un-predictable branch and optimize based on that.
However, we would *still* want block weights to be computed more like I
have proposed. Instead of only branch probabilities that were far outside
of 50/50 influencing it, we would want branch probabilities that were
highly confident to influence it. The practical result would be exactly the
same as what I have proposed, it is just that I'm making the simplifying
assumption that at the moment, there are no evenly distributed branch
probabilities which we have high confidence in.

My suspicion is that we will find even with direct profile information,
this will still be true. Mostly, I suspect there are relatively few cases
where it is important we model an even distribution of branches with high
probability. I'm happy to be wrong about this, but *that* is where I want
to wait for more data to make a decision that adds complexity to the system.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140202/d6e6d889/attachment.html>

Owen Anderson

2014-Feb-03 05:02 UTC

head link

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

On Feb 2, 2014, at 6:18 PM, Andrew Trick <atrick at apple.com> wrote:
>> The result of such a system would produce weights for every block in
the above CFG as '1.0', or equivalent to the entry block weight. This to
me is a really useful metric -- it indicates that no block in the CFG is really
more or less likely than any other. Only *biases* in a specific direction would
cause the block weights to go up or down significantly.
> 
> I don't like this statement, or don't understand it. It is useful
to know a branch is unbiased. Currently we assume branches are unbiased then
optimize conservatively in those cases (do no harm). But if we had greater
confidence in unbiased branches (because the branch was actually profiled), we
could if-convert much more aggressively
As an aside, I’d like to point out that knowing that a branch is unbiased is not
sufficient to know that it is poorly predicted and therefore profitable to
if-convert.   Consider a case where a branch is evaluated 100 times, going to
the true destination 50 times and the false destination 50 times.  If the cases
where it goes each particular direction is strongly correlated with prior
branches in the execution trace, the branch predictor’s history table will pick
up on that and successfully predict this branch despite it being unbiased when
considered in isolation.

—Owen
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140202/e0247ca0/attachment.html>

Diego Novillo

2014-Feb-03 15:47 UTC

head link

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

On Sun, Feb 2, 2014 at 5:13 AM, Chandler Carruth <chandlerc at gmail.com>
wrote:
> ---- Block Freqs ----
>  a = 1.0
>   a -> b1 = 0.5
>   a -> b2 = 0.5
>  b1 = 0.5
>   b1 -> c1 = 0.25
>   b1 -> c2 = 0.25
But this looks as if none of the edge probabilities were taking static
indicators into account? An edge going into a return, abort(), etc.
> BlockWeights instead of BlockFrequencies. My idea is that we don't
really
> care about the depth of the control dependence for a particular basic
block.
I agree. I naively thought that's what it's already done today. If you
follow this, then the relative hotness of blocks/regions can be
computed based on the aggregate weight by the paths into them.

Now, when we have sample information, we have raw block counts.
Wouldn't it make sense to just use them in that case?


Diego.

Duncan P. N. Exon Smith

2014-Feb-05 00:23 UTC

head link

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

On Feb 3, 2014, at 7:47 AM, Diego Novillo <dnovillo at google.com> wrote:
> Now, when we have sample information, we have raw block counts.
> Wouldn't it make sense to just use them in that case?
Raw block counts will give the same type of information as the current metric of
block frequency (although more accurate for the profiled use case).  If you have
profile data, running BranchProbabilityInfo followed by BlockFrequencyInfo
should recreate block frequencies roughly equivalent to the raw data scaled to a
“standard” entry frequency for each function, modulo adjustments for sanity and
precision.

Where this metric is unhelpful, raw block counts will also be unhelpful.

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Feb 2014 - [LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

Possibly Parallel Threads