thr3ads.net - llvm dev - [llvm-dev] RFC: Synthetic function entry counts [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li via llvm-dev

2017-Dec-15 19:27 UTC

[llvm-dev] RFC: Synthetic function entry counts

On Fri, Dec 15, 2017 at 11:13 AM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>
> On Fri, Dec 15, 2017 at 10:22 AM, Easwaran Raman <eraman at
google.com>
> wrote:
>
>>
>>
>> On Fri, Dec 15, 2017 at 12:22 AM, Sean Silva <chisophugis at
gmail.com>
>> wrote:
>>
>>> IIUC, this proposal is just saying that we should infer a static
profile
>>> for entry counts just like we do for branch probabilities. In the
case of
>>> entry counts, we do not hide that information behind an analysis
like BPI,
>>> so currently just annotating synthetic PGO entry counts is a simple
>>> solution that piggybacks on the PGO mechanism and Just Works.
>>>
>>> If that is correct then this makes perfect sense to me.
>>>
>> Yes, that is the proposal.
>>
>>>
>>> It could be argued that we ought to refactor things so that the raw
PGO
>>> metadata is only ever accessed via a wrapper CGSCC analysis that
falls back
>>> to interprocedural analysis (i.e. static profile heuristics) when
the entry
>>> count metadata is missing, just like BPI does with static
intraprocedural
>>> analysis and branch weight metadata. However, we probably don't
want to do
>>> that while folks are still depending on the old PM in production
since
>>> CGSCC analyses don't exist there which would force us to
maintain an old
>>> and new way of doing it.
>>>
>>> (Also, it sounds like you want to compute this with a top-down
CGSCC
>>> traversal, so it might not actually be computable incrementally as
a bottom
>>> up CGSCC analysis which is what CGSCC analyses currently do; an
auxiliary
>>> module analysis for the top-down part might work around this
though)
>>>
>>
>>>
>> Wrapping function entry counts behind an analysis might be doable but
>> requires more careful thought.
>>
>
> Definitely.
>
>
>> Right now getEntryCount is called from various passes: inliner, loop
>> passes (unroll, loop sink), codegenprepare (for function layout),
machine
>> block placement, module summary analysis ... This might be worth
attempting
>> when we fully migrate to the new pass manager but not now.
>>
>
> It seems like many of those places that directly reference getEntryCount
> are using it to check "HasProfileData". So adding synthetic
values would
> cause us to take a lot of code paths in the compiler that were intended to
> use real profile data.
>
> As a simple example, MachineBlockPlacement::collectLoopBlockSet uses MBFI
> and MBPI guarded by a check of getEntryCount to see if PGO data is present
> (the comment even says "This needs precise profile data and we only do
this
> when profile data is available."). Do we need to refactor things first
to
> use a more semantically meaningful check, like
"hasRealProfileData" so that
> synthetic counts don't fool this code into thinking it has real profile
> data? At least, comments like the one I quoted need to be updated to
> indicate that a static profile is considered precise enough.
>
>
>I think the existing code that checks 'hasEntryCount()' for
'isProfileAvailable' information will need to be refactored to query
'isRealProfileAvailable'.

>
> Also, one other question: in the case of FullLTO, how do we ensure that
> the synthetic profile data from static heuristics is consistent across
> TU's? Do we need to recalculate the synthetic profile data? If we need
to
> recalculate it, presumably we need to know which profile data is synthetic
> or not. E.g. in a scenario where most TU's are compiled with profile
data,
> but some are not (e.g. vendor or runtime lib delivered as bitcode),
> recalculated synthetic counts at FullLTO time should honor real profile
> counts but ignore synthetic counts from per-TU compilation.
>
> The PGO+FullLTO case with a runtime lib delivered as bitcode is not
> hypothetical; this was actually a common configuration used in production
> while I was at PlayStation. CC'ing Davide, River to confirm that they
still
> care about this case.
>
>
> Also, more generally, can you clarify how your static entry count
> calculation algorithm is expected to work in the presence of partial
> profile data?
>
I think the BFI information from those modules with real profile data can
be used for the IPA frequency propagation purpose, but the 'real' entry
count information will need to be dropped.  The entry counts for those
library functions are collected and aggregated from contexts of different
hosting binaries and their usefulness is very questionable.   They needs to
be 'normalized' in the context of the target binary, which is what the
synthetic entry count computation is doing.

David


>
> -- Sean Silva
>
>
>>
>> Also, the need to run this logic (or similar logic) as a "ThinLTO
>>> analysis" suggests not wedding it too much with the
intricacies of the
>>> IR-level pass management (although admittedly we already do that
with the
>>> inliner and then ThinLTO has to approximate those inlining
decisions, so it
>>> might not be the end of the world to have some divergence).
>>>
>>> -- Sean Silva
>>>
>>> On Dec 12, 2017 5:02 PM, "Easwaran Raman via llvm-dev"
<
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Functions in LLVM IR have a function_entry_count metadata that
is
>>>> attached in PGO compilation. By using the entry count together
with the
>>>> block frequency info, the compiler computes the profile count
of call
>>>> instructions based on which the hotness/coldness of callsites
can be
>>>> determined. Experiments have shown that using a higher
threshold for hot
>>>> callsites results in improved runtime performance of the
generated code
>>>> without significant code size increases. We propose to generate
synthetic
>>>> function counts for non-PGO compilation and use the counts for
boosting hot
>>>> callsites during inlining.
>>>>
>>>> Synthetic function entry counts of functions are initialized
based on
>>>> properties of the function such as whether it is visible
outside the
>>>> module, whether it has an inline keyword and so on. Then, the
callgraph SCC
>>>> is traversed in reverse post-order. Counts of callsites are
determined
>>>> based on the entry count and the block frequency of the
callsite. The
>>>> callsite count gets added to the entry count of the callee. For
targets of
>>>> indirect calls, we will use the !callees metadata to find the
possible
>>>> targets and distribute the count equally among them. For
functions in a
>>>> non-trivial SCC, the algorithm has to ensure that the counts
are stable and
>>>> deterministic.
>>>>
>>>> In ThinLTO mode, the function summary contains the list of call
edges
>>>> from the function. We propose to add the relative block
frequency on these
>>>> edges. During the thinlink phase, we propagate the function
counts on the
>>>> entire call graph and update the function summary with the
synthetic
>>>> counts. Additionally, we plan to use the computed counts to
drive the
>>>> importing decisions.
>>>>
>>>> Alternative approach
>>>>
>>>> -----------------------------
>>>>
>>>>
>>>> An alternative to generating synthetic counts is to make block
>>>> frequency info an inter-procedural analysis. Such an analysis
would allow
>>>> comparing BFI of callsites in two different functions. This has
several
>>>> downsides:
>>>>
>>>>    -
>>>>
>>>>    The inter-procedural BFI computation is likely to be more
expensive
>>>>    in terms of compile-time.
>>>>    -
>>>>
>>>>    Many function passes invalidate the BFI. This will require
>>>>    selective invalidation of function BFIs.
>>>>    -
>>>>
>>>>    Inliner correctly updates function counts of a callee after
a
>>>>    callsite is inlined. We can piggyback on this mechanism by
using synthetic
>>>>    counts.
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171215/a5c8f619/attachment.html>

Sean Silva via llvm-dev

2017-Dec-15 19:56 UTC

head link

[llvm-dev] RFC: Synthetic function entry counts

On Fri, Dec 15, 2017 at 11:27 AM, Xinliang David Li <davidxl at
google.com>
wrote:
>
>
> On Fri, Dec 15, 2017 at 11:13 AM, Sean Silva <chisophugis at
gmail.com>
> wrote:
>
>>
>>
>> On Fri, Dec 15, 2017 at 10:22 AM, Easwaran Raman <eraman at
google.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Dec 15, 2017 at 12:22 AM, Sean Silva <chisophugis at
gmail.com>
>>> wrote:
>>>
>>>> IIUC, this proposal is just saying that we should infer a
static
>>>> profile for entry counts just like we do for branch
probabilities. In the
>>>> case of entry counts, we do not hide that information behind an
analysis
>>>> like BPI, so currently just annotating synthetic PGO entry
counts is a
>>>> simple solution that piggybacks on the PGO mechanism and Just
Works.
>>>>
>>>> If that is correct then this makes perfect sense to me.
>>>>
>>> Yes, that is the proposal.
>>>
>>>>
>>>> It could be argued that we ought to refactor things so that the
raw PGO
>>>> metadata is only ever accessed via a wrapper CGSCC analysis
that falls back
>>>> to interprocedural analysis (i.e. static profile heuristics)
when the entry
>>>> count metadata is missing, just like BPI does with static
intraprocedural
>>>> analysis and branch weight metadata. However, we probably
don't want to do
>>>> that while folks are still depending on the old PM in
production since
>>>> CGSCC analyses don't exist there which would force us to
maintain an old
>>>> and new way of doing it.
>>>>
>>>> (Also, it sounds like you want to compute this with a top-down
CGSCC
>>>> traversal, so it might not actually be computable incrementally
as a bottom
>>>> up CGSCC analysis which is what CGSCC analyses currently do; an
auxiliary
>>>> module analysis for the top-down part might work around this
though)
>>>>
>>>
>>>>
>>> Wrapping function entry counts behind an analysis might be doable
but
>>> requires more careful thought.
>>>
>>
>> Definitely.
>>
>>
>>> Right now getEntryCount is called from various passes: inliner,
loop
>>> passes (unroll, loop sink), codegenprepare (for function layout),
machine
>>> block placement, module summary analysis ... This might be worth
attempting
>>> when we fully migrate to the new pass manager but not now.
>>>
>>
>> It seems like many of those places that directly reference
getEntryCount
>> are using it to check "HasProfileData". So adding synthetic
values would
>> cause us to take a lot of code paths in the compiler that were intended
to
>> use real profile data.
>>
>> As a simple example, MachineBlockPlacement::collectLoopBlockSet uses
>> MBFI and MBPI guarded by a check of getEntryCount to see if PGO data is
>> present (the comment even says "This needs precise profile data
and we only
>> do this when profile data is available."). Do we need to refactor
things
>> first to use a more semantically meaningful check, like
>> "hasRealProfileData" so that synthetic counts don't fool
this code into
>> thinking it has real profile data? At least, comments like the one I
quoted
>> need to be updated to indicate that a static profile is considered
precise
>> enough.
>>
>>
>>
> I think the existing code that checks 'hasEntryCount()' for
> 'isProfileAvailable' information will need to be refactored to
query
> 'isRealProfileAvailable'.
>
>
>>
>> Also, one other question: in the case of FullLTO, how do we ensure that
>> the synthetic profile data from static heuristics is consistent across
>> TU's? Do we need to recalculate the synthetic profile data? If we
need to
>> recalculate it, presumably we need to know which profile data is
synthetic
>> or not. E.g. in a scenario where most TU's are compiled with
profile data,
>> but some are not (e.g. vendor or runtime lib delivered as bitcode),
>> recalculated synthetic counts at FullLTO time should honor real profile
>> counts but ignore synthetic counts from per-TU compilation.
>>
>> The PGO+FullLTO case with a runtime lib delivered as bitcode is not
>> hypothetical; this was actually a common configuration used in
production
>> while I was at PlayStation. CC'ing Davide, River to confirm that
they still
>> care about this case.
>>
>>
>> Also, more generally, can you clarify how your static entry count
>> calculation algorithm is expected to work in the presence of partial
>> profile data?
>>
>
> I think the BFI information from those modules with real profile data can
> be used for the IPA frequency propagation purpose, but the 'real'
entry
> count information will need to be dropped.  The entry counts for those
> library functions are collected and aggregated from contexts of different
> hosting binaries and their usefulness is very questionable.
>
The case at PlayStation was one where the vendor bitcode libraries simply
did not have PGO data at all.

>  They needs to be 'normalized' in the context of the target binary,
which
> is what the synthetic entry count computation is doing.
>
Real entry counts might still be useful. For example, in a GUI or web
server app there may be a large number of functions that are called only
indirectly via an event loop that might be in a system library (e.g. inside
a libevent or Qt .so file). The collected branch probabilities are not
enough to decide that one handler is much hotter than the other.
Is the plan to currently not support this case? (i.e., we would require you
to build the event loop library with PGO so that indirect call profiling
can infer the hotness data)

-- Sean Silva

> David
>
>
>
>>
>> -- Sean Silva
>>
>>
>>>
>>> Also, the need to run this logic (or similar logic) as a
"ThinLTO
>>>> analysis" suggests not wedding it too much with the
intricacies of the
>>>> IR-level pass management (although admittedly we already do
that with the
>>>> inliner and then ThinLTO has to approximate those inlining
decisions, so it
>>>> might not be the end of the world to have some divergence).
>>>>
>>>> -- Sean Silva
>>>>
>>>> On Dec 12, 2017 5:02 PM, "Easwaran Raman via
llvm-dev" <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Functions in LLVM IR have a function_entry_count metadata
that is
>>>>> attached in PGO compilation. By using the entry count
together with the
>>>>> block frequency info, the compiler computes the profile
count of call
>>>>> instructions based on which the hotness/coldness of
callsites can be
>>>>> determined. Experiments have shown that using a higher
threshold for hot
>>>>> callsites results in improved runtime performance of the
generated code
>>>>> without significant code size increases. We propose to
generate synthetic
>>>>> function counts for non-PGO compilation and use the counts
for boosting hot
>>>>> callsites during inlining.
>>>>>
>>>>> Synthetic function entry counts of functions are
initialized based on
>>>>> properties of the function such as whether it is visible
outside the
>>>>> module, whether it has an inline keyword and so on. Then,
the callgraph SCC
>>>>> is traversed in reverse post-order. Counts of callsites are
determined
>>>>> based on the entry count and the block frequency of the
callsite. The
>>>>> callsite count gets added to the entry count of the callee.
For targets of
>>>>> indirect calls, we will use the !callees metadata to find
the possible
>>>>> targets and distribute the count equally among them. For
functions in a
>>>>> non-trivial SCC, the algorithm has to ensure that the
counts are stable and
>>>>> deterministic.
>>>>>
>>>>> In ThinLTO mode, the function summary contains the list of
call edges
>>>>> from the function. We propose to add the relative block
frequency on these
>>>>> edges. During the thinlink phase, we propagate the function
counts on the
>>>>> entire call graph and update the function summary with the
synthetic
>>>>> counts. Additionally, we plan to use the computed counts to
drive the
>>>>> importing decisions.
>>>>>
>>>>> Alternative approach
>>>>>
>>>>> -----------------------------
>>>>>
>>>>>
>>>>> An alternative to generating synthetic counts is to make
block
>>>>> frequency info an inter-procedural analysis. Such an
analysis would allow
>>>>> comparing BFI of callsites in two different functions. This
has several
>>>>> downsides:
>>>>>
>>>>>    -
>>>>>
>>>>>    The inter-procedural BFI computation is likely to be
more
>>>>>    expensive in terms of compile-time.
>>>>>    -
>>>>>
>>>>>    Many function passes invalidate the BFI. This will
require
>>>>>    selective invalidation of function BFIs.
>>>>>    -
>>>>>
>>>>>    Inliner correctly updates function counts of a callee
after a
>>>>>    callsite is inlined. We can piggyback on this mechanism
by using synthetic
>>>>>    counts.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171215/19fe7895/attachment.html>

Xinliang David Li via llvm-dev

2017-Dec-15 21:28 UTC

head link

[llvm-dev] RFC: Synthetic function entry counts

On Fri, Dec 15, 2017 at 11:56 AM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>
> On Fri, Dec 15, 2017 at 11:27 AM, Xinliang David Li <davidxl at
google.com>
> wrote:
>
>>
>>
>> On Fri, Dec 15, 2017 at 11:13 AM, Sean Silva <chisophugis at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Dec 15, 2017 at 10:22 AM, Easwaran Raman <eraman at
google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Dec 15, 2017 at 12:22 AM, Sean Silva <chisophugis at
gmail.com>
>>>> wrote:
>>>>
>>>>> IIUC, this proposal is just saying that we should infer a
static
>>>>> profile for entry counts just like we do for branch
probabilities. In the
>>>>> case of entry counts, we do not hide that information
behind an analysis
>>>>> like BPI, so currently just annotating synthetic PGO entry
counts is a
>>>>> simple solution that piggybacks on the PGO mechanism and
Just Works.
>>>>>
>>>>> If that is correct then this makes perfect sense to me.
>>>>>
>>>> Yes, that is the proposal.
>>>>
>>>>>
>>>>> It could be argued that we ought to refactor things so that
the raw
>>>>> PGO metadata is only ever accessed via a wrapper CGSCC
analysis that falls
>>>>> back to interprocedural analysis (i.e. static profile
heuristics) when the
>>>>> entry count metadata is missing, just like BPI does with
static
>>>>> intraprocedural analysis and branch weight metadata.
However, we probably
>>>>> don't want to do that while folks are still depending
on the old PM in
>>>>> production since CGSCC analyses don't exist there which
would force us to
>>>>> maintain an old and new way of doing it.
>>>>>
>>>>> (Also, it sounds like you want to compute this with a
top-down CGSCC
>>>>> traversal, so it might not actually be computable
incrementally as a bottom
>>>>> up CGSCC analysis which is what CGSCC analyses currently
do; an auxiliary
>>>>> module analysis for the top-down part might work around
this though)
>>>>>
>>>>
>>>>>
>>>> Wrapping function entry counts behind an analysis might be
doable but
>>>> requires more careful thought.
>>>>
>>>
>>> Definitely.
>>>
>>>
>>>> Right now getEntryCount is called from various passes: inliner,
loop
>>>> passes (unroll, loop sink), codegenprepare (for function
layout), machine
>>>> block placement, module summary analysis ... This might be
worth attempting
>>>> when we fully migrate to the new pass manager but not now.
>>>>
>>>
>>> It seems like many of those places that directly reference
getEntryCount
>>> are using it to check "HasProfileData". So adding
synthetic values would
>>> cause us to take a lot of code paths in the compiler that were
intended to
>>> use real profile data.
>>>
>>> As a simple example, MachineBlockPlacement::collectLoopBlockSet
uses
>>> MBFI and MBPI guarded by a check of getEntryCount to see if PGO
data is
>>> present (the comment even says "This needs precise profile
data and we only
>>> do this when profile data is available."). Do we need to
refactor things
>>> first to use a more semantically meaningful check, like
>>> "hasRealProfileData" so that synthetic counts don't
fool this code into
>>> thinking it has real profile data? At least, comments like the one
I quoted
>>> need to be updated to indicate that a static profile is considered
precise
>>> enough.
>>>
>>>
>>>
>> I think the existing code that checks 'hasEntryCount()' for
>> 'isProfileAvailable' information will need to be refactored to
query
>> 'isRealProfileAvailable'.
>>
>>
>>>
>>> Also, one other question: in the case of FullLTO, how do we ensure
that
>>> the synthetic profile data from static heuristics is consistent
across
>>> TU's? Do we need to recalculate the synthetic profile data? If
we need to
>>> recalculate it, presumably we need to know which profile data is
synthetic
>>> or not. E.g. in a scenario where most TU's are compiled with
profile data,
>>> but some are not (e.g. vendor or runtime lib delivered as bitcode),
>>> recalculated synthetic counts at FullLTO time should honor real
profile
>>> counts but ignore synthetic counts from per-TU compilation.
>>>
>>> The PGO+FullLTO case with a runtime lib delivered as bitcode is not
>>> hypothetical; this was actually a common configuration used in
production
>>> while I was at PlayStation. CC'ing Davide, River to confirm
that they still
>>> care about this case.
>>>
>>>
>>> Also, more generally, can you clarify how your static entry count
>>> calculation algorithm is expected to work in the presence of
partial
>>> profile data?
>>>
>>
>> I think the BFI information from those modules with real profile data
can
>> be used for the IPA frequency propagation purpose, but the
'real' entry
>> count information will need to be dropped.  The entry counts for those
>> library functions are collected and aggregated from contexts of
different
>> hosting binaries and their usefulness is very questionable.
>>
>
> The case at PlayStation was one where the vendor bitcode libraries simply
> did not have PGO data at all.
>
>
>>  They needs to be 'normalized' in the context of the target
binary, which
>> is what the synthetic entry count computation is doing.
>>
>
> Real entry counts might still be useful. For example, in a GUI or web
> server app there may be a large number of functions that are called only
> indirectly via an event loop that might be in a system library (e.g. inside
> a libevent or Qt .so file). The collected branch probabilities are not
> enough to decide that one handler is much hotter than the other.
> Is the plan to currently not support this case? (i.e., we would require
> you to build the event loop library with PGO so that indirect call
> profiling can infer the hotness data)
>
>Unless we can indirect call promote those handlers and inline them, the
global entry info for this handlers won't matter much though.

David



> -- Sean Silva
>
>
>> David
>>
>>
>>
>>>
>>> -- Sean Silva
>>>
>>>
>>>>
>>>> Also, the need to run this logic (or similar logic) as a
"ThinLTO
>>>>> analysis" suggests not wedding it too much with the
intricacies of the
>>>>> IR-level pass management (although admittedly we already do
that with the
>>>>> inliner and then ThinLTO has to approximate those inlining
decisions, so it
>>>>> might not be the end of the world to have some divergence).
>>>>>
>>>>> -- Sean Silva
>>>>>
>>>>> On Dec 12, 2017 5:02 PM, "Easwaran Raman via
llvm-dev" <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Functions in LLVM IR have a function_entry_count
metadata that is
>>>>>> attached in PGO compilation. By using the entry count
together with the
>>>>>> block frequency info, the compiler computes the profile
count of call
>>>>>> instructions based on which the hotness/coldness of
callsites can be
>>>>>> determined. Experiments have shown that using a higher
threshold for hot
>>>>>> callsites results in improved runtime performance of
the generated code
>>>>>> without significant code size increases. We propose to
generate synthetic
>>>>>> function counts for non-PGO compilation and use the
counts for boosting hot
>>>>>> callsites during inlining.
>>>>>>
>>>>>> Synthetic function entry counts of functions are
initialized based on
>>>>>> properties of the function such as whether it is
visible outside the
>>>>>> module, whether it has an inline keyword and so on.
Then, the callgraph SCC
>>>>>> is traversed in reverse post-order. Counts of callsites
are determined
>>>>>> based on the entry count and the block frequency of the
callsite. The
>>>>>> callsite count gets added to the entry count of the
callee. For targets of
>>>>>> indirect calls, we will use the !callees metadata to
find the possible
>>>>>> targets and distribute the count equally among them.
For functions in a
>>>>>> non-trivial SCC, the algorithm has to ensure that the
counts are stable and
>>>>>> deterministic.
>>>>>>
>>>>>> In ThinLTO mode, the function summary contains the list
of call edges
>>>>>> from the function. We propose to add the relative block
frequency on these
>>>>>> edges. During the thinlink phase, we propagate the
function counts on the
>>>>>> entire call graph and update the function summary with
the synthetic
>>>>>> counts. Additionally, we plan to use the computed
counts to drive the
>>>>>> importing decisions.
>>>>>>
>>>>>> Alternative approach
>>>>>>
>>>>>> -----------------------------
>>>>>>
>>>>>>
>>>>>> An alternative to generating synthetic counts is to
make block
>>>>>> frequency info an inter-procedural analysis. Such an
analysis would allow
>>>>>> comparing BFI of callsites in two different functions.
This has several
>>>>>> downsides:
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>    The inter-procedural BFI computation is likely to be
more
>>>>>>    expensive in terms of compile-time.
>>>>>>    -
>>>>>>
>>>>>>    Many function passes invalidate the BFI. This will
require
>>>>>>    selective invalidation of function BFIs.
>>>>>>    -
>>>>>>
>>>>>>    Inliner correctly updates function counts of a
callee after a
>>>>>>    callsite is inlined. We can piggyback on this
mechanism by using synthetic
>>>>>>    counts.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171215/9ad628e8/attachment.html>

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Dec 2017 - RFC: Synthetic function entry counts

[llvm-dev] RFC: Synthetic function entry counts

[llvm-dev] RFC: Synthetic function entry counts

[llvm-dev] RFC: Synthetic function entry counts

Maybe Matching Threads