thr3ads.net - llvm dev - [llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Fedor Sergeev via llvm-dev

2019-Mar-13 23:20 UTC

[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager

On 3/14/19 2:04 AM, Hiroshi Yamauchi wrote:>
>
> On Wed, Mar 13, 2019 at 2:37 PM Fedor Sergeev <fedor.sergeev at azul.com
> <mailto:fedor.sergeev at azul.com>> wrote:
>
>>
>>     - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass
>>     to be able to get to the ModuleAnalysisManager in one step and
>>     PSI through it.
>     This is just an optimization of compile-time, saves one
>     indirection through FunctionAnalysisManager.
>     I'm not even sure if it is worth the effort. And definitely not
>     crucial for the overall idea.
>
>
> This should probably be clarified to something like:
>
> - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass to be 
> able to get to the ModuleAnalysisManager and PSI because it may not 
> always through (const) FunctionAnalysisManager, 
> unless ModuleAnalysisManagerFunctionProxy is already cached.
>
> Since FunctionAnalysisManager we can get from LoopAnalysisManager is a 
> const ref, we cannot call getResult on it and always get 
> ModuleAnalysisManager and PSI (see below.) This actually happens in my 
> experiment.
>
> SomeLoopPass::run(Loop &L, LoopAnalysisManager &LAM, …) {
> auto &FAM = LAM.getResult<FunctionAnalysisManagerLoopProxy>(L, 
> AR).getManager();
> auto *MAMProxy =
FAM.getCachedResult<ModuleAnalysisManagerFunctionProxy>(
>   L.getHeader()->getParent()); *// Can be null*Oh... well...
> If (MAMProxy) {
> auto &MAM = MAMProxy->getManager();
> auto *PSI =
MAM.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());
>   } else {
> *// Can't get MAM and PSI.*
>   }
> ...
>
> ->
>
> SomeLoopPass::run(Loop &L, LoopAnalysisManager &LAM, …) {
> auto &MAM = LAM.getResult<ModuleAnalysisManagerLoopProxy>(L, 
> AR).getManager(); *// Not null*
> auto *PSI =
MAM.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());
>   ...
>
>
> AFAICT, adding ModuleAnalysisManagerLoopProxy seems to be as simple as:
>
> /// A proxy from a \c ModuleAnalysisManager to a \c Loop.
> typedef OuterAnalysisManagerProxy<ModuleAnalysisManager, Loop,
> LoopStandardAnalysisResults &>
> ModuleAnalysisManagerLoopProxy;It also needs to be added to PassBuilder::crossRegisterProxies...
But yes, that appears to be a required action.

regards,
   Fedor.>
>
>
>     regards,
>       Fedor.
>>
>>
>>
>>
>>     On Mon, Mar 4, 2019 at 2:05 PM Fedor Sergeev
>>     <fedor.sergeev at azul.com <mailto:fedor.sergeev at
azul.com>> wrote:
>>
>>
>>
>>         On 3/4/19 10:49 PM, Hiroshi Yamauchi wrote:
>>>
>>>
>>>         On Mon, Mar 4, 2019 at 10:55 AM Hiroshi Yamauchi
>>>         <yamauchi at google.com <mailto:yamauchi at
google.com>> wrote:
>>>
>>>
>>>
>>>             On Sat, Mar 2, 2019 at 12:58 AM Fedor Sergeev
>>>             <fedor.sergeev at azul.com <mailto:fedor.sergeev
at azul.com>>
>>>             wrote:
>>>
>>>
>>>
>>>                 On 3/2/19 2:38 AM, Hiroshi Yamauchi wrote:
>>>>                 Here's a sketch of the proposed approach
for just
>>>>                 one pass(but imagine more)
>>>>
>>>>                 https://reviews.llvm.org/D58845
>>>>
>>>>                 On Fri, Mar 1, 2019 at 12:54 PM Fedor Sergeev
via
>>>>                 llvm-dev <llvm-dev at lists.llvm.org
>>>>                 <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>>
>>>>                     On 2/28/19 12:47 AM, Hiroshi Yamauchi via
>>>>                     llvm-dev wrote:
>>>>>                     Hi all,
>>>>>
>>>>>                     To implement more profile-guided
>>>>>                     optimizations, we’d like to use
>>>>>                     ProfileSummaryInfo (PSI) and
>>>>>                     BlockFrequencyInfo (BFI) from more
passes of
>>>>>                     various types, under the new pass
manager.
>>>>>
>>>>>                     The following is what we came up with.
Would
>>>>>                     appreciate feedback. Thanks.
>>>>>
>>>>>                     Issue
>>>>>
>>>>>                     It’s not obvious (to me) how to best do
this,
>>>>>                     given that we cannot request an
outer-scope
>>>>>                     analysis result from an inner-scope
pass
>>>>>                     through analysis managers [1] and that
we
>>>>>                     might unnecessarily running some
analyses
>>>>>                     unless we conditionally build pass
pipelines
>>>>>                     for PGO cases.
>>>>                     Indeed, this is an intentional restriction
in
>>>>                     new pass manager, which is more or less a
>>>>                     reflection of a fundamental property of
>>>>                     outer-inner IRUnit relationship
>>>>                     and transformations/analyses run on those
>>>>                     units. The main intent for having those
inner
>>>>                     IRUnits (e.g. Loops) is to run local
>>>>                     transformations and save compile time
>>>>                     on being local to a particular small piece
of
>>>>                     IR. Loop Pass manager allows you to run a
whole
>>>>                     pipeline of different transformations still
>>>>                     locally, amplifying the save.
>>>>                     As soon as you run function-level analysis
from
>>>>                     within the loop pipeline you essentially
break
>>>>                     this pipelining.
>>>>                     Say, as you run your loop transformation it
>>>>                     modifies the loop (and the function) and
>>>>                     potentially invalidates the analysis,
>>>>                     so you have to rerun your analysis again
and
>>>>                     again. Hence instead of saving on compile
time
>>>>                     it ends up increasing it.
>>>>
>>>>
>>>>                 Exactly.
>>>>
>>>>
>>>>                     I have hit this issue somewhat recently
with
>>>>                     dependency of loop passes on
BranchProbabilityInfo.
>>>>                     (some loop passes, like IRCE can use it for
>>>>                     profitability analysis).
>>>>
>>>>                     The only solution that appears to be
reasonable
>>>>                     there is to teach all the loops passes that
>>>>                     need to be pipelined
>>>>                     to preserve BPI (or any other
>>>>                     module/function-level analyses) similar to
how
>>>>                     they preserve DominatorTree and
>>>>                     other "LoopStandard" analyses.
>>>>
>>>>
>>>>                 Is this implemented - do the loop passes
preserve BPI?
>>>                 Nope, not implemented right now.
>>>                 One of the problems is that even loop
>>>                 canonicalization passes run at the start of loop
>>>                 pass manager dont preserve it
>>>                 (and at least LoopSimplifyCFG does change control
flow).
>>>>
>>>>                 In buildFunctionSimplificationPipeline
>>>>                 (where LoopFullUnrollPass is added as in the
>>>>                 sketch), LateLoopOptimizationsEPCallbacks
>>>>                 and LoopOptimizerEndEPCallbacks seem to allow
some
>>>>                 arbitrary loop passes to be inserted into the
>>>>                 pipelines (via flags)?
>>>>
>>>>                 I wonder how hard it'd be to teach all the
relevant
>>>>                 loop passes to preserve BFI(or BPI)..
>>>                 Well, each time you restructure control flow around
>>>                 the loops you will have to update those extra
analyses,
>>>                 pretty much the same way as DT is being updated
>>>                 through DomTreeUpdater.
>>>                 The trick is to design a proper update interface
>>>                 (and then implement it ;) ).
>>>                 And I have not spent enough time on this issue to
>>>                 get a good idea of what that interface would be.
>>>
>>>
>>>             Hm, sounds non-trivial :) noting BFI depends on BPI.
>>>
>>>
>>>         To step back, it looks like:
>>>
>>>         want to use profiles from more passes -> need to get BFI
>>>         (from loop passes) -> need all the loop passes to
preserve BFI.
>>>
>>>         I wonder if there's no way around this.
>>         Indeed. I believe this is a general consensus here.
>>
>>         regards,
>>           Fedor.
>>
>>>
>>>
>>>
>>>                 regards,
>>>                   Fedor.
>>>
>>>>
>>>>>                     It seems that for different types of
passes to
>>>>>                     be able to get PSI and BFI, we’d need
to
>>>>>                     ensure PSI is cached for a non-module
pass,
>>>>>                     and PSI, BFI and the
ModuleAnalysisManager
>>>>>                     proxy are cached for a loop pass in the
pass
>>>>>                     pipelines. This may mean potentially
needing
>>>>>                     to insert BFI/PSI in front of many
passes [2].
>>>>>                     It seems not obvious how to
conditionally
>>>>>                     insert BFI for PGO pipelines because
there
>>>>>                     isn’t always a good flag to detect PGO
cases
>>>>>                     [3] or we tend to build pass pipelines
before
>>>>>                     examining the code (or without
propagating
>>>>>                     enough info down) [4].
>>>>>
>>>>>                     Proposed approach
>>>>>
>>>>>                     - Cache PSI right after the profile
summary in
>>>>>                     the IR is written in the pass pipeline
[5].
>>>>>                     This would avoid the need to insert
>>>>>                     RequiredAnalysisPass for PSI before
each
>>>>>                     non-module pass that needs it. PSI can
be
>>>>>                     technically invalidated but unlikely.
If it
>>>>>                     does, we insert another
RequiredAnalysisPass[6].
>>>>>
>>>>>                     - Conditionally insert
RequireAnalysisPass for
>>>>>                     BFI, if PGO, right before each loop
pass that
>>>>>                     needs it. This doesn't seem
avoidable because
>>>>>                     BFI can be invalidated whenever the CFG
>>>>>                     changes. We detect PGO based on the
command
>>>>>                     line flags and/or whether the module
has the
>>>>>                     profile summary info (we may need to
pass the
>>>>>                     module to more functions.)
>>>>>
>>>>>                     - Add a new proxy
>>>>>                     ModuleAnalysisManagerLoopProxy for a
loop pass
>>>>>                     to be able to get to the
ModuleAnalysisManager
>>>>>                     in one step and PSI through it.
>>>>>
>>>>>                     Alternative approaches
>>>>>
>>>>>                     Dropping BFI and use PSI only
>>>>>                     We could consider not using BFI and
solely
>>>>>                     relying on PSI and function-level
profiles
>>>>>                     only (as opposed to block-level), but
profile
>>>>>                     precision would suffer.
>>>>>
>>>>>                     Computing BFI in-place
>>>>>                     We could consider computing BFI
“in-place” by
>>>>>                     directly running BFI outside of the
pass
>>>>>                     manager [7]. This would let us avoid
using the
>>>>>                     analysis manager constraints but it
would
>>>>>                     still involve running an outer-scope
analysis
>>>>>                     from an inner-scope pass and
potentially cause
>>>>>                     problems in terms of pass pipelining
and
>>>>>                     concurrency. Moreover, a potential
downside of
>>>>>                     running analyses in-place is that it
won’t
>>>>>                     take advantage of cached analysis
results
>>>>>                     provided by the pass manager.
>>>>>
>>>>>                     Adding inner-scope versions of PSI and
BFI
>>>>>                     We could consider adding a
function-level and
>>>>>                     loop-level PSI and loop-level BFI,
which
>>>>>                     internally act like their outer-scope
versions
>>>>>                     but provide inner-scope results only.
This
>>>>>                     way, we could always call getResult for
PSI
>>>>>                     and BFI. However, this would still
involve
>>>>>                     running an outer-scope analysis from an
>>>>>                     inner-scope pass.
>>>>>
>>>>>                     Caching the FAM and the MAM proxies
>>>>>                     We could consider caching the
>>>>>                     FunctionalAnalysisManager and the
>>>>>                     ModuleAnalysisManager proxies once
early on
>>>>>                     instead of adding a new proxy. But it
seems to
>>>>>                     not likely work well because the
analysis
>>>>>                     cache key type includes the function or
the
>>>>>                     module and some pass may add a new
function
>>>>>                     for which the proxy wouldn’t be cached.
We’d
>>>>>                     need to write and insert a pass in
select
>>>>>                     locations to just fill the cache.
Adding the
>>>>>                     new proxy would take care of these with
a
>>>>>                     three-line change.
>>>>>
>>>>>                     Conditional BFI
>>>>>                     We could consider adding a conditional
BFI
>>>>>                     analysis that is a wrapper around BFI
and
>>>>>                     computes BFI only if profiles are
available
>>>>>                     (either checking the module has profile
>>>>>                     summary or depend on the PSI.) With
this, we
>>>>>                     wouldn’t need to conditionally build
pass
>>>>>                     pipelines and may work for the new pass
>>>>>                     manager. But a similar wouldn’t work
for the
>>>>>                     old pass manager because we cannot
>>>>>                     conditionally depend on an analysis
under it.
>>>>                     There is LazyBlockFrequencyInfo.
>>>>                     Not sure how well it fits this idea.
>>>>
>>>>
>>>>                 Good point. LazyBlockFrequencyInfo seems usable
>>>>                 with the old pass manager (save unnecessary
>>>>                 BFI/BPI) and would work for function passes. I
>>>>                 think the restriction still applies - a loop
pass
>>>>                 cannot still request (outer-scope) BFI, lazy or
>>>>                 not, new or old (pass manager). Another
assumption
>>>>                 is that it'd be cheap and safe to
unconditionally
>>>>                 depend on PSI or check the module's profile
summary.
>>>>
>>>>
>>>>                     regards,
>>>>                       Fedor.
>>>>
>>>>>
>>>>>
>>>>>                     [1] We cannot call
AnalysisManager::getResult
>>>>>                     for an outer scope but only
getCachedResult.
>>>>>                     Probably because of potential
pipelining or
>>>>>                     concurrency issues.
>>>>>                     [2] For example, potentially breaking
up
>>>>>                     multiple pipelined loop passes and
insert
>>>>>                    
RequireAnalysisPass<BlockFrequencyAnalysis> in
>>>>>                     front of each of them.
>>>>>                     [3] For example, -fprofile-instr-use
and
>>>>>                     -fprofile-sample-use aren’t present in
ThinLTO
>>>>>                     post link builds.
>>>>>                     [4] For example, we could check whether
the
>>>>>                     module has the profile summary metadata
>>>>>                     annotated when building pass pipelines
but we
>>>>>                     don’t always pass the module down to
the place
>>>>>                     where we build pass pipelines.
>>>>>                     [5] By inserting
>>>>>                    
RequireAnalysisPass<ProfileSummaryInfo> after
>>>>>                     the PGOInstrumentationUse and the
>>>>>                     SampleProfileLoaderPass passes (and
around the
>>>>>                     PGOIndirectCallPromotion pass for the
Thin LTO
>>>>>                     post link pipeline.)
>>>>>                     [6] For example, the context-sensitive
PGO.
>>>>>                     [7] Directly calling its constructor
along
>>>>>                     with the dependent analyses results,
eg. the
>>>>>                     jump threading pass.
>>>>>
>>>>>                    
_______________________________________________
>>>>>                     LLVM Developers mailing list
>>>>>                     llvm-dev at lists.llvm.org 
<mailto:llvm-dev at lists.llvm.org>
>>>>>                    
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>                    
_______________________________________________
>>>>                     LLVM Developers mailing list
>>>>                     llvm-dev at lists.llvm.org
>>>>                     <mailto:llvm-dev at lists.llvm.org>
>>>>                    
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190314/828db884/attachment-0001.html>

Hiroshi Yamauchi via llvm-dev

2019-Mar-13 23:52 UTC

head link

[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager

On Wed, Mar 13, 2019 at 4:20 PM Fedor Sergeev <fedor.sergeev at azul.com>
wrote:
>
>
> On 3/14/19 2:04 AM, Hiroshi Yamauchi wrote:
>
>
>
> On Wed, Mar 13, 2019 at 2:37 PM Fedor Sergeev <fedor.sergeev at
azul.com>
> wrote:
>
>>
>> - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass to be
>> able to get to the ModuleAnalysisManager in one step and PSI through
it.
>>
>> This is just an optimization of compile-time, saves one indirection
>> through FunctionAnalysisManager.
>> I'm not even sure if it is worth the effort. And definitely not
crucial
>> for the overall idea.
>>
>
> This should probably be clarified to something like:
>
> - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass to be
> able to get to the ModuleAnalysisManager and PSI because it may not always
> through (const) FunctionAnalysisManager, unless
ModuleAnalysisManagerFunctionProxy
> is already cached.
>
> Since FunctionAnalysisManager we can get from LoopAnalysisManager is a
> const ref, we cannot call getResult on it and always get
> ModuleAnalysisManager and PSI (see below.) This actually happens in my
> experiment.
>
> SomeLoopPass::run(Loop &L, LoopAnalysisManager &LAM, …) {
>   auto &FAM = LAM.getResult<FunctionAnalysisManagerLoopProxy>(L,
> AR).getManager();
>   auto *MAMProxy =
FAM.getCachedResult<ModuleAnalysisManagerFunctionProxy>(
>       L.getHeader()->getParent()); *// Can be null*
>
> Oh... well...
>
>   If (MAMProxy) {
>     auto &MAM = MAMProxy->getManager();
>     auto *PSI >
MAM.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());
>   } else {
>     *// Can't get MAM and PSI.*
>   }
>   ...
>
> ->
>
> SomeLoopPass::run(Loop &L, LoopAnalysisManager &LAM, …) {
>   auto &MAM = LAM.getResult<ModuleAnalysisManagerLoopProxy>(L,
> AR).getManager();  *// Not null*
>   auto *PSI =
MAM.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());
>   ...
>
>
> AFAICT, adding ModuleAnalysisManagerLoopProxy seems to be as simple as:
>
> /// A proxy from a \c ModuleAnalysisManager to a \c Loop.
> typedef OuterAnalysisManagerProxy<ModuleAnalysisManager, Loop,
>                                   LoopStandardAnalysisResults &>
>     ModuleAnalysisManagerLoopProxy;
>
> It also needs to be added to PassBuilder::crossRegisterProxies...
> But yes, that appears to be a required action.
>
Yes, this line:

LAM.registerPass([&] { return ModuleAnalysisManagerLoopProxy(MAM); });

> regards,
>   Fedor.
>
>
>
>
>
>>
>> regards,
>>   Fedor.
>>
>>
>>
>>
>>
>> On Mon, Mar 4, 2019 at 2:05 PM Fedor Sergeev <fedor.sergeev at
azul.com>
>> wrote:
>>
>>>
>>>
>>> On 3/4/19 10:49 PM, Hiroshi Yamauchi wrote:
>>>
>>>
>>>
>>> On Mon, Mar 4, 2019 at 10:55 AM Hiroshi Yamauchi <yamauchi at
google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Sat, Mar 2, 2019 at 12:58 AM Fedor Sergeev <fedor.sergeev
at azul.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 3/2/19 2:38 AM, Hiroshi Yamauchi wrote:
>>>>>
>>>>> Here's a sketch of the proposed approach for just one
pass (but
>>>>> imagine more)
>>>>>
>>>>> https://reviews.llvm.org/D58845
>>>>>
>>>>> On Fri, Mar 1, 2019 at 12:54 PM Fedor Sergeev via llvm-dev
<
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> On 2/28/19 12:47 AM, Hiroshi Yamauchi via llvm-dev
wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> To implement more profile-guided optimizations, we’d
like to use
>>>>>> ProfileSummaryInfo (PSI) and BlockFrequencyInfo (BFI)
from more passes of
>>>>>> various types, under the new pass manager.
>>>>>>
>>>>>> The following is what we came up with. Would appreciate
feedback.
>>>>>> Thanks.
>>>>>>
>>>>>> Issue
>>>>>>
>>>>>> It’s not obvious (to me) how to best do this, given
that we cannot
>>>>>> request an outer-scope analysis result from an
inner-scope pass through
>>>>>> analysis managers [1] and that we might unnecessarily
running some analyses
>>>>>> unless we conditionally build pass pipelines for PGO
cases.
>>>>>>
>>>>>> Indeed, this is an intentional restriction in new pass
manager, which
>>>>>> is more or less a reflection of a fundamental property
of outer-inner
>>>>>> IRUnit relationship
>>>>>> and transformations/analyses run on those units. The
main intent for
>>>>>> having those inner IRUnits (e.g. Loops) is to run local
transformations and
>>>>>> save compile time
>>>>>> on being local to a particular small piece of IR. Loop
Pass manager
>>>>>> allows you to run a whole pipeline of different
transformations still
>>>>>> locally, amplifying the save.
>>>>>> As soon as you run function-level analysis from within
the loop
>>>>>> pipeline you essentially break this pipelining.
>>>>>> Say, as you run your loop transformation it modifies
the loop (and
>>>>>> the function) and potentially invalidates the analysis,
>>>>>> so you have to rerun your analysis again and again.
Hence instead of
>>>>>> saving on compile time it ends up increasing it.
>>>>>>
>>>>>
>>>>> Exactly.
>>>>>
>>>>>
>>>>>> I have hit this issue somewhat recently with dependency
of loop
>>>>>> passes on BranchProbabilityInfo.
>>>>>> (some loop passes, like IRCE can use it for
profitability analysis).
>>>>>>
>>>>> The only solution that appears to be reasonable there is to
teach all
>>>>>> the loops passes that need to be pipelined
>>>>>> to preserve BPI (or any other module/function-level
analyses) similar
>>>>>> to how they preserve DominatorTree and
>>>>>> other "LoopStandard" analyses.
>>>>>>
>>>>>
>>>>> Is this implemented - do the loop passes preserve BPI?
>>>>>
>>>>> Nope, not implemented right now.
>>>>> One of the problems is that even loop canonicalization
passes run at
>>>>> the start of loop pass manager dont preserve it
>>>>> (and at least LoopSimplifyCFG does change control flow).
>>>>>
>>>>>
>>>>> In buildFunctionSimplificationPipeline (where
LoopFullUnrollPass is
>>>>> added as in the sketch), LateLoopOptimizationsEPCallbacks
>>>>> and LoopOptimizerEndEPCallbacks seem to allow some
arbitrary loop passes to
>>>>> be inserted into the pipelines (via flags)?
>>>>>
>>>>> I wonder how hard it'd be to teach all the relevant
loop passes to
>>>>> preserve BFI (or BPI)..
>>>>>
>>>>> Well, each time you restructure control flow around the
loops you will
>>>>> have to update those extra analyses,
>>>>> pretty much the same way as DT is being updated through
DomTreeUpdater.
>>>>> The trick is to design a proper update interface (and then
implement
>>>>> it ;) ).
>>>>> And I have not spent enough time on this issue to get a
good idea of
>>>>> what that interface would be.
>>>>>
>>>>
>>>> Hm, sounds non-trivial :) noting BFI depends on BPI.
>>>>
>>>
>>> To step back, it looks like:
>>>
>>> want to use profiles from more passes -> need to get BFI (from
loop
>>> passes) -> need all the loop passes to preserve BFI.
>>>
>>> I wonder if there's no way around this.
>>>
>>> Indeed. I believe this is a general consensus here.
>>>
>>> regards,
>>>   Fedor.
>>>
>>>
>>>
>>>>
>>>>> regards,
>>>>>   Fedor.
>>>>>
>>>>>
>>>>>> It seems that for different types of passes to be able
to get PSI and
>>>>>> BFI, we’d need to ensure PSI is cached for a non-module
pass, and PSI, BFI
>>>>>> and the ModuleAnalysisManager proxy are cached for a
loop pass in the pass
>>>>>> pipelines. This may mean potentially needing to insert
BFI/PSI in front of
>>>>>> many passes [2]. It seems not obvious how to
conditionally insert BFI for
>>>>>> PGO pipelines because there isn’t always a good flag to
detect PGO cases
>>>>>> [3] or we tend to build pass pipelines before examining
the code (or
>>>>>> without propagating enough info down) [4].
>>>>>>
>>>>>> Proposed approach
>>>>>>
>>>>>> - Cache PSI right after the profile summary in the IR
is written in
>>>>>> the pass pipeline [5]. This would avoid the need to
insert
>>>>>> RequiredAnalysisPass for PSI before each non-module
pass that needs it. PSI
>>>>>> can be technically invalidated but unlikely. If it
does, we insert another
>>>>>> RequiredAnalysisPass [6].
>>>>>>
>>>>>> - Conditionally insert RequireAnalysisPass for BFI, if
PGO, right
>>>>>> before each loop pass that needs it. This doesn't
seem avoidable because
>>>>>> BFI can be invalidated whenever the CFG changes. We
detect PGO based on the
>>>>>> command line flags and/or whether the module has the
profile summary
>>>>>> info (we may need to pass the module to more
functions.)
>>>>>>
>>>>>> - Add a new proxy ModuleAnalysisManagerLoopProxy for a
loop pass to
>>>>>> be able to get to the ModuleAnalysisManager in one step
and PSI through it.
>>>>>>
>>>>>> Alternative approaches
>>>>>>
>>>>>> Dropping BFI and use PSI only
>>>>>> We could consider not using BFI and solely relying on
PSI and
>>>>>> function-level profiles only (as opposed to
block-level), but profile
>>>>>> precision would suffer.
>>>>>>
>>>>>> Computing BFI in-place
>>>>>> We could consider computing BFI “in-place” by directly
running BFI
>>>>>> outside of the pass manager [7]. This would let us
avoid using the analysis
>>>>>> manager constraints but it would still involve running
an outer-scope
>>>>>> analysis from an inner-scope pass and potentially cause
problems in terms
>>>>>> of pass pipelining and concurrency. Moreover, a
potential downside of
>>>>>> running analyses in-place is that it won’t take
advantage of cached
>>>>>> analysis results provided by the pass manager.
>>>>>>
>>>>>> Adding inner-scope versions of PSI and BFI
>>>>>> We could consider adding a function-level and
loop-level PSI and
>>>>>> loop-level BFI, which internally act like their
outer-scope versions but
>>>>>> provide inner-scope results only. This way, we could
always call getResult
>>>>>> for PSI and BFI. However, this would still involve
running an outer-scope
>>>>>> analysis from an inner-scope pass.
>>>>>>
>>>>>> Caching the FAM and the MAM proxies
>>>>>> We could consider caching the FunctionalAnalysisManager
and the
>>>>>> ModuleAnalysisManager proxies once early on instead of
adding a new proxy.
>>>>>> But it seems to not likely work well because the
analysis cache key type
>>>>>> includes the function or the module and some pass may
add a new function
>>>>>> for which the proxy wouldn’t be cached. We’d need to
write and insert a
>>>>>> pass in select locations to just fill the cache. Adding
the new proxy would
>>>>>> take care of these with a three-line change.
>>>>>>
>>>>>> Conditional BFI
>>>>>> We could consider adding a conditional BFI analysis
that is a wrapper
>>>>>> around BFI and computes BFI only if profiles are
available (either checking
>>>>>> the module has profile summary or depend on the PSI.)
With this, we
>>>>>> wouldn’t need to conditionally build pass pipelines and
may work for the
>>>>>> new pass manager. But a similar wouldn’t work for the
old pass manager
>>>>>> because we cannot conditionally depend on an analysis
under it.
>>>>>>
>>>>>> There is LazyBlockFrequencyInfo.
>>>>>> Not sure how well it fits this idea.
>>>>>>
>>>>>
>>>>> Good point. LazyBlockFrequencyInfo seems usable with the
old pass
>>>>> manager (save unnecessary BFI/BPI) and would work for
function passes. I
>>>>> think the restriction still applies - a loop pass cannot
still
>>>>> request (outer-scope) BFI, lazy or not, new or old (pass
manager). Another
>>>>> assumption is that it'd be cheap and safe to
unconditionally depend
>>>>> on PSI or check the module's profile summary.
>>>>>
>>>>>
>>>>>> regards,
>>>>>>   Fedor.
>>>>>>
>>>>>>
>>>>>>
>>>>>> [1] We cannot call AnalysisManager::getResult for an
outer scope but
>>>>>> only getCachedResult. Probably because of potential
pipelining or
>>>>>> concurrency issues.
>>>>>> [2] For example, potentially breaking up multiple
pipelined loop
>>>>>> passes and insert
RequireAnalysisPass<BlockFrequencyAnalysis> in front of
>>>>>> each of them.
>>>>>> [3] For example, -fprofile-instr-use and
-fprofile-sample-use aren’t
>>>>>> present in ThinLTO post link builds.
>>>>>> [4] For example, we could check whether the module has
the profile
>>>>>> summary metadata annotated when building pass pipelines
but we don’t always
>>>>>> pass the module down to the place where we build pass
pipelines.
>>>>>> [5] By inserting
RequireAnalysisPass<ProfileSummaryInfo> after the
>>>>>> PGOInstrumentationUse and the SampleProfileLoaderPass
passes (and around
>>>>>> the PGOIndirectCallPromotion pass for the Thin LTO post
link pipeline.)
>>>>>> [6] For example, the context-sensitive PGO.
>>>>>> [7] Directly calling its constructor along with the
dependent
>>>>>> analyses results, eg. the jump threading pass.
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>
>>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190313/db4bad6a/attachment-0001.html>

llvm dev - Mar 2019 - RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager

[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager

[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager