thr3ads.net - llvm dev - [llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Hiroshi Yamauchi via llvm-dev

2019-Mar-04 18:55 UTC

[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager

On Sat, Mar 2, 2019 at 12:58 AM Fedor Sergeev <fedor.sergeev at azul.com>
wrote:
>
>
> On 3/2/19 2:38 AM, Hiroshi Yamauchi wrote:
>
> Here's a sketch of the proposed approach for just one pass (but imagine
> more)
>
> https://reviews.llvm.org/D58845
>
> On Fri, Mar 1, 2019 at 12:54 PM Fedor Sergeev via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On 2/28/19 12:47 AM, Hiroshi Yamauchi via llvm-dev wrote:
>>
>> Hi all,
>>
>> To implement more profile-guided optimizations, we’d like to use
>> ProfileSummaryInfo (PSI) and BlockFrequencyInfo (BFI) from more passes
of
>> various types, under the new pass manager.
>>
>> The following is what we came up with. Would appreciate feedback.
Thanks.
>>
>> Issue
>>
>> It’s not obvious (to me) how to best do this, given that we cannot
>> request an outer-scope analysis result from an inner-scope pass through
>> analysis managers [1] and that we might unnecessarily running some
analyses
>> unless we conditionally build pass pipelines for PGO cases.
>>
>> Indeed, this is an intentional restriction in new pass manager, which
is
>> more or less a reflection of a fundamental property of outer-inner
IRUnit
>> relationship
>> and transformations/analyses run on those units. The main intent for
>> having those inner IRUnits (e.g. Loops) is to run local transformations
and
>> save compile time
>> on being local to a particular small piece of IR. Loop Pass manager
>> allows you to run a whole pipeline of different transformations still
>> locally, amplifying the save.
>> As soon as you run function-level analysis from within the loop
pipeline
>> you essentially break this pipelining.
>> Say, as you run your loop transformation it modifies the loop (and the
>> function) and potentially invalidates the analysis,
>> so you have to rerun your analysis again and again. Hence instead of
>> saving on compile time it ends up increasing it.
>>
>
> Exactly.
>
>
>> I have hit this issue somewhat recently with dependency of loop passes
on
>> BranchProbabilityInfo.
>> (some loop passes, like IRCE can use it for profitability analysis).
>>
> The only solution that appears to be reasonable there is to teach all the
>> loops passes that need to be pipelined
>> to preserve BPI (or any other module/function-level analyses) similar
to
>> how they preserve DominatorTree and
>> other "LoopStandard" analyses.
>>
>
> Is this implemented - do the loop passes preserve BPI?
>
> Nope, not implemented right now.
> One of the problems is that even loop canonicalization passes run at the
> start of loop pass manager dont preserve it
> (and at least LoopSimplifyCFG does change control flow).
>
>
> In buildFunctionSimplificationPipeline (where LoopFullUnrollPass is added
> as in the sketch), LateLoopOptimizationsEPCallbacks
> and LoopOptimizerEndEPCallbacks seem to allow some arbitrary loop passes to
> be inserted into the pipelines (via flags)?
>
> I wonder how hard it'd be to teach all the relevant loop passes to
> preserve BFI (or BPI)..
>
> Well, each time you restructure control flow around the loops you will
> have to update those extra analyses,
> pretty much the same way as DT is being updated through DomTreeUpdater.
> The trick is to design a proper update interface (and then implement it ;)
> ).
> And I have not spent enough time on this issue to get a good idea of what
> that interface would be.
>
Hm, sounds non-trivial :) noting BFI depends on BPI.

> regards,
>   Fedor.
>
>
>> It seems that for different types of passes to be able to get PSI and
>> BFI, we’d need to ensure PSI is cached for a non-module pass, and PSI,
BFI
>> and the ModuleAnalysisManager proxy are cached for a loop pass in the
pass
>> pipelines. This may mean potentially needing to insert BFI/PSI in front
of
>> many passes [2]. It seems not obvious how to conditionally insert BFI
for
>> PGO pipelines because there isn’t always a good flag to detect PGO
cases
>> [3] or we tend to build pass pipelines before examining the code (or
>> without propagating enough info down) [4].
>>
>> Proposed approach
>>
>> - Cache PSI right after the profile summary in the IR is written in the
>> pass pipeline [5]. This would avoid the need to insert
RequiredAnalysisPass
>> for PSI before each non-module pass that needs it. PSI can be
technically
>> invalidated but unlikely. If it does, we insert another
RequiredAnalysisPass
>>  [6].
>>
>> - Conditionally insert RequireAnalysisPass for BFI, if PGO, right
before
>> each loop pass that needs it. This doesn't seem avoidable because
BFI can
>> be invalidated whenever the CFG changes. We detect PGO based on the
command
>> line flags and/or whether the module has the profile summary info (we
>> may need to pass the module to more functions.)
>>
>> - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass to be
>> able to get to the ModuleAnalysisManager in one step and PSI through
it.
>>
>> Alternative approaches
>>
>> Dropping BFI and use PSI only
>> We could consider not using BFI and solely relying on PSI and
>> function-level profiles only (as opposed to block-level), but profile
>> precision would suffer.
>>
>> Computing BFI in-place
>> We could consider computing BFI “in-place” by directly running BFI
>> outside of the pass manager [7]. This would let us avoid using the
analysis
>> manager constraints but it would still involve running an outer-scope
>> analysis from an inner-scope pass and potentially cause problems in
terms
>> of pass pipelining and concurrency. Moreover, a potential downside of
>> running analyses in-place is that it won’t take advantage of cached
>> analysis results provided by the pass manager.
>>
>> Adding inner-scope versions of PSI and BFI
>> We could consider adding a function-level and loop-level PSI and
>> loop-level BFI, which internally act like their outer-scope versions
but
>> provide inner-scope results only. This way, we could always call
getResult
>> for PSI and BFI. However, this would still involve running an
outer-scope
>> analysis from an inner-scope pass.
>>
>> Caching the FAM and the MAM proxies
>> We could consider caching the FunctionalAnalysisManager and the
>> ModuleAnalysisManager proxies once early on instead of adding a new
proxy.
>> But it seems to not likely work well because the analysis cache key
type
>> includes the function or the module and some pass may add a new
function
>> for which the proxy wouldn’t be cached. We’d need to write and insert a
>> pass in select locations to just fill the cache. Adding the new proxy
would
>> take care of these with a three-line change.
>>
>> Conditional BFI
>> We could consider adding a conditional BFI analysis that is a wrapper
>> around BFI and computes BFI only if profiles are available (either
checking
>> the module has profile summary or depend on the PSI.) With this, we
>> wouldn’t need to conditionally build pass pipelines and may work for
the
>> new pass manager. But a similar wouldn’t work for the old pass manager
>> because we cannot conditionally depend on an analysis under it.
>>
>> There is LazyBlockFrequencyInfo.
>> Not sure how well it fits this idea.
>>
>
> Good point. LazyBlockFrequencyInfo seems usable with the old pass manager
> (save unnecessary BFI/BPI) and would work for function passes. I think t
> he restriction still applies - a loop pass cannot still request
> (outer-scope) BFI, lazy or not, new or old (pass manager). Another
> assumption is that it'd be cheap and safe to unconditionally depend on
> PSI or check the module's profile summary.
>
>
>> regards,
>>   Fedor.
>>
>>
>>
>> [1] We cannot call AnalysisManager::getResult for an outer scope but
only
>> getCachedResult. Probably because of potential pipelining or
concurrency
>> issues.
>> [2] For example, potentially breaking up multiple pipelined loop passes
>> and insert RequireAnalysisPass<BlockFrequencyAnalysis> in front
of each of
>> them.
>> [3] For example, -fprofile-instr-use and -fprofile-sample-use aren’t
>> present in ThinLTO post link builds.
>> [4] For example, we could check whether the module has the profile
>> summary metadata annotated when building pass pipelines but we don’t
always
>> pass the module down to the place where we build pass pipelines.
>> [5] By inserting RequireAnalysisPass<ProfileSummaryInfo> after
the
>> PGOInstrumentationUse and the SampleProfileLoaderPass passes (and
around
>> the PGOIndirectCallPromotion pass for the Thin LTO post link pipeline.)
>> [6] For example, the context-sensitive PGO.
>> [7] Directly calling its constructor along with the dependent analyses
>> results, eg. the jump threading pass.
>>
>> _______________________________________________
>> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190304/92803f78/attachment.html>

Hiroshi Yamauchi via llvm-dev

2019-Mar-04 19:49 UTC

head link

[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager

On Mon, Mar 4, 2019 at 10:55 AM Hiroshi Yamauchi <yamauchi at google.com>
wrote:
>
>
> On Sat, Mar 2, 2019 at 12:58 AM Fedor Sergeev <fedor.sergeev at
azul.com>
> wrote:
>
>>
>>
>> On 3/2/19 2:38 AM, Hiroshi Yamauchi wrote:
>>
>> Here's a sketch of the proposed approach for just one pass (but
imagine
>> more)
>>
>> https://reviews.llvm.org/D58845
>>
>> On Fri, Mar 1, 2019 at 12:54 PM Fedor Sergeev via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> On 2/28/19 12:47 AM, Hiroshi Yamauchi via llvm-dev wrote:
>>>
>>> Hi all,
>>>
>>> To implement more profile-guided optimizations, we’d like to use
>>> ProfileSummaryInfo (PSI) and BlockFrequencyInfo (BFI) from more
passes of
>>> various types, under the new pass manager.
>>>
>>> The following is what we came up with. Would appreciate feedback.
Thanks.
>>>
>>> Issue
>>>
>>> It’s not obvious (to me) how to best do this, given that we cannot
>>> request an outer-scope analysis result from an inner-scope pass
through
>>> analysis managers [1] and that we might unnecessarily running some
analyses
>>> unless we conditionally build pass pipelines for PGO cases.
>>>
>>> Indeed, this is an intentional restriction in new pass manager,
which is
>>> more or less a reflection of a fundamental property of outer-inner
IRUnit
>>> relationship
>>> and transformations/analyses run on those units. The main intent
for
>>> having those inner IRUnits (e.g. Loops) is to run local
transformations and
>>> save compile time
>>> on being local to a particular small piece of IR. Loop Pass manager
>>> allows you to run a whole pipeline of different transformations
still
>>> locally, amplifying the save.
>>> As soon as you run function-level analysis from within the loop
pipeline
>>> you essentially break this pipelining.
>>> Say, as you run your loop transformation it modifies the loop (and
the
>>> function) and potentially invalidates the analysis,
>>> so you have to rerun your analysis again and again. Hence instead
of
>>> saving on compile time it ends up increasing it.
>>>
>>
>> Exactly.
>>
>>
>>> I have hit this issue somewhat recently with dependency of loop
passes
>>> on BranchProbabilityInfo.
>>> (some loop passes, like IRCE can use it for profitability
analysis).
>>>
>> The only solution that appears to be reasonable there is to teach all
the
>>> loops passes that need to be pipelined
>>> to preserve BPI (or any other module/function-level analyses)
similar to
>>> how they preserve DominatorTree and
>>> other "LoopStandard" analyses.
>>>
>>
>> Is this implemented - do the loop passes preserve BPI?
>>
>> Nope, not implemented right now.
>> One of the problems is that even loop canonicalization passes run at
the
>> start of loop pass manager dont preserve it
>> (and at least LoopSimplifyCFG does change control flow).
>>
>>
>> In buildFunctionSimplificationPipeline (where LoopFullUnrollPass is
added
>> as in the sketch), LateLoopOptimizationsEPCallbacks
>> and LoopOptimizerEndEPCallbacks seem to allow some arbitrary loop
passes to
>> be inserted into the pipelines (via flags)?
>>
>> I wonder how hard it'd be to teach all the relevant loop passes to
>> preserve BFI (or BPI)..
>>
>> Well, each time you restructure control flow around the loops you will
>> have to update those extra analyses,
>> pretty much the same way as DT is being updated through DomTreeUpdater.
>> The trick is to design a proper update interface (and then implement it
>> ;) ).
>> And I have not spent enough time on this issue to get a good idea of
what
>> that interface would be.
>>
>
> Hm, sounds non-trivial :) noting BFI depends on BPI.
>
To step back, it looks like:

want to use profiles from more passes -> need to get BFI (from loop passes)
-> need all the loop passes to preserve BFI.

I wonder if there's no way around this.

>
>> regards,
>>   Fedor.
>>
>>
>>> It seems that for different types of passes to be able to get PSI
and
>>> BFI, we’d need to ensure PSI is cached for a non-module pass, and
PSI, BFI
>>> and the ModuleAnalysisManager proxy are cached for a loop pass in
the pass
>>> pipelines. This may mean potentially needing to insert BFI/PSI in
front of
>>> many passes [2]. It seems not obvious how to conditionally insert
BFI for
>>> PGO pipelines because there isn’t always a good flag to detect PGO
cases
>>> [3] or we tend to build pass pipelines before examining the code
(or
>>> without propagating enough info down) [4].
>>>
>>> Proposed approach
>>>
>>> - Cache PSI right after the profile summary in the IR is written in
the
>>> pass pipeline [5]. This would avoid the need to insert
RequiredAnalysisPass
>>> for PSI before each non-module pass that needs it. PSI can be
technically
>>> invalidated but unlikely. If it does, we insert another
RequiredAnalysisPass
>>>  [6].
>>>
>>> - Conditionally insert RequireAnalysisPass for BFI, if PGO, right
before
>>> each loop pass that needs it. This doesn't seem avoidable
because BFI can
>>> be invalidated whenever the CFG changes. We detect PGO based on the
command
>>> line flags and/or whether the module has the profile summary info
(we
>>> may need to pass the module to more functions.)
>>>
>>> - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass to
be
>>> able to get to the ModuleAnalysisManager in one step and PSI
through it.
>>>
>>> Alternative approaches
>>>
>>> Dropping BFI and use PSI only
>>> We could consider not using BFI and solely relying on PSI and
>>> function-level profiles only (as opposed to block-level), but
profile
>>> precision would suffer.
>>>
>>> Computing BFI in-place
>>> We could consider computing BFI “in-place” by directly running BFI
>>> outside of the pass manager [7]. This would let us avoid using the
analysis
>>> manager constraints but it would still involve running an
outer-scope
>>> analysis from an inner-scope pass and potentially cause problems in
terms
>>> of pass pipelining and concurrency. Moreover, a potential downside
of
>>> running analyses in-place is that it won’t take advantage of cached
>>> analysis results provided by the pass manager.
>>>
>>> Adding inner-scope versions of PSI and BFI
>>> We could consider adding a function-level and loop-level PSI and
>>> loop-level BFI, which internally act like their outer-scope
versions but
>>> provide inner-scope results only. This way, we could always call
getResult
>>> for PSI and BFI. However, this would still involve running an
outer-scope
>>> analysis from an inner-scope pass.
>>>
>>> Caching the FAM and the MAM proxies
>>> We could consider caching the FunctionalAnalysisManager and the
>>> ModuleAnalysisManager proxies once early on instead of adding a new
proxy.
>>> But it seems to not likely work well because the analysis cache key
type
>>> includes the function or the module and some pass may add a new
function
>>> for which the proxy wouldn’t be cached. We’d need to write and
insert a
>>> pass in select locations to just fill the cache. Adding the new
proxy would
>>> take care of these with a three-line change.
>>>
>>> Conditional BFI
>>> We could consider adding a conditional BFI analysis that is a
wrapper
>>> around BFI and computes BFI only if profiles are available (either
checking
>>> the module has profile summary or depend on the PSI.) With this, we
>>> wouldn’t need to conditionally build pass pipelines and may work
for the
>>> new pass manager. But a similar wouldn’t work for the old pass
manager
>>> because we cannot conditionally depend on an analysis under it.
>>>
>>> There is LazyBlockFrequencyInfo.
>>> Not sure how well it fits this idea.
>>>
>>
>> Good point. LazyBlockFrequencyInfo seems usable with the old pass
manager
>> (save unnecessary BFI/BPI) and would work for function passes. I think
t
>> he restriction still applies - a loop pass cannot still request
>> (outer-scope) BFI, lazy or not, new or old (pass manager). Another
>> assumption is that it'd be cheap and safe to unconditionally depend
on
>> PSI or check the module's profile summary.
>>
>>
>>> regards,
>>>   Fedor.
>>>
>>>
>>>
>>> [1] We cannot call AnalysisManager::getResult for an outer scope
but
>>> only getCachedResult. Probably because of potential pipelining or
>>> concurrency issues.
>>> [2] For example, potentially breaking up multiple pipelined loop
passes
>>> and insert RequireAnalysisPass<BlockFrequencyAnalysis> in
front of each of
>>> them.
>>> [3] For example, -fprofile-instr-use and -fprofile-sample-use
aren’t
>>> present in ThinLTO post link builds.
>>> [4] For example, we could check whether the module has the profile
>>> summary metadata annotated when building pass pipelines but we
don’t always
>>> pass the module down to the place where we build pass pipelines.
>>> [5] By inserting RequireAnalysisPass<ProfileSummaryInfo>
after the
>>> PGOInstrumentationUse and the SampleProfileLoaderPass passes (and
around
>>> the PGOIndirectCallPromotion pass for the Thin LTO post link
pipeline.)
>>> [6] For example, the context-sensitive PGO.
>>> [7] Directly calling its constructor along with the dependent
analyses
>>> results, eg. the jump threading pass.
>>>
>>> _______________________________________________
>>> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190304/b0d19089/attachment.html>

Fedor Sergeev via llvm-dev

2019-Mar-04 22:05 UTC

head link

[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager

On 3/4/19 10:49 PM, Hiroshi Yamauchi wrote:>
>
> On Mon, Mar 4, 2019 at 10:55 AM Hiroshi Yamauchi <yamauchi at google.com
> <mailto:yamauchi at google.com>> wrote:
>
>
>
>     On Sat, Mar 2, 2019 at 12:58 AM Fedor Sergeev
>     <fedor.sergeev at azul.com <mailto:fedor.sergeev at
azul.com>> wrote:
>
>
>
>         On 3/2/19 2:38 AM, Hiroshi Yamauchi wrote:
>>         Here's a sketch of the proposed approach for just one
>>         pass(but imagine more)
>>
>>         https://reviews.llvm.org/D58845
>>
>>         On Fri, Mar 1, 2019 at 12:54 PM Fedor Sergeev via llvm-dev
>>         <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>>             On 2/28/19 12:47 AM, Hiroshi Yamauchi via llvm-dev wrote:
>>>             Hi all,
>>>
>>>             To implement more profile-guided optimizations, we’d
>>>             like to use ProfileSummaryInfo (PSI) and
>>>             BlockFrequencyInfo (BFI) from more passes of various
>>>             types, under the new pass manager.
>>>
>>>             The following is what we came up with. Would appreciate
>>>             feedback. Thanks.
>>>
>>>             Issue
>>>
>>>             It’s not obvious (to me) how to best do this, given
that
>>>             we cannot request an outer-scope analysis result from
an
>>>             inner-scope pass through analysis managers [1] and that
>>>             we might unnecessarily running some analyses unless we
>>>             conditionally build pass pipelines for PGO cases.
>>             Indeed, this is an intentional restriction in new pass
>>             manager, which is more or less a reflection of a
>>             fundamental property of outer-inner IRUnit relationship
>>             and transformations/analyses run on those units. The main
>>             intent for having those inner IRUnits (e.g. Loops) is to
>>             run local transformations and save compile time
>>             on being local to a particular small piece of IR. Loop
>>             Pass manager allows you to run a whole pipeline of
>>             different transformations still locally, amplifying the
save.
>>             As soon as you run function-level analysis from within
>>             the loop pipeline you essentially break this pipelining.
>>             Say, as you run your loop transformation it modifies the
>>             loop (and the function) and potentially invalidates the
>>             analysis,
>>             so you have to rerun your analysis again and again. Hence
>>             instead of saving on compile time it ends up increasing it.
>>
>>
>>         Exactly.
>>
>>
>>             I have hit this issue somewhat recently with dependency
>>             of loop passes on BranchProbabilityInfo.
>>             (some loop passes, like IRCE can use it for profitability
>>             analysis).
>>
>>             The only solution that appears to be reasonable there is
>>             to teach all the loops passes that need to be pipelined
>>             to preserve BPI (or any other module/function-level
>>             analyses) similar to how they preserve DominatorTree and
>>             other "LoopStandard" analyses.
>>
>>
>>         Is this implemented - do the loop passes preserve BPI?
>         Nope, not implemented right now.
>         One of the problems is that even loop canonicalization passes
>         run at the start of loop pass manager dont preserve it
>         (and at least LoopSimplifyCFG does change control flow).
>>
>>         In buildFunctionSimplificationPipeline
>>         (where LoopFullUnrollPass is added as in the sketch),
>>         LateLoopOptimizationsEPCallbacks
>>         and LoopOptimizerEndEPCallbacks seem to allow some arbitrary
>>         loop passes to be inserted into the pipelines (via flags)?
>>
>>         I wonder how hard it'd be to teach all the relevant loop
>>         passes to preserve BFI(or BPI)..
>         Well, each time you restructure control flow around the loops
>         you will have to update those extra analyses,
>         pretty much the same way as DT is being updated through
>         DomTreeUpdater.
>         The trick is to design a proper update interface (and then
>         implement it ;) ).
>         And I have not spent enough time on this issue to get a good
>         idea of what that interface would be.
>
>
>     Hm, sounds non-trivial :) noting BFI depends on BPI.
>
>
> To step back, it looks like:
>
> want to use profiles from more passes -> need to get BFI (from loop 
> passes) -> need all the loop passes to preserve BFI.
>
> I wonder if there's no way around this.Indeed. I believe this is a general consensus here.

regards,
   Fedor.
>
>
>
>         regards,
>           Fedor.
>
>>
>>>             It seems that for different types of passes to be able
>>>             to get PSI and BFI, we’d need to ensure PSI is cached
>>>             for a non-module pass, and PSI, BFI and the
>>>             ModuleAnalysisManager proxy are cached for a loop pass
>>>             in the pass pipelines. This may mean potentially
needing
>>>             to insert BFI/PSI in front of many passes [2]. It seems
>>>             not obvious how to conditionally insert BFI for PGO
>>>             pipelines because there isn’t always a good flag to
>>>             detect PGO cases [3] or we tend to build pass pipelines
>>>             before examining the code (or without propagating
enough
>>>             info down) [4].
>>>
>>>             Proposed approach
>>>
>>>             - Cache PSI right after the profile summary in the IR
is
>>>             written in the pass pipeline [5]. This would avoid the
>>>             need to insert RequiredAnalysisPass for PSI before each
>>>             non-module pass that needs it. PSI can be technically
>>>             invalidated but unlikely. If it does, we insert another
>>>             RequiredAnalysisPass[6].
>>>
>>>             - Conditionally insert RequireAnalysisPass for BFI, if
>>>             PGO, right before each loop pass that needs it. This
>>>             doesn't seem avoidable because BFI can be
invalidated
>>>             whenever the CFG changes. We detect PGO based on the
>>>             command line flags and/or whether the module has the
>>>             profile summary info (we may need to pass the module to
>>>             more functions.)
>>>
>>>             - Add a new proxy ModuleAnalysisManagerLoopProxy for a
>>>             loop pass to be able to get to the
ModuleAnalysisManager
>>>             in one step and PSI through it.
>>>
>>>             Alternative approaches
>>>
>>>             Dropping BFI and use PSI only
>>>             We could consider not using BFI and solely relying on
>>>             PSI and function-level profiles only (as opposed to
>>>             block-level), but profile precision would suffer.
>>>
>>>             Computing BFI in-place
>>>             We could consider computing BFI “in-place” by directly
>>>             running BFI outside of the pass manager [7]. This would
>>>             let us avoid using the analysis manager constraints but
>>>             it would still involve running an outer-scope analysis
>>>             from an inner-scope pass and potentially cause problems
>>>             in terms of pass pipelining and concurrency. Moreover,
a
>>>             potential downside of running analyses in-place is that
>>>             it won’t take advantage of cached analysis results
>>>             provided by the pass manager.
>>>
>>>             Adding inner-scope versions of PSI and BFI
>>>             We could consider adding a function-level and
loop-level
>>>             PSI and loop-level BFI, which internally act like their
>>>             outer-scope versions but provide inner-scope results
>>>             only. This way, we could always call getResult for PSI
>>>             and BFI. However, this would still involve running an
>>>             outer-scope analysis from an inner-scope pass.
>>>
>>>             Caching the FAM and the MAM proxies
>>>             We could consider caching the FunctionalAnalysisManager
>>>             and the ModuleAnalysisManager proxies once early on
>>>             instead of adding a new proxy. But it seems to not
>>>             likely work well because the analysis cache key type
>>>             includes the function or the module and some pass may
>>>             add a new function for which the proxy wouldn’t be
>>>             cached. We’d need to write and insert a pass in select
>>>             locations to just fill the cache. Adding the new proxy
>>>             would take care of these with a three-line change.
>>>
>>>             Conditional BFI
>>>             We could consider adding a conditional BFI analysis
that
>>>             is a wrapper around BFI and computes BFI only if
>>>             profiles are available (either checking the module has
>>>             profile summary or depend on the PSI.) With this, we
>>>             wouldn’t need to conditionally build pass pipelines and
>>>             may work for the new pass manager. But a similar
>>>             wouldn’t work for the old pass manager because we
cannot
>>>             conditionally depend on an analysis under it.
>>             There is LazyBlockFrequencyInfo.
>>             Not sure how well it fits this idea.
>>
>>
>>         Good point. LazyBlockFrequencyInfo seems usable with the old
>>         pass manager (save unnecessary BFI/BPI) and would work for
>>         function passes. I think the restriction still applies - a
>>         loop pass cannot still request (outer-scope) BFI, lazy or
>>         not, new or old (pass manager). Another assumption is that
>>         it'd be cheap and safe to unconditionally depend on PSI or
>>         check the module's profile summary.
>>
>>
>>             regards,
>>               Fedor.
>>
>>>
>>>
>>>             [1] We cannot call AnalysisManager::getResult for an
>>>             outer scope but only getCachedResult. Probably because
>>>             of potential pipelining or concurrency issues.
>>>             [2] For example, potentially breaking up multiple
>>>             pipelined loop passes and insert
>>>             RequireAnalysisPass<BlockFrequencyAnalysis> in
front of
>>>             each of them.
>>>             [3] For example, -fprofile-instr-use and
>>>             -fprofile-sample-use aren’t present in ThinLTO post
link
>>>             builds.
>>>             [4] For example, we could check whether the module has
>>>             the profile summary metadata annotated when building
>>>             pass pipelines but we don’t always pass the module down
>>>             to the place where we build pass pipelines.
>>>             [5] By inserting
RequireAnalysisPass<ProfileSummaryInfo>
>>>             after the PGOInstrumentationUse and the
>>>             SampleProfileLoaderPass passes (and around the
>>>             PGOIndirectCallPromotion pass for the Thin LTO post
link
>>>             pipeline.)
>>>             [6] For example, the context-sensitive PGO.
>>>             [7] Directly calling its constructor along with the
>>>             dependent analyses results, eg. the jump threading
pass.
>>>
>>>             _______________________________________________
>>>             LLVM Developers mailing list
>>>             llvm-dev at lists.llvm.org  <mailto:llvm-dev at
lists.llvm.org>
>>>            
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>             _______________________________________________
>>             LLVM Developers mailing list
>>             llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>             https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190305/054eda98/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Mar 2019 - RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager

[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager

[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager

[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager

Maybe Matching Threads