Fedor Sergeev via llvm-dev
2019-Mar-13 23:20 UTC
[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager
On 3/14/19 2:04 AM, Hiroshi Yamauchi wrote:> > > On Wed, Mar 13, 2019 at 2:37 PM Fedor Sergeev <fedor.sergeev at azul.com > <mailto:fedor.sergeev at azul.com>> wrote: > >> >> - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass >> to be able to get to the ModuleAnalysisManager in one step and >> PSI through it. > This is just an optimization of compile-time, saves one > indirection through FunctionAnalysisManager. > I'm not even sure if it is worth the effort. And definitely not > crucial for the overall idea. > > > This should probably be clarified to something like: > > - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass to be > able to get to the ModuleAnalysisManager and PSI because it may not > always through (const) FunctionAnalysisManager, > unless ModuleAnalysisManagerFunctionProxy is already cached. > > Since FunctionAnalysisManager we can get from LoopAnalysisManager is a > const ref, we cannot call getResult on it and always get > ModuleAnalysisManager and PSI (see below.) This actually happens in my > experiment. > > SomeLoopPass::run(Loop &L, LoopAnalysisManager &LAM, …) { > auto &FAM = LAM.getResult<FunctionAnalysisManagerLoopProxy>(L, > AR).getManager(); > auto *MAMProxy = FAM.getCachedResult<ModuleAnalysisManagerFunctionProxy>( > L.getHeader()->getParent()); *// Can be null*Oh... well...> If (MAMProxy) { > auto &MAM = MAMProxy->getManager(); > auto *PSI = MAM.getCachedResult<ProfileSummaryAnalysis>(*F.getParent()); > } else { > *// Can't get MAM and PSI.* > } > ... > > -> > > SomeLoopPass::run(Loop &L, LoopAnalysisManager &LAM, …) { > auto &MAM = LAM.getResult<ModuleAnalysisManagerLoopProxy>(L, > AR).getManager(); *// Not null* > auto *PSI = MAM.getCachedResult<ProfileSummaryAnalysis>(*F.getParent()); > ... > > > AFAICT, adding ModuleAnalysisManagerLoopProxy seems to be as simple as: > > /// A proxy from a \c ModuleAnalysisManager to a \c Loop. > typedef OuterAnalysisManagerProxy<ModuleAnalysisManager, Loop, > LoopStandardAnalysisResults &> > ModuleAnalysisManagerLoopProxy;It also needs to be added to PassBuilder::crossRegisterProxies... But yes, that appears to be a required action. regards, Fedor.> > > > regards, > Fedor. >> >> >> >> >> On Mon, Mar 4, 2019 at 2:05 PM Fedor Sergeev >> <fedor.sergeev at azul.com <mailto:fedor.sergeev at azul.com>> wrote: >> >> >> >> On 3/4/19 10:49 PM, Hiroshi Yamauchi wrote: >>> >>> >>> On Mon, Mar 4, 2019 at 10:55 AM Hiroshi Yamauchi >>> <yamauchi at google.com <mailto:yamauchi at google.com>> wrote: >>> >>> >>> >>> On Sat, Mar 2, 2019 at 12:58 AM Fedor Sergeev >>> <fedor.sergeev at azul.com <mailto:fedor.sergeev at azul.com>> >>> wrote: >>> >>> >>> >>> On 3/2/19 2:38 AM, Hiroshi Yamauchi wrote: >>>> Here's a sketch of the proposed approach for just >>>> one pass(but imagine more) >>>> >>>> https://reviews.llvm.org/D58845 >>>> >>>> On Fri, Mar 1, 2019 at 12:54 PM Fedor Sergeev via >>>> llvm-dev <llvm-dev at lists.llvm.org >>>> <mailto:llvm-dev at lists.llvm.org>> wrote: >>>> >>>> On 2/28/19 12:47 AM, Hiroshi Yamauchi via >>>> llvm-dev wrote: >>>>> Hi all, >>>>> >>>>> To implement more profile-guided >>>>> optimizations, we’d like to use >>>>> ProfileSummaryInfo (PSI) and >>>>> BlockFrequencyInfo (BFI) from more passes of >>>>> various types, under the new pass manager. >>>>> >>>>> The following is what we came up with. Would >>>>> appreciate feedback. Thanks. >>>>> >>>>> Issue >>>>> >>>>> It’s not obvious (to me) how to best do this, >>>>> given that we cannot request an outer-scope >>>>> analysis result from an inner-scope pass >>>>> through analysis managers [1] and that we >>>>> might unnecessarily running some analyses >>>>> unless we conditionally build pass pipelines >>>>> for PGO cases. >>>> Indeed, this is an intentional restriction in >>>> new pass manager, which is more or less a >>>> reflection of a fundamental property of >>>> outer-inner IRUnit relationship >>>> and transformations/analyses run on those >>>> units. The main intent for having those inner >>>> IRUnits (e.g. Loops) is to run local >>>> transformations and save compile time >>>> on being local to a particular small piece of >>>> IR. Loop Pass manager allows you to run a whole >>>> pipeline of different transformations still >>>> locally, amplifying the save. >>>> As soon as you run function-level analysis from >>>> within the loop pipeline you essentially break >>>> this pipelining. >>>> Say, as you run your loop transformation it >>>> modifies the loop (and the function) and >>>> potentially invalidates the analysis, >>>> so you have to rerun your analysis again and >>>> again. Hence instead of saving on compile time >>>> it ends up increasing it. >>>> >>>> >>>> Exactly. >>>> >>>> >>>> I have hit this issue somewhat recently with >>>> dependency of loop passes on BranchProbabilityInfo. >>>> (some loop passes, like IRCE can use it for >>>> profitability analysis). >>>> >>>> The only solution that appears to be reasonable >>>> there is to teach all the loops passes that >>>> need to be pipelined >>>> to preserve BPI (or any other >>>> module/function-level analyses) similar to how >>>> they preserve DominatorTree and >>>> other "LoopStandard" analyses. >>>> >>>> >>>> Is this implemented - do the loop passes preserve BPI? >>> Nope, not implemented right now. >>> One of the problems is that even loop >>> canonicalization passes run at the start of loop >>> pass manager dont preserve it >>> (and at least LoopSimplifyCFG does change control flow). >>>> >>>> In buildFunctionSimplificationPipeline >>>> (where LoopFullUnrollPass is added as in the >>>> sketch), LateLoopOptimizationsEPCallbacks >>>> and LoopOptimizerEndEPCallbacks seem to allow some >>>> arbitrary loop passes to be inserted into the >>>> pipelines (via flags)? >>>> >>>> I wonder how hard it'd be to teach all the relevant >>>> loop passes to preserve BFI(or BPI).. >>> Well, each time you restructure control flow around >>> the loops you will have to update those extra analyses, >>> pretty much the same way as DT is being updated >>> through DomTreeUpdater. >>> The trick is to design a proper update interface >>> (and then implement it ;) ). >>> And I have not spent enough time on this issue to >>> get a good idea of what that interface would be. >>> >>> >>> Hm, sounds non-trivial :) noting BFI depends on BPI. >>> >>> >>> To step back, it looks like: >>> >>> want to use profiles from more passes -> need to get BFI >>> (from loop passes) -> need all the loop passes to preserve BFI. >>> >>> I wonder if there's no way around this. >> Indeed. I believe this is a general consensus here. >> >> regards, >> Fedor. >> >>> >>> >>> >>> regards, >>> Fedor. >>> >>>> >>>>> It seems that for different types of passes to >>>>> be able to get PSI and BFI, we’d need to >>>>> ensure PSI is cached for a non-module pass, >>>>> and PSI, BFI and the ModuleAnalysisManager >>>>> proxy are cached for a loop pass in the pass >>>>> pipelines. This may mean potentially needing >>>>> to insert BFI/PSI in front of many passes [2]. >>>>> It seems not obvious how to conditionally >>>>> insert BFI for PGO pipelines because there >>>>> isn’t always a good flag to detect PGO cases >>>>> [3] or we tend to build pass pipelines before >>>>> examining the code (or without propagating >>>>> enough info down) [4]. >>>>> >>>>> Proposed approach >>>>> >>>>> - Cache PSI right after the profile summary in >>>>> the IR is written in the pass pipeline [5]. >>>>> This would avoid the need to insert >>>>> RequiredAnalysisPass for PSI before each >>>>> non-module pass that needs it. PSI can be >>>>> technically invalidated but unlikely. If it >>>>> does, we insert another RequiredAnalysisPass[6]. >>>>> >>>>> - Conditionally insert RequireAnalysisPass for >>>>> BFI, if PGO, right before each loop pass that >>>>> needs it. This doesn't seem avoidable because >>>>> BFI can be invalidated whenever the CFG >>>>> changes. We detect PGO based on the command >>>>> line flags and/or whether the module has the >>>>> profile summary info (we may need to pass the >>>>> module to more functions.) >>>>> >>>>> - Add a new proxy >>>>> ModuleAnalysisManagerLoopProxy for a loop pass >>>>> to be able to get to the ModuleAnalysisManager >>>>> in one step and PSI through it. >>>>> >>>>> Alternative approaches >>>>> >>>>> Dropping BFI and use PSI only >>>>> We could consider not using BFI and solely >>>>> relying on PSI and function-level profiles >>>>> only (as opposed to block-level), but profile >>>>> precision would suffer. >>>>> >>>>> Computing BFI in-place >>>>> We could consider computing BFI “in-place” by >>>>> directly running BFI outside of the pass >>>>> manager [7]. This would let us avoid using the >>>>> analysis manager constraints but it would >>>>> still involve running an outer-scope analysis >>>>> from an inner-scope pass and potentially cause >>>>> problems in terms of pass pipelining and >>>>> concurrency. Moreover, a potential downside of >>>>> running analyses in-place is that it won’t >>>>> take advantage of cached analysis results >>>>> provided by the pass manager. >>>>> >>>>> Adding inner-scope versions of PSI and BFI >>>>> We could consider adding a function-level and >>>>> loop-level PSI and loop-level BFI, which >>>>> internally act like their outer-scope versions >>>>> but provide inner-scope results only. This >>>>> way, we could always call getResult for PSI >>>>> and BFI. However, this would still involve >>>>> running an outer-scope analysis from an >>>>> inner-scope pass. >>>>> >>>>> Caching the FAM and the MAM proxies >>>>> We could consider caching the >>>>> FunctionalAnalysisManager and the >>>>> ModuleAnalysisManager proxies once early on >>>>> instead of adding a new proxy. But it seems to >>>>> not likely work well because the analysis >>>>> cache key type includes the function or the >>>>> module and some pass may add a new function >>>>> for which the proxy wouldn’t be cached. We’d >>>>> need to write and insert a pass in select >>>>> locations to just fill the cache. Adding the >>>>> new proxy would take care of these with a >>>>> three-line change. >>>>> >>>>> Conditional BFI >>>>> We could consider adding a conditional BFI >>>>> analysis that is a wrapper around BFI and >>>>> computes BFI only if profiles are available >>>>> (either checking the module has profile >>>>> summary or depend on the PSI.) With this, we >>>>> wouldn’t need to conditionally build pass >>>>> pipelines and may work for the new pass >>>>> manager. But a similar wouldn’t work for the >>>>> old pass manager because we cannot >>>>> conditionally depend on an analysis under it. >>>> There is LazyBlockFrequencyInfo. >>>> Not sure how well it fits this idea. >>>> >>>> >>>> Good point. LazyBlockFrequencyInfo seems usable >>>> with the old pass manager (save unnecessary >>>> BFI/BPI) and would work for function passes. I >>>> think the restriction still applies - a loop pass >>>> cannot still request (outer-scope) BFI, lazy or >>>> not, new or old (pass manager). Another assumption >>>> is that it'd be cheap and safe to unconditionally >>>> depend on PSI or check the module's profile summary. >>>> >>>> >>>> regards, >>>> Fedor. >>>> >>>>> >>>>> >>>>> [1] We cannot call AnalysisManager::getResult >>>>> for an outer scope but only getCachedResult. >>>>> Probably because of potential pipelining or >>>>> concurrency issues. >>>>> [2] For example, potentially breaking up >>>>> multiple pipelined loop passes and insert >>>>> RequireAnalysisPass<BlockFrequencyAnalysis> in >>>>> front of each of them. >>>>> [3] For example, -fprofile-instr-use and >>>>> -fprofile-sample-use aren’t present in ThinLTO >>>>> post link builds. >>>>> [4] For example, we could check whether the >>>>> module has the profile summary metadata >>>>> annotated when building pass pipelines but we >>>>> don’t always pass the module down to the place >>>>> where we build pass pipelines. >>>>> [5] By inserting >>>>> RequireAnalysisPass<ProfileSummaryInfo> after >>>>> the PGOInstrumentationUse and the >>>>> SampleProfileLoaderPass passes (and around the >>>>> PGOIndirectCallPromotion pass for the Thin LTO >>>>> post link pipeline.) >>>>> [6] For example, the context-sensitive PGO. >>>>> [7] Directly calling its constructor along >>>>> with the dependent analyses results, eg. the >>>>> jump threading pass. >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> <mailto:llvm-dev at lists.llvm.org> >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190314/828db884/attachment-0001.html>
Hiroshi Yamauchi via llvm-dev
2019-Mar-13 23:52 UTC
[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager
On Wed, Mar 13, 2019 at 4:20 PM Fedor Sergeev <fedor.sergeev at azul.com> wrote:> > > On 3/14/19 2:04 AM, Hiroshi Yamauchi wrote: > > > > On Wed, Mar 13, 2019 at 2:37 PM Fedor Sergeev <fedor.sergeev at azul.com> > wrote: > >> >> - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass to be >> able to get to the ModuleAnalysisManager in one step and PSI through it. >> >> This is just an optimization of compile-time, saves one indirection >> through FunctionAnalysisManager. >> I'm not even sure if it is worth the effort. And definitely not crucial >> for the overall idea. >> > > This should probably be clarified to something like: > > - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass to be > able to get to the ModuleAnalysisManager and PSI because it may not always > through (const) FunctionAnalysisManager, unless ModuleAnalysisManagerFunctionProxy > is already cached. > > Since FunctionAnalysisManager we can get from LoopAnalysisManager is a > const ref, we cannot call getResult on it and always get > ModuleAnalysisManager and PSI (see below.) This actually happens in my > experiment. > > SomeLoopPass::run(Loop &L, LoopAnalysisManager &LAM, …) { > auto &FAM = LAM.getResult<FunctionAnalysisManagerLoopProxy>(L, > AR).getManager(); > auto *MAMProxy = FAM.getCachedResult<ModuleAnalysisManagerFunctionProxy>( > L.getHeader()->getParent()); *// Can be null* > > Oh... well... > > If (MAMProxy) { > auto &MAM = MAMProxy->getManager(); > auto *PSI > MAM.getCachedResult<ProfileSummaryAnalysis>(*F.getParent()); > } else { > *// Can't get MAM and PSI.* > } > ... > > -> > > SomeLoopPass::run(Loop &L, LoopAnalysisManager &LAM, …) { > auto &MAM = LAM.getResult<ModuleAnalysisManagerLoopProxy>(L, > AR).getManager(); *// Not null* > auto *PSI = MAM.getCachedResult<ProfileSummaryAnalysis>(*F.getParent()); > ... > > > AFAICT, adding ModuleAnalysisManagerLoopProxy seems to be as simple as: > > /// A proxy from a \c ModuleAnalysisManager to a \c Loop. > typedef OuterAnalysisManagerProxy<ModuleAnalysisManager, Loop, > LoopStandardAnalysisResults &> > ModuleAnalysisManagerLoopProxy; > > It also needs to be added to PassBuilder::crossRegisterProxies... > But yes, that appears to be a required action. >Yes, this line: LAM.registerPass([&] { return ModuleAnalysisManagerLoopProxy(MAM); });> regards, > Fedor. > > > > > >> >> regards, >> Fedor. >> >> >> >> >> >> On Mon, Mar 4, 2019 at 2:05 PM Fedor Sergeev <fedor.sergeev at azul.com> >> wrote: >> >>> >>> >>> On 3/4/19 10:49 PM, Hiroshi Yamauchi wrote: >>> >>> >>> >>> On Mon, Mar 4, 2019 at 10:55 AM Hiroshi Yamauchi <yamauchi at google.com> >>> wrote: >>> >>>> >>>> >>>> On Sat, Mar 2, 2019 at 12:58 AM Fedor Sergeev <fedor.sergeev at azul.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On 3/2/19 2:38 AM, Hiroshi Yamauchi wrote: >>>>> >>>>> Here's a sketch of the proposed approach for just one pass (but >>>>> imagine more) >>>>> >>>>> https://reviews.llvm.org/D58845 >>>>> >>>>> On Fri, Mar 1, 2019 at 12:54 PM Fedor Sergeev via llvm-dev < >>>>> llvm-dev at lists.llvm.org> wrote: >>>>> >>>>>> On 2/28/19 12:47 AM, Hiroshi Yamauchi via llvm-dev wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> To implement more profile-guided optimizations, we’d like to use >>>>>> ProfileSummaryInfo (PSI) and BlockFrequencyInfo (BFI) from more passes of >>>>>> various types, under the new pass manager. >>>>>> >>>>>> The following is what we came up with. Would appreciate feedback. >>>>>> Thanks. >>>>>> >>>>>> Issue >>>>>> >>>>>> It’s not obvious (to me) how to best do this, given that we cannot >>>>>> request an outer-scope analysis result from an inner-scope pass through >>>>>> analysis managers [1] and that we might unnecessarily running some analyses >>>>>> unless we conditionally build pass pipelines for PGO cases. >>>>>> >>>>>> Indeed, this is an intentional restriction in new pass manager, which >>>>>> is more or less a reflection of a fundamental property of outer-inner >>>>>> IRUnit relationship >>>>>> and transformations/analyses run on those units. The main intent for >>>>>> having those inner IRUnits (e.g. Loops) is to run local transformations and >>>>>> save compile time >>>>>> on being local to a particular small piece of IR. Loop Pass manager >>>>>> allows you to run a whole pipeline of different transformations still >>>>>> locally, amplifying the save. >>>>>> As soon as you run function-level analysis from within the loop >>>>>> pipeline you essentially break this pipelining. >>>>>> Say, as you run your loop transformation it modifies the loop (and >>>>>> the function) and potentially invalidates the analysis, >>>>>> so you have to rerun your analysis again and again. Hence instead of >>>>>> saving on compile time it ends up increasing it. >>>>>> >>>>> >>>>> Exactly. >>>>> >>>>> >>>>>> I have hit this issue somewhat recently with dependency of loop >>>>>> passes on BranchProbabilityInfo. >>>>>> (some loop passes, like IRCE can use it for profitability analysis). >>>>>> >>>>> The only solution that appears to be reasonable there is to teach all >>>>>> the loops passes that need to be pipelined >>>>>> to preserve BPI (or any other module/function-level analyses) similar >>>>>> to how they preserve DominatorTree and >>>>>> other "LoopStandard" analyses. >>>>>> >>>>> >>>>> Is this implemented - do the loop passes preserve BPI? >>>>> >>>>> Nope, not implemented right now. >>>>> One of the problems is that even loop canonicalization passes run at >>>>> the start of loop pass manager dont preserve it >>>>> (and at least LoopSimplifyCFG does change control flow). >>>>> >>>>> >>>>> In buildFunctionSimplificationPipeline (where LoopFullUnrollPass is >>>>> added as in the sketch), LateLoopOptimizationsEPCallbacks >>>>> and LoopOptimizerEndEPCallbacks seem to allow some arbitrary loop passes to >>>>> be inserted into the pipelines (via flags)? >>>>> >>>>> I wonder how hard it'd be to teach all the relevant loop passes to >>>>> preserve BFI (or BPI).. >>>>> >>>>> Well, each time you restructure control flow around the loops you will >>>>> have to update those extra analyses, >>>>> pretty much the same way as DT is being updated through DomTreeUpdater. >>>>> The trick is to design a proper update interface (and then implement >>>>> it ;) ). >>>>> And I have not spent enough time on this issue to get a good idea of >>>>> what that interface would be. >>>>> >>>> >>>> Hm, sounds non-trivial :) noting BFI depends on BPI. >>>> >>> >>> To step back, it looks like: >>> >>> want to use profiles from more passes -> need to get BFI (from loop >>> passes) -> need all the loop passes to preserve BFI. >>> >>> I wonder if there's no way around this. >>> >>> Indeed. I believe this is a general consensus here. >>> >>> regards, >>> Fedor. >>> >>> >>> >>>> >>>>> regards, >>>>> Fedor. >>>>> >>>>> >>>>>> It seems that for different types of passes to be able to get PSI and >>>>>> BFI, we’d need to ensure PSI is cached for a non-module pass, and PSI, BFI >>>>>> and the ModuleAnalysisManager proxy are cached for a loop pass in the pass >>>>>> pipelines. This may mean potentially needing to insert BFI/PSI in front of >>>>>> many passes [2]. It seems not obvious how to conditionally insert BFI for >>>>>> PGO pipelines because there isn’t always a good flag to detect PGO cases >>>>>> [3] or we tend to build pass pipelines before examining the code (or >>>>>> without propagating enough info down) [4]. >>>>>> >>>>>> Proposed approach >>>>>> >>>>>> - Cache PSI right after the profile summary in the IR is written in >>>>>> the pass pipeline [5]. This would avoid the need to insert >>>>>> RequiredAnalysisPass for PSI before each non-module pass that needs it. PSI >>>>>> can be technically invalidated but unlikely. If it does, we insert another >>>>>> RequiredAnalysisPass [6]. >>>>>> >>>>>> - Conditionally insert RequireAnalysisPass for BFI, if PGO, right >>>>>> before each loop pass that needs it. This doesn't seem avoidable because >>>>>> BFI can be invalidated whenever the CFG changes. We detect PGO based on the >>>>>> command line flags and/or whether the module has the profile summary >>>>>> info (we may need to pass the module to more functions.) >>>>>> >>>>>> - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass to >>>>>> be able to get to the ModuleAnalysisManager in one step and PSI through it. >>>>>> >>>>>> Alternative approaches >>>>>> >>>>>> Dropping BFI and use PSI only >>>>>> We could consider not using BFI and solely relying on PSI and >>>>>> function-level profiles only (as opposed to block-level), but profile >>>>>> precision would suffer. >>>>>> >>>>>> Computing BFI in-place >>>>>> We could consider computing BFI “in-place” by directly running BFI >>>>>> outside of the pass manager [7]. This would let us avoid using the analysis >>>>>> manager constraints but it would still involve running an outer-scope >>>>>> analysis from an inner-scope pass and potentially cause problems in terms >>>>>> of pass pipelining and concurrency. Moreover, a potential downside of >>>>>> running analyses in-place is that it won’t take advantage of cached >>>>>> analysis results provided by the pass manager. >>>>>> >>>>>> Adding inner-scope versions of PSI and BFI >>>>>> We could consider adding a function-level and loop-level PSI and >>>>>> loop-level BFI, which internally act like their outer-scope versions but >>>>>> provide inner-scope results only. This way, we could always call getResult >>>>>> for PSI and BFI. However, this would still involve running an outer-scope >>>>>> analysis from an inner-scope pass. >>>>>> >>>>>> Caching the FAM and the MAM proxies >>>>>> We could consider caching the FunctionalAnalysisManager and the >>>>>> ModuleAnalysisManager proxies once early on instead of adding a new proxy. >>>>>> But it seems to not likely work well because the analysis cache key type >>>>>> includes the function or the module and some pass may add a new function >>>>>> for which the proxy wouldn’t be cached. We’d need to write and insert a >>>>>> pass in select locations to just fill the cache. Adding the new proxy would >>>>>> take care of these with a three-line change. >>>>>> >>>>>> Conditional BFI >>>>>> We could consider adding a conditional BFI analysis that is a wrapper >>>>>> around BFI and computes BFI only if profiles are available (either checking >>>>>> the module has profile summary or depend on the PSI.) With this, we >>>>>> wouldn’t need to conditionally build pass pipelines and may work for the >>>>>> new pass manager. But a similar wouldn’t work for the old pass manager >>>>>> because we cannot conditionally depend on an analysis under it. >>>>>> >>>>>> There is LazyBlockFrequencyInfo. >>>>>> Not sure how well it fits this idea. >>>>>> >>>>> >>>>> Good point. LazyBlockFrequencyInfo seems usable with the old pass >>>>> manager (save unnecessary BFI/BPI) and would work for function passes. I >>>>> think the restriction still applies - a loop pass cannot still >>>>> request (outer-scope) BFI, lazy or not, new or old (pass manager). Another >>>>> assumption is that it'd be cheap and safe to unconditionally depend >>>>> on PSI or check the module's profile summary. >>>>> >>>>> >>>>>> regards, >>>>>> Fedor. >>>>>> >>>>>> >>>>>> >>>>>> [1] We cannot call AnalysisManager::getResult for an outer scope but >>>>>> only getCachedResult. Probably because of potential pipelining or >>>>>> concurrency issues. >>>>>> [2] For example, potentially breaking up multiple pipelined loop >>>>>> passes and insert RequireAnalysisPass<BlockFrequencyAnalysis> in front of >>>>>> each of them. >>>>>> [3] For example, -fprofile-instr-use and -fprofile-sample-use aren’t >>>>>> present in ThinLTO post link builds. >>>>>> [4] For example, we could check whether the module has the profile >>>>>> summary metadata annotated when building pass pipelines but we don’t always >>>>>> pass the module down to the place where we build pass pipelines. >>>>>> [5] By inserting RequireAnalysisPass<ProfileSummaryInfo> after the >>>>>> PGOInstrumentationUse and the SampleProfileLoaderPass passes (and around >>>>>> the PGOIndirectCallPromotion pass for the Thin LTO post link pipeline.) >>>>>> [6] For example, the context-sensitive PGO. >>>>>> [7] Directly calling its constructor along with the dependent >>>>>> analyses results, eg. the jump threading pass. >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>> >>>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190313/db4bad6a/attachment-0001.html>