Sean Silva via llvm-dev
2016-Jun-13 22:05 UTC
[llvm-dev] The state of IRPGO (3 remaining work items)
Quick update. I've gotten derailed from posting a patch for this due to focusing on higher priority PGO inlining work. No ETA. -- Sean Silva On Fri, Jun 3, 2016 at 6:06 PM, Sean Silva <chisophugis at gmail.com> wrote:> > > On Thu, Jun 2, 2016 at 6:41 PM, Xinliang David Li <davidxl at google.com> > wrote: > >> >> >> On Thu, Jun 2, 2016 at 5:30 PM, Sean Silva <chisophugis at gmail.com> wrote: >> >>> >>> >>> On Thu, Jun 2, 2016 at 2:51 PM, Frédéric Riss <friss at apple.com> wrote: >>> >>>> >>>> On Jun 2, 2016, at 12:10 AM, Sean Silva <chisophugis at gmail.com> wrote: >>>> >>>> >>>> >>>> On Wed, Jun 1, 2016 at 5:46 PM, Frédéric Riss <friss at apple.com> wrote: >>>> >>>>> >>>>> On Jun 1, 2016, at 1:46 PM, Sean Silva <chisophugis at gmail.com> wrote: >>>>> >>>>> >>>>> >>>>> On Tue, May 31, 2016 at 6:02 PM, Frédéric Riss <friss at apple.com> >>>>> wrote: >>>>> >>>>>> >>>>>> On May 24, 2016, at 5:21 PM, Sean Silva <chisophugis at gmail.com> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Tue, May 24, 2016 at 3:41 PM, Vedant Kumar <vsk at apple.com> wrote: >>>>>> >>>>>>> >>>>>>> > On May 23, 2016, at 8:56 PM, Xinliang David Li <davidxl at google.com> >>>>>>> wrote: >>>>>>> > >>>>>>> > On Mon, May 23, 2016 at 8:23 PM, Sean Silva <chisophugis at gmail.com> >>>>>>> wrote: >>>>>>> > Jake and I have been integrating IRPGO on PS4, and we've >>>>>>> identified 3 remaining work items. >>>>>>> > >>>>>>> > Sean, thanks for the write up. It matches very well with what we >>>>>>> think as well. >>>>>>> >>>>>>> + 1 >>>>>>> >>>>>>> >>>>>>> > - Driver changes >>>>>>> > >>>>>>> > We'd like to make IRPGO the default on PS4. We also think that it >>>>>>> would be beneficial to make IRPGO the default PGO on all platforms >>>>>>> (coverage would continue to use FE instr as it does currently, of course). >>>>>>> In previous conversations (e.g. http://reviews.llvm.org/D15829) it >>>>>>> has come up that Apple have requirements that would prevent them from >>>>>>> moving to IRPGO as the default PGO, at least without a deprecation period >>>>>>> of one or two releases. >>>>>>> >>>>>>> Sean pointed out the problematic scenario in D15829 (in plan "C"): >>>>>>> >>>>>>> ``` >>>>>>> All existing user workflows continue to work, except for workflows >>>>>>> that attempt to llvm-profdata merge some old frontend profile data (e.g. >>>>>>> they have checked-in to version control and represents some special >>>>>>> workload) with the profile data from new binaries. >>>>>>> ``` >>>>>>> >>>>>>> We can address this issue by (1) making sure llvm-profdata emits a >>>>>>> helpful warning when merging an FE-based profile with an IR-based one, and >>>>>>> (2) keeping an option to use FE instrumentation for PGO. Having (2) helps >>>>>>> people who can't (or don't want) to switch to IR PGO. >>>>>>> >>>>>>> >>>>>>> > I'd like to get consensus on a path forward. >>>>>>> > As a point of discussion, how about we make IRPGO the default on >>>>>>> all platforms except Apple platforms. >>>>>>> >>>>>>> I'd really rather not introduce this inconsistency. I'm worried that >>>>>>> it might lead to Darwin becoming a second-tier platform for PGO. >>>>>>> >>>>>>> Fred (CC'd) is following up with some of our internal users to check >>>>>>> if we can change the default behavior of -fprofile-instr-generate. He >>>>>>> should be able to chime in on this soon. >>>>>>> >>>>>> >>>>>> Sorry it took me so long. >>>>>> >>>>> >>>>> Hi Fred, >>>>> >>>>> My understanding is that you were specifically investigating whether >>>>> Apple needed compatibility for merging indexed profiles. Is that >>>>> compatibility needed? The only compelling argument I have heard to continue >>>>> to expose FEPGO is that Apple may have a compatibility requirement for >>>>> merging indexed profiles from previous compiler versions. >>>>> >>>>> >>>>> Sorry no, my comment had nothing to do with merging profiles. I >>>>> understand that this will break, and it might very well be an issue for us, >>>>> but I think there is a more fundamental issue with the proposed plan. As >>>>> you bring it up though, this is a user visible breakage that shouldn’t be >>>>> disregarded completely. >>>>> >>>> >>>> Merging with existing indexed profiles is the only user-visible >>>> breakage AFAIK (this was discussed at length in >>>> http://reviews.llvm.org/D15829 and the corresponding email thread). >>>> Please provide concrete examples where things would break. >>>> >>>> >>>> I feel like we are talking past each other. It is a fact that “merging >>>> existing indexed profiles” will break. You consider it to be a reasonable >>>> breakage, I’m just saying that we shouldn't ignore this completely. It is a >>>> known breakage so it should be considered when making the decision about >>>> the new flags’ behavior. >>>> We advise users to commit their profiles to VCS, so this could actually >>>> be a real issue for them (Xcode won’t try to merge old/new profiles by >>>> itself though, but users could do this). For those kind of changes where >>>> the alternative is not a full drop-in replacement, I think deprecating the >>>> option and replacing it with a new one is a better methodology. >>>> >>>> >>>> >>>>> >>>>> Even if this is a requirement, then I still intend to make IRPGO the >>>>> default and only PGO going forward, at least on PS4. I think that doing the >>>>> same for all platforms in the upstream compiler probably makes sense as >>>>> well, since an internal Apple vendor compatibility requirement should not >>>>> penalize all users of the open source project. >>>>> >>>>> >>>>> Again, I’m not expressing an Apple requirement, just trying to discuss >>>>> the specifics of the proposed implementation. My goal is not to hinder >>>>> anything, and I want our platforms to be able to use IRPGO reliably if >>>>> users see the need for it. >>>>> >>>> >>>> What I'm saying is that besides reduced training overhead (and the >>>> inability to merge with older indexed profiles, which AFAIK is the only >>>> actual potential requirement that would need a deprecation period for >>>> FEPGO), IRPGO is basically "just a better PGO", so adding a frontend one >>>> (except as something purely during a deprecation period) is >>>> pointless. "just a better PGO" is what IRPGO is for my users. I don't want >>>> to have to have them deal with (and I don't want to support) FEPGO. >>>> >>>> >>>> Well we want to support FEPGO because it fits Xcode's workflow better >>>> (mainly it allows to do PGO + coverage at the same time). >>>> >>>> Anything that will cause the existing flag to continue to produce FEPGO >>>> on PS4 is not something that I'm really okay with. The reduced overhead of >>>> IRPGO is really important on PS4 (i.e. the difference between the >>>> instrumented game being playable or not). I really don't want to have to >>>> test the triple to determine the meaning of `-fprofile-instr-generate` >>>> (without `-fcoverage-mapping`). >>>> >>>> >>>> And I agree that IRPGO is better for your users, but the best workflow >>>> for your users is not necessarily the best workflow for every clang user. I >>>> totally agree with you that testing the triple to decide the meaning of the >>>> option is wrong to do upstream. The best default is not a platform choice, >>>> it’s a workflow decision. >>>> >>>> >>>>> I’ve discussed the change in behavior quiet extensively, and I after >>>>>> having changed my mind a couple times, I would argue in favor of keeping >>>>>> the current behavior for the existing flags. I think adding a new switch >>>>>> for IRPGO is a better option. The argument that weighted most on my opinion >>>>>> is the proposed interaction with -fcoverage-mapping, and it is not at all >>>>>> platform specific. With the proposed new behavior, turning coverage on and >>>>>> off in your build system will generate a binary with different performance >>>>>> characteristics and this feels really wrong. >>>>>> >>>>> >>>>> Bob already mentioned in the other thread that >>>>> `-fprofile-instr-generate -fcoverage-mapping` was sufficiently different >>>>> from `-fprofile-instr-generate` that `-fprofile-instr-generate >>>>> -fcoverage-mapping` was not an acceptable workaround that could be used for >>>>> enabling FEPGO during a transitionary period, so I'm not convinced that >>>>> your argument here makes sense. >>>>> >>>>> >>>>> I’m not sure what you’re referring to here, and I have a hard time >>>>> parsing the sentence. I suppose “was not an acceptable” should read “was an >>>>> acceptable”? I would be surprised that Bob ever agreed to completely >>>>> transition away from FEPGO. I didn’t even understand that getting rid of >>>>> FEPGO was on table as you seem to imply bellow. >>>>> >>>> >>>> No, it is written as intended. The backstory is in >>>> http://reviews.llvm.org/D15829 (and the corresponding email thread). >>>> The paragraph starting with "The coverage mapping adds considerable cost.”. >>>> >>>> >>>> Thanks for the pointer, it turns out this part of the thread never made >>>> it to my mailbox. What Bob is saying is that we need an option to turn on >>>> FEPGO without coverage, and this is basically exactly what I think we >>>> should do: add an explicit option to choose the instrumentation type, so >>>> that we can use FEPGO even without coverage in the new IRPGO world. >>>> >>>> >>>>> I also share David's opinion that this is not going to be an issue in >>>>> practice. I think it makes sense for PGO and coverage to have different >>>>> overheads. Coverage inherently has to trace all locations at source level, >>>>> while PGO has more freedom. >>>>> >>>>> >>>>> I’m sorry if I wasn’t clear, but I’m not talking about instrumentation >>>>> overhead, I’m talking about the performance of the binary generated using >>>>> the profiles. If we go the route of making the meaning of >>>>> -fprofile-instr-generate depend on whether -fcoverage-mapping gets passed, >>>>> then we change the kind of instrumentation and thus the input to the >>>>> optimizations behind the user’s back. I wouldn’t be surprised that using >>>>> profiles generated by FEPGO and IRPGO give you a final executable with >>>>> measurably different performance characteristics. >>>>> >>>> >>>> I think the point is that given the effort being put into IRPGO, the >>>> IRPGO version will always be a faster final executable. >>>> >>>> >>>> 2 things: >>>> - Do you have strong evidence of this being true in all cases? I’m not >>>> looking forward to when I’ll transition Apple’s clang to IRPGO, because I >>>> know that we’ll have to spend time characterizing the performance of the >>>> compiler. It might be faster in all but a couple benchmarks, we still will >>>> need to investigate what happened there to see if we can fix it. I’m not >>>> saying it’s bad (I’m actually also looking forward to evaluate it, because >>>> it might give better results for us), but it’s definitely not a benign >>>> change that I would do without double checking the results. And as such, I >>>> also wouldn’t want the compiler to silently do this change behind my back. >>>> - Even if it is 100% true, then users will see a degradation in >>>> performance if they add -fcoverage-mapping to their CFLAGS. I think this is >>>> just bad user experience. The user should be able to decide if he wants to >>>> be able to compromise some performance to be able to do PGO+coverage at the >>>> same time. >>>> >>>> Why provide a "worse" PGO option? >>>> >>>> >>>> Worse from your point of view. The resilience to compiler changes that >>>> FEPGO currently offers is a real feature even if it doesn’t have a place in >>>> Sony’s workflow. >>>> >>>> >>>> >>>>> If you’re tracking your performance, this can be really painful. >>>>> Recently we wasted days investigating performance regressions that were due >>>>> to buggy profiles. I strongly believe having an option seemingly unrelated >>>>> to PGO change this behavior is wrong and can cause actual pain for our end >>>>> users. >>>>> >>>> >>>> After a deprecation period we can force `-fprofile-instr-generate` and >>>> `-fcoverage-mapping` to be mutually exclusive if necessary. Does this solve >>>> your problem? >>>> >>>> >>>> We don’t want a “deprecation period”. Unless IRPGO evolves in something >>>> that can do precise coverage and PGO in the same run, FEPGO can’t be >>>> deprecated (as in slated for removal). >>>> >>>> Actually, I think it makes a lot of sense in some respects for >>>> `-fprofile-instr-generate` and `-fprofile-instr-generate >>>> -fcoverage-mapping` to be IRPGO and FEPGO/coverage. The difference from a >>>> user's perspective is basically "is the instrumentation inserted by the >>>> compiler constrained to have source-level coverage, or does the compiler >>>> not have this restriction". Although as I've said, I'm not a fan of >>>> supporting FEPGO in the long-term due to maintenance issues. >>>> >>>> Also note that things like the context-sensitivity obtained through >>>> pre-inlining (see Rong's original RFC) is simply not obtainable within a >>>> source-level instrumentation paradigm (even if we did something like the >>>> counter fusion discussed in "[llvm-dev] RFC: Pass to prune redundant >>>> profiling instrumentation" to reduce the overhead to that of IRPGO with >>>> pre-inlining). Thus FEPGO a.k.a. "coverage-level PGO" would nonetheless be >>>> at an inherent disadvantage. >>>> >>>> >>>>> Also, David's point about redundant work on FEPGO is a good one. We >>>>> don't want to continue maintaining two different PGO’s. >>>>> >>>>> >>>>> Are you implying that LLVM should drop FEPGO? It’s a totally sensible >>>>> thing to do to use your tests as training data for your profile generation. >>>>> It’s also a very valid thing to do to use your tests to do coverage. Xcode >>>>> does both of these things. I would see it a a big regression to not support >>>>> doing both at the same time (this would mean doubling compile+testing time >>>>> for users of both). >>>>> >>>> >>>> As David pointed out, training runs for PGO and coverage have different >>>> goals. I'm very skeptical of any testing that tries to do both at the same >>>> time, but this will continue to work (albeit without benefitting from any >>>> of the effort being put into IRPGO). >>>> >>>> >>>> For a high-end game, sure. Small apps are a different beast and it’s a >>>> reasonable thing to use your set of tests as training data. Again, all the >>>> workflows are not equivalent. >>>> >>>> As the instrumentation needs to stay there for coverage anyway, I >>>>> expect FEPGO to stay there and maintained (we care a lot about coverage). >>>>> I’m not saying that all the work going into IRPGO should be duplicated in >>>>> FEPGO, but what’s there and working should keep working. >>>>> >>>> >>>> For my users the reduced overhead of IRPGO is an important feature, and >>>> making it the default is important for that reason. Since most of the >>>> effort going into PGO is focused on IRPGO, this will lead to users using >>>> FEPGO ending up as a second-tier PGO, which Vedant said he specifically >>>> wanted to avoid. The only option to avoid this is for users to not be using >>>> FEPGO. >>>> >>>> >>>> I’d love to have everybody be able to enjoy the lower overhead, but if >>>> the answer to this is that they need to compile+train twice to get coverage >>>> and PGO, then the trade-off is wrong. I suppose you can agree that every >>>> user doesn’t have the same requirements as game developers. At least users >>>> should have the choice. >>>> >>>> Also, FEPGO has a lot of nice characteristics like resilience to IRGEN >>>>> changes. If you have archived profiles, then when you switch compilers your >>>>> performance shouldn’t degrade with FEPGO (modulo optimization bugs), while >>>>> it’s much more likely to degrade with IRPGO. >>>>> >>>> >>>> Note that this use case continues to work. I.e. we continue to apply >>>> existing frontend profiles correctly. including frontend profiles generated >>>> with -fcoverage-mapping, so that collecting coverage and PGO at the same >>>> time (although not advisable) still works. The only use case that breaks is >>>> merging existing indexed profiles, which is why we are specifically waiting >>>> for an answer on whether this is a requirement for you guys at Apple, which >>>> will determine what kind of deprecation period etc. will be needed before >>>> we can default it. >>>> >>>> >>>>> It overall looks like a much better option for people who do not need >>>>> the lower instrumentation overhead. >>>>> >>>> >>>> This is not just about lower instrumentation overhead. Things like the >>>> recently added static VP node allocation (which will e.g. make indirect >>>> callsite promotion for LTO'd kernels work) are other things are being >>>> missed out on. >>>> >>>> >>>> >>>>> I would actually make the IRPGO mode completely incompatible with the >>>>>> -fcoverage-mapping flag. >>>>>> >>>>> >>>>> I'm not sure what you mean by this. Nobody is proposing anything that >>>>> would make -fcoverage-mapping do anything related to IRPGO. >>>>> >>>>> >>>>> What I mean is that -f<whatever enables IRPGO> should error out when >>>>> passed at the same time as -fcoverage-mapping. >>>>> >>>> >>>> I think you're coming into this with the mindset that FEPGO will still >>>> be a possibility (outside of a build that is used for coverage mapping). >>>> I'm not convinced that we actually need to continue exposing that except as >>>> a weird thing in conjunction with coverage (and possibly for a deprecation >>>> period if users want to merge indexed profiles). >>>> >>>> >>>> FEPGO needs to be a possibility. And the instrumentation code needs to >>>> be there anyway for the coverage. Here’s a proposal: >>>> - Add -fpgo-instr=[llvm|clang] to choose which kind of PGO >>>> instrumentation you want >>>> - make -fpgo-instr=llvm incompatible with with -fcoverage-mapping >>>> - deprecate -fprofile-instr-generate, and remove it after a couple >>>> releases >>>> >>>> (I really don’t care about the names) >>>> >>>> I think it is a better way forward to have -fprofile-instr-generate >>>> keep it’s current meaning and to have downstream toolchain providers have >>>> an internal patch to make it an alias to the one they prefer. Just because >>>> changing the meaning of options is a bad thing. >>>> >>> >>> Sure, I can deal with that. And actually this discussion has suggested >>> some good user-understandable names for the two different PGO's. I think >>> that "stable PGO" or "coverage-based PGO" or "source-level PGO" is a really >>> good name for FEPGO. "frontend" and "IR" aren't really meaningful to users. >>> >>> Based on what I have seen, FEPGO has the following reasons to keep it >>> around: >>> - it currently can be used together with coverage >>> - it has certain compatibility guarantees about profiles across compiler >>> versions >>> >>> >>> >>>> This also means that if the consensus is that -fprofile-instr-generate >>>> should really change its meaning to mean IRPGO, I’m open to having this >>>> internal patch on our side. >>>> >>> >>> Yeah, it sounds like someone is going to have to keep a "private patch" >>> to change the default. At that point doing a switch on the triple in >>> upstream seems preferable though :/ >>> >>> >>> So I propose the following (which is equivalent to what you proposed, >>> but starting to put specific option names): >>> 1. Add -fprofile-instr-generate=stable and >>> -fprofile-instr-generate=unstable >>> a) Indexed profiles for -fprofile-instr-generate=stable are guaranteed >>> to be compatible across compiler releases (which will include existing >>> releases). Additionally, -fprofile-instr-generate=stable can be used in >>> combination with -fcoverage-mapping. >>> b) Indexed profiles for -fprofile-instr-generate=unstable are not >>> guaranteed to be compatible across compiler releases. It is an error to use >>> this in conjunction with -fcoverage-mapping >>> c) -fprofile-instr-generate defaults to one of the two in a way that >>> meets vendor needs, via private patches or an upstream switch on the triple >>> or whatever. It can eventually be deprecated or whatever. >>> >>> -fprofile-instr-generate=stable would basically correspond to FEPGO / FE >>> coverage instrumentation. -fprofile-instr-generate=unstable would basically >>> correspond to IRPGO. >>> >>> What do you think? >>> >> >> >> Sounds fine to me, though I am not a fan of using unstable in the option. >> I think a more meaningful way (that capture the essence of the difference) >> is the following naming: >> 1) FEPGO: -fprofile-instr-generate=source or >> -fprofile-instr-generate=region >> 2) IR: -fprofile-instr-generate=cfg or -fprofile-instr-generate=graph >> >> Also since -fprofile-instr-generate= form is already used to specify raw >> profile path, we may need a different driver option. >> > > Oops! > > >> Alternatives include >> 1) -fprofile-instrument=<...> -- this maps directly to the cc1 option we >> have today >> or >> 2) -fpgo-instr=<> -- suggested by Fred or >> 3) -fpgo-instr-method=<...> >> > > I will probably use one of these in the patch. > > -- Sean Silva > > >> >> >> Having the driver level option is the first good step forward. In the >> near future, when performance of IRPGO is further tuned (e.g, better >> integration with inliner, more runtime perf win with -flto=thin, -flto >> etc), and the interests of the IRPGO is better aligned, we can revisit the >> default. >> >> thanks, >> >> David >> >> >> >> >>> >>> -- Sean Silva >>> >>> >>>> >>>> I also think a new option is cleaner (again I don’t care about the >>>> name), but if people feel that -fprofile-instr-generate=[llvm|clang] is >>>> easier, then this works for me too. >>>> >>>> Fred >>>> >>>> >>>> -- Sean Silva >>>> >>>> >>>>> >>>>> Thanks! >>>>> Fred >>>>> >>>>> -- Sean Silva >>>>> >>>>> >>>>> >>>>>> >>>>>> Fred >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> At its core I don't think -fprofile-instr-generate *implies* >>>>>>> FE-based instrumentation. So, I'd like to see the driver do this (on all >>>>>>> platforms): >>>>>>> >>>>>>> * -fprofile-instr-generate: IR instrumentation >>>>>>> * -fprofile-instr-generate=IR: IR instrumentation >>>>>>> * -fprofile-instr-generate=FE: FE instrumentation >>>>>>> * -fprofile-instr-generate -fcoverage-mapping: FE + coverage >>>>>>> instrumentation >>>>>>> >>>>>>> It's a bit ugly because the meaning of -fprofile-instr-generate >>>>>>> becomes context-sensitive. But, (1) it doesn't break existing common >>>>>>> workflows and (2) it makes it easier to ship IRPGO. The big caveat here is >>>>>>> that we'll need to wait a bit and see if our internal users are OK with >>>>>>> this. >>>>>>> >>>>>> >>>>>> Is there a reason to even have the possibility for FEPGO in the long >>>>>> run? From what I can tell, at most we would add a >>>>>> -fuse-the-old-pgo-because-i-want-to-merge-with-old-profiles option >>>>>> to hold people over until they can regenerate their profiles with the >>>>>> current compiler. We can add a flag to control what pre-instrumentation is >>>>>> done to retain the source-level robustness of FEPGO (e.g. >>>>>> -fpgo-no-simplify-before-instrumenting or something). >>>>>> >>>>>> >>>>>>> One alternative is to introduce a separate driver flag for IRPGO. >>>>>>> This might not work well for Sony's existing users. I'd be interested in >>>>>>> any feedback about this approach. >>>>>>> >>>>>> >>>>>> Personally, I would prefer to maintaining command line compatibility >>>>>> for PGO in Clang (i.e. users don't have to modify their build systems). >>>>>> >>>>>> >>>>>> -- Sean Silva >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> > I really don't like fragmenting things like this (e.g. if a >>>>>>> third-party tests "clang's" PGO they will get something different depending >>>>>>> on the platform), but I don't see another way given Apple's constraints. >>>>>>> > >>>>>>> > I'd like to see IRPGO to be the default as well, but the first >>>>>>> thing we need is a driver level option to make the switch (prof-gen) -- >>>>>>> currently we rely on -Xclang option to switch between two modes, which is >>>>>>> less than ideal. >>>>>>> > >>>>>>> > If the concern from Apple is that the old profile still need to >>>>>>> work, then this is problem already solved. The reason is that >>>>>>> -fprofile-instr-use can automatically detect the type of the profile and >>>>>>> switch the mode. >>>>>>> >>>>>>> It's not just that. As Sean pointed out, we're concerned about old >>>>>>> profiles inter-operating poorly with new ones. >>>>>>> >>>>>>> thanks, >>>>>>> vedant >>>>>>> >>>>>>> >>>>>>> > - Pre-instrumentation passes >>>>>>> > >>>>>>> > Pre-instrumentation optimization has been critical for reducing >>>>>>> the overhead of PGO for the PS4 games we tested (as expected). However, in >>>>>>> our measurements (and we are glad to provide more info) the main benefit >>>>>>> was inlining (also as expected). A simple pass of inlining at threshold 100 >>>>>>> appeared to give all the benefits. Even inlining at threshold 0 gave almost >>>>>>> all the benefits. For example, the passes initially proposed in >>>>>>> http://reviews.llvm.org/D15828did not improve over just inlining >>>>>>> with threshold 100. >>>>>>> > >>>>>>> > (due to PR27299 we also need to add simplifycfg after inlining to >>>>>>> clean up, but this doesn't affect the instrumentation overhead in our >>>>>>> measurements) >>>>>>> > >>>>>>> > Bottom line: for our use cases, inlining does all the work, but >>>>>>> we're not opposed to having more passes, which might be beneficial for >>>>>>> non-game workloads (which is most code). >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > Yes, Rong is re-collecting performance data before submitting the >>>>>>> patch. >>>>>>> > >>>>>>> > - Warnings >>>>>>> > >>>>>>> > We identified 3 classes of issues which manifest as spammy >>>>>>> warnings when applying profile data with IRPGO (these affect FEPGO also I >>>>>>> believe, but we looked in depth at IRPGO): >>>>>>> > >>>>>>> > 1. The main concerning one is that getPGOFuncName mangles the >>>>>>> filename into the counter name. This causes us to get >>>>>>> instrprof_error::unknown_function when the pgo-use build is done in a >>>>>>> different build directory from the training build (which is a reasonable >>>>>>> thing to support). In this situation, PGO data is useless for all `static` >>>>>>> functions (and as a byproduct results in a huge volume of warnings). >>>>>>> > >>>>>>> > This can be enhanced with an user option to override the behavior. >>>>>>> Can you help filing a tracking bug? >>>>>>> > >>>>>>> > >>>>>>> > 2. In different TU's, pre-instr inlining might make different >>>>>>> inlining decisions (for example, different functions may be available for >>>>>>> inlining), causing hash mismatch errors (instrprof_error::hash_mismatch). >>>>>>> In building a large game, we only saw 8 instance of this, so it is not as >>>>>>> severe as 1, but would be good to fix. >>>>>>> > >>>>>>> > >>>>>>> > Rong has a patch addressing that -- will submit after cleanup pass >>>>>>> change is done. >>>>>>> > >>>>>>> > >>>>>>> > 3. A .cpp file may be compiled and put into an archive, but then >>>>>>> not selected by the linker and will therefore not result in a counter in >>>>>>> the profraw. When compiling this file with pgo-use, >>>>>>> instrprof_error::unknown_function will result and a warning will be emitted. >>>>>>> > >>>>>>> > yes -- this is a common problem to other compilers as well. >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > Case 1 can be fixed using a function hash or other unique >>>>>>> identifier instead of a file path. David, in D20195 you mentioned that Rong >>>>>>> was working on a patch that would fix 2; we are looking forward to that. >>>>>>> > >>>>>>> > >>>>>>> > Right. >>>>>>> > >>>>>>> > For 3, I unfortunately do not know of any solution. I don't think >>>>>>> there is a way for us to make this warning reliable in the face of this >>>>>>> circumstance. So my conclusion is that instrprof_error::unknown_function at >>>>>>> least must be defaulted to off unfortunately. >>>>>>> > >>>>>>> > yes, this can be annoying. If the warnings can be buffered, then >>>>>>> the compiler can check if this is due to missing profile for the whole file >>>>>>> and can reduce the warnings into one single warning (source file has no >>>>>>> profile data). Making it off by default sounds fine to me too if it is too >>>>>>> noisy. >>>>>>> > >>>>>>> > thanks, >>>>>>> > >>>>>>> > David >>>>>>> > >>>>>>> > >>>>>>> > -- Sean Silva >>>>>>> >>>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160613/f257e341/attachment-0001.html>
Xinliang David Li via llvm-dev
2016-Jun-13 22:07 UTC
[llvm-dev] The state of IRPGO (3 remaining work items)
No problem. Inliner work is certainly higher priority. Rong can help with the option related work. thanks, David On Mon, Jun 13, 2016 at 3:05 PM, Sean Silva <chisophugis at gmail.com> wrote:> Quick update. I've gotten derailed from posting a patch for this due to > focusing on higher priority PGO inlining work. No ETA. > > -- Sean Silva > > > On Fri, Jun 3, 2016 at 6:06 PM, Sean Silva <chisophugis at gmail.com> wrote: > >> >> >> On Thu, Jun 2, 2016 at 6:41 PM, Xinliang David Li <davidxl at google.com> >> wrote: >> >>> >>> >>> On Thu, Jun 2, 2016 at 5:30 PM, Sean Silva <chisophugis at gmail.com> >>> wrote: >>> >>>> >>>> >>>> On Thu, Jun 2, 2016 at 2:51 PM, Frédéric Riss <friss at apple.com> wrote: >>>> >>>>> >>>>> On Jun 2, 2016, at 12:10 AM, Sean Silva <chisophugis at gmail.com> wrote: >>>>> >>>>> >>>>> >>>>> On Wed, Jun 1, 2016 at 5:46 PM, Frédéric Riss <friss at apple.com> wrote: >>>>> >>>>>> >>>>>> On Jun 1, 2016, at 1:46 PM, Sean Silva <chisophugis at gmail.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Tue, May 31, 2016 at 6:02 PM, Frédéric Riss <friss at apple.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> On May 24, 2016, at 5:21 PM, Sean Silva <chisophugis at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, May 24, 2016 at 3:41 PM, Vedant Kumar <vsk at apple.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> > On May 23, 2016, at 8:56 PM, Xinliang David Li < >>>>>>>> davidxl at google.com> wrote: >>>>>>>> > >>>>>>>> > On Mon, May 23, 2016 at 8:23 PM, Sean Silva < >>>>>>>> chisophugis at gmail.com> wrote: >>>>>>>> > Jake and I have been integrating IRPGO on PS4, and we've >>>>>>>> identified 3 remaining work items. >>>>>>>> > >>>>>>>> > Sean, thanks for the write up. It matches very well with what we >>>>>>>> think as well. >>>>>>>> >>>>>>>> + 1 >>>>>>>> >>>>>>>> >>>>>>>> > - Driver changes >>>>>>>> > >>>>>>>> > We'd like to make IRPGO the default on PS4. We also think that it >>>>>>>> would be beneficial to make IRPGO the default PGO on all platforms >>>>>>>> (coverage would continue to use FE instr as it does currently, of course). >>>>>>>> In previous conversations (e.g. http://reviews.llvm.org/D15829) it >>>>>>>> has come up that Apple have requirements that would prevent them from >>>>>>>> moving to IRPGO as the default PGO, at least without a deprecation period >>>>>>>> of one or two releases. >>>>>>>> >>>>>>>> Sean pointed out the problematic scenario in D15829 (in plan "C"): >>>>>>>> >>>>>>>> ``` >>>>>>>> All existing user workflows continue to work, except for workflows >>>>>>>> that attempt to llvm-profdata merge some old frontend profile data (e.g. >>>>>>>> they have checked-in to version control and represents some special >>>>>>>> workload) with the profile data from new binaries. >>>>>>>> ``` >>>>>>>> >>>>>>>> We can address this issue by (1) making sure llvm-profdata emits a >>>>>>>> helpful warning when merging an FE-based profile with an IR-based one, and >>>>>>>> (2) keeping an option to use FE instrumentation for PGO. Having (2) helps >>>>>>>> people who can't (or don't want) to switch to IR PGO. >>>>>>>> >>>>>>>> >>>>>>>> > I'd like to get consensus on a path forward. >>>>>>>> > As a point of discussion, how about we make IRPGO the default on >>>>>>>> all platforms except Apple platforms. >>>>>>>> >>>>>>>> I'd really rather not introduce this inconsistency. I'm worried >>>>>>>> that it might lead to Darwin becoming a second-tier platform for PGO. >>>>>>>> >>>>>>>> Fred (CC'd) is following up with some of our internal users to >>>>>>>> check if we can change the default behavior of -fprofile-instr-generate. He >>>>>>>> should be able to chime in on this soon. >>>>>>>> >>>>>>> >>>>>>> Sorry it took me so long. >>>>>>> >>>>>> >>>>>> Hi Fred, >>>>>> >>>>>> My understanding is that you were specifically investigating whether >>>>>> Apple needed compatibility for merging indexed profiles. Is that >>>>>> compatibility needed? The only compelling argument I have heard to continue >>>>>> to expose FEPGO is that Apple may have a compatibility requirement for >>>>>> merging indexed profiles from previous compiler versions. >>>>>> >>>>>> >>>>>> Sorry no, my comment had nothing to do with merging profiles. I >>>>>> understand that this will break, and it might very well be an issue for us, >>>>>> but I think there is a more fundamental issue with the proposed plan. As >>>>>> you bring it up though, this is a user visible breakage that shouldn’t be >>>>>> disregarded completely. >>>>>> >>>>> >>>>> Merging with existing indexed profiles is the only user-visible >>>>> breakage AFAIK (this was discussed at length in >>>>> http://reviews.llvm.org/D15829 and the corresponding email thread). >>>>> Please provide concrete examples where things would break. >>>>> >>>>> >>>>> I feel like we are talking past each other. It is a fact that “merging >>>>> existing indexed profiles” will break. You consider it to be a reasonable >>>>> breakage, I’m just saying that we shouldn't ignore this completely. It is a >>>>> known breakage so it should be considered when making the decision about >>>>> the new flags’ behavior. >>>>> We advise users to commit their profiles to VCS, so this could >>>>> actually be a real issue for them (Xcode won’t try to merge old/new >>>>> profiles by itself though, but users could do this). For those kind of >>>>> changes where the alternative is not a full drop-in replacement, I think >>>>> deprecating the option and replacing it with a new one is a better >>>>> methodology. >>>>> >>>>> >>>>> >>>>>> >>>>>> Even if this is a requirement, then I still intend to make IRPGO the >>>>>> default and only PGO going forward, at least on PS4. I think that doing the >>>>>> same for all platforms in the upstream compiler probably makes sense as >>>>>> well, since an internal Apple vendor compatibility requirement should not >>>>>> penalize all users of the open source project. >>>>>> >>>>>> >>>>>> Again, I’m not expressing an Apple requirement, just trying to >>>>>> discuss the specifics of the proposed implementation. My goal is not to >>>>>> hinder anything, and I want our platforms to be able to use IRPGO reliably >>>>>> if users see the need for it. >>>>>> >>>>> >>>>> What I'm saying is that besides reduced training overhead (and the >>>>> inability to merge with older indexed profiles, which AFAIK is the only >>>>> actual potential requirement that would need a deprecation period for >>>>> FEPGO), IRPGO is basically "just a better PGO", so adding a frontend one >>>>> (except as something purely during a deprecation period) is >>>>> pointless. "just a better PGO" is what IRPGO is for my users. I don't want >>>>> to have to have them deal with (and I don't want to support) FEPGO. >>>>> >>>>> >>>>> Well we want to support FEPGO because it fits Xcode's workflow better >>>>> (mainly it allows to do PGO + coverage at the same time). >>>>> >>>>> Anything that will cause the existing flag to continue to produce >>>>> FEPGO on PS4 is not something that I'm really okay with. The reduced >>>>> overhead of IRPGO is really important on PS4 (i.e. the difference between >>>>> the instrumented game being playable or not). I really don't want to have >>>>> to test the triple to determine the meaning of `-fprofile-instr-generate` >>>>> (without `-fcoverage-mapping`). >>>>> >>>>> >>>>> And I agree that IRPGO is better for your users, but the best workflow >>>>> for your users is not necessarily the best workflow for every clang user. I >>>>> totally agree with you that testing the triple to decide the meaning of the >>>>> option is wrong to do upstream. The best default is not a platform choice, >>>>> it’s a workflow decision. >>>>> >>>>> >>>>>> I’ve discussed the change in behavior quiet extensively, and I after >>>>>>> having changed my mind a couple times, I would argue in favor of keeping >>>>>>> the current behavior for the existing flags. I think adding a new switch >>>>>>> for IRPGO is a better option. The argument that weighted most on my opinion >>>>>>> is the proposed interaction with -fcoverage-mapping, and it is not at all >>>>>>> platform specific. With the proposed new behavior, turning coverage on and >>>>>>> off in your build system will generate a binary with different performance >>>>>>> characteristics and this feels really wrong. >>>>>>> >>>>>> >>>>>> Bob already mentioned in the other thread that >>>>>> `-fprofile-instr-generate -fcoverage-mapping` was sufficiently different >>>>>> from `-fprofile-instr-generate` that `-fprofile-instr-generate >>>>>> -fcoverage-mapping` was not an acceptable workaround that could be used for >>>>>> enabling FEPGO during a transitionary period, so I'm not convinced that >>>>>> your argument here makes sense. >>>>>> >>>>>> >>>>>> I’m not sure what you’re referring to here, and I have a hard time >>>>>> parsing the sentence. I suppose “was not an acceptable” should read “was an >>>>>> acceptable”? I would be surprised that Bob ever agreed to completely >>>>>> transition away from FEPGO. I didn’t even understand that getting rid of >>>>>> FEPGO was on table as you seem to imply bellow. >>>>>> >>>>> >>>>> No, it is written as intended. The backstory is in >>>>> http://reviews.llvm.org/D15829 (and the corresponding email thread). >>>>> The paragraph starting with "The coverage mapping adds considerable cost.”. >>>>> >>>>> >>>>> Thanks for the pointer, it turns out this part of the thread never >>>>> made it to my mailbox. What Bob is saying is that we need an option to turn >>>>> on FEPGO without coverage, and this is basically exactly what I think we >>>>> should do: add an explicit option to choose the instrumentation type, so >>>>> that we can use FEPGO even without coverage in the new IRPGO world. >>>>> >>>>> >>>>>> I also share David's opinion that this is not going to be an issue in >>>>>> practice. I think it makes sense for PGO and coverage to have different >>>>>> overheads. Coverage inherently has to trace all locations at source level, >>>>>> while PGO has more freedom. >>>>>> >>>>>> >>>>>> I’m sorry if I wasn’t clear, but I’m not talking about >>>>>> instrumentation overhead, I’m talking about the performance of the binary >>>>>> generated using the profiles. If we go the route of making the meaning of >>>>>> -fprofile-instr-generate depend on whether -fcoverage-mapping gets passed, >>>>>> then we change the kind of instrumentation and thus the input to the >>>>>> optimizations behind the user’s back. I wouldn’t be surprised that using >>>>>> profiles generated by FEPGO and IRPGO give you a final executable with >>>>>> measurably different performance characteristics. >>>>>> >>>>> >>>>> I think the point is that given the effort being put into IRPGO, the >>>>> IRPGO version will always be a faster final executable. >>>>> >>>>> >>>>> 2 things: >>>>> - Do you have strong evidence of this being true in all cases? I’m >>>>> not looking forward to when I’ll transition Apple’s clang to IRPGO, because >>>>> I know that we’ll have to spend time characterizing the performance of the >>>>> compiler. It might be faster in all but a couple benchmarks, we still will >>>>> need to investigate what happened there to see if we can fix it. I’m not >>>>> saying it’s bad (I’m actually also looking forward to evaluate it, because >>>>> it might give better results for us), but it’s definitely not a benign >>>>> change that I would do without double checking the results. And as such, I >>>>> also wouldn’t want the compiler to silently do this change behind my back. >>>>> - Even if it is 100% true, then users will see a degradation in >>>>> performance if they add -fcoverage-mapping to their CFLAGS. I think this is >>>>> just bad user experience. The user should be able to decide if he wants to >>>>> be able to compromise some performance to be able to do PGO+coverage at the >>>>> same time. >>>>> >>>>> Why provide a "worse" PGO option? >>>>> >>>>> >>>>> Worse from your point of view. The resilience to compiler changes that >>>>> FEPGO currently offers is a real feature even if it doesn’t have a place in >>>>> Sony’s workflow. >>>>> >>>>> >>>>> >>>>>> If you’re tracking your performance, this can be really painful. >>>>>> Recently we wasted days investigating performance regressions that were due >>>>>> to buggy profiles. I strongly believe having an option seemingly unrelated >>>>>> to PGO change this behavior is wrong and can cause actual pain for our end >>>>>> users. >>>>>> >>>>> >>>>> After a deprecation period we can force `-fprofile-instr-generate` and >>>>> `-fcoverage-mapping` to be mutually exclusive if necessary. Does this solve >>>>> your problem? >>>>> >>>>> >>>>> We don’t want a “deprecation period”. Unless IRPGO evolves in >>>>> something that can do precise coverage and PGO in the same run, FEPGO can’t >>>>> be deprecated (as in slated for removal). >>>>> >>>>> Actually, I think it makes a lot of sense in some respects for >>>>> `-fprofile-instr-generate` and `-fprofile-instr-generate >>>>> -fcoverage-mapping` to be IRPGO and FEPGO/coverage. The difference from a >>>>> user's perspective is basically "is the instrumentation inserted by the >>>>> compiler constrained to have source-level coverage, or does the compiler >>>>> not have this restriction". Although as I've said, I'm not a fan of >>>>> supporting FEPGO in the long-term due to maintenance issues. >>>>> >>>>> Also note that things like the context-sensitivity obtained through >>>>> pre-inlining (see Rong's original RFC) is simply not obtainable within a >>>>> source-level instrumentation paradigm (even if we did something like the >>>>> counter fusion discussed in "[llvm-dev] RFC: Pass to prune redundant >>>>> profiling instrumentation" to reduce the overhead to that of IRPGO with >>>>> pre-inlining). Thus FEPGO a.k.a. "coverage-level PGO" would nonetheless be >>>>> at an inherent disadvantage. >>>>> >>>>> >>>>>> Also, David's point about redundant work on FEPGO is a good one. We >>>>>> don't want to continue maintaining two different PGO’s. >>>>>> >>>>>> >>>>>> Are you implying that LLVM should drop FEPGO? It’s a totally sensible >>>>>> thing to do to use your tests as training data for your profile generation. >>>>>> It’s also a very valid thing to do to use your tests to do coverage. Xcode >>>>>> does both of these things. I would see it a a big regression to not support >>>>>> doing both at the same time (this would mean doubling compile+testing time >>>>>> for users of both). >>>>>> >>>>> >>>>> As David pointed out, training runs for PGO and coverage have >>>>> different goals. I'm very skeptical of any testing that tries to do both at >>>>> the same time, but this will continue to work (albeit without benefitting >>>>> from any of the effort being put into IRPGO). >>>>> >>>>> >>>>> For a high-end game, sure. Small apps are a different beast and it’s a >>>>> reasonable thing to use your set of tests as training data. Again, all the >>>>> workflows are not equivalent. >>>>> >>>>> As the instrumentation needs to stay there for coverage anyway, I >>>>>> expect FEPGO to stay there and maintained (we care a lot about coverage). >>>>>> I’m not saying that all the work going into IRPGO should be duplicated in >>>>>> FEPGO, but what’s there and working should keep working. >>>>>> >>>>> >>>>> For my users the reduced overhead of IRPGO is an important feature, >>>>> and making it the default is important for that reason. Since most of the >>>>> effort going into PGO is focused on IRPGO, this will lead to users using >>>>> FEPGO ending up as a second-tier PGO, which Vedant said he specifically >>>>> wanted to avoid. The only option to avoid this is for users to not be using >>>>> FEPGO. >>>>> >>>>> >>>>> I’d love to have everybody be able to enjoy the lower overhead, but if >>>>> the answer to this is that they need to compile+train twice to get coverage >>>>> and PGO, then the trade-off is wrong. I suppose you can agree that every >>>>> user doesn’t have the same requirements as game developers. At least users >>>>> should have the choice. >>>>> >>>>> Also, FEPGO has a lot of nice characteristics like resilience to IRGEN >>>>>> changes. If you have archived profiles, then when you switch compilers your >>>>>> performance shouldn’t degrade with FEPGO (modulo optimization bugs), while >>>>>> it’s much more likely to degrade with IRPGO. >>>>>> >>>>> >>>>> Note that this use case continues to work. I.e. we continue to apply >>>>> existing frontend profiles correctly. including frontend profiles generated >>>>> with -fcoverage-mapping, so that collecting coverage and PGO at the same >>>>> time (although not advisable) still works. The only use case that breaks is >>>>> merging existing indexed profiles, which is why we are specifically waiting >>>>> for an answer on whether this is a requirement for you guys at Apple, which >>>>> will determine what kind of deprecation period etc. will be needed before >>>>> we can default it. >>>>> >>>>> >>>>>> It overall looks like a much better option for people who do not need >>>>>> the lower instrumentation overhead. >>>>>> >>>>> >>>>> This is not just about lower instrumentation overhead. Things like the >>>>> recently added static VP node allocation (which will e.g. make indirect >>>>> callsite promotion for LTO'd kernels work) are other things are being >>>>> missed out on. >>>>> >>>>> >>>>> >>>>>> I would actually make the IRPGO mode completely incompatible with the >>>>>>> -fcoverage-mapping flag. >>>>>>> >>>>>> >>>>>> I'm not sure what you mean by this. Nobody is proposing anything that >>>>>> would make -fcoverage-mapping do anything related to IRPGO. >>>>>> >>>>>> >>>>>> What I mean is that -f<whatever enables IRPGO> should error out when >>>>>> passed at the same time as -fcoverage-mapping. >>>>>> >>>>> >>>>> I think you're coming into this with the mindset that FEPGO will still >>>>> be a possibility (outside of a build that is used for coverage mapping). >>>>> I'm not convinced that we actually need to continue exposing that except as >>>>> a weird thing in conjunction with coverage (and possibly for a deprecation >>>>> period if users want to merge indexed profiles). >>>>> >>>>> >>>>> FEPGO needs to be a possibility. And the instrumentation code needs to >>>>> be there anyway for the coverage. Here’s a proposal: >>>>> - Add -fpgo-instr=[llvm|clang] to choose which kind of PGO >>>>> instrumentation you want >>>>> - make -fpgo-instr=llvm incompatible with with -fcoverage-mapping >>>>> - deprecate -fprofile-instr-generate, and remove it after a couple >>>>> releases >>>>> >>>>> (I really don’t care about the names) >>>>> >>>>> I think it is a better way forward to have -fprofile-instr-generate >>>>> keep it’s current meaning and to have downstream toolchain providers have >>>>> an internal patch to make it an alias to the one they prefer. Just because >>>>> changing the meaning of options is a bad thing. >>>>> >>>> >>>> Sure, I can deal with that. And actually this discussion has suggested >>>> some good user-understandable names for the two different PGO's. I think >>>> that "stable PGO" or "coverage-based PGO" or "source-level PGO" is a really >>>> good name for FEPGO. "frontend" and "IR" aren't really meaningful to users. >>>> >>>> Based on what I have seen, FEPGO has the following reasons to keep it >>>> around: >>>> - it currently can be used together with coverage >>>> - it has certain compatibility guarantees about profiles across >>>> compiler versions >>>> >>>> >>>> >>>>> This also means that if the consensus is that -fprofile-instr-generate >>>>> should really change its meaning to mean IRPGO, I’m open to having this >>>>> internal patch on our side. >>>>> >>>> >>>> Yeah, it sounds like someone is going to have to keep a "private patch" >>>> to change the default. At that point doing a switch on the triple in >>>> upstream seems preferable though :/ >>>> >>>> >>>> So I propose the following (which is equivalent to what you proposed, >>>> but starting to put specific option names): >>>> 1. Add -fprofile-instr-generate=stable and >>>> -fprofile-instr-generate=unstable >>>> a) Indexed profiles for -fprofile-instr-generate=stable are guaranteed >>>> to be compatible across compiler releases (which will include existing >>>> releases). Additionally, -fprofile-instr-generate=stable can be used in >>>> combination with -fcoverage-mapping. >>>> b) Indexed profiles for -fprofile-instr-generate=unstable are not >>>> guaranteed to be compatible across compiler releases. It is an error to use >>>> this in conjunction with -fcoverage-mapping >>>> c) -fprofile-instr-generate defaults to one of the two in a way that >>>> meets vendor needs, via private patches or an upstream switch on the triple >>>> or whatever. It can eventually be deprecated or whatever. >>>> >>>> -fprofile-instr-generate=stable would basically correspond to FEPGO / >>>> FE coverage instrumentation. -fprofile-instr-generate=unstable would >>>> basically correspond to IRPGO. >>>> >>>> What do you think? >>>> >>> >>> >>> Sounds fine to me, though I am not a fan of using unstable in the >>> option. I think a more meaningful way (that capture the essence of the >>> difference) is the following naming: >>> 1) FEPGO: -fprofile-instr-generate=source or >>> -fprofile-instr-generate=region >>> 2) IR: -fprofile-instr-generate=cfg or -fprofile-instr-generate=graph >>> >>> Also since -fprofile-instr-generate= form is already used to specify raw >>> profile path, we may need a different driver option. >>> >> >> Oops! >> >> >>> Alternatives include >>> 1) -fprofile-instrument=<...> -- this maps directly to the cc1 option we >>> have today >>> or >>> 2) -fpgo-instr=<> -- suggested by Fred or >>> 3) -fpgo-instr-method=<...> >>> >> >> I will probably use one of these in the patch. >> >> -- Sean Silva >> >> >>> >>> >>> Having the driver level option is the first good step forward. In the >>> near future, when performance of IRPGO is further tuned (e.g, better >>> integration with inliner, more runtime perf win with -flto=thin, -flto >>> etc), and the interests of the IRPGO is better aligned, we can revisit the >>> default. >>> >>> thanks, >>> >>> David >>> >>> >>> >>> >>>> >>>> -- Sean Silva >>>> >>>> >>>>> >>>>> I also think a new option is cleaner (again I don’t care about the >>>>> name), but if people feel that -fprofile-instr-generate=[llvm|clang] is >>>>> easier, then this works for me too. >>>>> >>>>> Fred >>>>> >>>>> >>>>> -- Sean Silva >>>>> >>>>> >>>>>> >>>>>> Thanks! >>>>>> Fred >>>>>> >>>>>> -- Sean Silva >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Fred >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> At its core I don't think -fprofile-instr-generate *implies* >>>>>>>> FE-based instrumentation. So, I'd like to see the driver do this (on all >>>>>>>> platforms): >>>>>>>> >>>>>>>> * -fprofile-instr-generate: IR instrumentation >>>>>>>> * -fprofile-instr-generate=IR: IR instrumentation >>>>>>>> * -fprofile-instr-generate=FE: FE instrumentation >>>>>>>> * -fprofile-instr-generate -fcoverage-mapping: FE + coverage >>>>>>>> instrumentation >>>>>>>> >>>>>>>> It's a bit ugly because the meaning of -fprofile-instr-generate >>>>>>>> becomes context-sensitive. But, (1) it doesn't break existing common >>>>>>>> workflows and (2) it makes it easier to ship IRPGO. The big caveat here is >>>>>>>> that we'll need to wait a bit and see if our internal users are OK with >>>>>>>> this. >>>>>>>> >>>>>>> >>>>>>> Is there a reason to even have the possibility for FEPGO in the long >>>>>>> run? From what I can tell, at most we would add a >>>>>>> -fuse-the-old-pgo-because-i-want-to-merge-with-old-profiles option >>>>>>> to hold people over until they can regenerate their profiles with the >>>>>>> current compiler. We can add a flag to control what pre-instrumentation is >>>>>>> done to retain the source-level robustness of FEPGO (e.g. >>>>>>> -fpgo-no-simplify-before-instrumenting or something). >>>>>>> >>>>>>> >>>>>>>> One alternative is to introduce a separate driver flag for IRPGO. >>>>>>>> This might not work well for Sony's existing users. I'd be interested in >>>>>>>> any feedback about this approach. >>>>>>>> >>>>>>> >>>>>>> Personally, I would prefer to maintaining command line compatibility >>>>>>> for PGO in Clang (i.e. users don't have to modify their build systems). >>>>>>> >>>>>>> >>>>>>> -- Sean Silva >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> > I really don't like fragmenting things like this (e.g. if a >>>>>>>> third-party tests "clang's" PGO they will get something different depending >>>>>>>> on the platform), but I don't see another way given Apple's constraints. >>>>>>>> > >>>>>>>> > I'd like to see IRPGO to be the default as well, but the first >>>>>>>> thing we need is a driver level option to make the switch (prof-gen) -- >>>>>>>> currently we rely on -Xclang option to switch between two modes, which is >>>>>>>> less than ideal. >>>>>>>> > >>>>>>>> > If the concern from Apple is that the old profile still need to >>>>>>>> work, then this is problem already solved. The reason is that >>>>>>>> -fprofile-instr-use can automatically detect the type of the profile and >>>>>>>> switch the mode. >>>>>>>> >>>>>>>> It's not just that. As Sean pointed out, we're concerned about old >>>>>>>> profiles inter-operating poorly with new ones. >>>>>>>> >>>>>>>> thanks, >>>>>>>> vedant >>>>>>>> >>>>>>>> >>>>>>>> > - Pre-instrumentation passes >>>>>>>> > >>>>>>>> > Pre-instrumentation optimization has been critical for reducing >>>>>>>> the overhead of PGO for the PS4 games we tested (as expected). However, in >>>>>>>> our measurements (and we are glad to provide more info) the main benefit >>>>>>>> was inlining (also as expected). A simple pass of inlining at threshold 100 >>>>>>>> appeared to give all the benefits. Even inlining at threshold 0 gave almost >>>>>>>> all the benefits. For example, the passes initially proposed in >>>>>>>> http://reviews.llvm.org/D15828did not improve over just inlining >>>>>>>> with threshold 100. >>>>>>>> > >>>>>>>> > (due to PR27299 we also need to add simplifycfg after inlining to >>>>>>>> clean up, but this doesn't affect the instrumentation overhead in our >>>>>>>> measurements) >>>>>>>> > >>>>>>>> > Bottom line: for our use cases, inlining does all the work, but >>>>>>>> we're not opposed to having more passes, which might be beneficial for >>>>>>>> non-game workloads (which is most code). >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > Yes, Rong is re-collecting performance data before submitting the >>>>>>>> patch. >>>>>>>> > >>>>>>>> > - Warnings >>>>>>>> > >>>>>>>> > We identified 3 classes of issues which manifest as spammy >>>>>>>> warnings when applying profile data with IRPGO (these affect FEPGO also I >>>>>>>> believe, but we looked in depth at IRPGO): >>>>>>>> > >>>>>>>> > 1. The main concerning one is that getPGOFuncName mangles the >>>>>>>> filename into the counter name. This causes us to get >>>>>>>> instrprof_error::unknown_function when the pgo-use build is done in a >>>>>>>> different build directory from the training build (which is a reasonable >>>>>>>> thing to support). In this situation, PGO data is useless for all `static` >>>>>>>> functions (and as a byproduct results in a huge volume of warnings). >>>>>>>> > >>>>>>>> > This can be enhanced with an user option to override the >>>>>>>> behavior. Can you help filing a tracking bug? >>>>>>>> > >>>>>>>> > >>>>>>>> > 2. In different TU's, pre-instr inlining might make different >>>>>>>> inlining decisions (for example, different functions may be available for >>>>>>>> inlining), causing hash mismatch errors (instrprof_error::hash_mismatch). >>>>>>>> In building a large game, we only saw 8 instance of this, so it is not as >>>>>>>> severe as 1, but would be good to fix. >>>>>>>> > >>>>>>>> > >>>>>>>> > Rong has a patch addressing that -- will submit after cleanup >>>>>>>> pass change is done. >>>>>>>> > >>>>>>>> > >>>>>>>> > 3. A .cpp file may be compiled and put into an archive, but then >>>>>>>> not selected by the linker and will therefore not result in a counter in >>>>>>>> the profraw. When compiling this file with pgo-use, >>>>>>>> instrprof_error::unknown_function will result and a warning will be emitted. >>>>>>>> > >>>>>>>> > yes -- this is a common problem to other compilers as well. >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > Case 1 can be fixed using a function hash or other unique >>>>>>>> identifier instead of a file path. David, in D20195 you mentioned that Rong >>>>>>>> was working on a patch that would fix 2; we are looking forward to that. >>>>>>>> > >>>>>>>> > >>>>>>>> > Right. >>>>>>>> > >>>>>>>> > For 3, I unfortunately do not know of any solution. I don't think >>>>>>>> there is a way for us to make this warning reliable in the face of this >>>>>>>> circumstance. So my conclusion is that instrprof_error::unknown_function at >>>>>>>> least must be defaulted to off unfortunately. >>>>>>>> > >>>>>>>> > yes, this can be annoying. If the warnings can be buffered, then >>>>>>>> the compiler can check if this is due to missing profile for the whole file >>>>>>>> and can reduce the warnings into one single warning (source file has no >>>>>>>> profile data). Making it off by default sounds fine to me too if it is too >>>>>>>> noisy. >>>>>>>> > >>>>>>>> > thanks, >>>>>>>> > >>>>>>>> > David >>>>>>>> > >>>>>>>> > >>>>>>>> > -- Sean Silva >>>>>>>> >>>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160613/51b1d673/attachment.html>
Jake VanAdrighem via llvm-dev
2016-Jun-23 08:39 UTC
[llvm-dev] The state of IRPGO (3 remaining work items)
On Mon, Jun 13, 2016 at 3:07 PM, Xinliang David Li <davidxl at google.com> wrote:> No problem. Inliner work is certainly higher priority. Rong can help with > the option related work. > >FYI I've started working on a patch for the driver changes and would be happy to take care of it. I'll post the patch in the next week or so. Jake> thanks, > David > > On Mon, Jun 13, 2016 at 3:05 PM, Sean Silva <chisophugis at gmail.com> wrote: > >> Quick update. I've gotten derailed from posting a patch for this due to >> focusing on higher priority PGO inlining work. No ETA. >> >> -- Sean Silva >> >> >> On Fri, Jun 3, 2016 at 6:06 PM, Sean Silva <chisophugis at gmail.com> wrote: >> >>> >>> >>> On Thu, Jun 2, 2016 at 6:41 PM, Xinliang David Li <davidxl at google.com> >>> wrote: >>> >>>> >>>> >>>> On Thu, Jun 2, 2016 at 5:30 PM, Sean Silva <chisophugis at gmail.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Thu, Jun 2, 2016 at 2:51 PM, Frédéric Riss <friss at apple.com> wrote: >>>>> >>>>>> >>>>>> On Jun 2, 2016, at 12:10 AM, Sean Silva <chisophugis at gmail.com> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jun 1, 2016 at 5:46 PM, Frédéric Riss <friss at apple.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> On Jun 1, 2016, at 1:46 PM, Sean Silva <chisophugis at gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, May 31, 2016 at 6:02 PM, Frédéric Riss <friss at apple.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> On May 24, 2016, at 5:21 PM, Sean Silva <chisophugis at gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, May 24, 2016 at 3:41 PM, Vedant Kumar <vsk at apple.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> > On May 23, 2016, at 8:56 PM, Xinliang David Li < >>>>>>>>> davidxl at google.com> wrote: >>>>>>>>> > >>>>>>>>> > On Mon, May 23, 2016 at 8:23 PM, Sean Silva < >>>>>>>>> chisophugis at gmail.com> wrote: >>>>>>>>> > Jake and I have been integrating IRPGO on PS4, and we've >>>>>>>>> identified 3 remaining work items. >>>>>>>>> > >>>>>>>>> > Sean, thanks for the write up. It matches very well with what we >>>>>>>>> think as well. >>>>>>>>> >>>>>>>>> + 1 >>>>>>>>> >>>>>>>>> >>>>>>>>> > - Driver changes >>>>>>>>> > >>>>>>>>> > We'd like to make IRPGO the default on PS4. We also think that >>>>>>>>> it would be beneficial to make IRPGO the default PGO on all platforms >>>>>>>>> (coverage would continue to use FE instr as it does currently, of course). >>>>>>>>> In previous conversations (e.g. http://reviews.llvm.org/D15829) >>>>>>>>> it has come up that Apple have requirements that would prevent them from >>>>>>>>> moving to IRPGO as the default PGO, at least without a deprecation period >>>>>>>>> of one or two releases. >>>>>>>>> >>>>>>>>> Sean pointed out the problematic scenario in D15829 (in plan "C"): >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> All existing user workflows continue to work, except for workflows >>>>>>>>> that attempt to llvm-profdata merge some old frontend profile data (e.g. >>>>>>>>> they have checked-in to version control and represents some special >>>>>>>>> workload) with the profile data from new binaries. >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> We can address this issue by (1) making sure llvm-profdata emits a >>>>>>>>> helpful warning when merging an FE-based profile with an IR-based one, and >>>>>>>>> (2) keeping an option to use FE instrumentation for PGO. Having (2) helps >>>>>>>>> people who can't (or don't want) to switch to IR PGO. >>>>>>>>> >>>>>>>>> >>>>>>>>> > I'd like to get consensus on a path forward. >>>>>>>>> > As a point of discussion, how about we make IRPGO the default on >>>>>>>>> all platforms except Apple platforms. >>>>>>>>> >>>>>>>>> I'd really rather not introduce this inconsistency. I'm worried >>>>>>>>> that it might lead to Darwin becoming a second-tier platform for PGO. >>>>>>>>> >>>>>>>>> Fred (CC'd) is following up with some of our internal users to >>>>>>>>> check if we can change the default behavior of -fprofile-instr-generate. He >>>>>>>>> should be able to chime in on this soon. >>>>>>>>> >>>>>>>> >>>>>>>> Sorry it took me so long. >>>>>>>> >>>>>>> >>>>>>> Hi Fred, >>>>>>> >>>>>>> My understanding is that you were specifically investigating whether >>>>>>> Apple needed compatibility for merging indexed profiles. Is that >>>>>>> compatibility needed? The only compelling argument I have heard to continue >>>>>>> to expose FEPGO is that Apple may have a compatibility requirement for >>>>>>> merging indexed profiles from previous compiler versions. >>>>>>> >>>>>>> >>>>>>> Sorry no, my comment had nothing to do with merging profiles. I >>>>>>> understand that this will break, and it might very well be an issue for us, >>>>>>> but I think there is a more fundamental issue with the proposed plan. As >>>>>>> you bring it up though, this is a user visible breakage that shouldn’t be >>>>>>> disregarded completely. >>>>>>> >>>>>> >>>>>> Merging with existing indexed profiles is the only user-visible >>>>>> breakage AFAIK (this was discussed at length in >>>>>> http://reviews.llvm.org/D15829 and the corresponding email thread). >>>>>> Please provide concrete examples where things would break. >>>>>> >>>>>> >>>>>> I feel like we are talking past each other. It is a fact that >>>>>> “merging existing indexed profiles” will break. You consider it to be a >>>>>> reasonable breakage, I’m just saying that we shouldn't ignore this >>>>>> completely. It is a known breakage so it should be considered when making >>>>>> the decision about the new flags’ behavior. >>>>>> We advise users to commit their profiles to VCS, so this could >>>>>> actually be a real issue for them (Xcode won’t try to merge old/new >>>>>> profiles by itself though, but users could do this). For those kind of >>>>>> changes where the alternative is not a full drop-in replacement, I think >>>>>> deprecating the option and replacing it with a new one is a better >>>>>> methodology. >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Even if this is a requirement, then I still intend to make IRPGO the >>>>>>> default and only PGO going forward, at least on PS4. I think that doing the >>>>>>> same for all platforms in the upstream compiler probably makes sense as >>>>>>> well, since an internal Apple vendor compatibility requirement should not >>>>>>> penalize all users of the open source project. >>>>>>> >>>>>>> >>>>>>> Again, I’m not expressing an Apple requirement, just trying to >>>>>>> discuss the specifics of the proposed implementation. My goal is not to >>>>>>> hinder anything, and I want our platforms to be able to use IRPGO reliably >>>>>>> if users see the need for it. >>>>>>> >>>>>> >>>>>> What I'm saying is that besides reduced training overhead (and the >>>>>> inability to merge with older indexed profiles, which AFAIK is the only >>>>>> actual potential requirement that would need a deprecation period for >>>>>> FEPGO), IRPGO is basically "just a better PGO", so adding a frontend one >>>>>> (except as something purely during a deprecation period) is >>>>>> pointless. "just a better PGO" is what IRPGO is for my users. I don't want >>>>>> to have to have them deal with (and I don't want to support) FEPGO. >>>>>> >>>>>> >>>>>> Well we want to support FEPGO because it fits Xcode's workflow better >>>>>> (mainly it allows to do PGO + coverage at the same time). >>>>>> >>>>>> Anything that will cause the existing flag to continue to produce >>>>>> FEPGO on PS4 is not something that I'm really okay with. The reduced >>>>>> overhead of IRPGO is really important on PS4 (i.e. the difference between >>>>>> the instrumented game being playable or not). I really don't want to have >>>>>> to test the triple to determine the meaning of `-fprofile-instr-generate` >>>>>> (without `-fcoverage-mapping`). >>>>>> >>>>>> >>>>>> And I agree that IRPGO is better for your users, but the best >>>>>> workflow for your users is not necessarily the best workflow for every >>>>>> clang user. I totally agree with you that testing the triple to decide the >>>>>> meaning of the option is wrong to do upstream. The best default is not a >>>>>> platform choice, it’s a workflow decision. >>>>>> >>>>>> >>>>>>> I’ve discussed the change in behavior quiet extensively, and I after >>>>>>>> having changed my mind a couple times, I would argue in favor of keeping >>>>>>>> the current behavior for the existing flags. I think adding a new switch >>>>>>>> for IRPGO is a better option. The argument that weighted most on my opinion >>>>>>>> is the proposed interaction with -fcoverage-mapping, and it is not at all >>>>>>>> platform specific. With the proposed new behavior, turning coverage on and >>>>>>>> off in your build system will generate a binary with different performance >>>>>>>> characteristics and this feels really wrong. >>>>>>>> >>>>>>> >>>>>>> Bob already mentioned in the other thread that >>>>>>> `-fprofile-instr-generate -fcoverage-mapping` was sufficiently different >>>>>>> from `-fprofile-instr-generate` that `-fprofile-instr-generate >>>>>>> -fcoverage-mapping` was not an acceptable workaround that could be used for >>>>>>> enabling FEPGO during a transitionary period, so I'm not convinced that >>>>>>> your argument here makes sense. >>>>>>> >>>>>>> >>>>>>> I’m not sure what you’re referring to here, and I have a hard time >>>>>>> parsing the sentence. I suppose “was not an acceptable” should read “was an >>>>>>> acceptable”? I would be surprised that Bob ever agreed to completely >>>>>>> transition away from FEPGO. I didn’t even understand that getting rid of >>>>>>> FEPGO was on table as you seem to imply bellow. >>>>>>> >>>>>> >>>>>> No, it is written as intended. The backstory is in >>>>>> http://reviews.llvm.org/D15829 (and the corresponding email thread). >>>>>> The paragraph starting with "The coverage mapping adds considerable cost.”. >>>>>> >>>>>> >>>>>> Thanks for the pointer, it turns out this part of the thread never >>>>>> made it to my mailbox. What Bob is saying is that we need an option to turn >>>>>> on FEPGO without coverage, and this is basically exactly what I think we >>>>>> should do: add an explicit option to choose the instrumentation type, so >>>>>> that we can use FEPGO even without coverage in the new IRPGO world. >>>>>> >>>>>> >>>>>>> I also share David's opinion that this is not going to be an issue >>>>>>> in practice. I think it makes sense for PGO and coverage to have different >>>>>>> overheads. Coverage inherently has to trace all locations at source level, >>>>>>> while PGO has more freedom. >>>>>>> >>>>>>> >>>>>>> I’m sorry if I wasn’t clear, but I’m not talking about >>>>>>> instrumentation overhead, I’m talking about the performance of the binary >>>>>>> generated using the profiles. If we go the route of making the meaning of >>>>>>> -fprofile-instr-generate depend on whether -fcoverage-mapping gets passed, >>>>>>> then we change the kind of instrumentation and thus the input to the >>>>>>> optimizations behind the user’s back. I wouldn’t be surprised that using >>>>>>> profiles generated by FEPGO and IRPGO give you a final executable with >>>>>>> measurably different performance characteristics. >>>>>>> >>>>>> >>>>>> I think the point is that given the effort being put into IRPGO, the >>>>>> IRPGO version will always be a faster final executable. >>>>>> >>>>>> >>>>>> 2 things: >>>>>> - Do you have strong evidence of this being true in all cases? I’m >>>>>> not looking forward to when I’ll transition Apple’s clang to IRPGO, because >>>>>> I know that we’ll have to spend time characterizing the performance of the >>>>>> compiler. It might be faster in all but a couple benchmarks, we still will >>>>>> need to investigate what happened there to see if we can fix it. I’m not >>>>>> saying it’s bad (I’m actually also looking forward to evaluate it, because >>>>>> it might give better results for us), but it’s definitely not a benign >>>>>> change that I would do without double checking the results. And as such, I >>>>>> also wouldn’t want the compiler to silently do this change behind my back. >>>>>> - Even if it is 100% true, then users will see a degradation in >>>>>> performance if they add -fcoverage-mapping to their CFLAGS. I think this is >>>>>> just bad user experience. The user should be able to decide if he wants to >>>>>> be able to compromise some performance to be able to do PGO+coverage at the >>>>>> same time. >>>>>> >>>>>> Why provide a "worse" PGO option? >>>>>> >>>>>> >>>>>> Worse from your point of view. The resilience to compiler changes >>>>>> that FEPGO currently offers is a real feature even if it doesn’t have a >>>>>> place in Sony’s workflow. >>>>>> >>>>>> >>>>>> >>>>>>> If you’re tracking your performance, this can be really painful. >>>>>>> Recently we wasted days investigating performance regressions that were due >>>>>>> to buggy profiles. I strongly believe having an option seemingly unrelated >>>>>>> to PGO change this behavior is wrong and can cause actual pain for our end >>>>>>> users. >>>>>>> >>>>>> >>>>>> After a deprecation period we can force `-fprofile-instr-generate` >>>>>> and `-fcoverage-mapping` to be mutually exclusive if necessary. Does this >>>>>> solve your problem? >>>>>> >>>>>> >>>>>> We don’t want a “deprecation period”. Unless IRPGO evolves in >>>>>> something that can do precise coverage and PGO in the same run, FEPGO can’t >>>>>> be deprecated (as in slated for removal). >>>>>> >>>>>> Actually, I think it makes a lot of sense in some respects for >>>>>> `-fprofile-instr-generate` and `-fprofile-instr-generate >>>>>> -fcoverage-mapping` to be IRPGO and FEPGO/coverage. The difference from a >>>>>> user's perspective is basically "is the instrumentation inserted by the >>>>>> compiler constrained to have source-level coverage, or does the compiler >>>>>> not have this restriction". Although as I've said, I'm not a fan of >>>>>> supporting FEPGO in the long-term due to maintenance issues. >>>>>> >>>>>> Also note that things like the context-sensitivity obtained through >>>>>> pre-inlining (see Rong's original RFC) is simply not obtainable within a >>>>>> source-level instrumentation paradigm (even if we did something like the >>>>>> counter fusion discussed in "[llvm-dev] RFC: Pass to prune redundant >>>>>> profiling instrumentation" to reduce the overhead to that of IRPGO with >>>>>> pre-inlining). Thus FEPGO a.k.a. "coverage-level PGO" would nonetheless be >>>>>> at an inherent disadvantage. >>>>>> >>>>>> >>>>>>> Also, David's point about redundant work on FEPGO is a good one. We >>>>>>> don't want to continue maintaining two different PGO’s. >>>>>>> >>>>>>> >>>>>>> Are you implying that LLVM should drop FEPGO? It’s a totally >>>>>>> sensible thing to do to use your tests as training data for your profile >>>>>>> generation. It’s also a very valid thing to do to use your tests to do >>>>>>> coverage. Xcode does both of these things. I would see it a a big >>>>>>> regression to not support doing both at the same time (this would mean >>>>>>> doubling compile+testing time for users of both). >>>>>>> >>>>>> >>>>>> As David pointed out, training runs for PGO and coverage have >>>>>> different goals. I'm very skeptical of any testing that tries to do both at >>>>>> the same time, but this will continue to work (albeit without benefitting >>>>>> from any of the effort being put into IRPGO). >>>>>> >>>>>> >>>>>> For a high-end game, sure. Small apps are a different beast and it’s >>>>>> a reasonable thing to use your set of tests as training data. Again, all >>>>>> the workflows are not equivalent. >>>>>> >>>>>> As the instrumentation needs to stay there for coverage anyway, I >>>>>>> expect FEPGO to stay there and maintained (we care a lot about coverage). >>>>>>> I’m not saying that all the work going into IRPGO should be duplicated in >>>>>>> FEPGO, but what’s there and working should keep working. >>>>>>> >>>>>> >>>>>> For my users the reduced overhead of IRPGO is an important feature, >>>>>> and making it the default is important for that reason. Since most of the >>>>>> effort going into PGO is focused on IRPGO, this will lead to users using >>>>>> FEPGO ending up as a second-tier PGO, which Vedant said he specifically >>>>>> wanted to avoid. The only option to avoid this is for users to not be using >>>>>> FEPGO. >>>>>> >>>>>> >>>>>> I’d love to have everybody be able to enjoy the lower overhead, but >>>>>> if the answer to this is that they need to compile+train twice to get >>>>>> coverage and PGO, then the trade-off is wrong. I suppose you can agree that >>>>>> every user doesn’t have the same requirements as game developers. At least >>>>>> users should have the choice. >>>>>> >>>>>> Also, FEPGO has a lot of nice characteristics like resilience to >>>>>>> IRGEN changes. If you have archived profiles, then when you switch >>>>>>> compilers your performance shouldn’t degrade with FEPGO (modulo >>>>>>> optimization bugs), while it’s much more likely to degrade with IRPGO. >>>>>>> >>>>>> >>>>>> Note that this use case continues to work. I.e. we continue to apply >>>>>> existing frontend profiles correctly. including frontend profiles generated >>>>>> with -fcoverage-mapping, so that collecting coverage and PGO at the same >>>>>> time (although not advisable) still works. The only use case that breaks is >>>>>> merging existing indexed profiles, which is why we are specifically waiting >>>>>> for an answer on whether this is a requirement for you guys at Apple, which >>>>>> will determine what kind of deprecation period etc. will be needed before >>>>>> we can default it. >>>>>> >>>>>> >>>>>>> It overall looks like a much better option for people who do not >>>>>>> need the lower instrumentation overhead. >>>>>>> >>>>>> >>>>>> This is not just about lower instrumentation overhead. Things like >>>>>> the recently added static VP node allocation (which will e.g. make indirect >>>>>> callsite promotion for LTO'd kernels work) are other things are being >>>>>> missed out on. >>>>>> >>>>>> >>>>>> >>>>>>> I would actually make the IRPGO mode completely incompatible with >>>>>>>> the -fcoverage-mapping flag. >>>>>>>> >>>>>>> >>>>>>> I'm not sure what you mean by this. Nobody is proposing anything >>>>>>> that would make -fcoverage-mapping do anything related to IRPGO. >>>>>>> >>>>>>> >>>>>>> What I mean is that -f<whatever enables IRPGO> should error out when >>>>>>> passed at the same time as -fcoverage-mapping. >>>>>>> >>>>>> >>>>>> I think you're coming into this with the mindset that FEPGO will >>>>>> still be a possibility (outside of a build that is used for coverage >>>>>> mapping). I'm not convinced that we actually need to continue exposing that >>>>>> except as a weird thing in conjunction with coverage (and possibly for a >>>>>> deprecation period if users want to merge indexed profiles). >>>>>> >>>>>> >>>>>> FEPGO needs to be a possibility. And the instrumentation code needs >>>>>> to be there anyway for the coverage. Here’s a proposal: >>>>>> - Add -fpgo-instr=[llvm|clang] to choose which kind of PGO >>>>>> instrumentation you want >>>>>> - make -fpgo-instr=llvm incompatible with with -fcoverage-mapping >>>>>> - deprecate -fprofile-instr-generate, and remove it after a couple >>>>>> releases >>>>>> >>>>>> (I really don’t care about the names) >>>>>> >>>>>> I think it is a better way forward to have -fprofile-instr-generate >>>>>> keep it’s current meaning and to have downstream toolchain providers have >>>>>> an internal patch to make it an alias to the one they prefer. Just because >>>>>> changing the meaning of options is a bad thing. >>>>>> >>>>> >>>>> Sure, I can deal with that. And actually this discussion has suggested >>>>> some good user-understandable names for the two different PGO's. I think >>>>> that "stable PGO" or "coverage-based PGO" or "source-level PGO" is a really >>>>> good name for FEPGO. "frontend" and "IR" aren't really meaningful to users. >>>>> >>>>> Based on what I have seen, FEPGO has the following reasons to keep it >>>>> around: >>>>> - it currently can be used together with coverage >>>>> - it has certain compatibility guarantees about profiles across >>>>> compiler versions >>>>> >>>>> >>>>> >>>>>> This also means that if the consensus is that >>>>>> -fprofile-instr-generate should really change its meaning to mean IRPGO, >>>>>> I’m open to having this internal patch on our side. >>>>>> >>>>> >>>>> Yeah, it sounds like someone is going to have to keep a "private >>>>> patch" to change the default. At that point doing a switch on the triple in >>>>> upstream seems preferable though :/ >>>>> >>>>> >>>>> So I propose the following (which is equivalent to what you proposed, >>>>> but starting to put specific option names): >>>>> 1. Add -fprofile-instr-generate=stable and >>>>> -fprofile-instr-generate=unstable >>>>> a) Indexed profiles for -fprofile-instr-generate=stable are guaranteed >>>>> to be compatible across compiler releases (which will include existing >>>>> releases). Additionally, -fprofile-instr-generate=stable can be used in >>>>> combination with -fcoverage-mapping. >>>>> b) Indexed profiles for -fprofile-instr-generate=unstable are not >>>>> guaranteed to be compatible across compiler releases. It is an error to use >>>>> this in conjunction with -fcoverage-mapping >>>>> c) -fprofile-instr-generate defaults to one of the two in a way that >>>>> meets vendor needs, via private patches or an upstream switch on the triple >>>>> or whatever. It can eventually be deprecated or whatever. >>>>> >>>>> -fprofile-instr-generate=stable would basically correspond to FEPGO / >>>>> FE coverage instrumentation. -fprofile-instr-generate=unstable would >>>>> basically correspond to IRPGO. >>>>> >>>>> What do you think? >>>>> >>>> >>>> >>>> Sounds fine to me, though I am not a fan of using unstable in the >>>> option. I think a more meaningful way (that capture the essence of the >>>> difference) is the following naming: >>>> 1) FEPGO: -fprofile-instr-generate=source or >>>> -fprofile-instr-generate=region >>>> 2) IR: -fprofile-instr-generate=cfg or -fprofile-instr-generate=graph >>>> >>>> Also since -fprofile-instr-generate= form is already used to specify >>>> raw profile path, we may need a different driver option. >>>> >>> >>> Oops! >>> >>> >>>> Alternatives include >>>> 1) -fprofile-instrument=<...> -- this maps directly to the cc1 option >>>> we have today >>>> or >>>> 2) -fpgo-instr=<> -- suggested by Fred or >>>> 3) -fpgo-instr-method=<...> >>>> >>> >>> I will probably use one of these in the patch. >>> >>> -- Sean Silva >>> >>> >>>> >>>> >>>> Having the driver level option is the first good step forward. In the >>>> near future, when performance of IRPGO is further tuned (e.g, better >>>> integration with inliner, more runtime perf win with -flto=thin, -flto >>>> etc), and the interests of the IRPGO is better aligned, we can revisit the >>>> default. >>>> >>>> thanks, >>>> >>>> David >>>> >>>> >>>> >>>> >>>>> >>>>> -- Sean Silva >>>>> >>>>> >>>>>> >>>>>> I also think a new option is cleaner (again I don’t care about the >>>>>> name), but if people feel that -fprofile-instr-generate=[llvm|clang] is >>>>>> easier, then this works for me too. >>>>>> >>>>>> Fred >>>>>> >>>>>> >>>>>> -- Sean Silva >>>>>> >>>>>> >>>>>>> >>>>>>> Thanks! >>>>>>> Fred >>>>>>> >>>>>>> -- Sean Silva >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Fred >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> At its core I don't think -fprofile-instr-generate *implies* >>>>>>>>> FE-based instrumentation. So, I'd like to see the driver do this (on all >>>>>>>>> platforms): >>>>>>>>> >>>>>>>>> * -fprofile-instr-generate: IR instrumentation >>>>>>>>> * -fprofile-instr-generate=IR: IR instrumentation >>>>>>>>> * -fprofile-instr-generate=FE: FE instrumentation >>>>>>>>> * -fprofile-instr-generate -fcoverage-mapping: FE + coverage >>>>>>>>> instrumentation >>>>>>>>> >>>>>>>>> It's a bit ugly because the meaning of -fprofile-instr-generate >>>>>>>>> becomes context-sensitive. But, (1) it doesn't break existing common >>>>>>>>> workflows and (2) it makes it easier to ship IRPGO. The big caveat here is >>>>>>>>> that we'll need to wait a bit and see if our internal users are OK with >>>>>>>>> this. >>>>>>>>> >>>>>>>> >>>>>>>> Is there a reason to even have the possibility for FEPGO in the >>>>>>>> long run? From what I can tell, at most we would add a >>>>>>>> -fuse-the-old-pgo-because-i-want-to-merge-with-old-profiles option >>>>>>>> to hold people over until they can regenerate their profiles with the >>>>>>>> current compiler. We can add a flag to control what pre-instrumentation is >>>>>>>> done to retain the source-level robustness of FEPGO (e.g. >>>>>>>> -fpgo-no-simplify-before-instrumenting or something). >>>>>>>> >>>>>>>> >>>>>>>>> One alternative is to introduce a separate driver flag for IRPGO. >>>>>>>>> This might not work well for Sony's existing users. I'd be interested in >>>>>>>>> any feedback about this approach. >>>>>>>>> >>>>>>>> >>>>>>>> Personally, I would prefer to maintaining command line >>>>>>>> compatibility for PGO in Clang (i.e. users don't have to modify their build >>>>>>>> systems). >>>>>>>> >>>>>>>> >>>>>>>> -- Sean Silva >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> > I really don't like fragmenting things like this (e.g. if a >>>>>>>>> third-party tests "clang's" PGO they will get something different depending >>>>>>>>> on the platform), but I don't see another way given Apple's constraints. >>>>>>>>> > >>>>>>>>> > I'd like to see IRPGO to be the default as well, but the first >>>>>>>>> thing we need is a driver level option to make the switch (prof-gen) -- >>>>>>>>> currently we rely on -Xclang option to switch between two modes, which is >>>>>>>>> less than ideal. >>>>>>>>> > >>>>>>>>> > If the concern from Apple is that the old profile still need to >>>>>>>>> work, then this is problem already solved. The reason is that >>>>>>>>> -fprofile-instr-use can automatically detect the type of the profile and >>>>>>>>> switch the mode. >>>>>>>>> >>>>>>>>> It's not just that. As Sean pointed out, we're concerned about old >>>>>>>>> profiles inter-operating poorly with new ones. >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> vedant >>>>>>>>> >>>>>>>>> >>>>>>>>> > - Pre-instrumentation passes >>>>>>>>> > >>>>>>>>> > Pre-instrumentation optimization has been critical for reducing >>>>>>>>> the overhead of PGO for the PS4 games we tested (as expected). However, in >>>>>>>>> our measurements (and we are glad to provide more info) the main benefit >>>>>>>>> was inlining (also as expected). A simple pass of inlining at threshold 100 >>>>>>>>> appeared to give all the benefits. Even inlining at threshold 0 gave almost >>>>>>>>> all the benefits. For example, the passes initially proposed in >>>>>>>>> http://reviews.llvm.org/D15828did not improve over just inlining >>>>>>>>> with threshold 100. >>>>>>>>> > >>>>>>>>> > (due to PR27299 we also need to add simplifycfg after inlining >>>>>>>>> to clean up, but this doesn't affect the instrumentation overhead in our >>>>>>>>> measurements) >>>>>>>>> > >>>>>>>>> > Bottom line: for our use cases, inlining does all the work, but >>>>>>>>> we're not opposed to having more passes, which might be beneficial for >>>>>>>>> non-game workloads (which is most code). >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > Yes, Rong is re-collecting performance data before submitting >>>>>>>>> the patch. >>>>>>>>> > >>>>>>>>> > - Warnings >>>>>>>>> > >>>>>>>>> > We identified 3 classes of issues which manifest as spammy >>>>>>>>> warnings when applying profile data with IRPGO (these affect FEPGO also I >>>>>>>>> believe, but we looked in depth at IRPGO): >>>>>>>>> > >>>>>>>>> > 1. The main concerning one is that getPGOFuncName mangles the >>>>>>>>> filename into the counter name. This causes us to get >>>>>>>>> instrprof_error::unknown_function when the pgo-use build is done in a >>>>>>>>> different build directory from the training build (which is a reasonable >>>>>>>>> thing to support). In this situation, PGO data is useless for all `static` >>>>>>>>> functions (and as a byproduct results in a huge volume of warnings). >>>>>>>>> > >>>>>>>>> > This can be enhanced with an user option to override the >>>>>>>>> behavior. Can you help filing a tracking bug? >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > 2. In different TU's, pre-instr inlining might make different >>>>>>>>> inlining decisions (for example, different functions may be available for >>>>>>>>> inlining), causing hash mismatch errors (instrprof_error::hash_mismatch). >>>>>>>>> In building a large game, we only saw 8 instance of this, so it is not as >>>>>>>>> severe as 1, but would be good to fix. >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > Rong has a patch addressing that -- will submit after cleanup >>>>>>>>> pass change is done. >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > 3. A .cpp file may be compiled and put into an archive, but then >>>>>>>>> not selected by the linker and will therefore not result in a counter in >>>>>>>>> the profraw. When compiling this file with pgo-use, >>>>>>>>> instrprof_error::unknown_function will result and a warning will be emitted. >>>>>>>>> > >>>>>>>>> > yes -- this is a common problem to other compilers as well. >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > Case 1 can be fixed using a function hash or other unique >>>>>>>>> identifier instead of a file path. David, in D20195 you mentioned that Rong >>>>>>>>> was working on a patch that would fix 2; we are looking forward to that. >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > Right. >>>>>>>>> > >>>>>>>>> > For 3, I unfortunately do not know of any solution. I don't >>>>>>>>> think there is a way for us to make this warning reliable in the face of >>>>>>>>> this circumstance. So my conclusion is that >>>>>>>>> instrprof_error::unknown_function at least must be defaulted to off >>>>>>>>> unfortunately. >>>>>>>>> > >>>>>>>>> > yes, this can be annoying. If the warnings can be buffered, then >>>>>>>>> the compiler can check if this is due to missing profile for the whole file >>>>>>>>> and can reduce the warnings into one single warning (source file has no >>>>>>>>> profile data). Making it off by default sounds fine to me too if it is too >>>>>>>>> noisy. >>>>>>>>> > >>>>>>>>> > thanks, >>>>>>>>> > >>>>>>>>> > David >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > -- Sean Silva >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160623/d0aabec5/attachment-0001.html>