search for: pgo

Displaying 20 results from an estimated 602 matches for "pgo".

Did you mean: go
2019 Sep 12
4
PGO is ineffective for Rust - but why?
...sa Johnson <tejohnson at google.com> wrote: > I just have a couple suggestions off the top of my head: > - have you tried using the new pass manager > (-fexperimental-new-pass-manager)? That has access to additional analysis > info during inlining and is able to make more precise PGO based inline > decisions. > (although note the above shouldn't make the difference between no performance and a typical PGO performance boost) Another thing I just thought of - are you using -ffunction-sections and -fdata-sections? These will allow for PGO based function layout in the l...
2019 Sep 12
6
PGO is ineffective for Rust - but why?
Hi everyone, As part of my work for Mozilla's Low Level Tools team I've implemented PGO in the Rust compiler. The feature is available since Rust 1.37 [1]. However, so far we have not seen any actual performance gains from enabling PGO for Rust code. Performance even seems to drop 1-3% with PGO enabled. I wonder why that is and I'm hoping that someone here might have experience de...
2019 Sep 16
2
PGO is ineffective for Rust - but why?
...of this: I > confirmed that `rustc` indeed uses `-ffunction-sections` and > `-fdata-sections` on all platforms except for macOS. When trying out > different linkers for a small test case [1], however, I found that > there were rather large differences in execution time: > > ld (no PGO) = 172 ms > ld (PGO) = 196 ms > > gold (no PGO) = 182 ms > gold (PGO) = 141 ms > > lld (no PGO) = 193 ms > lld (PGO) = 171 ms > > So `gold` and `lld` both profit from PGO quite a bit, while `ld` > linked programs are slower with PGO. I then noticed that branch > wei...
2019 Sep 16
2
PGO is ineffective for Rust - but why?
...of this: I > confirmed that `rustc` indeed uses `-ffunction-sections` and > `-fdata-sections` on all platforms except for macOS. When trying out > different linkers for a small test case [1], however, I found that > there were rather large differences in execution time: > > ld (no PGO) = 172 ms > ld (PGO) = 196 ms > > gold (no PGO) = 182 ms > gold (PGO) = 141 ms > > lld (no PGO) = 193 ms > lld (PGO) = 171 ms > > So `gold` and `lld` both profit from PGO quite a bit, while `ld` > linked programs are slower with PGO. I then noticed that branch > wei...
2019 Sep 24
3
PGO is ineffective for Rust - but why?
To give a little update here: - I've been further investigating and found an issue [1] with the Cargo build tool that most Rust projects use. This issue prevents all projects using Cargo from properly using PGO because it causes symbol names to be different between the generate and the use phase. With this issue fixed the number of "No profile data available for function" warnings goes down from 92369 to 1167 for the Firefox codebase. - I also found that the potential GNU ld bug mentioned above...
2019 Sep 17
2
PGO is ineffective for Rust - but why?
...the same test program [1] compiled with Clang 8 does not have any problems with GNU ld: The `__llvm_prf_data` section is the same size for all three linkers. It must be something specific to the Rust compiler that's going wrong here. [1] https://github.com/michaelwoerister/rust-pgo-test-programs/tree/master/cpp_branch_weights On Tue, Sep 17, 2019 at 3:26 PM Michael Woerister <mwoerister at mozilla.com> wrote: > > > Can you clarify if performance difference is caused by using different linkers at instrumentation build? > > Yes, good observatio...
2016 May 07
2
About Clang llvm PGO
Thanks for testing out LLVM PGO and evaluated the performance. We are currently still more focused on infrastructure improvement which is the foundation for performance improvement. We are making great progress in this direction, but there are still some key missing pieces such as profile data in inliner etc. We are working on...
2015 May 27
4
[LLVMdev] Capabilities of Clang's PGO (e.g. improving code density)
Hello - I'm an Engineer in Microsoft Office after looking into possible advantages of using PGO for our Android Applications. We at Microsoft have deep experience with Visual C++'s Profile Guided Optimization<https://msdn.microsoft.com/en-us/library/e7k32f4k.aspx> and often see 10% or more reduction in the size of application code loaded after using PGO for key...
2020 Jun 02
2
Improve hot cold splitting to aggressively outline small blocks
...ll look into it. Best regards, Ruijie Ruijie Fang Email: ruijief at princeton.edu On Tue, Jun 2, 2020 at 12:48 PM Tobias Hieta <tobias at plexapp.com> wrote: > Hello Ruijie, > > One other workload that would be interesting to test might be clang > itself. Building clang with PGO information is a common trick for improving > compiler performance and it's well supported in the build system. > > Thanks for working on this. > > Tobias. > > On Tue, Jun 2, 2020, 18:16 Ruijie Fang via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >>...
2018 Feb 06
2
Current PGO status
Hello David, thanks for detailed response! Do you have any tests that you use to measure the PGO effectiveness? I have tested clang version 6.0 with the same sample that Jie Chen used in 2016 and actually both frontend-based PGO and IR-based make code run slower, see the average time: clang++ -O3: 3.15 sec  clang++ -O3 and -fprofile-instr-use: 3.160 sec clang++ -O3 and -fprofile-use: 3.180...
2015 May 27
3
[LLVMdev] Capabilities of Clang's PGO (e.g. improving code density)
Thanks! CIL [LeeHu] for a few comments… From: Xinliang David Li [mailto:xinliangli at gmail.com] Sent: Wednesday, May 27, 2015 9:29 AM To: Lee Hunt Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Capabilities of Clang's PGO (e.g. improving code density) On Tue, May 26, 2015 at 8:47 PM, Lee Hunt <leehu at exchange.microsoft.com<mailto:leehu at exchange.microsoft.com>> wrote: Hello – I’m an Engineer in Microsoft Office after looking into possible advantages of using PGO for our Android Applications. We a...
2018 Feb 05
3
Current PGO status
Hello David! I have recently started acquaintance with PGO in LLVM/clang and found your e-mail thread: http://lists.llvm.org/pipermail/llvm-dev/2016-May/099395.html . Here you posted a nice list of optimizations that use profiling and of those which could be using but don't. However that thread is about 2 years old. Could yo...
2018 Feb 05
0
Current PGO status
On Sun, Feb 4, 2018 at 9:59 PM, Victor Leschuk <vleschuk at accesssoftek.com> wrote: > Hello David! > > I have recently started acquaintance with PGO in LLVM/clang and found > your e-mail thread: > http://lists.llvm.org/pipermail/llvm-dev/2016-May/099395.html . Here you > posted a nice list of optimizations that use profiling and of those > which could be using but don't. However that thread is about 2...
2018 Jan 29
2
Using PGO and -O3
Hello all, clang-related PGO documentation recommends using PGO with -O2 (for example: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization). The question is: is there any reason why exactly -O2 is used in examples? Are there any factors which can cause problems when using PGO with -O3? Tha...
2015 May 27
2
[LLVMdev] FW: Capabilities of Clang's PGO (e.g. improving code density)
David, Yes, that is very helpful. Thanks! --randy From: Xinliang David Li [mailto:xinliangli at gmail.com] Sent: Wednesday, May 27, 2015 12:53 PM To: Randy Chapman Cc: Lee Hunt; llvmdev at cs.uiuc.edu Subject: Re: FW: [LLVMdev] Capabilities of Clang's PGO (e.g. improving code density) On Wed, May 27, 2015 at 12:40 PM, Randy Chapman <randyc at microsoft.com<mailto:randyc at microsoft.com>> wrote: Hi David! Thanks again for your help! I was wondering if you could clarify one thing for me? I find mention of “hot arc” optimization (-f...
2015 Aug 11
4
RFC: PGO Late instrumentation for LLVM
One aspect of this that I have not seen discussed is that middle-end instrumentation enables PGO optimizations to front-ends other than Clang. While I agree that FE instrumentation could be improved, it still requires every FE to implement essentially the same common functionality. Having PGO instrumentation generated in the middle-end, allows us every FE to automatically take advantage of P...
2019 Mar 30
2
Minimal PGO for ORC JIT
...ng to execute next. So with the help of multiple JIT background threads, those functions get compiled before they are referenced. This will help in reducing JIT compilation latencies in multi-core machines. This order must be collected during profiling so that we can use in JIT. I'm new to PGO, I don't know how the internals details much. As I'm proposing this project for GSoC'19, i would like to learn how PGO is structured, it will help to design similar for JIT & write a proposal. I googled but is less information available about the internals. Is any references to...
2018 Feb 07
2
Current PGO status
...oncrete). Maybe we could investigate it together? Just tell me where to start? On 02/07/2018 02:11 AM, Xinliang David Li wrote: > Victor, thanks for the experiment. > > My suspicion is it is due to the remaining issues with block layout -- > especially with loop rotation (with PGO). Another problem is that tail > dup is not happening after loop rotation which can limit the > effectiveness of loop rotation. > > I tried the internal option -mllvm -force-precise-rotation-cost and > there is about 10% speedup with -fprofile-use. This option turns on > more prec...
2018 Feb 06
0
Current PGO status
Victor, thanks for the experiment. My suspicion is it is due to the remaining issues with block layout -- especially with loop rotation (with PGO). Another problem is that tail dup is not happening after loop rotation which can limit the effectiveness of loop rotation. I tried the internal option -mllvm -force-precise-rotation-cost and there is about 10% speedup with -fprofile-use. This option turns on more precise cost model when computing...
2016 Aug 17
5
AutoFDO sample profiles v. SelectInst,
...2:15 PM, Xinliang David Li via llvm-dev < llvm-dev at lists.llvm.org> wrote: > +dehao. > > There are two potential problems: > > 1) the branch gets eliminated in the binary that is being profiled, so > there is no profile data > This seems like a fundamental problem for PGO. Maybe it is also responsible for this bug: https://llvm.org/bugs/show_bug.cgi?id=27359 ? Should we limit select optimizations in IR for a PGO-training build? Or should there be a 'select smasher' pass later in the pipeline that turns selects into branches for a PGO-trainin...