Displaying 20 results from an estimated 608 matches for "pgo".
Did you mean:
go
2019 Sep 12
4
PGO is ineffective for Rust - but why?
...sa Johnson <tejohnson at google.com> wrote:
> I just have a couple suggestions off the top of my head:
> - have you tried using the new pass manager
> (-fexperimental-new-pass-manager)? That has access to additional analysis
> info during inlining and is able to make more precise PGO based inline
> decisions.
>
(although note the above shouldn't make the difference between no
performance and a typical PGO performance boost)
Another thing I just thought of - are you using -ffunction-sections and
-fdata-sections? These will allow for PGO based function layout in the
l...
2019 Sep 12
6
PGO is ineffective for Rust - but why?
Hi everyone,
As part of my work for Mozilla's Low Level Tools team I've
implemented PGO in the Rust compiler. The feature is
available since Rust 1.37 [1]. However, so far we have not
seen any actual performance gains from enabling PGO for
Rust code. Performance even seems to drop 1-3% with PGO
enabled. I wonder why that is and I'm hoping that someone
here might have experience de...
2019 Sep 16
2
PGO is ineffective for Rust - but why?
...of this: I
> confirmed that `rustc` indeed uses `-ffunction-sections` and
> `-fdata-sections` on all platforms except for macOS. When trying out
> different linkers for a small test case [1], however, I found that
> there were rather large differences in execution time:
>
> ld (no PGO) = 172 ms
> ld (PGO) = 196 ms
>
> gold (no PGO) = 182 ms
> gold (PGO) = 141 ms
>
> lld (no PGO) = 193 ms
> lld (PGO) = 171 ms
>
> So `gold` and `lld` both profit from PGO quite a bit, while `ld`
> linked programs are slower with PGO. I then noticed that branch
> wei...
2019 Sep 16
2
PGO is ineffective for Rust - but why?
...of this: I
> confirmed that `rustc` indeed uses `-ffunction-sections` and
> `-fdata-sections` on all platforms except for macOS. When trying out
> different linkers for a small test case [1], however, I found that
> there were rather large differences in execution time:
>
> ld (no PGO) = 172 ms
> ld (PGO) = 196 ms
>
> gold (no PGO) = 182 ms
> gold (PGO) = 141 ms
>
> lld (no PGO) = 193 ms
> lld (PGO) = 171 ms
>
> So `gold` and `lld` both profit from PGO quite a bit, while `ld`
> linked programs are slower with PGO. I then noticed that branch
> wei...
2019 Sep 24
3
PGO is ineffective for Rust - but why?
To give a little update here:
- I've been further investigating and found an issue [1] with the
Cargo build tool that most Rust projects use. This issue prevents all
projects using Cargo from properly using PGO because it causes symbol
names to be different between the generate and the use phase. With
this issue fixed the number of "No profile data available for
function" warnings goes down from 92369 to 1167 for the Firefox
codebase.
- I also found that the potential GNU ld bug mentioned above...
2019 Sep 17
2
PGO is ineffective for Rust - but why?
...a C version of the same test program [1] compiled with
Clang 8 does not have any problems with GNU ld: The `__llvm_prf_data`
section is the same size for all three linkers. It must be something
specific to the Rust compiler that's going wrong here.
[1] https://github.com/michaelwoerister/rust-pgo-test-programs/tree/master/cpp_branch_weights
On Tue, Sep 17, 2019 at 3:26 PM Michael Woerister
<mwoerister at mozilla.com> wrote:
>
> > Can you clarify if performance difference is caused by using different linkers at instrumentation build?
>
> Yes, good observation! Whether t...
2016 May 07
2
About Clang llvm PGO
Thanks for testing out LLVM PGO and evaluated the performance.
We are currently still more focused on infrastructure improvement which is
the foundation for performance improvement. We are making great progress
in this direction, but there are still some key missing pieces such as
profile data in inliner etc. We are working on...
2015 May 27
4
[LLVMdev] Capabilities of Clang's PGO (e.g. improving code density)
Hello -
I'm an Engineer in Microsoft Office after looking into possible advantages of using PGO for our Android Applications.
We at Microsoft have deep experience with Visual C++'s Profile Guided Optimization<https://msdn.microsoft.com/en-us/library/e7k32f4k.aspx> and often see 10% or more reduction in the size of application code loaded after using PGO for key scenarios (e.g. appl...
2018 Feb 06
2
Current PGO status
Hello David, thanks for detailed response!
Do you have any tests that you use to measure the PGO effectiveness? I
have tested clang version 6.0 with the same sample that Jie Chen used in
2016 and actually both frontend-based PGO and IR-based make code run
slower, see the average time:
clang++ -O3: 3.15 sec
clang++ -O3 and -fprofile-instr-use: 3.160 sec
clang++ -O3 and -fprofile-use: 3.180...
2020 Jun 02
2
Improve hot cold splitting to aggressively outline small blocks
...ll look into
it.
Best regards,
Ruijie
Ruijie Fang
Email: ruijief at princeton.edu
On Tue, Jun 2, 2020 at 12:48 PM Tobias Hieta <tobias at plexapp.com> wrote:
> Hello Ruijie,
>
> One other workload that would be interesting to test might be clang
> itself. Building clang with PGO information is a common trick for improving
> compiler performance and it's well supported in the build system.
>
> Thanks for working on this.
>
> Tobias.
>
> On Tue, Jun 2, 2020, 18:16 Ruijie Fang via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>...
2015 May 27
3
[LLVMdev] Capabilities of Clang's PGO (e.g. improving code density)
Thanks! CIL [LeeHu] for a few comments…
From: Xinliang David Li [mailto:xinliangli at gmail.com]
Sent: Wednesday, May 27, 2015 9:29 AM
To: Lee Hunt
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Capabilities of Clang's PGO (e.g. improving code density)
On Tue, May 26, 2015 at 8:47 PM, Lee Hunt <leehu at exchange.microsoft.com<mailto:leehu at exchange.microsoft.com>> wrote:
Hello –
I’m an Engineer in Microsoft Office after looking into possible advantages of using PGO for our Android Applications.
We a...
2018 Feb 05
0
Current PGO status
On Sun, Feb 4, 2018 at 9:59 PM, Victor Leschuk <vleschuk at accesssoftek.com>
wrote:
> Hello David!
>
> I have recently started acquaintance with PGO in LLVM/clang and found
> your e-mail thread:
> http://lists.llvm.org/pipermail/llvm-dev/2016-May/099395.html . Here you
> posted a nice list of optimizations that use profiling and of those
> which could be using but don't. However that thread is about 2 years
> old. Could you p...
2018 Feb 05
3
Current PGO status
Hello David!
I have recently started acquaintance with PGO in LLVM/clang and found
your e-mail thread:
http://lists.llvm.org/pipermail/llvm-dev/2016-May/099395.html . Here you
posted a nice list of optimizations that use profiling and of those
which could be using but don't. However that thread is about 2 years
old. Could you please kindly let me know...
2018 Jan 29
2
Using PGO and -O3
Hello all,
clang-related PGO documentation recommends using PGO with -O2 (for
example:
https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization).
The question is: is there any reason why exactly -O2 is used in
examples? Are there any factors which can cause problems when using PGO
with -O3?
Thanks in advance f...
2019 Mar 30
2
Minimal PGO for ORC JIT
...ng to execute next. So with
the help of multiple JIT background threads, those functions get
compiled before they are referenced. This will help in reducing JIT
compilation latencies in multi-core machines. This order must be
collected during profiling so that we can use in JIT.
I'm new to PGO, I don't know how the internals details much. As I'm
proposing this project for GSoC'19, i would like to learn how PGO is
structured, it will help to design similar for JIT & write a proposal. I
googled but is less information available about the internals. Is any
references to...
2015 May 27
2
[LLVMdev] FW: Capabilities of Clang's PGO (e.g. improving code density)
David,
Yes, that is very helpful. Thanks!
--randy
From: Xinliang David Li [mailto:xinliangli at gmail.com]
Sent: Wednesday, May 27, 2015 12:53 PM
To: Randy Chapman
Cc: Lee Hunt; llvmdev at cs.uiuc.edu
Subject: Re: FW: [LLVMdev] Capabilities of Clang's PGO (e.g. improving code density)
On Wed, May 27, 2015 at 12:40 PM, Randy Chapman <randyc at microsoft.com<mailto:randyc at microsoft.com>> wrote:
Hi David!
Thanks again for your help! I was wondering if you could clarify one thing for me?
I find mention of “hot arc” optimization (-f...
2015 Aug 11
4
RFC: PGO Late instrumentation for LLVM
One aspect of this that I have not seen discussed is that middle-end
instrumentation enables PGO optimizations to front-ends other than Clang.
While I agree that FE instrumentation could be improved, it still requires
every FE to implement essentially the same common functionality. Having
PGO instrumentation generated in the middle-end, allows us every FE to
automatically take advantage of P...
2018 Feb 06
0
Current PGO status
Victor, thanks for the experiment.
My suspicion is it is due to the remaining issues with block layout --
especially with loop rotation (with PGO). Another problem is that tail dup
is not happening after loop rotation which can limit the effectiveness of
loop rotation.
I tried the internal option -mllvm -force-precise-rotation-cost and there
is about 10% speedup with -fprofile-use. This option turns on more precise
cost model when computing...
2018 Feb 07
2
Current PGO status
...to be concrete). Maybe we could investigate it together? Just
tell me where to start?
On 02/07/2018 02:11 AM, Xinliang David Li wrote:
> Victor, thanks for the experiment.
>
> My suspicion is it is due to the remaining issues with block layout --
> especially with loop rotation (with PGO). Another problem is that tail
> dup is not happening after loop rotation which can limit the
> effectiveness of loop rotation.
>
> I tried the internal option -mllvm -force-precise-rotation-cost and
> there is about 10% speedup with -fprofile-use. This option turns on
> more prec...
2016 Aug 17
5
AutoFDO sample profiles v. SelectInst,
...2:15 PM, Xinliang David Li via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> +dehao.
>
> There are two potential problems:
>
> 1) the branch gets eliminated in the binary that is being profiled, so
> there is no profile data
>
This seems like a fundamental problem for PGO. Maybe it is also responsible
for this bug: https://llvm.org/bugs/show_bug.cgi?id=27359 ?
Should we limit select optimizations in IR for a PGO-training build? Or
should there be a 'select smasher' pass later in the pipeline that turns
selects into branches for a PGO-training build? (I don&...