Displaying 20 results from an estimated 5000 matches similar to: "[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)"
2014 Apr 17
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Thu, Apr 17, 2014 at 6:10 PM, Yaron Keren <yaron.keren at gmail.com> wrote:
> If accuracy is not critical, incrementing the counters without any guards
> might be good enough.
>
No. Contention on the counters leads to 5x-10x slowdown. This is never
good enough.
--kcc
Hot areas will still be hot and cold areas will not be affected.
>
> Yaron
>
>
>
>
2014 Apr 17
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
Chandler Carruth <chandlerc at google.com> writes:
> Having thought a bit about the best strategy to solve this, I think we should
> use a tradeoff of memory to reduce contention. I don't really like any of the
> other options as much, if we can get that one to work. Here is my specific
> suggestion:
>
> On Thu, Apr 17, 2014 at 5:21 AM, Kostya Serebryany <kcc at
2014 Apr 17
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
Good thinking, but why do you think runtime selection of shard count is
better than compile time selection? For single threaded apps, shard count
is always 1, so why paying the penalty to check thread id each time
function is entered?
For multi-threaded apps, I would expect MAX to be smaller than NUM_OF_CORES
to avoid excessive memory consumption, then you always end up with N ==
MAX. If MAX is
2014 Apr 18
4
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Apr 17, 2014, at 2:04 PM, Chandler Carruth <chandlerc at google.com> wrote:
> On Thu, Apr 17, 2014 at 1:27 PM, Justin Bogner <mail at justinbogner.com> wrote:
> Chandler Carruth <chandlerc at google.com> writes:
> > if (thread-ID != main's thread-ID && shard_count < std::min(MAX, NUMBER_OF_CORES)) {
> > shard_count = std::min(MAX,
2014 Apr 18
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Fri, Apr 18, 2014 at 11:13 AM, Dmitry Vyukov <dvyukov at google.com> wrote:
> Hi,
>
> This is long thread, so I will combine several comments into single email.
>
>
> >> - 8-bit per-thread counters, dumping into central counters on overflow.
> >The overflow will happen very quickly with 8bit counter.
>
> Yes, but it reduces contention by 256x (a thread
2014 Apr 18
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Fri, Apr 18, 2014 at 11:44 AM, Dmitry Vyukov <dvyukov at google.com> wrote:
> On Fri, Apr 18, 2014 at 11:32 AM, Dmitry Vyukov <dvyukov at google.com>wrote:
>
>> On Fri, Apr 18, 2014 at 11:13 AM, Dmitry Vyukov <dvyukov at google.com>wrote:
>>
>>> Hi,
>>>
>>> This is long thread, so I will combine several comments into single
2014 Apr 18
4
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Fri, Apr 18, 2014 at 12:13 AM, Dmitry Vyukov <dvyukov at google.com> wrote:
> Hi,
>
> This is long thread, so I will combine several comments into single email.
>
>
> >> - 8-bit per-thread counters, dumping into central counters on overflow.
> >The overflow will happen very quickly with 8bit counter.
>
> Yes, but it reduces contention by 256x (a thread
2014 Apr 23
4
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Apr 23, 2014, at 7:31 AM, Kostya Serebryany <kcc at google.com> wrote:
> I've run one proprietary benchmark that reflects a large portion of the google's server side code.
> -fprofile-instr-generate leads to 14x slowdown due to counter contention. That's serious.
> Admittedly, there is a single hot function that accounts for half of that slowdown,
> but even if
2014 Apr 25
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Apr 24, 2014, at 1:33 AM, Dmitry Vyukov <dvyukov at google.com> wrote:
>>
>> I can see that the behavior of our current instrumentation is going to be a
>> problem for the kinds of applications that you’re looking at. If you can
>> find a way to get the overhead down without losing accuracy
>
> What are your requirements for accuracy?
> Current
2016 Mar 07
3
Profile-based inlining status
Hello,
I'm learning how LLVM performs PGO (profile-guided optimizations) by using
the instrumentation-based profile build (-fprofile-instr-generate and
-fprofile-instr-use).
However, I found there is no difference in inlining behaviors between with
and without PGO for a few spec benchmarks by checking the emit optimization
reports (-Rpass=inline -Rpass-missed=inline -Rpass-analysis=inline).
2014 Apr 17
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Thu, Apr 17, 2014 at 8:37 PM, Jonathan Roelofs <jonathan at codesourcery.com
> wrote:
> How about per-thread if the counter is hot enough?
>
Err. How do you know if the counter is hot w/o first profiling the app?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2016 Mar 11
3
RFC: Pass to prune redundant profiling instrumentation
On Thu, Mar 10, 2016 at 8:33 PM, Sean Silva via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> On Thu, Mar 10, 2016 at 7:21 PM, Vedant Kumar via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi,
>>
>> I'd like to add a new pass to LLVM which removes redundant profile counter
>> updates. The goal is to speed up code coverage
2015 Aug 08
3
RFC: PGO Late instrumentation for LLVM
Instrumentation based Profile Guided Optimization (PGO) is a compiler
technique that leverages important program runtime information, such as
precise edge counts and frequent value information, to make frequently
executed code run faster. It's proven to be one of the most effective ways
to improve program performance.
An important design point of PGO is to decide where to place the
2016 Mar 11
2
RFC: Pass to prune redundant profiling instrumentation
On Thu, Mar 10, 2016 at 10:13 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>
> On Thu, Mar 10, 2016 at 9:42 PM, Xinliang David Li <xinliangli at gmail.com>
> wrote:
>
>>
>>
>> On Thu, Mar 10, 2016 at 8:33 PM, Sean Silva via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>>
>>> On Thu, Mar 10,
2014 Feb 21
12
[LLVMdev] asan coverage
>
>
>
> We may need some additional info.
What kind of additional info?
> I haven't put a ton of thought into
> this, but I'm hoping we can either (a) use debug info as is or add some
> extra (valid) debug info to support this, or (b) add an extra
> debug-info-like section to instrumented binaries with the information we
> need.
>
I'd try this data
2017 Oct 26
2
Profiling data structure
On Wed, Oct 25, 2017 at 09:13:54AM -0700, Xinliang David Li wrote:
> On Wed, Oct 25, 2017 at 12:26 AM, Roger Pau Monné via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> > Hello,
> >
> > I've been working on implementing some basic functionality in order to
> > use the llvm profiling functionality inside of a kernel (the Xen
> > hypervisor).
2015 Aug 08
2
RFC: PGO Late instrumentation for LLVM
Accidentally sent to uiuc server.
On Fri, Aug 7, 2015 at 10:49 PM, Sean Silva <chisophugis at gmail.com> wrote:
> Can you compare your results with another approach: simply do not
> instrument the top 1% hottest functions (by function entry count)? If this
> simple approach provides most of the benefits (my measurements on one
> codebase I tested show that it would eliminate
2015 Aug 10
3
RFC: PGO Late instrumentation for LLVM
On Sat, Aug 8, 2015 at 6:31 AM, Xinliang David Li <davidxl at google.com>
wrote:
> On Fri, Aug 7, 2015 at 10:56 PM, Sean Silva <chisophugis at gmail.com> wrote:
> > Accidentally sent to uiuc server.
> >
> >
> > On Fri, Aug 7, 2015 at 10:49 PM, Sean Silva <chisophugis at gmail.com>
> wrote:
> >>
> >> Can you compare your results
2015 Apr 09
2
[LLVMdev] code coverage instrumentation
Hi
Not sure if this is a clang or llvm related question so I'm sending to both mailing lists.
Anyways, I have few questions regarding size and execution time of instrumented code:
We are trying to run code coverage on memory limited hardware and investigating both (generating gcov output using -coverage and the llvm's own way using -fprofile-instr-generate -fcoverage-mapping clang flags)
2016 May 24
0
The state of IRPGO (3 remaining work items)
Zooming into the command-line option bike-shed:
> On 2016-May-24, at 15:41, Vedant Kumar via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> At its core I don't think -fprofile-instr-generate *implies* FE-based instrumentation. So, I'd like to see the driver do this (on all platforms):
>
> * -fprofile-instr-generate: IR instrumentation
> *