thr3ads.net - similar to: "[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)"

Displaying 20 results from an estimated 5000 matches similar to: "[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)"

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 17

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

On Thu, Apr 17, 2014 at 6:10 PM, Yaron Keren <yaron.keren at gmail.com> wrote: > If accuracy is not critical, incrementing the counters without any guards > might be good enough. > No. Contention on the counters leads to 5x-10x slowdown. This is never good enough. --kcc Hot areas will still be hot and cold areas will not be affected. > > Yaron > > > >

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 17

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

Chandler Carruth <chandlerc at google.com> writes: > Having thought a bit about the best strategy to solve this, I think we should > use a tradeoff of memory to reduce contention. I don't really like any of the > other options as much, if we can get that one to work. Here is my specific > suggestion: > > On Thu, Apr 17, 2014 at 5:21 AM, Kostya Serebryany <kcc at

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 17

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

Good thinking, but why do you think runtime selection of shard count is better than compile time selection? For single threaded apps, shard count is always 1, so why paying the penalty to check thread id each time function is entered? For multi-threaded apps, I would expect MAX to be smaller than NUM_OF_CORES to avoid excessive memory consumption, then you always end up with N == MAX. If MAX is

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 18

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

On Apr 17, 2014, at 2:04 PM, Chandler Carruth <chandlerc at google.com> wrote: > On Thu, Apr 17, 2014 at 1:27 PM, Justin Bogner <mail at justinbogner.com> wrote: > Chandler Carruth <chandlerc at google.com> writes: > > if (thread-ID != main's thread-ID && shard_count < std::min(MAX, NUMBER_OF_CORES)) { > > shard_count = std::min(MAX,

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 18

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

On Fri, Apr 18, 2014 at 11:13 AM, Dmitry Vyukov <dvyukov at google.com> wrote: > Hi, > > This is long thread, so I will combine several comments into single email. > > > >> - 8-bit per-thread counters, dumping into central counters on overflow. > >The overflow will happen very quickly with 8bit counter. > > Yes, but it reduces contention by 256x (a thread

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 18

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

On Fri, Apr 18, 2014 at 11:44 AM, Dmitry Vyukov <dvyukov at google.com> wrote: > On Fri, Apr 18, 2014 at 11:32 AM, Dmitry Vyukov <dvyukov at google.com>wrote: > >> On Fri, Apr 18, 2014 at 11:13 AM, Dmitry Vyukov <dvyukov at google.com>wrote: >> >>> Hi, >>> >>> This is long thread, so I will combine several comments into single

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 18

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

On Fri, Apr 18, 2014 at 12:13 AM, Dmitry Vyukov <dvyukov at google.com> wrote: > Hi, > > This is long thread, so I will combine several comments into single email. > > > >> - 8-bit per-thread counters, dumping into central counters on overflow. > >The overflow will happen very quickly with 8bit counter. > > Yes, but it reduces contention by 256x (a thread

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 23

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

On Apr 23, 2014, at 7:31 AM, Kostya Serebryany <kcc at google.com> wrote: > I've run one proprietary benchmark that reflects a large portion of the google's server side code. > -fprofile-instr-generate leads to 14x slowdown due to counter contention. That's serious. > Admittedly, there is a single hot function that accounts for half of that slowdown, > but even if

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 25

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

On Apr 24, 2014, at 1:33 AM, Dmitry Vyukov <dvyukov at google.com> wrote: >> >> I can see that the behavior of our current instrumentation is going to be a >> problem for the kinds of applications that you’re looking at. If you can >> find a way to get the overhead down without losing accuracy > > What are your requirements for accuracy? > Current

Profile-based inlining status

2016 Mar 07

Profile-based inlining status

Hello, I'm learning how LLVM performs PGO (profile-guided optimizations) by using the instrumentation-based profile build (-fprofile-instr-generate and -fprofile-instr-use). However, I found there is no difference in inlining behaviors between with and without PGO for a few spec benchmarks by checking the emit optimization reports (-Rpass=inline -Rpass-missed=inline -Rpass-analysis=inline).

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 17

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

On Thu, Apr 17, 2014 at 8:37 PM, Jonathan Roelofs <jonathan at codesourcery.com > wrote: > How about per-thread if the counter is hot enough? > Err. How do you know if the counter is hot w/o first profiling the app? -------------- next part -------------- An HTML attachment was scrubbed... URL:

RFC: Pass to prune redundant profiling instrumentation

2016 Mar 11

RFC: Pass to prune redundant profiling instrumentation

On Thu, Mar 10, 2016 at 8:33 PM, Sean Silva via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > > On Thu, Mar 10, 2016 at 7:21 PM, Vedant Kumar via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi, >> >> I'd like to add a new pass to LLVM which removes redundant profile counter >> updates. The goal is to speed up code coverage

RFC: PGO Late instrumentation for LLVM

2015 Aug 08

RFC: PGO Late instrumentation for LLVM

Instrumentation based Profile Guided Optimization (PGO) is a compiler technique that leverages important program runtime information, such as precise edge counts and frequent value information, to make frequently executed code run faster. It's proven to be one of the most effective ways to improve program performance. An important design point of PGO is to decide where to place the

RFC: Pass to prune redundant profiling instrumentation

2016 Mar 11

RFC: Pass to prune redundant profiling instrumentation

On Thu, Mar 10, 2016 at 10:13 PM, Sean Silva <chisophugis at gmail.com> wrote: > > > On Thu, Mar 10, 2016 at 9:42 PM, Xinliang David Li <xinliangli at gmail.com> > wrote: > >> >> >> On Thu, Mar 10, 2016 at 8:33 PM, Sean Silva via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> >>> >>> On Thu, Mar 10,

[LLVMdev] asan coverage

2014 Feb 21

[LLVMdev] asan coverage

> > > > We may need some additional info. What kind of additional info? > I haven't put a ton of thought into > this, but I'm hoping we can either (a) use debug info as is or add some > extra (valid) debug info to support this, or (b) add an extra > debug-info-like section to instrumented binaries with the information we > need. > I'd try this data

Profiling data structure

2017 Oct 26

Profiling data structure

On Wed, Oct 25, 2017 at 09:13:54AM -0700, Xinliang David Li wrote: > On Wed, Oct 25, 2017 at 12:26 AM, Roger Pau Monné via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > Hello, > > > > I've been working on implementing some basic functionality in order to > > use the llvm profiling functionality inside of a kernel (the Xen > > hypervisor).

RFC: PGO Late instrumentation for LLVM

2015 Aug 08

RFC: PGO Late instrumentation for LLVM

Accidentally sent to uiuc server. On Fri, Aug 7, 2015 at 10:49 PM, Sean Silva <chisophugis at gmail.com> wrote: > Can you compare your results with another approach: simply do not > instrument the top 1% hottest functions (by function entry count)? If this > simple approach provides most of the benefits (my measurements on one > codebase I tested show that it would eliminate

RFC: PGO Late instrumentation for LLVM

2015 Aug 10

RFC: PGO Late instrumentation for LLVM

On Sat, Aug 8, 2015 at 6:31 AM, Xinliang David Li <davidxl at google.com> wrote: > On Fri, Aug 7, 2015 at 10:56 PM, Sean Silva <chisophugis at gmail.com> wrote: > > Accidentally sent to uiuc server. > > > > > > On Fri, Aug 7, 2015 at 10:49 PM, Sean Silva <chisophugis at gmail.com> > wrote: > >> > >> Can you compare your results

[LLVMdev] code coverage instrumentation

2015 Apr 09

[LLVMdev] code coverage instrumentation

Hi Not sure if this is a clang or llvm related question so I'm sending to both mailing lists. Anyways, I have few questions regarding size and execution time of instrumented code: We are trying to run code coverage on memory limited hardware and investigating both (generating gcov output using -coverage and the llvm's own way using -fprofile-instr-generate -fcoverage-mapping clang flags)

The state of IRPGO (3 remaining work items)

2016 May 24

The state of IRPGO (3 remaining work items)

Zooming into the command-line option bike-shed: > On 2016-May-24, at 15:41, Vedant Kumar via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > At its core I don't think -fprofile-instr-generate *implies* FE-based instrumentation. So, I'd like to see the driver do this (on all platforms): > > * -fprofile-instr-generate: IR instrumentation > *

similar to: [LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)