thr3ads.net - search: "shard

Displaying 6 results from an estimated 6 matches for "shard_count".

Did you mean: shared_count

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 17

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

...hat if you never need multiple shards (single threaded) you pay > essentially zero cost. I would have a global number of shards that changes > rarely, and re-compute it on entry to each function with something along the > lines of: > > if (thread-ID != main's thread-ID && shard_count == 1) { > shard_count = std::min(MAX, std::max(NUMBER_OF_THREADS, NUMBER_OF_CORES)); > // if shard_count changed with this, we can also call a library routine here > that does the work of allocating the actual extra shards. > } Is it possible to hook on something more clever than f...

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 18

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

On Apr 17, 2014, at 2:04 PM, Chandler Carruth <chandlerc at google.com> wrote: > On Thu, Apr 17, 2014 at 1:27 PM, Justin Bogner <mail at justinbogner.com> wrote: > Chandler Carruth <chandlerc at google.com> writes: > > if (thread-ID != main's thread-ID && shard_count < std::min(MAX, NUMBER_OF_CORES)) { > > shard_count = std::min(MAX, std::max(NUMBER_OF_THREADS, NUMBER_OF_CORES)); > > // if shard_count changed with this, we can also call a library routine here > > that does the work of allocating the actual extra shards. > > } >...

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 18

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

...reduce costs. > The baseline memory consumption for systems (and amount of RAM!) is > O(NCORES), not O(1). In some read-mostly cases it's possible to achieve > O(1) memory consumption, and that's great. But if it's not the case here, > let it be so. > > > > > shard_count = std::min(MAX, std::max(NUMBER_OF_THREADS, NUMBER_OF_CORES)) > > Threads do not produce contention, it's cores that produce contention. > The formula must be: shard_count = k*NCORES > And if you want less memory in single-threaded case, then: shard_count = > min(k*NCORES, c*NTH...

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 17

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

Hi, The current design of -fprofile-instr-generate has the same fundamental flaw as the old gcc's gcov instrumentation: it has contention on counters. A trivial synthetic test case was described here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066116.html For the problem to appear we need to have a hot function that is simultaneously executed by multiple threads -- then we will

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 18

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

...systems (and amount of RAM!) is >>> O(NCORES), not O(1). In some read-mostly cases it's possible to achieve >>> O(1) memory consumption, and that's great. But if it's not the case here, >>> let it be so. >>> >>> >>> >>> > shard_count = std::min(MAX, std::max(NUMBER_OF_THREADS, >>> NUMBER_OF_CORES)) >>> >>> Threads do not produce contention, it's cores that produce contention. >>> The formula must be: shard_count = k*NCORES >>> And if you want less memory in single-threaded case,...

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

2014 Apr 17

[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

...hat if you never need multiple shards (single threaded) you pay > essentially zero cost. I would have a global number of shards that changes > rarely, and re-compute it on entry to each function with something along > the lines of: > > if (thread-ID != main's thread-ID && shard_count == 1) { > shard_count = std::min(MAX, std::max(NUMBER_OF_THREADS, > NUMBER_OF_CORES)); > // if shard_count changed with this, we can also call a library routine > here that does the work of allocating the actual extra shards. > } > > MAX is a fixed cap so even on systems wi...

search for: shard_count