Displaying 6 results from an estimated 6 matches for "shard_count".
Did you mean:
shared_count
2014 Apr 17
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
...hat if you never need multiple shards (single threaded) you pay
> essentially zero cost. I would have a global number of shards that changes
> rarely, and re-compute it on entry to each function with something along the
> lines of:
>
> if (thread-ID != main's thread-ID && shard_count == 1) {
> shard_count = std::min(MAX, std::max(NUMBER_OF_THREADS, NUMBER_OF_CORES));
> // if shard_count changed with this, we can also call a library routine here
> that does the work of allocating the actual extra shards.
> }
Is it possible to hook on something more clever than f...
2014 Apr 18
4
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Apr 17, 2014, at 2:04 PM, Chandler Carruth <chandlerc at google.com> wrote:
> On Thu, Apr 17, 2014 at 1:27 PM, Justin Bogner <mail at justinbogner.com> wrote:
> Chandler Carruth <chandlerc at google.com> writes:
> > if (thread-ID != main's thread-ID && shard_count < std::min(MAX, NUMBER_OF_CORES)) {
> > shard_count = std::min(MAX, std::max(NUMBER_OF_THREADS, NUMBER_OF_CORES));
> > // if shard_count changed with this, we can also call a library routine here
> > that does the work of allocating the actual extra shards.
> > }
>...
2014 Apr 18
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
...reduce costs.
> The baseline memory consumption for systems (and amount of RAM!) is
> O(NCORES), not O(1). In some read-mostly cases it's possible to achieve
> O(1) memory consumption, and that's great. But if it's not the case here,
> let it be so.
>
>
>
> > shard_count = std::min(MAX, std::max(NUMBER_OF_THREADS, NUMBER_OF_CORES))
>
> Threads do not produce contention, it's cores that produce contention.
> The formula must be: shard_count = k*NCORES
> And if you want less memory in single-threaded case, then: shard_count =
> min(k*NCORES, c*NTH...
2014 Apr 17
9
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
Hi,
The current design of -fprofile-instr-generate has the same fundamental
flaw
as the old gcc's gcov instrumentation: it has contention on counters.
A trivial synthetic test case was described here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066116.html
For the problem to appear we need to have a hot function that is
simultaneously executed
by multiple threads -- then we will
2014 Apr 18
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
...systems (and amount of RAM!) is
>>> O(NCORES), not O(1). In some read-mostly cases it's possible to achieve
>>> O(1) memory consumption, and that's great. But if it's not the case here,
>>> let it be so.
>>>
>>>
>>>
>>> > shard_count = std::min(MAX, std::max(NUMBER_OF_THREADS,
>>> NUMBER_OF_CORES))
>>>
>>> Threads do not produce contention, it's cores that produce contention.
>>> The formula must be: shard_count = k*NCORES
>>> And if you want less memory in single-threaded case,...
2014 Apr 17
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
...hat if you never need multiple shards (single threaded) you pay
> essentially zero cost. I would have a global number of shards that changes
> rarely, and re-compute it on entry to each function with something along
> the lines of:
>
> if (thread-ID != main's thread-ID && shard_count == 1) {
> shard_count = std::min(MAX, std::max(NUMBER_OF_THREADS,
> NUMBER_OF_CORES));
> // if shard_count changed with this, we can also call a library routine
> here that does the work of allocating the actual extra shards.
> }
>
> MAX is a fixed cap so even on systems wi...