Displaying 2 results from an estimated 2 matches for "num_of_cores".
2014 Apr 17
2
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
Good thinking, but why do you think runtime selection of shard count is
better than compile time selection? For single threaded apps, shard count
is always 1, so why paying the penalty to check thread id each time
function is entered?
For multi-threaded apps, I would expect MAX to be smaller than NUM_OF_CORES
to avoid excessive memory consumption, then you always end up with N ==
MAX. If MAX is larger than NUM_OF_CORES, for large MT apps, the # of
threads tends to be larger than NUM_OF_CORES, so it also ends up with N ==
MAX. For rare cases, the shard count may switch between MAX and
NUM_OF_CORES, bu...
2014 Apr 17
9
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
Hi,
The current design of -fprofile-instr-generate has the same fundamental
flaw
as the old gcc's gcov instrumentation: it has contention on counters.
A trivial synthetic test case was described here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066116.html
For the problem to appear we need to have a hot function that is
simultaneously executed
by multiple threads -- then we will