Duncan P. N. Exon Smith
2014-Apr-17 17:58 UTC
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On 2014-Apr-17, at 10:38, Xinliang David Li <xinliangli at gmail.com> wrote:> > Another idea is to use stack local counters per function -- synced up with global counters on entry and exit. the problem with it is for deeply recursive calls, stack pressure can be too high.I think they'd need to be synced with global counters before function calls as well, since any function call can call "exit()".
Xinliang David Li
2014-Apr-17 18:09 UTC
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Thu, Apr 17, 2014 at 10:58 AM, Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote:> > On 2014-Apr-17, at 10:38, Xinliang David Li <xinliangli at gmail.com> wrote: > > > > > Another idea is to use stack local counters per function -- synced up > with global counters on entry and exit. the problem with it is for deeply > recursive calls, stack pressure can be too high. > > I think they'd need to be synced with global counters before function > calls as well, since any function call can call "exit()". >right -- but it might be better to handle this in other ways. For instance a stack of counters for each frames is maintained. At exit, they are flushed in a batch. Or simply ignore it in case of program exit . David -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140417/285d897f/attachment.html>
Bob Wilson
2014-Apr-17 18:22 UTC
[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
On Apr 17, 2014, at 11:09 AM, Xinliang David Li <xinliangli at gmail.com> wrote:> > On Thu, Apr 17, 2014 at 10:58 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: > > On 2014-Apr-17, at 10:38, Xinliang David Li <xinliangli at gmail.com> wrote: > > > > > Another idea is to use stack local counters per function -- synced up with global counters on entry and exit. the problem with it is for deeply recursive calls, stack pressure can be too high. > > I think they'd need to be synced with global counters before function > calls as well, since any function call can call "exit()". > > right -- but it might be better to handle this in other ways. For instance a stack of counters for each frames is maintained. At exit, they are flushed in a batch. Or simply ignore it in case of program exit .It seems to me like we’re going to have a hard time getting good multithreaded performance without significant impact on the single-threaded behavior. We might need to add an option to choose between those. There’s a lot of room for improvement in the performance with the current instrumentation, so maybe we can find a way to make things incrementally better in a way that helps both, but avoiding the multithreaded cache conflicts seems like it’s going to be expensive in other ways. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140417/12cde0a2/attachment.html>