On 04/20/2016 02:58 PM, Renato Golin via llvm-dev wrote:> Hi Derek, > > I'm not an expert in any of these topics, but I'm excited that you > guys are doing it. It seems like a missing piece that needs to be > filled. > > Some comments inline... > > > On 17 April 2016 at 22:46, Derek Bruening via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> We would prefer to trade off accuracy and build a >> less-accurate tool below our overhead ceiling than to build a high-accuracy >> but slow tool. > > I agree with this strategy. > > As a first approach, making the fastest you can, then later > introducing more probes, maybe via some slider flag (like -ON) to > consciously trade speed for accuracy. > > >> Studying instruction cache behavior with compiler >> instrumentation can be challenging, however, so we plan to at least >> initially focus on data performance. > > I'm interested in how you're going to do this without kernel profiling > probes, like perf. > > Or is the point here introducing syscalls in the right places instead > of randomly profiled? Wouldn't that bias your results? > > >> Many of our planned tools target specific performance issues with data >> accesses. They employ the technique of *shadow memory* to store metadata >> about application data references, using the compiler to instrument loads >> and stores with code to update the shadow memory. > > Is it just counting the number of reads/writes? Or are you going to > add how many of those accesses were hit by a cache miss? > > >> *Cache fragmentation*: this tool gather data structure field hotness >> information, looking for data layout optimization opportunities by grouping >> hot fields together to avoid data cache fragmentation. Future enhancements >> may add field affinity information if it can be computed with low enough >> overhead. > > Would be also good to have temporal information, so that you can > correlate data access that occurs, for example, inside the same loop / > basic block, or in sequence in the common CFG flow. This could lead to > change in allocation patterns (heap, BSS). > > >> *Working set measurement*: this tool measures the data working set size of >> an application at each snapshot during execution. It can help to understand >> phased behavior as well as providing basic direction for further effort by >> the developer: e.g., knowing whether the working set is close to fitting in >> current L3 caches or is many times larger can help determine where to spend >> effort. > > This is interesting, but most useful when your dataset changes size > over different runs. This is similar to running the program under perf > for different workloads, and I'm not sure how you're going to get that > in a single run. It also comes with the additional problem that cache > sizes are not always advertised, so you might have an additional tool > to guess the sizes based on increasing the size of data blocks and > finding steps on the data access graph. > > >> *Dead store detection*: this tool identifies dead stores (write-after-write >> patterns with no intervening read) as well as redundant stores (writes of >> the same value already in memory). Xref the Deadspy paper from CGO 2012. > > This should probably be spotted by the compiler, so I guess it's a > tool for compiler developers to spot missed optimisation opportunities > in the back-end.Not when dead store happens in an external DSO where compiler can't detect it (same applies for single references).>> *Single-reference*: this tool identifies data cache lines brought in but >> only read once. These could be candidates for non-temporal loads. > > That's nice and should be simple enough to get a report in the end. > This also seem to be a hint to compiler developers rather than users. > > I think you guys have a nice set of tools to develop and I'm looking > forward to working with them. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
On 20 April 2016 at 13:18, Yury Gribov <y.gribov at samsung.com> wrote:> Not when dead store happens in an external DSO where compiler can't detect > it (same applies for single references).Do you mean the ones between the DSO and the instrumented code? Because if it's just in the DSO itself, then the compiler could have spotted it, too, when compiling the DSO. I mean, of course there are cases line interprocedural dead-store (call a function that changes A while changing A right after), but that again, could be found at compilation time, given enough inlining depth or IP analysis. Also, if this is not something the compiler can fix, what is the point of detecting dead-stores? For all the non-trivial cases the compiler can't spot, most will probably arise from special situations where the compiler is changing the code to expose the issue, and thus, the user has little control over how to fix the underlying problem. cheers, --renato
On Wed, Apr 20, 2016 at 1:42 PM, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> wrote:> On 20 April 2016 at 13:18, Yury Gribov <y.gribov at samsung.com> wrote: >> Not when dead store happens in an external DSO where compiler can't detect >> it (same applies for single references). > > Do you mean the ones between the DSO and the instrumented code? > Because if it's just in the DSO itself, then the compiler could have > spotted it, too, when compiling the DSO. > > I mean, of course there are cases line interprocedural dead-store > (call a function that changes A while changing A right after), but > that again, could be found at compilation time, given enough inlining > depth or IP analysis.When I read the description, I assumed it would (mostly) be used to detect those inter-procedural dead-stores, that the compiler can't see (without LTO, at least). The "external DSO" case also exists, but unless the DSO is also instrumented, you'd get lots of false-negatives (which aren't "a big problem" with the sanitizers, but of course we want to minimize them (you'd do it by also instrumenting the DSO)).> Also, if this is not something the compiler can fix, what is the point > of detecting dead-stores? For all the non-trivial cases the compiler > can't spot, most will probably arise from special situations where the > compiler is changing the code to expose the issue, and thus, the user > has little control over how to fix the underlying problem.Same as the other sanitizers: The compiler can't fix, but you (the programmer) can! :-) I don't think the dead-stores would mostly come from the compiler changing code around. I think they'd most likely come from the other example you mentioned, where you call a function which writes somewhere, and then you write over it, with no intervening read. If this happens a lot with a given function, maybe you want to write to some parts of the structure conditionally. Derek, Qin: Since this is mostly being researched as it is being implemented (and in the public repo), how do you plan to coordinate with the rest of the community? (Current status, what's "left" to get a "useful" implementation, etc) About the working set tool: How are you thinking about doing the snapshots? How do you plan to sync the several threads? Spawning an external process/"thread" (kind of like LSan), or internally? About the tools in general: Do you expect any of the currently planned ones to be intrusive, and require the user to change their code before they can use the tool with good results? Thank you, Filipe
On Wed, Apr 20, 2016 at 8:42 AM, Renato Golin <renato.golin at linaro.org> wrote:> Also, if this is not something the compiler can fix, what is the point > of detecting dead-stores? For all the non-trivial cases the compiler > can't spot, most will probably arise from special situations where the > compiler is changing the code to expose the issue, and thus, the user > has little control over how to fix the underlying problem. >The case studies in the DeadSpy paper show how they, the user, fixed the most frequent cases of dead stores in SPECCPU (yes, in some cases working around the compiler, but not always): "DeadSpy: A Tool to Pinpoint Program Inefficiencies" <https://www.researchgate.net/publication/241623127_DeadSpy_A_tool_to_pinpoint_program_inefficiencies> by Chabbi et al. from CGO 2012. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160420/111ac036/attachment.html>