thr3ads.net - llvm dev - [llvm-dev] RFC: EfficiencySanitizer [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Derek Bruening via llvm-dev

2016-Apr-17 21:46 UTC

[llvm-dev] RFC: EfficiencySanitizer

TL;DR: We plan to build a suite of compiler-based dynamic instrumentation
tools for analyzing targeted performance problems.  These tools will all
live under a new "EfficiencySanitizer" (or "esan") sanitizer
umbrella, as
they will share significant portions of their implementations.

===================Motivation
===================
Our goal is to build a suite of dynamic instrumentation tools for analyzing
particular performance problems that are difficult to evaluate using other
profiling methods.  Modern hardware performance counters provide insight
into where time is spent and when micro-architectural events such as cache
misses are occurring, but they are of limited effectiveness for contextual
analysis: it is not easy to answer *why* a cache miss occurred.

Examples of tools that we have planned include: identifying wasted or
redundant computation, identifying cache fragmentation, and measuring
working sets.  See more details on these below.

===================Approach
===================
We believe that tools with overhead beyond about 5x are simply too
heavyweight to easily apply to large, industrial-sized applications running
real-world workloads.  Our goal is for our tools to gather useful
information with overhead less than 5x, and ideally closer to 3x, to
facilitate deployment.  We would prefer to trade off accuracy and build a
less-accurate tool below our overhead ceiling than to build a high-accuracy
but slow tool.  We hope to hit a sweet spot of tools that gather
trace-based contextual information not feasible with pure sampling yet are
still practical to deploy.

In a similar vein, we would prefer a targeted tool that analyzes one
particular aspect of performance with low overhead than a more general tool
that can answer more questions but has high overhead.

Dynamic binary instrumentation is one option for these types of tools, but
typically compiler-based instrumentation provides better performance, and
we intend to focus only on analyzing applications for which source code is
available.  Studying instruction cache behavior with compiler
instrumentation can be challenging, however, so we plan to at least
initially focus on data performance.

Many of our planned tools target specific performance issues with data
accesses.  They employ the technique of *shadow memory* to store metadata
about application data references, using the compiler to instrument loads
and stores with code to update the shadow memory.  A companion runtime
library intercepts libc calls if necessary to update shadow memory on
non-application data references.  The runtime library also intercepts heap
allocations and other key events in order to perform its analyses.  This is
all very similar to how existing sanitizers such as AddressSanitizer,
ThreadSanitizer, MemorySanitizer, etc. operate today.

===================Example Tools
===================
We have several initial tools that we plan to build.  These are not
necessarily novel ideas on their own: some of these have already been
explored in academia.  The idea is to create practical, low-overhead,
robust, and publicly available versions of these tools.

*Cache fragmentation*: this tool gather data structure field hotness
information, looking for data layout optimization opportunities by grouping
hot fields together to avoid data cache fragmentation.  Future enhancements
may add field affinity information if it can be computed with low enough
overhead.

*Working set measurement*: this tool measures the data working set size of
an application at each snapshot during execution.  It can help to
understand phased behavior as well as providing basic direction for further
effort by the developer: e.g., knowing whether the working set is close to
fitting in current L3 caches or is many times larger can help determine
where to spend effort.

*Dead store detection*: this tool identifies dead stores (write-after-write
patterns with no intervening read) as well as redundant stores (writes of
the same value already in memory).  Xref the Deadspy paper from CGO 2012.

*Single-reference*: this tool identifies data cache lines brought in but
only read once.  These could be candidates for non-temporal loads.

===================EfficiencySanitizer
===================
We are proposing the name EfficiencySanitizer, or "esan" for short, to
refer to this suite of dynamic instrumentation tools for improving program
efficiency.  As we have a number of different tools that share quite a bit
of their implementation we plan to consider them sub-tools under the
EfficiencySanitizer umbrella, rather than adding a whole bunch of separate
instrumentation and runtime library components.

While these tools are not addressing correctness issues like other
sanitizers, they will be sharing a lot of the existing sanitizer runtime
library support code.  Furthermore, users are already familiar with the
sanitizer brand, and it seems better to extend that concept rather than add
some new term.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160417/21831feb/attachment.html>

Craig, Ben via llvm-dev

2016-Apr-18 17:36 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

This sounds interesting.  I've got a couple of questions about the cache 
fragmentation tool and the working set measurement tool.

On 4/17/2016 4:46 PM, Derek Bruening via llvm-dev wrote:> TL;DR: We plan to build a suite of compiler-based dynamic 
> instrumentation tools for analyzing targeted performance problems.  
> These tools will all live under a new "EfficiencySanitizer" (or 
> "esan") sanitizer umbrella, as they will share significant
portions of
> their implementations.
>
> ===================> Motivation
> ===================>
> Our goal is to build a suite of dynamic instrumentation tools for 
> analyzing particular performance problems that are difficult to 
> evaluate using other profiling methods.  Modern hardware performance 
> counters provide insight into where time is spent and when 
> micro-architectural events such as cache misses are occurring, but 
> they are of limited effectiveness for contextual analysis: it is not 
> easy to answer *why* a cache miss occurred.
>
> Examples of tools that we have planned include: identifying wasted or 
> redundant computation, identifying cache fragmentation, and measuring 
> working sets.  See more details on these below.
>
> ===================> Approach
> ===================>
> We believe that tools with overhead beyond about 5x are simply too 
> heavyweight to easily apply to large, industrial-sized applications 
> running real-world workloads. Our goal is for our tools to gather 
> useful information with overhead less than 5x, and ideally closer to 
> 3x, to facilitate deployment.  We would prefer to trade off accuracy 
> and build a less-accurate tool below our overhead ceiling than to 
> build a high-accuracy but slow tool.  We hope to hit a sweet spot of 
> tools that gather trace-based contextual information not feasible with 
> pure sampling yet are still practical to deploy.
>
> In a similar vein, we would prefer a targeted tool that analyzes one 
> particular aspect of performance with low overhead than a more general 
> tool that can answer more questions but has high overhead.
>
> Dynamic binary instrumentation is one option for these types of tools, 
> but typically compiler-based instrumentation provides better 
> performance, and we intend to focus only on analyzing applications for 
> which source code is available. Studying instruction cache behavior 
> with compiler instrumentation can be challenging, however, so we plan 
> to at least initially focus on data performance.
>
> Many of our planned tools target specific performance issues with data 
> accesses.  They employ the technique of *shadow memory* to store 
> metadata about application data references, using the compiler to 
> instrument loads and stores with code to update the shadow memory.  A 
> companion runtime library intercepts libc calls if necessary to update 
> shadow memory on non-application data references.  The runtime library 
> also intercepts heap allocations and other key events in order to 
> perform its analyses.  This is all very similar to how existing 
> sanitizers such as AddressSanitizer, ThreadSanitizer, MemorySanitizer, 
> etc. operate today.
>
> ===================> Example Tools
> ===================>
> We have several initial tools that we plan to build.  These are not 
> necessarily novel ideas on their own: some of these have already been 
> explored in academia.  The idea is to create practical, low-overhead, 
> robust, and publicly available versions of these tools.
>
> *Cache fragmentation*: this tool gather data structure field hotness 
> information, looking for data layout optimization opportunities by 
> grouping hot fields together to avoid data cache fragmentation.  
> Future enhancements may add field affinity information if it can be 
> computed with low enough overhead.I can imagine vaguely imagine how this data would be acquired, but I'm 
more interested in what analysis is provided by the tool, and how this 
information would be presented to a user.  Would it be a flat list of 
classes, sorted by number of accesses, with each field annotated by 
number of accesses?  Or is there some other kind of presentation 
planned?  Maybe some kind of weighting for classes with frequent cache 
misses?
> *Working set measurement*: this tool measures the data working set 
> size of an application at each snapshot during execution.  It can help 
> to understand phased behavior as well as providing basic direction for 
> further effort by the developer: e.g., knowing whether the working set 
> is close to fitting in current L3 caches or is many times larger can 
> help determine where to spend effort.I think my questions here are basically the reverse of my prior 
questions.  I can imagine the presentation ( a graph with time on the X 
axis, working set measurement on the Y axis, with some markers 
highlighting key execution points).  I'm not sure how the data 
collection works though, or even really what is being measured.  Are you 
planning on counting the number of data bytes / data cache lines used 
during each time period?  For the purposes of this tool, when is data 
brought into the working set and when is data evicted from the working set?
>
> *Dead store detection*: this tool identifies dead stores 
> (write-after-write patterns with no intervening read) as well as 
> redundant stores (writes of the same value already in memory).  Xref 
> the Deadspy paper from CGO 2012.
>
> *Single-reference*: this tool identifies data cache lines brought in 
> but only read once.  These could be candidates for non-temporal loads.
>
> ===================> EfficiencySanitizer
> ===================>
> We are proposing the name EfficiencySanitizer, or "esan" for
short, to
> refer to this suite of dynamic instrumentation tools for improving 
> program efficiency.  As we have a number of different tools that share 
> quite a bit of their implementation we plan to consider them sub-tools 
> under the EfficiencySanitizer umbrella, rather than adding a whole 
> bunch of separate instrumentation and runtime library components.
>
> While these tools are not addressing correctness issues like other 
> sanitizers, they will be sharing a lot of the existing sanitizer 
> runtime library support code.  Furthermore, users are already familiar 
> with the sanitizer brand, and it seems better to extend that concept 
> rather than add some new term.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160418/339e79db/attachment.html>

Derek Bruening via llvm-dev

2016-Apr-18 18:02 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

On Mon, Apr 18, 2016 at 1:36 PM, Craig, Ben via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> On 4/17/2016 4:46 PM, Derek Bruening via llvm-dev wrote:
>
> *Cache fragmentation*: this tool gather data structure field hotness
> information, looking for data layout optimization opportunities by grouping
> hot fields together to avoid data cache fragmentation.  Future enhancements
> may add field affinity information if it can be computed with low enough
> overhead.
>
> I can imagine vaguely imagine how this data would be acquired, but I'm
> more interested in what analysis is provided by the tool, and how this
> information would be presented to a user.  Would it be a flat list of
> classes, sorted by number of accesses, with each field annotated by number
> of accesses?  Or is there some other kind of presentation planned?  Maybe
> some kind of weighting for classes with frequent cache misses?
>
The sorting/filtering metric will include disparity between fields: hot
fields interleaved with cold fields are what it's looking for, with a total
access count high enough to matter.  Yes, it would present to the user the
field layout with annotations for access count as you suggest.

> *Working set measurement*: this tool measures the data working set size of
> an application at each snapshot during execution.  It can help to
> understand phased behavior as well as providing basic direction for further
> effort by the developer: e.g., knowing whether the working set is close to
> fitting in current L3 caches or is many times larger can help determine
> where to spend effort.
>
> I think my questions here are basically the reverse of my prior
> questions.  I can imagine the presentation ( a graph with time on the X
> axis, working set measurement on the Y axis, with some markers highlighting
> key execution points).  I'm not sure how the data collection works
though,
> or even really what is being measured.  Are you planning on counting the
> number of data bytes / data cache lines used during each time period?  For
> the purposes of this tool, when is data brought into the working set and
> when is data evicted from the working set?
>
The tool records which data cache lines were touched at least once during a
snapshot (basically just setting a shadow memory bit for each load/store).
The metadata is cleared after each snapshot is recorded so that the next
snapshot starts with a blank slate.  Snapshots can be combined via logical
or as the execution time grows to adaptively handle varying total execution
time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160418/8d9588c0/attachment.html>

Filipe Cabecinhas via llvm-dev

2016-Apr-19 17:18 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

Hi Derek,

Thanks for proposing this. It seems like it might be an interesting
tool for us too. But this proposal seems a bit hand-wavy, and I think
it's missing some crucial info before we start heading this way.


On Sun, Apr 17, 2016 at 10:46 PM, Derek Bruening via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> TL;DR: We plan to build a suite of compiler-based dynamic instrumentation
> tools for analyzing targeted performance problems.  These tools will all
> live under a new "EfficiencySanitizer" (or "esan")
sanitizer umbrella, as
> they will share significant portions of their implementations.
I will bike-shed the name as much as anyone else, but I'll stay away for now
:-)
> ===================> Motivation
> ===================>
> Our goal is to build a suite of dynamic instrumentation tools for analyzing
> particular performance problems that are difficult to evaluate using other
> profiling methods.  Modern hardware performance counters provide insight
> into where time is spent and when micro-architectural events such as cache
> misses are occurring, but they are of limited effectiveness for contextual
> analysis: it is not easy to answer *why* a cache miss occurred.
>
> Examples of tools that we have planned include: identifying wasted or
> redundant computation, identifying cache fragmentation, and measuring
> working sets.  See more details on these below.
>
> ===================> Approach
> ===================>
> We believe that tools with overhead beyond about 5x are simply too
> heavyweight to easily apply to large, industrial-sized applications running
> real-world workloads.  Our goal is for our tools to gather useful
> information with overhead less than 5x, and ideally closer to 3x, to
> facilitate deployment.  We would prefer to trade off accuracy and build a
> less-accurate tool below our overhead ceiling than to build a high-accuracy
> but slow tool.  We hope to hit a sweet spot of tools that gather
trace-based
> contextual information not feasible with pure sampling yet are still
> practical to deploy.
This is also very important for us, and there are sanitizers which are
"harder to sell" because of the overhead.

> In a similar vein, we would prefer a targeted tool that analyzes one
> particular aspect of performance with low overhead than a more general tool
> that can answer more questions but has high overhead.
>
> Dynamic binary instrumentation is one option for these types of tools, but
> typically compiler-based instrumentation provides better performance, and
we
> intend to focus only on analyzing applications for which source code is
> available.  Studying instruction cache behavior with compiler
> instrumentation can be challenging, however, so we plan to at least
> initially focus on data performance.
>
> Many of our planned tools target specific performance issues with data
> accesses.  They employ the technique of *shadow memory* to store metadata
> about application data references, using the compiler to instrument loads
> and stores with code to update the shadow memory.  A companion runtime
> library intercepts libc calls if necessary to update shadow memory on
> non-application data references.  The runtime library also intercepts heap
> allocations and other key events in order to perform its analyses.  This is
> all very similar to how existing sanitizers such as AddressSanitizer,
> ThreadSanitizer, MemorySanitizer, etc. operate today.
>
> ===================> Example Tools
> ===================>
> We have several initial tools that we plan to build.  These are not
> necessarily novel ideas on their own: some of these have already been
> explored in academia.  The idea is to create practical, low-overhead,
> robust, and publicly available versions of these tools.
>
> *Cache fragmentation*: this tool gather data structure field hotness
> information, looking for data layout optimization opportunities by grouping
> hot fields together to avoid data cache fragmentation.  Future enhancements
> may add field affinity information if it can be computed with low enough
> overhead.
>
> *Working set measurement*: this tool measures the data working set size of
> an application at each snapshot during execution.  It can help to
understand
> phased behavior as well as providing basic direction for further effort by
> the developer: e.g., knowing whether the working set is close to fitting in
> current L3 caches or is many times larger can help determine where to spend
> effort.
>
> *Dead store detection*: this tool identifies dead stores (write-after-write
> patterns with no intervening read) as well as redundant stores (writes of
> the same value already in memory).  Xref the Deadspy paper from CGO 2012.
>
> *Single-reference*: this tool identifies data cache lines brought in but
> only read once.  These could be candidates for non-temporal loads.
Do you have any estimates on memory overhead (both memory usage
(+shadow), and code size) you expect? As well as estimated about the
possible slowdown?
At least for the tools you are currently starting to implement, it
would be nice to have some estimates and plans on what is going to
happen.

I would actually like to see a small RFC about each tool and what the
plan (overhead/slowdown, pseudo-code for instrumentation, UX, ...) is
before starting to commit.
I don't expect the plan to be very detailed, nor for everything to be
pinned down, of course. This seems to be a bit at a research stage,
and I'm totally ok with it. But would rather it not be as opaque as it
is now. The way the tools will report problems/data/etc, is important,
for example.

> ===================> EfficiencySanitizer
> ===================>
> We are proposing the name EfficiencySanitizer, or "esan" for
short, to refer
> to this suite of dynamic instrumentation tools for improving program
> efficiency.  As we have a number of different tools that share quite a bit
> of their implementation we plan to consider them sub-tools under the
> EfficiencySanitizer umbrella, rather than adding a whole bunch of separate
> instrumentation and runtime library components.
How much code is expected to be shared? Most? Similar to what the
sanitizers already share?
Do we expect the shadow memory mapping to be (mostly) the same among
all esan's tools?
Do you already have an idea of a few tools + the type of code that
would be shared (for example: "read/write instrumentation is mostly
the same among these tools", or "generating reports for these tools is
mostly the same", or something similar)?
> While these tools are not addressing correctness issues like other
> sanitizers, they will be sharing a lot of the existing sanitizer runtime
> library support code.  Furthermore, users are already familiar with the
> sanitizer brand, and it seems better to extend that concept rather than add
> some new term.
Not the email to bike-shed the name, but I don't like "Efficiency"
that much here :-)


Thank you for working on this,

  Filipe

Derek Bruening via llvm-dev

2016-Apr-19 18:17 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

On Tue, Apr 19, 2016 at 1:18 PM, Filipe Cabecinhas <filcab at gmail.com>
wrote:
> Thanks for proposing this. It seems like it might be an interesting
> tool for us too. But this proposal seems a bit hand-wavy, and I think
> it's missing some crucial info before we start heading this way.
>
> At least for the tools you are currently starting to implement, it
> would be nice to have some estimates and plans on what is going to
> happen.
>
I would actually like to see a small RFC about each tool and what
the> plan (overhead/slowdown, pseudo-code for instrumentation, UX, ...) is
> before starting to commit.
>
I don't expect the plan to be very detailed, nor for everything to
be> pinned down, of course. This seems to be a bit at a research stage,
> and I'm totally ok with it. But would rather it not be as opaque as it
> is now. The way the tools will report problems/data/etc, is important,
> for example.
>
The main response here is that these tools are in an exploratory stage and
that what we are proposing is not a fixed list of tools but a framework
within which we can build prototypes and refine them.  We do not know which
ideas will work out and which ones will be abandoned.  In some cases,
answers will not all be known until a prototype is in place to see how it
behaves on large applications: we might need to add additional shadow
metadata or additional instrumentation to produce the right amount of
actionable output.  What we want to avoid is having to build prototypes
out-of-tree until they are proven and polished and only then trying to
commit large patches upstream.

Do you have any estimates on memory overhead (both memory
usage> (+shadow), and code size) you expect? As well as estimated about the
> possible slowdown?
>
The shadow memory for tools we are considering include 64:1, 4:1, and 1:1.
This is all within the ranges of existing sanitizers.  Each tool will have
inlined instrumentation to update shadow memory on either every memory
access or in some cases just focusing on the heap; I do not have a number
for code size expansion. The slowdown ranges were in the original email:
2x-5x.
> We are proposing the name EfficiencySanitizer, or "esan" for
short, to
> refer
> > to this suite of dynamic instrumentation tools for improving program
> > efficiency.  As we have a number of different tools that share quite a
> bit
> > of their implementation we plan to consider them sub-tools under the
> > EfficiencySanitizer umbrella, rather than adding a whole bunch of
> separate
> > instrumentation and runtime library components.
>
> How much code is expected to be shared? Most? Similar to what the
> sanitizers already share?
>
More than what the existing sanitizers share.

> Do we expect the shadow memory mapping to be (mostly) the same among
> all esan's tools?
>
Most will use a single mapping and single algorithm, but the scaledown will
vary as mentioned.

> Do you already have an idea of a few tools + the type of code that
> would be shared (for example: "read/write instrumentation is mostly
> the same among these tools", or "generating reports for these
tools is
> mostly the same", or something similar)?
>
Instrumentation will be very similar (minor differences in shadow scaling
and shadow metadata format), identical intrinsic handling, identical
slowpath entries, identical libc interception, identical shadow
management.  Reports are not fully explored.
> While these tools are not addressing correctness issues like other
> > sanitizers, they will be sharing a lot of the existing sanitizer
runtime
> > library support code.  Furthermore, users are already familiar with
the
> > sanitizer brand, and it seems better to extend that concept rather
than
> add
> > some new term.
>
> Not the email to bike-shed the name, but I don't like
"Efficiency"
> that much here :-)
>
There were name discussions prior to the RFC and the conclusion was that
there is just not going to be a single first-place choice for everyone.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160419/2035c019/attachment.html>

Chandler Carruth via llvm-dev

2016-Apr-19 23:40 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

On Tue, Apr 19, 2016 at 10:18 AM Filipe Cabecinhas via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Sun, Apr 17, 2016 at 10:46 PM, Derek Bruening via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > TL;DR: We plan to build a suite of compiler-based dynamic
instrumentation
> > tools for analyzing targeted performance problems.  These tools will
all
> > live under a new "EfficiencySanitizer" (or "esan")
sanitizer umbrella, as
> > they will share significant portions of their implementations.
>
> I will bike-shed the name as much as anyone else, but I'll stay away
for
> now :-)
>
>
> > While these tools are not addressing correctness issues like other
> > sanitizers, they will be sharing a lot of the existing sanitizer
runtime
> > library support code.  Furthermore, users are already familiar with
the
> > sanitizer brand, and it seems better to extend that concept rather
than
> add
> > some new term.
>
> Not the email to bike-shed the name, but I don't like
"Efficiency"
> that much here :-)
>
Heh, so I really do like "efficiency", much more than performance or
quickness.

But I really do think that we shouldn't try to bikeshed the name much. IMO,
while it's great to throw out ideas around better names, and certainly to
point out any serious *problems* with a name, but past that I'll be happy
with whatever the folks actually working on this pick.

I think we should all focus on the relevant technical discussion, and float
any name ideas that we have, and let Derek and Qin make the final call on
the name.

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160419/343fc4c1/attachment.html>

Adam Nemet via llvm-dev

2016-Apr-20 06:14 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

Interesting idea!  I understand how the bookkeeping in the tool is similar to
some of the sanitizers but I am wondering whether that is really the best
developer’s work-flow for such a tool.

I could imagine that some of the opportunities discovered by the tool could be
optimized automatically by the compiler (e.g. temporal loads, sw prefetching,
partitioning the heap) so feeding this information back to the compiler could be
highly useful.  I am wondering whether the PGO model is closer to what we want
at the end.  The problem can also be thought of as a natural extension of PGO. 
Besides instrumenting branches and indirect calls, it adds instrumentation for
loads and stores.

We have internally been discussing ways to use PGO for optimization diagnostics
(a continuation of Tyler’s work, see
http://blog.llvm.org/2014/11/loop-vectorization-diagnostics-and.html
<http://blog.llvm.org/2014/11/loop-vectorization-diagnostics-and.html>). 
The idea is to help the developer to focus in on opportunities in hot code. It
seems that the diagnostics provided by your tools could be emitted directly by
the analyses in LLVM during the profile-use phase.

Adam
> On Apr 17, 2016, at 2:46 PM, Derek Bruening via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> TL;DR: We plan to build a suite of compiler-based dynamic instrumentation
tools for analyzing targeted performance problems.  These tools will all live
under a new "EfficiencySanitizer" (or "esan") sanitizer
umbrella, as they will share significant portions of their implementations.
> 
> ===================> Motivation
> ===================> 
> Our goal is to build a suite of dynamic instrumentation tools for analyzing
particular performance problems that are difficult to evaluate using other
profiling methods.  Modern hardware performance counters provide insight into
where time is spent and when micro-architectural events such as cache misses are
occurring, but they are of limited effectiveness for contextual analysis: it is
not easy to answer *why* a cache miss occurred.
> 
> Examples of tools that we have planned include: identifying wasted or
redundant computation, identifying cache fragmentation, and measuring working
sets.  See more details on these below.
> 
> ===================> Approach
> ===================> 
> We believe that tools with overhead beyond about 5x are simply too
heavyweight to easily apply to large, industrial-sized applications running
real-world workloads.  Our goal is for our tools to gather useful information
with overhead less than 5x, and ideally closer to 3x, to facilitate deployment. 
We would prefer to trade off accuracy and build a less-accurate tool below our
overhead ceiling than to build a high-accuracy but slow tool.  We hope to hit a
sweet spot of tools that gather trace-based contextual information not feasible
with pure sampling yet are still practical to deploy.
> 
> In a similar vein, we would prefer a targeted tool that analyzes one
particular aspect of performance with low overhead than a more general tool that
can answer more questions but has high overhead.
> 
> Dynamic binary instrumentation is one option for these types of tools, but
typically compiler-based instrumentation provides better performance, and we
intend to focus only on analyzing applications for which source code is
available.  Studying instruction cache behavior with compiler instrumentation
can be challenging, however, so we plan to at least initially focus on data
performance.
> 
> Many of our planned tools target specific performance issues with data
accesses.  They employ the technique of *shadow memory* to store metadata about
application data references, using the compiler to instrument loads and stores
with code to update the shadow memory.  A companion runtime library intercepts
libc calls if necessary to update shadow memory on non-application data
references.  The runtime library also intercepts heap allocations and other key
events in order to perform its analyses.  This is all very similar to how
existing sanitizers such as AddressSanitizer, ThreadSanitizer, MemorySanitizer,
etc. operate today.
> 
> ===================> Example Tools
> ===================> 
> We have several initial tools that we plan to build.  These are not
necessarily novel ideas on their own: some of these have already been explored
in academia.  The idea is to create practical, low-overhead, robust, and
publicly available versions of these tools.
> 
> *Cache fragmentation*: this tool gather data structure field hotness
information, looking for data layout optimization opportunities by grouping hot
fields together to avoid data cache fragmentation.  Future enhancements may add
field affinity information if it can be computed with low enough overhead.
> 
> *Working set measurement*: this tool measures the data working set size of
an application at each snapshot during execution.  It can help to understand
phased behavior as well as providing basic direction for further effort by the
developer: e.g., knowing whether the working set is close to fitting in current
L3 caches or is many times larger can help determine where to spend effort.
> 
> *Dead store detection*: this tool identifies dead stores (write-after-write
patterns with no intervening read) as well as redundant stores (writes of the
same value already in memory).  Xref the Deadspy paper from CGO 2012.
> 
> *Single-reference*: this tool identifies data cache lines brought in but
only read once.  These could be candidates for non-temporal loads.
> 
> ===================> EfficiencySanitizer
> ===================> 
> We are proposing the name EfficiencySanitizer, or "esan" for
short, to refer to this suite of dynamic instrumentation tools for improving
program efficiency.  As we have a number of different tools that share quite a
bit of their implementation we plan to consider them sub-tools under the
EfficiencySanitizer umbrella, rather than adding a whole bunch of separate
instrumentation and runtime library components.
> 
> While these tools are not addressing correctness issues like other
sanitizers, they will be sharing a lot of the existing sanitizer runtime library
support code.  Furthermore, users are already familiar with the sanitizer brand,
and it seems better to extend that concept rather than add some new term.
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160419/21932e1a/attachment.html>

Sean Silva via llvm-dev

2016-Apr-20 06:19 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

Some of this data might be interesting for profile guidance. Are there any
plans there?

-- Sean Silva

On Sun, Apr 17, 2016 at 2:46 PM, Derek Bruening via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> TL;DR: We plan to build a suite of compiler-based dynamic instrumentation
> tools for analyzing targeted performance problems.  These tools will all
> live under a new "EfficiencySanitizer" (or "esan")
sanitizer umbrella, as
> they will share significant portions of their implementations.
>
> ===================> Motivation
> ===================>
> Our goal is to build a suite of dynamic instrumentation tools for
> analyzing particular performance problems that are difficult to evaluate
> using other profiling methods.  Modern hardware performance counters
> provide insight into where time is spent and when micro-architectural
> events such as cache misses are occurring, but they are of limited
> effectiveness for contextual analysis: it is not easy to answer *why* a
> cache miss occurred.
>
> Examples of tools that we have planned include: identifying wasted or
> redundant computation, identifying cache fragmentation, and measuring
> working sets.  See more details on these below.
>
> ===================> Approach
> ===================>
> We believe that tools with overhead beyond about 5x are simply too
> heavyweight to easily apply to large, industrial-sized applications running
> real-world workloads.  Our goal is for our tools to gather useful
> information with overhead less than 5x, and ideally closer to 3x, to
> facilitate deployment.  We would prefer to trade off accuracy and build a
> less-accurate tool below our overhead ceiling than to build a high-accuracy
> but slow tool.  We hope to hit a sweet spot of tools that gather
> trace-based contextual information not feasible with pure sampling yet are
> still practical to deploy.
>
> In a similar vein, we would prefer a targeted tool that analyzes one
> particular aspect of performance with low overhead than a more general tool
> that can answer more questions but has high overhead.
>
> Dynamic binary instrumentation is one option for these types of tools, but
> typically compiler-based instrumentation provides better performance, and
> we intend to focus only on analyzing applications for which source code is
> available.  Studying instruction cache behavior with compiler
> instrumentation can be challenging, however, so we plan to at least
> initially focus on data performance.
>
> Many of our planned tools target specific performance issues with data
> accesses.  They employ the technique of *shadow memory* to store metadata
> about application data references, using the compiler to instrument loads
> and stores with code to update the shadow memory.  A companion runtime
> library intercepts libc calls if necessary to update shadow memory on
> non-application data references.  The runtime library also intercepts heap
> allocations and other key events in order to perform its analyses.  This is
> all very similar to how existing sanitizers such as AddressSanitizer,
> ThreadSanitizer, MemorySanitizer, etc. operate today.
>
> ===================> Example Tools
> ===================>
> We have several initial tools that we plan to build.  These are not
> necessarily novel ideas on their own: some of these have already been
> explored in academia.  The idea is to create practical, low-overhead,
> robust, and publicly available versions of these tools.
>
> *Cache fragmentation*: this tool gather data structure field hotness
> information, looking for data layout optimization opportunities by grouping
> hot fields together to avoid data cache fragmentation.  Future enhancements
> may add field affinity information if it can be computed with low enough
> overhead.
>
> *Working set measurement*: this tool measures the data working set size of
> an application at each snapshot during execution.  It can help to
> understand phased behavior as well as providing basic direction for further
> effort by the developer: e.g., knowing whether the working set is close to
> fitting in current L3 caches or is many times larger can help determine
> where to spend effort.
>
> *Dead store detection*: this tool identifies dead stores
> (write-after-write patterns with no intervening read) as well as redundant
> stores (writes of the same value already in memory).  Xref the Deadspy
> paper from CGO 2012.
>
> *Single-reference*: this tool identifies data cache lines brought in but
> only read once.  These could be candidates for non-temporal loads.
>
> ===================> EfficiencySanitizer
> ===================>
> We are proposing the name EfficiencySanitizer, or "esan" for
short, to
> refer to this suite of dynamic instrumentation tools for improving program
> efficiency.  As we have a number of different tools that share quite a bit
> of their implementation we plan to consider them sub-tools under the
> EfficiencySanitizer umbrella, rather than adding a whole bunch of separate
> instrumentation and runtime library components.
>
> While these tools are not addressing correctness issues like other
> sanitizers, they will be sharing a lot of the existing sanitizer runtime
> library support code.  Furthermore, users are already familiar with the
> sanitizer brand, and it seems better to extend that concept rather than add
> some new term.
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160419/94cb8371/attachment.html>

Renato Golin via llvm-dev

2016-Apr-20 11:58 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

Hi Derek,

I'm not an expert in any of these topics, but I'm excited that you
guys are doing it. It seems like a missing piece that needs to be
filled.

Some comments inline...


On 17 April 2016 at 22:46, Derek Bruening via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> We would prefer to trade off accuracy and build a
> less-accurate tool below our overhead ceiling than to build a high-accuracy
> but slow tool.
I agree with this strategy.

As a first approach, making the fastest you can, then later
introducing more probes, maybe via some slider flag (like -ON) to
consciously trade speed for accuracy.

> Studying instruction cache behavior with compiler
> instrumentation can be challenging, however, so we plan to at least
> initially focus on data performance.
I'm interested in how you're going to do this without kernel profiling
probes, like perf.

Or is the point here introducing syscalls in the right places instead
of randomly profiled? Wouldn't that bias your results?

> Many of our planned tools target specific performance issues with data
> accesses.  They employ the technique of *shadow memory* to store metadata
> about application data references, using the compiler to instrument loads
> and stores with code to update the shadow memory.
Is it just counting the number of reads/writes? Or are you going to
add how many of those accesses were hit by a cache miss?

> *Cache fragmentation*: this tool gather data structure field hotness
> information, looking for data layout optimization opportunities by grouping
> hot fields together to avoid data cache fragmentation.  Future enhancements
> may add field affinity information if it can be computed with low enough
> overhead.
Would be also good to have temporal information, so that you can
correlate data access that occurs, for example, inside the same loop /
basic block, or in sequence in the common CFG flow. This could lead to
change in allocation patterns (heap, BSS).

> *Working set measurement*: this tool measures the data working set size of
> an application at each snapshot during execution.  It can help to
understand
> phased behavior as well as providing basic direction for further effort by
> the developer: e.g., knowing whether the working set is close to fitting in
> current L3 caches or is many times larger can help determine where to spend
> effort.
This is interesting, but most useful when your dataset changes size
over different runs. This is similar to running the program under perf
for different workloads, and I'm not sure how you're going to get that
in a single run. It also comes with the additional problem that cache
sizes are not always advertised, so you might have an additional tool
to guess the sizes based on increasing the size of data blocks and
finding steps on the data access graph.

> *Dead store detection*: this tool identifies dead stores (write-after-write
> patterns with no intervening read) as well as redundant stores (writes of
> the same value already in memory).  Xref the Deadspy paper from CGO 2012.
This should probably be spotted by the compiler, so I guess it's a
tool for compiler developers to spot missed optimisation opportunities
in the back-end.

> *Single-reference*: this tool identifies data cache lines brought in but
> only read once.  These could be candidates for non-temporal loads.
That's nice and should be simple enough to get a report in the end.
This also seem to be a hint to compiler developers rather than users.

I think you guys have a nice set of tools to develop and I'm looking
forward to working with them.

cheers,
--renato

Yury Gribov via llvm-dev

2016-Apr-20 12:18 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

On 04/20/2016 02:58 PM, Renato Golin via llvm-dev wrote:> Hi Derek,
>
> I'm not an expert in any of these topics, but I'm excited that you
> guys are doing it. It seems like a missing piece that needs to be
> filled.
>
> Some comments inline...
>
>
> On 17 April 2016 at 22:46, Derek Bruening via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> We would prefer to trade off accuracy and build a
>> less-accurate tool below our overhead ceiling than to build a
high-accuracy
>> but slow tool.
>
> I agree with this strategy.
>
> As a first approach, making the fastest you can, then later
> introducing more probes, maybe via some slider flag (like -ON) to
> consciously trade speed for accuracy.
>
>
>> Studying instruction cache behavior with compiler
>> instrumentation can be challenging, however, so we plan to at least
>> initially focus on data performance.
>
> I'm interested in how you're going to do this without kernel
profiling
> probes, like perf.
>
> Or is the point here introducing syscalls in the right places instead
> of randomly profiled? Wouldn't that bias your results?
>
>
>> Many of our planned tools target specific performance issues with data
>> accesses.  They employ the technique of *shadow memory* to store
metadata
>> about application data references, using the compiler to instrument
loads
>> and stores with code to update the shadow memory.
>
> Is it just counting the number of reads/writes? Or are you going to
> add how many of those accesses were hit by a cache miss?
>
>
>> *Cache fragmentation*: this tool gather data structure field hotness
>> information, looking for data layout optimization opportunities by
grouping
>> hot fields together to avoid data cache fragmentation.  Future
enhancements
>> may add field affinity information if it can be computed with low
enough
>> overhead.
>
> Would be also good to have temporal information, so that you can
> correlate data access that occurs, for example, inside the same loop /
> basic block, or in sequence in the common CFG flow. This could lead to
> change in allocation patterns (heap, BSS).
>
>
>> *Working set measurement*: this tool measures the data working set size
of
>> an application at each snapshot during execution.  It can help to
understand
>> phased behavior as well as providing basic direction for further effort
by
>> the developer: e.g., knowing whether the working set is close to
fitting in
>> current L3 caches or is many times larger can help determine where to
spend
>> effort.
>
> This is interesting, but most useful when your dataset changes size
> over different runs. This is similar to running the program under perf
> for different workloads, and I'm not sure how you're going to get
that
> in a single run. It also comes with the additional problem that cache
> sizes are not always advertised, so you might have an additional tool
> to guess the sizes based on increasing the size of data blocks and
> finding steps on the data access graph.
>
>
>> *Dead store detection*: this tool identifies dead stores
(write-after-write
>> patterns with no intervening read) as well as redundant stores (writes
of
>> the same value already in memory).  Xref the Deadspy paper from CGO
2012.
>
> This should probably be spotted by the compiler, so I guess it's a
> tool for compiler developers to spot missed optimisation opportunities
> in the back-end.
Not when dead store happens in an external DSO where compiler can't 
detect it (same applies for single references).
>> *Single-reference*: this tool identifies data cache lines brought in
but
>> only read once.  These could be candidates for non-temporal loads.
>
> That's nice and should be simple enough to get a report in the end.
> This also seem to be a hint to compiler developers rather than users.
>
> I think you guys have a nice set of tools to develop and I'm looking
> forward to working with them.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Derek Bruening via llvm-dev

2016-Apr-20 15:46 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

On Wed, Apr 20, 2016 at 2:14 AM, Adam Nemet <anemet at apple.com> wrote:
> Interesting idea!  I understand how the bookkeeping in the tool is similar
> to some of the sanitizers but I am wondering whether that is really the
> best developer’s work-flow for such a tool.
>
> I could imagine that some of the opportunities discovered by the tool
> could be optimized automatically by the compiler (e.g. temporal loads, sw
> prefetching, partitioning the heap) so feeding this information back to the
> compiler could be highly useful.  I am wondering whether the PGO model is
> closer to what we want at the end.  The problem can also be thought of as a
> natural extension of PGO.  Besides instrumenting branches and indirect
> calls, it adds instrumentation for loads and stores.
>
It would be great to automatically apply the results of the tools, but we
do not think that this is straightforward for enough cases up front.  For
the cache fragmentation tool, automatically applying data structure field
reordering (or splitting or peeling) generally requires whole-program
compilation, which is not always available and currently does not scale up
to the size of applications we would like to target.  The working set tool
is not a candidate for automated action.  Acting on dead stores typically
requires programmer analysis to confirm that there is not some non-executed
path where the store is not actually dead.

We would like to start with a standalone sanitizer usage model providing
developer feedback.  How about if we start with that and in the future we
can revisit whether some subset of the results can be acted on in an
automated fashion?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160420/57e8ec56/attachment.html>

Xinliang David Li via llvm-dev

2016-Apr-20 17:42 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

On Tue, Apr 19, 2016 at 11:19 PM, Sean Silva via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Some of this data might be interesting for profile guidance. Are there any
> plans there?
>
>Esan instrumentation is geared toward application level tuning by
developers -- the data collected here are not quite 'actionable' by the
compiler directly. For instance, struct field reordering needs whole
program analysis and can be very tricky to do for C++ code with complex
inheritances (e.g, best base  class field order may depending on the
context of the inheritance). Fancy struct layout changes such as peeling,
splitting/outlining, field inlining etc also requires very good address
escape analysis etc.   Dead store detection can be used indirectly by the
compiler -- compiler certainly can not use the information to prove
statically the stores are dead, but the compiler developer can use this
tool to find the cases and figure out missing optimizations in the
compiler.   Data working set size data is also not quite usable by the
compiler.

Esan's design tradeoffs are also not the same as PGO. The former allows
more overhead and is less restricted -- it can go deeper.

David


> -- Sean Silva
>
> On Sun, Apr 17, 2016 at 2:46 PM, Derek Bruening via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> TL;DR: We plan to build a suite of compiler-based dynamic
instrumentation
>> tools for analyzing targeted performance problems.  These tools will
all
>> live under a new "EfficiencySanitizer" (or "esan")
sanitizer umbrella, as
>> they will share significant portions of their implementations.
>>
>> ===================>> Motivation
>> ===================>>
>> Our goal is to build a suite of dynamic instrumentation tools for
>> analyzing particular performance problems that are difficult to
evaluate
>> using other profiling methods.  Modern hardware performance counters
>> provide insight into where time is spent and when micro-architectural
>> events such as cache misses are occurring, but they are of limited
>> effectiveness for contextual analysis: it is not easy to answer *why* a
>> cache miss occurred.
>>
>> Examples of tools that we have planned include: identifying wasted or
>> redundant computation, identifying cache fragmentation, and measuring
>> working sets.  See more details on these below.
>>
>> ===================>> Approach
>> ===================>>
>> We believe that tools with overhead beyond about 5x are simply too
>> heavyweight to easily apply to large, industrial-sized applications
running
>> real-world workloads.  Our goal is for our tools to gather useful
>> information with overhead less than 5x, and ideally closer to 3x, to
>> facilitate deployment.  We would prefer to trade off accuracy and build
a
>> less-accurate tool below our overhead ceiling than to build a
high-accuracy
>> but slow tool.  We hope to hit a sweet spot of tools that gather
>> trace-based contextual information not feasible with pure sampling yet
are
>> still practical to deploy.
>>
>> In a similar vein, we would prefer a targeted tool that analyzes one
>> particular aspect of performance with low overhead than a more general
tool
>> that can answer more questions but has high overhead.
>>
>> Dynamic binary instrumentation is one option for these types of tools,
>> but typically compiler-based instrumentation provides better
performance,
>> and we intend to focus only on analyzing applications for which source
code
>> is available.  Studying instruction cache behavior with compiler
>> instrumentation can be challenging, however, so we plan to at least
>> initially focus on data performance.
>>
>> Many of our planned tools target specific performance issues with data
>> accesses.  They employ the technique of *shadow memory* to store
metadata
>> about application data references, using the compiler to instrument
loads
>> and stores with code to update the shadow memory.  A companion runtime
>> library intercepts libc calls if necessary to update shadow memory on
>> non-application data references.  The runtime library also intercepts
heap
>> allocations and other key events in order to perform its analyses. 
This is
>> all very similar to how existing sanitizers such as AddressSanitizer,
>> ThreadSanitizer, MemorySanitizer, etc. operate today.
>>
>> ===================>> Example Tools
>> ===================>>
>> We have several initial tools that we plan to build.  These are not
>> necessarily novel ideas on their own: some of these have already been
>> explored in academia.  The idea is to create practical, low-overhead,
>> robust, and publicly available versions of these tools.
>>
>> *Cache fragmentation*: this tool gather data structure field hotness
>> information, looking for data layout optimization opportunities by
grouping
>> hot fields together to avoid data cache fragmentation.  Future
enhancements
>> may add field affinity information if it can be computed with low
enough
>> overhead.
>>
>> *Working set measurement*: this tool measures the data working set size
>> of an application at each snapshot during execution.  It can help to
>> understand phased behavior as well as providing basic direction for
further
>> effort by the developer: e.g., knowing whether the working set is close
to
>> fitting in current L3 caches or is many times larger can help determine
>> where to spend effort.
>>
>> *Dead store detection*: this tool identifies dead stores
>> (write-after-write patterns with no intervening read) as well as
redundant
>> stores (writes of the same value already in memory).  Xref the Deadspy
>> paper from CGO 2012.
>>
>> *Single-reference*: this tool identifies data cache lines brought in
but
>> only read once.  These could be candidates for non-temporal loads.
>>
>> ===================>> EfficiencySanitizer
>> ===================>>
>> We are proposing the name EfficiencySanitizer, or "esan" for
short, to
>> refer to this suite of dynamic instrumentation tools for improving
program
>> efficiency.  As we have a number of different tools that share quite a
bit
>> of their implementation we plan to consider them sub-tools under the
>> EfficiencySanitizer umbrella, rather than adding a whole bunch of
separate
>> instrumentation and runtime library components.
>>
>> While these tools are not addressing correctness issues like other
>> sanitizers, they will be sharing a lot of the existing sanitizer
runtime
>> library support code.  Furthermore, users are already familiar with the
>> sanitizer brand, and it seems better to extend that concept rather than
add
>> some new term.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160420/ca5b30bd/attachment.html>

Xinliang David Li via llvm-dev

2016-Apr-20 17:57 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

On Tue, Apr 19, 2016 at 11:14 PM, Adam Nemet via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Interesting idea!  I understand how the bookkeeping in the tool is similar
> to some of the sanitizers but I am wondering whether that is really the
> best developer’s work-flow for such a tool.
>
> I could imagine that some of the opportunities discovered by the tool
> could be optimized automatically by the compiler (e.g. temporal loads, sw
> prefetching, partitioning the heap) so feeding this information back to the
> compiler could be highly useful.  I am wondering whether the PGO model is
> closer to what we want at the end.  The problem can also be thought of as a
> natural extension of PGO.  Besides instrumenting branches and indirect
> calls, it adds instrumentation for loads and stores.
>
> We have internally been discussing ways to use PGO for optimization
> diagnostics (a continuation of Tyler’s work, see
> http://blog.llvm.org/2014/11/loop-vectorization-diagnostics-and.html).
> The idea is to help the developer to focus in on opportunities in hot code.
> It seems that the diagnostics provided by your tools could be emitted
> directly by the analyses in LLVM during the profile-use phase.
>
You are right that some of the information are already available with PGO +
static analysis -- one example is field affinity data.  Instruction working
set data is also 'roughly' available.

However I think Esan can potentially do a better job, easier to use and be
a centralized place for getting such information.

David

>
> Adam
>
> On Apr 17, 2016, at 2:46 PM, Derek Bruening via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> TL;DR: We plan to build a suite of compiler-based dynamic instrumentation
> tools for analyzing targeted performance problems.  These tools will all
> live under a new "EfficiencySanitizer" (or "esan")
sanitizer umbrella, as
> they will share significant portions of their implementations.
>
> ===================> Motivation
> ===================>
> Our goal is to build a suite of dynamic instrumentation tools for
> analyzing particular performance problems that are difficult to evaluate
> using other profiling methods.  Modern hardware performance counters
> provide insight into where time is spent and when micro-architectural
> events such as cache misses are occurring, but they are of limited
> effectiveness for contextual analysis: it is not easy to answer *why* a
> cache miss occurred.
>
> Examples of tools that we have planned include: identifying wasted or
> redundant computation, identifying cache fragmentation, and measuring
> working sets.  See more details on these below.
>
> ===================> Approach
> ===================>
> We believe that tools with overhead beyond about 5x are simply too
> heavyweight to easily apply to large, industrial-sized applications running
> real-world workloads.  Our goal is for our tools to gather useful
> information with overhead less than 5x, and ideally closer to 3x, to
> facilitate deployment.  We would prefer to trade off accuracy and build a
> less-accurate tool below our overhead ceiling than to build a high-accuracy
> but slow tool.  We hope to hit a sweet spot of tools that gather
> trace-based contextual information not feasible with pure sampling yet are
> still practical to deploy.
>
> In a similar vein, we would prefer a targeted tool that analyzes one
> particular aspect of performance with low overhead than a more general tool
> that can answer more questions but has high overhead.
>
> Dynamic binary instrumentation is one option for these types of tools, but
> typically compiler-based instrumentation provides better performance, and
> we intend to focus only on analyzing applications for which source code is
> available.  Studying instruction cache behavior with compiler
> instrumentation can be challenging, however, so we plan to at least
> initially focus on data performance.
>
> Many of our planned tools target specific performance issues with data
> accesses.  They employ the technique of *shadow memory* to store metadata
> about application data references, using the compiler to instrument loads
> and stores with code to update the shadow memory.  A companion runtime
> library intercepts libc calls if necessary to update shadow memory on
> non-application data references.  The runtime library also intercepts heap
> allocations and other key events in order to perform its analyses.  This is
> all very similar to how existing sanitizers such as AddressSanitizer,
> ThreadSanitizer, MemorySanitizer, etc. operate today.
>
> ===================> Example Tools
> ===================>
> We have several initial tools that we plan to build.  These are not
> necessarily novel ideas on their own: some of these have already been
> explored in academia.  The idea is to create practical, low-overhead,
> robust, and publicly available versions of these tools.
>
> *Cache fragmentation*: this tool gather data structure field hotness
> information, looking for data layout optimization opportunities by grouping
> hot fields together to avoid data cache fragmentation.  Future enhancements
> may add field affinity information if it can be computed with low enough
> overhead.
>
> *Working set measurement*: this tool measures the data working set size of
> an application at each snapshot during execution.  It can help to
> understand phased behavior as well as providing basic direction for further
> effort by the developer: e.g., knowing whether the working set is close to
> fitting in current L3 caches or is many times larger can help determine
> where to spend effort.
>
> *Dead store detection*: this tool identifies dead stores
> (write-after-write patterns with no intervening read) as well as redundant
> stores (writes of the same value already in memory).  Xref the Deadspy
> paper from CGO 2012.
>
> *Single-reference*: this tool identifies data cache lines brought in but
> only read once.  These could be candidates for non-temporal loads.
>
> ===================> EfficiencySanitizer
> ===================>
> We are proposing the name EfficiencySanitizer, or "esan" for
short, to
> refer to this suite of dynamic instrumentation tools for improving program
> efficiency.  As we have a number of different tools that share quite a bit
> of their implementation we plan to consider them sub-tools under the
> EfficiencySanitizer umbrella, rather than adding a whole bunch of separate
> instrumentation and runtime library components.
>
> While these tools are not addressing correctness issues like other
> sanitizers, they will be sharing a lot of the existing sanitizer runtime
> library support code.  Furthermore, users are already familiar with the
> sanitizer brand, and it seems better to extend that concept rather than add
> some new term.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160420/e963d85b/attachment.html>

Derek Bruening via llvm-dev

2016-Apr-20 23:50 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

On Wed, Apr 20, 2016 at 7:58 AM, Renato Golin <renato.golin at linaro.org>
wrote:
> On 17 April 2016 at 22:46, Derek Bruening via llvm-dev
> > Studying instruction cache behavior with compiler
> > instrumentation can be challenging, however, so we plan to at least
> > initially focus on data performance.
>
> I'm interested in how you're going to do this without kernel
profiling
> probes, like perf.
>
> Or is the point here introducing syscalls in the right places instead
> of randomly profiled? Wouldn't that bias your results?
>
I'm not sure I understand the question: are you asking whether not
gathering data on time spent in the kernel is an issue?  Or you're asking
how to measure aspects of performance without using sampling or hardware
performance counters?
> Many of our planned tools target specific performance issues with data
> > accesses.  They employ the technique of *shadow memory* to store
metadata
> > about application data references, using the compiler to instrument
loads
> > and stores with code to update the shadow memory.
>
> Is it just counting the number of reads/writes? Or are you going to
> add how many of those accesses were hit by a cache miss?
>
It varies by tool.  The brief descriptions in the original email hopefully
shed some light; we are also sending separate RFC's for each tool (working
set was already sent).  The cache frag tool is basically just counting,
yes.  There is no cache miss information here: we are not using hardware
perf counters nor running a software cache simulation.  We are measuring
particular aspects of application behavior that tend to affect performance,
often abstracted away from the precise microarchitecture you're running on.
> *Cache fragmentation*: this tool gather data structure field hotness
> > information, looking for data layout optimization opportunities by
> grouping
> > hot fields together to avoid data cache fragmentation.  Future
> enhancements
> > may add field affinity information if it can be computed with low
enough
> > overhead.
>
> Would be also good to have temporal information, so that you can
> correlate data access that occurs, for example, inside the same loop /
> basic block, or in sequence in the common CFG flow. This could lead to
> change in allocation patterns (heap, BSS).
>
Agreed, we have thought about adding temporal information, though it would
cost more and we have not flushed out the details.

> > *Working set measurement*: this tool measures the data working set
size
> of
> > an application at each snapshot during execution.  It can help to
> understand
> > phased behavior as well as providing basic direction for further
effort
> by
> > the developer: e.g., knowing whether the working set is close to
fitting
> in
> > current L3 caches or is many times larger can help determine where to
> spend
> > effort.
>
> This is interesting, but most useful when your dataset changes size
> over different runs. This is similar to running the program under perf
> for different workloads, and I'm not sure how you're going to get
that
> in a single run. It also comes with the additional problem that cache
> sizes are not always advertised, so you might have an additional tool
> to guess the sizes based on increasing the size of data blocks and
> finding steps on the data access graph.
>
This tool is relatively agnostic of the precise details of the caches
beyond having its granularity based on the cache line size it assumes (64
bytes, can be parametrized).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160420/94074bda/attachment.html>

Hal Finkel via llvm-dev

2016-Apr-21 02:28 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

----- Original Message -----> From: "Derek Bruening via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: llvm-dev at lists.llvm.org
> Cc: efficiency-sanitizer at google.com
> Sent: Sunday, April 17, 2016 4:46:29 PM
> Subject: [llvm-dev] RFC: EfficiencySanitizer
> 
> 
> 
> 
> TL;DR: We plan to build a suite of compiler-based dynamic
> instrumentation tools for analyzing targeted performance problems.
> These tools will all live under a new "EfficiencySanitizer" (or
> "esan") sanitizer umbrella, as they will share significant
portions
> of their implementations.
> 
> 
> ===================> Motivation
> ===================> 
> 
> Our goal is to build a suite of dynamic instrumentation tools for
> analyzing particular performance problems that are difficult to
> evaluate using other profiling methods. Modern hardware performance
> counters provide insight into where time is spent and when
> micro-architectural events such as cache misses are occurring, but
> they are of limited effectiveness for contextual analysis: it is not
> easy to answer *why* a cache miss occurred.
> 
> 
> Examples of tools that we have planned include: identifying wasted or
> redundant computation, identifying cache fragmentation, and
> measuring working sets. See more details on these below.
> 
> 
> ===================> Approach
> ===================> 
> 
> We believe that tools with overhead beyond about 5x are simply too
> heavyweight to easily apply to large, industrial-sized applications
> running real-world workloads. Our goal is for our tools to gather
> useful information with overhead less than 5x, and ideally closer to
> 3x, to facilitate deployment. We would prefer to trade off accuracy
> and build a less-accurate tool below our overhead ceiling than to
> build a high-accuracy but slow tool. We hope to hit a sweet spot of
> tools that gather trace-based contextual information not feasible
> with pure sampling yet are still practical to deploy.
> 
> 
> In a similar vein, we would prefer a targeted tool that analyzes one
> particular aspect of performance with low overhead than a more
> general tool that can answer more questions but has high overhead.
> 
> 
> Dynamic binary instrumentation is one option for these types of
> tools, but typically compiler-based instrumentation provides better
> performance, and we intend to focus only on analyzing applications
> for which source code is available. Studying instruction cache
> behavior with compiler instrumentation can be challenging, however,
> so we plan to at least initially focus on data performance.
> 
> 
> Many of our planned tools target specific performance issues with
> data accesses. They employ the technique of *shadow memory* to store
> metadata about application data references, using the compiler to
> instrument loads and stores with code to update the shadow memory. A
> companion runtime library intercepts libc calls if necessary to
> update shadow memory on non-application data references. The runtime
> library also intercepts heap allocations and other key events in
> order to perform its analyses. This is all very similar to how
> existing sanitizers such as AddressSanitizer, ThreadSanitizer,
> MemorySanitizer, etc. operate today.
> 
> 
> ===================> Example Tools
> ===================> 
> 
> We have several initial tools that we plan to build. These are not
> necessarily novel ideas on their own: some of these have already
> been explored in academia. The idea is to create practical,
> low-overhead, robust, and publicly available versions of these
> tools.
> 
> 
> *Cache fragmentation*: this tool gather data structure field hotness
> information, looking for data layout optimization opportunities by
> grouping hot fields together to avoid data cache fragmentation.
> Future enhancements may add field affinity information if it can be
> computed with low enough overhead.
> 
> 
> *Working set measurement*: this tool measures the data working set
> size of an application at each snapshot during execution. It can
> help to understand phased behavior as well as providing basic
> direction for further effort by the developer: e.g., knowing whether
> the working set is close to fitting in current L3 caches or is many
> times larger can help determine where to spend effort.
Will this technology allow us to pinpoint specific accesses that generally have
high latency (i.e. generally are cache misses)? This information is useful for
programmers, and is also useful as an input to loop unrolling, instruction
scheduling, and the like on ooo cores.

 -Hal
> 
> *Dead store detection*: this tool identifies dead stores
> (write-after-write patterns with no intervening read) as well as
> redundant stores (writes of the same value already in memory). Xref
> the Deadspy paper from CGO 2012.
> 
> 
> *Single-reference*: this tool identifies data cache lines brought in
> but only read once. These could be candidates for non-temporal
> loads.
> 
> 
> ===================> EfficiencySanitizer
> ===================> 
> 
> We are proposing the name EfficiencySanitizer, or "esan" for
short,
> to refer to this suite of dynamic instrumentation tools for
> improving program efficiency. As we have a number of different tools
> that share quite a bit of their implementation we plan to consider
> them sub-tools under the EfficiencySanitizer umbrella, rather than
> adding a whole bunch of separate instrumentation and runtime library
> components.
> 
> 
> While these tools are not addressing correctness issues like other
> sanitizers, they will be sharing a lot of the existing sanitizer
> runtime library support code. Furthermore, users are already
> familiar with the sanitizer brand, and it seems better to extend
> that concept rather than add some new term.
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Qin Zhao via llvm-dev

2016-Apr-21 17:48 UTC

head link

[llvm-dev] RFC: EfficiencySanitizer

>
>
> Will this technology allow us to pinpoint specific accesses that generally
> have high latency (i.e. generally are cache misses)? This information is
> useful for programmers, and is also useful as an input to loop unrolling,
> instruction scheduling, and the like on ooo cores.
>
Won't hardware performance counter tell you which accesses are delinquent
accesses?
The tools here are trying to provide more information and hopefully some
insight of the performance problem.
For example, if we can tell that the working set of an application is
slightly bigger than the L3 cache, developers would be able to take the
right action to improve the performance.

We are welcome any suggestions about how the information can be used, or
what information should be collected.

Qin

>
>  -Hal
>
> >
> > *Dead store detection*: this tool identifies dead stores
> > (write-after-write patterns with no intervening read) as well as
> > redundant stores (writes of the same value already in memory). Xref
> > the Deadspy paper from CGO 2012.
> >
> >
> > *Single-reference*: this tool identifies data cache lines brought in
> > but only read once. These could be candidates for non-temporal
> > loads.
> >
> >
> > ===================> > EfficiencySanitizer
> > ===================> >
> >
> > We are proposing the name EfficiencySanitizer, or "esan" for
short,
> > to refer to this suite of dynamic instrumentation tools for
> > improving program efficiency. As we have a number of different tools
> > that share quite a bit of their implementation we plan to consider
> > them sub-tools under the EfficiencySanitizer umbrella, rather than
> > adding a whole bunch of separate instrumentation and runtime library
> > components.
> >
> >
> > While these tools are not addressing correctness issues like other
> > sanitizers, they will be sharing a lot of the existing sanitizer
> > runtime library support code. Furthermore, users are already
> > familiar with the sanitizer brand, and it seems better to extend
> > that concept rather than add some new term.
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
> --
> You received this message because you are subscribed to the Google Groups
> "efficiency-sanitizer" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to efficiency-sanitizer+unsubscribe at google.com.
> To post to this group, send email to efficiency-sanitizer at google.com.
> To view this discussion on the web visit
>
https://groups.google.com/a/google.com/d/msgid/efficiency-sanitizer/19928798.1837.1461205693345.JavaMail.hfinkel%40sapling5.localdomain
> .
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160421/55f919dc/attachment.html>

llvm dev - Apr 2016 - RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer

[llvm-dev] RFC: EfficiencySanitizer