thr3ads.net - llvm dev - [llvm-dev] RFC: Comprehensive Static Instrumentation [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Craig, Ben via llvm-dev

2016-Jun-17 13:29 UTC

[llvm-dev] RFC: Comprehensive Static Instrumentation

On 6/16/2016 2:48 PM, Mehdi Amini via llvm-dev wrote:>
>> On Jun 16, 2016, at 9:01 AM, TB Schardl via llvm-dev 
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>> The CSI framework inserts instrumentation hooks at salient locations 
>> throughout the compiled code of a program-under-test, such as 
>> function entry and exit points, basic-block entry and exit points, 
>> before and after each memory operation, etc.  Tool writers can 
>> instrument a program-under-test simply by first writing a library 
>> that defines the semantics of relevant hooks
>> and then statically linking their compiled library with the 
>> program-under-test.
>>
>> At first glance, this brute-force method of inserting hooks at every 
>> salient location in the program-under-test seems to be replete with 
>> overheads.  CSI overcomes these overheads through the use of 
>> link-time-optimization (LTO), which is now readily available in most 
>> major compilers, including GCC and LLVM.  Using LTO, instrumentation 
>> hooks that are not used by a particular tool can be elided, allowing 
>> the overheads of these hooks to be avoided when the
>
> I don't understand this flow: the front-end emits all the possible 
> instrumentation but the useless calls to the runtime will be removed 
> during the link?
> It means that the final binary is specialized for a given tool right? 
> What is the advantage of generating this useless instrumentation in 
> the first place then? I'm missing a piece here...
>Suppose I want to build a production build, and one build for each of 
ASAN, MSAN, UBSAN, and TSAN.

With the current approach, I need to compile my source five different 
times, and link five different times.

With the CSI approach (assuming it was the backing technology behind the 
sanitizers), I need to compile twice (once for production, once for 
instrumentation), then LTO-link five times.  I can reuse my .o files 
across the sanitizer types.

It's possible that the math doesn't really work out in practice if the 
cost of the LTO-link dwarfs the compile times.

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160617/dc484003/attachment.html>

Qin Zhao via llvm-dev

2016-Jun-17 15:17 UTC

head link

[llvm-dev] RFC: Comprehensive Static Instrumentation

On Fri, Jun 17, 2016 at 9:29 AM, Craig, Ben via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 6/16/2016 2:48 PM, Mehdi Amini via llvm-dev wrote:
>
>
> On Jun 16, 2016, at 9:01 AM, TB Schardl via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> The CSI framework inserts instrumentation hooks at salient locations
> throughout the compiled code of a program-under-test, such as function
> entry and exit points, basic-block entry and exit points, before and after
> each memory operation, etc.  Tool writers can instrument a
> program-under-test simply by first writing a library that defines the
> semantics of relevant hooks
> and then statically linking their compiled library with the
> program-under-test.
>
> At first glance, this brute-force method of inserting hooks at every
> salient location in the program-under-test seems to be replete with
> overheads.  CSI overcomes these overheads through the use of
> link-time-optimization (LTO), which is now readily available in most major
> compilers, including GCC and LLVM.  Using LTO, instrumentation hooks that
> are not used by a particular tool can be elided, allowing the overheads of
> these hooks to be avoided when the
>
>
> I don't understand this flow: the front-end emits all the possible
> instrumentation but the useless calls to the runtime will be removed during
> the link?
> It means that the final binary is specialized for a given tool right? What
> is the advantage of generating this useless instrumentation in the first
> place then? I'm missing a piece here...
>
> Suppose I want to build a production build, and one build for each of
> ASAN, MSAN, UBSAN, and TSAN.
>
> With the current approach, I need to compile my source five different
> times, and link five different times.
>
> With the CSI approach (assuming it was the backing technology behind the
> sanitizers), I need to compile twice (once for production, once for
> instrumentation), then LTO-link five times.  I can reuse my .o files across
> the sanitizer types.
>
> It's possible that the math doesn't really work out in practice if
the
> cost of the LTO-link dwarfs the compile times.
>
Other than the build time, we should also consider the performance of the
produced binary, which might be more important.
I have hard time to believe that the LTO-link optimized (CSI version 1)
binary could beat the original ASan IR instrumentation based binary.
With IR instrumentation, the binary could benefit both from problem
specific domain knowledge and comprehensive compiler optimizations:
e.g., inline small code without context switch, skip redundant load/store
instrumentation, and  more aggressive optimization since the compiler sees
everything.
I am not sure if CSI could do any of them.
IMHO, CSI might be good for fast prototype research work and may fall short
when we are really serious about the performance.

>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160617/215cd041/attachment.html>

TB Schardl via llvm-dev

2016-Jun-17 18:27 UTC

head link

[llvm-dev] RFC: Comprehensive Static Instrumentation

Hey Ben,

Thank you for your comments.  I've put my response inline.

Cheers,
TB

On Fri, Jun 17, 2016 at 6:29 AM, Craig, Ben via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 6/16/2016 2:48 PM, Mehdi Amini via llvm-dev wrote:
>
>
> On Jun 16, 2016, at 9:01 AM, TB Schardl via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> The CSI framework inserts instrumentation hooks at salient locations
> throughout the compiled code of a program-under-test, such as function
> entry and exit points, basic-block entry and exit points, before and after
> each memory operation, etc.  Tool writers can instrument a
> program-under-test simply by first writing a library that defines the
> semantics of relevant hooks
> and then statically linking their compiled library with the
> program-under-test.
>
> At first glance, this brute-force method of inserting hooks at every
> salient location in the program-under-test seems to be replete with
> overheads.  CSI overcomes these overheads through the use of
> link-time-optimization (LTO), which is now readily available in most major
> compilers, including GCC and LLVM.  Using LTO, instrumentation hooks that
> are not used by a particular tool can be elided, allowing the overheads of
> these hooks to be avoided when the
>
>
> I don't understand this flow: the front-end emits all the possible
> instrumentation but the useless calls to the runtime will be removed during
> the link?
> It means that the final binary is specialized for a given tool right? What
> is the advantage of generating this useless instrumentation in the first
> place then? I'm missing a piece here...
>
> Suppose I want to build a production build, and one build for each of
> ASAN, MSAN, UBSAN, and TSAN.
>
> With the current approach, I need to compile my source five different
> times, and link five different times.
>
> With the CSI approach (assuming it was the backing technology behind the
> sanitizers), I need to compile twice (once for production, once for
> instrumentation), then LTO-link five times.  I can reuse my .o files across
> the sanitizer types.
>
This reduction in the number of compile operations needed, and in the
number intermediate object/bitcode files produced, is indeed an advantage
of the CSI approach.

As an aside, we've been experimenting with linking CSI-instrumented
bitcodes against the "null tool," which implements every
instrumentation
hook as a nop, and comparing the performance of those binaries against
production binaries.  Our preliminary tests have shown some promising
results.  For generating main executables, using LTO to link
CSI-instrumented bitcodes with the null tool produces executables that are
as fast as the production executables.  For generating dynamic libraries,
however, using LTO to link the CSI-instrumented bitcode of a dynamic
library with the null tool seems to produce a binary that is slower than
production.  (The Apache HTTP server benchmark we've tried runs roughly 30%
slower when using such null-tool-instrumented dynamic libraries.)  These
results suggest that using LTO to link CSI-instrumented bitcodes with the
null tool is almost, but not quite, able to produce binaries with
production performance, which would allow tool users to only compile their
sources once.

>
> It's possible that the math doesn't really work out in practice if
the
> cost of the LTO-link dwarfs the compile times.
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160617/ebe084aa/attachment.html>

Mehdi Amini via llvm-dev

2016-Jun-17 18:42 UTC

head link

[llvm-dev] RFC: Comprehensive Static Instrumentation

> On Jun 17, 2016, at 11:27 AM, TB Schardl via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hey Ben,
> 
> Thank you for your comments.  I've put my response inline.
> 
> Cheers,
> TB
> 
> On Fri, Jun 17, 2016 at 6:29 AM, Craig, Ben via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> On 6/16/2016 2:48 PM, Mehdi Amini via llvm-dev wrote:
>> 
>>> On Jun 16, 2016, at 9:01 AM, TB Schardl via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>> 
>>> The CSI framework inserts instrumentation hooks at salient
locations throughout the compiled code of a program-under-test, such as function
entry and exit points, basic-block entry and exit points, before and after each
memory operation, etc.  Tool writers can instrument a program-under-test simply
by first writing a library that defines the semantics of relevant hooks
>>> and then statically linking their compiled library with the
program-under-test.
>>> 
>>> At first glance, this brute-force method of inserting hooks at
every salient location in the program-under-test seems to be replete with
overheads.  CSI overcomes these overheads through the use of
link-time-optimization (LTO), which is now readily available in most major
compilers, including GCC and LLVM.  Using LTO, instrumentation hooks that are
not used by a particular tool can be elided, allowing the overheads of these
hooks to be avoided when the
>> 
>> I don't understand this flow: the front-end emits all the possible
instrumentation but the useless calls to the runtime will be removed during the
link?
>> It means that the final binary is specialized for a given tool right?
What is the advantage of generating this useless instrumentation in the first
place then? I'm missing a piece here...
>> 
> Suppose I want to build a production build, and one build for each of ASAN,
MSAN, UBSAN, and TSAN.
> 
> With the current approach, I need to compile my source five different
times, and link five different times.
> 
> With the CSI approach (assuming it was the backing technology behind the
sanitizers), I need to compile twice (once for production, once for
instrumentation), then LTO-link five times.  I can reuse my .o files across the
sanitizer types.
> 
> This reduction in the number of compile operations needed, and in the
number intermediate object/bitcode files produced, is indeed an advantage of the
CSI approach.
It is a very artificial advantage, what are you saving? Temporary Disk Space?

> As an aside, we've been experimenting with linking CSI-instrumented
bitcodes against the "null tool," which implements every
instrumentation hook as a nop, and comparing the performance of those binaries
against production binaries.  Our preliminary tests have shown some promising
results.  For generating main executables, using LTO to link CSI-instrumented
bitcodes with the null tool produces executables that are as fast as the
production executables.  For generating dynamic libraries, however, using LTO to
link the CSI-instrumented bitcode of a dynamic library with the null tool seems
to produce a binary that is slower than production.  (The Apache HTTP server
benchmark we've tried runs roughly 30% slower when using such
null-tool-instrumented dynamic libraries.)  These results suggest that using LTO
to link CSI-instrumented bitcodes with the null tool is almost, but not quite,
able to produce binaries with production performance
These results suggests that “adding instrumentation has a cost” nothing more,
and is unrelated to LTO at all.
You would provide the runtime to the compiler directly during the compile phase
and you would get the same results.

> , which would allow tool users to only compile their sources once.
LTO means basically “compiles during the link”. You won’t save much.

I haven’t seen a single compelling argument to *tie* CSI to LTO in this thread
until now.


— 
Mehdi




>  
> 
> It's possible that the math doesn't really work out in practice if
the cost of the LTO-link dwarfs the compile times.
> -- 
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160617/c34d1aec/attachment-0001.html>

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Jun 2016 - RFC: Comprehensive Static Instrumentation

[llvm-dev] RFC: Comprehensive Static Instrumentation

[llvm-dev] RFC: Comprehensive Static Instrumentation

[llvm-dev] RFC: Comprehensive Static Instrumentation

[llvm-dev] RFC: Comprehensive Static Instrumentation

Reasonably Related Threads