TB Schardl via llvm-dev
2016-Jun-20 23:00 UTC
[llvm-dev] RFC: Comprehensive Static Instrumentation
Hey David, Thank you for your feedback. I'm trying to understand the model you sketched in order to compare it to CSI's current approach, but the details of your proposal are still fuzzy to me. In particular, although the model you described would avoid using LTO to elide unused hooks, it seems more complicated for both tool writers and tool users to use. Please clarify your model and shed some light on the questions below. 1) What does the "CSI dev tool" you describe look like? In particular, how does the tool writer produce and maintain an export list of non-stub hook definitions for her tool? Maintaining an export list manually would seem to be an error-prone hassle. One could imagine generating the export list automatically with a custom compile-time pass that the tool-writer runs, but maintaining consistency between a tool and its export list remains a concern for both the tool writer and the tool user. 2) To clarify, in your scheme, the tool writer produces an export list as well as an object/bitcode for the tool. The tool user compiles the program-under-test using the export list, and then incorporates the tool object/bitcode at link time. Is this what you have in mind? 3) Do I understand correctly that your model only gets rid of the need for LTO to elide unused instrumentation hooks? In particular, are other optimizations on the instrumentation still contingent on LTO (or ThinLTO)? One nice feature of the current design is that tool writers can use properties passed to hooks to elide instrumentation conditionally based on common static analysis. It looks like optimizations based on properties are still possible in the model you propose, but as with the current design, they would still rely on LTO or ThinLTO. 4) In your model, if the tool user wants to analyze his program with several different tools, then he must recompile his program from source once for each tool. Is this correct? The argument in this thread seems to be that recompiling the program from source is no worse than using LTO because LTO incurs high overhead. (From the results I've found online http://llvm.org/devmtg/2015-04/slides/ThinLTO_EuroLLVM2015.pdf, however, ThinLTO seems to be much lower overhead than LTO. Are these results still accurate?) 5) What happens when the program-under-test is built from multiple source files? It seems that all source files must be compiled using the same export list, which is a burden on the tool user. Although this complexity could be managed by the program's build system, managing this complexity through the build system or by some other means still seems like a burden on the tool user. 6) What happens when the program-under-test uses third-party libraries? If the tool user does not have access to the sources of those libraries, then I don't see how they can properly elide hooks using an export list. At best, it would seem that the library writer would distribute a object/bitcode for the library with all hooks in place already, which is exactly what happens with CSI's the existing design. Thanks again for your feedback. Cheers, TB On Sun, Jun 19, 2016 at 6:45 PM, Xinliang David Li <xinliangli at gmail.com> wrote:> It is great that CSI does not depend on lto for correctness, but I think > it should remove the dependency completely. Being able to work with lto > should be something that is supported, but lto is not required. > > Such a design will greatly increase CSI usability not only for tool > users(app developers), but also for CSI tool developers. > > My understanding is that it is not far to get there. Here is the CSI use > model I am thinking: > > 1) a CSI dev tool can be provided to tool developers to produce a) hook > library in native object format b) hook lib in bitcode format c) export > list of non stub hook defs > > 2) the compiler implementation of CSI lowering will check the export list > and decide whether to lower a hook call to noop or not > > 3) in thinlto mode, the bitcode can be readin for inlining. This works for > lto too > > 4) compiler driver can hide all the above from the users > > > In short, making it work in default mode makes it easier to deploy and can > get you a long way > > Thanks, > > David > > > On Sunday, June 19, 2016, TB Schardl <neboat at mit.edu> wrote: > >> Hey Peter and David, >> >> Thank you for your comments. >> >> As mentioned elsewhere, the current design of CSI does not rely on LTO >> for correctness. The tool-instrumented executable will run correctly even >> if the linker performs no optimization. In particular, unused >> instrumentation hooks are implemented by default as nop functions, which >> just return immediately. CSI is a system that *can use* LTO to improve >> tool performance; it does not *require *LTO to function. >> >> One of our considerations when developing CSI version 1 was design >> simplicity. As such, CSI version 1 essentially consists of three >> components: >> 1) A compile-time pass that the tool user runs to insert instrumentation >> hooks. >> 2) A null-tool library that provides default nop implementations for each >> instrumentation hook. When a tool writer implements a tool using the CSI >> API, the tool writer's implemented hooks override the corresponding default >> implementations. >> 3) A runtime library that implements certain powerful features of CSI, >> including contiguous sets of ID's for hooks. >> >> We've been thinking about how CSI might work with -mlink-bitcode-file. >> From our admittedly limited understanding of that feature, it seems that a >> design that uses -mlink-bitcode-file would still require something like the >> first and third components of the existing design. Additional complexity >> might be needed to get CSI to work with -mlink-bitcode-file, but these two >> components seem to be core to CSI, regardless of whether >> -mlink-bitcode-file is used. (Eliminating the null-tool library amounts to >> eliminating a pretty simple 39-line C file, which at first blush doesn't >> look like a big win in design complexity). >> >> CSI focuses on making it easy for tool writers to create many >> dynamic-analysis tools. CSI can leverage standard compiler optimizations >> to improve tool performance, if the tool user employs mechanisms such as >> LTO or thinLTO, but LTO itself is not mandatory. It might be worthwhile to >> explore other approaches with different trade-offs, such as >> -mlink-bitcode-file, but the existing design doesn't preclude these >> approaches down the road, and they will be able to share the same >> infrastructure. Unless the other approaches are dramatically simpler, the >> existing design seems like a good place to start. >> >> Cheers, >> TB >> >> On Fri, Jun 17, 2016 at 4:25 PM, Peter Collingbourne via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> >>> >>> On Thu, Jun 16, 2016 at 10:16 PM, Xinliang David Li via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> >>>> >>>> On Thu, Jun 16, 2016 at 3:27 PM, Mehdi Amini via llvm-dev < >>>> llvm-dev at lists.llvm.org> wrote: >>>> >>>>> Hi TB, >>>>> >>>>> Thanks for you answer. >>>>> >>>>> On Jun 16, 2016, at 2:50 PM, TB Schardl <neboat at mit.edu> wrote: >>>>> >>>>> Hey Mehdi, >>>>> >>>>> Thank you for your comments. I've CC'd the CSI mailing list with your >>>>> comments and put my responses inline. Please let me know any other >>>>> questions you have. >>>>> >>>>> Cheers, >>>>> TB >>>>> >>>>> On Thu, Jun 16, 2016 at 3:48 PM, Mehdi Amini <mehdi.amini at apple.com> >>>>> wrote: >>>>> >>>>>> >>>>>> On Jun 16, 2016, at 9:01 AM, TB Schardl via llvm-dev < >>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>> >>>>>> Hey LLVM-dev, >>>>>> >>>>>> We propose to build the CSI framework to provide a comprehensive >>>>>> suite of compiler-inserted instrumentation hooks that dynamic-analysis >>>>>> tools can use to observe and investigate program runtime behavior. >>>>>> Traditionally, tools based on compiler instrumentation would each >>>>>> separately modify the compiler to insert their own instrumentation. In >>>>>> contrast, CSI inserts a standard collection of instrumentation hooks into >>>>>> the program-under-test. Each CSI-tool is implemented as a library that >>>>>> defines relevant hooks, and the remaining hooks are "nulled" out and elided >>>>>> during link-time optimization (LTO), resulting in instrumented runtimes on >>>>>> par with custom instrumentation. CSI allows many compiler-based tools to >>>>>> be written as simple libraries without modifying the compiler, greatly >>>>>> lowering the bar for >>>>>> developing dynamic-analysis tools. >>>>>> >>>>>> ===============>>>>>> Motivation >>>>>> ===============>>>>>> >>>>>> Key to understanding and improving the behavior of any system is >>>>>> visibility -- the ability to know what is going on inside the system. >>>>>> Various dynamic-analysis tools, such as race detectors, memory checkers, >>>>>> cache simulators, call-graph generators, code-coverage analyzers, and >>>>>> performance profilers, rely on compiler instrumentation to gain visibility >>>>>> into the program behaviors during execution. With this approach, the tool >>>>>> writer modifies the compiler to insert instrumentation code into the >>>>>> program-under-test so that it can execute behind the scene while the >>>>>> program-under-test runs. This approach, however, means that the >>>>>> development of new tools requires compiler work, which many potential tool >>>>>> writers are ill equipped to do, and thus raises the bar for building new >>>>>> and innovative tools. >>>>>> >>>>>> The goal of the CSI framework is to provide comprehensive static >>>>>> instrumentation through the compiler, in order to simplify the task of >>>>>> building efficient and effective platform-independent tools. The CSI >>>>>> framework allows the tool writer to easily develop analysis tools that >>>>>> require >>>>>> compiler instrumentation without needing to understand the compiler >>>>>> internals or modifying the compiler, which greatly lowers the bar for >>>>>> developing dynamic-analysis tools. >>>>>> >>>>>> ===============>>>>>> Approach >>>>>> ===============>>>>>> >>>>>> The CSI framework inserts instrumentation hooks at salient locations >>>>>> throughout the compiled code of a program-under-test, such as function >>>>>> entry and exit points, basic-block entry and exit points, before and after >>>>>> each memory operation, etc. Tool writers can instrument a >>>>>> program-under-test simply by first writing a library that defines the >>>>>> semantics of relevant hooks >>>>>> and then statically linking their compiled library with the >>>>>> program-under-test. >>>>>> >>>>>> At first glance, this brute-force method of inserting hooks at every >>>>>> salient location in the program-under-test seems to be replete with >>>>>> overheads. CSI overcomes these overheads through the use of >>>>>> link-time-optimization (LTO), which is now readily available in most major >>>>>> compilers, including GCC and LLVM. Using LTO, instrumentation hooks that >>>>>> are not used by a particular tool can be elided, allowing the overheads of >>>>>> these hooks to be avoided when the >>>>>> >>>>>> >>>>>> I don't understand this flow: the front-end emits all the possible >>>>>> instrumentation but the useless calls to the runtime will be removed during >>>>>> the link? >>>>>> It means that the final binary is specialized for a given tool right? >>>>>> What is the advantage of generating this useless instrumentation in the >>>>>> first place then? I'm missing a piece here... >>>>>> >>>>> >>>>> Here's the idea. When a tool user, who has a program they want to >>>>> instrument, compiles their program source into an object/bitcode, he can >>>>> turn on the CSI compile-time pass to insert instrumentation hooks (function >>>>> calls to instrumentation routines) throughout the IR of the program. >>>>> Separately, a tool writer implements a particular tool by writing a library >>>>> that defines the subset of instrumentation hooks she cares about. At link >>>>> time, the object/bitcode of the program source is linked with the object >>>>> file/bitcode of the tool, resulting in a tool-instrumented executable. >>>>> When LTO is used at link time, unused instrumentation is elided, and >>>>> additional optimizations can run on the instrumented program. (I'm happy >>>>> to send you a nice picture that we have of this flow, if the mailing list >>>>> doesn't mind.) >>>>> >>>>> >>>>> Ok this is roughly what I had in mind. >>>>> >>>>> I still believe it is not great to rely on LTO, and better, it is not >>>>> needed to achieve this result. >>>>> >>>>> For instance, I don't see why the "library" that defines the subset of >>>>> instrumentation hooks used by this tool can't be fed during a regular >>>>> compile, and the useless hook be eliminated at this point. >>>>> Implementation detail, but in practice, instead of feeding the library >>>>> itself, the "framework" that allows to generate the library for the tool >>>>> writer can output a "configuration file" along side the library, and this >>>>> configuration file is what is fed to the compiler and tells the >>>>> instrumentation pass which of the hooks to generate. It sounds more >>>>> efficient to me, and remove the dependency on LTO. >>>>> I imagine there is a possible drawback that I'm missing right now... >>>>> >>>> >>>> >>>> I agree that the tool does not need to depend on full LTO. What is >>>> needed is essentially an option or configuration such that the compiler can >>>> find the bit code file(s) for the hooks during compilation time. It is >>>> pretty much similar to how math function inlining can be done ... >>>> >>> >>> I agree, and I would strongly prefer that the design worked like this >>> rather than relying on LTO. >>> >>> The flag for loading bitcode already exists, and is called >>> -mlink-bitcode-file. Projects such as libclc already use it, I believe. >>> >>> What might be useful is if CSI improved the infrastructure around >>> -mlink-bitcode-file to make it more convenient to produce compatible >>> bitcode files. libclc for example relies on a post-processing pass to >>> change symbol linkage, and I think that can be avoided by changing symbol >>> linkages as they are imported from the bitcode file. >>> >>> Peter >>> >>> David >>>> >>>> >>>> >>>> >>>>> >>>>> >>>>> >>>>> The final binary is specialized to a given tool. One advantage of >>>>> CSI, however, is that a single set of instrumentation covers the needs of a >>>>> wide variety of tools, since different tools provide different >>>>> implementations of the same hooks. The specialization of a binary to a >>>>> given tool happens at link time. >>>>> >>>>> >>>>>> >>>>>> instrumented program-under-test is run. Furthermore, LTO can >>>>>> optimize a tool's instrumentation within a program using traditional >>>>>> compiler optimizations. Our initial study indicates that the use of LTO >>>>>> does not unduly slow down the build time >>>>>> >>>>>> >>>>>> This is a false claim: LTO has a very large overhead, and especially >>>>>> is not parallel, so the more core you have the more the difference will be. >>>>>> We frequently observes builds that are 3 times slower. Moreover, LTO is not >>>>>> incremental friendly and during debug (which is very relevant with >>>>>> sanitizer) rebuilding involves waiting for the full link to occur again. >>>>>> >>>>>> >>>>> Can you please point us towards some projects where LTO incurs a 3x >>>>> slowdown? We're interested in the overhead of LTO on build times, and >>>>> although we've found LTO to incur more overhead on parallel build times >>>>> than serial build times, as you mentioned, the overheads we've measured on >>>>> serial or parallel builds have been less than 40% (which we saw when >>>>> building the Apache HTTP server). >>>>> >>>>> >>>>> I expect this to be reproducible on most non-trivial C/C++ programs. >>>>> But taking clang as an example, just running `ninja clang` on OS X a >>>>> not-so-recent 12-cores machine takes 970s with LTO and 252s without (and I >>>>> believe this is without debug info...). >>>>> Running just `ninja` to build all of llvm/clang here would take *a >>>>> lot* longer with LTO, and not so much without. >>>>> >>>>> The LTO builds without assert >>>>> >>>>> Best, >>>>> >>>>> -- >>>>> Mehdi >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> We've also designed CSI such that it does not depend on LTO for >>>>> correctness; the program and tool will work correctly with ordinary ld. Of >>>>> course, the downside of not using LTO is that instrumentation is not >>>>> optimized, and in particular, unused instrumentation will incur overhead. >>>>> >>>>> >>>>>> >>>>>> -- >>>>>> Mehdi >>>>>> >>>>>> , and the LTO can indeed optimize away unused hooks. One of our >>>>>> experiments with Apache HTTP server shows that, compiling with CSI and >>>>>> linking with the "null" CSI-tool (which consists solely of empty hooks) >>>>>> slows down the build time of the Apache HTTP server by less than 40%, and >>>>>> the resulting tool-instrumented executable is as fast as the original >>>>>> uninstrumented code. >>>>>> >>>>>> >>>>>> ===============>>>>>> CSI version 1 >>>>>> ===============>>>>>> >>>>>> The initial version of CSI supports a basic set of hooks that covers >>>>>> the following categories of program objects: functions, function exits >>>>>> (specifically, returns), basic blocks, call sites, loads, and stores. We >>>>>> prioritized instrumenting these IR objects based on the need of seven >>>>>> example CSI tools, including a race detector, a cache-reuse analyzer, and a >>>>>> code-coverage analyzer. We plan to evolve the CSI API over time to be more >>>>>> comprehensive, and we have designed the CSI API to be extensible, allowing >>>>>> new instrumentation to be added as needs grow. We chose to initially >>>>>> implement a minimal "core" set of hooks, because we felt it was best to add >>>>>> new instrumentation on an as-needed basis in order to keep the interface >>>>>> simple. >>>>>> >>>>>> There are three salient features about the design of CSI. First, CSI >>>>>> assigns each instrumented program object a unique integer identifier within >>>>>> one of the (currently) six program-object categories. Within each >>>>>> category, the ID's are consecutively numbered from 0 up to the number of >>>>>> such objects minus 1. The contiguous assignment of the ID's allows the >>>>>> tool writer to easily keep track of IR objects in the program and iterate >>>>>> through all objects in a category (whether the object is encountered during >>>>>> execution or not). Second, CSI provides a platform-independent means to >>>>>> relate a given program object to locations in the source code. >>>>>> Specifically, CSI provides "front-end-data (FED)" tables, which provide >>>>>> file name and source lines for each program object given the object's ID. >>>>>> Third, each CSI hook takes in as a parameter a "property": a 64-bit >>>>>> unsigned integer that CSI uses to export the results of compiler analyses >>>>>> and other information known at compile time. The use of properties allow >>>>>> the tool to rely on compiler analyses to optimize instrumentation and >>>>>> decrease overhead. In particular, since the value of a property is known >>>>>> at compile time, LTO can constant-fold the conditional test around a >>>>>> property to elide unnecessary instrumentation. >>>>>> >>>>>> ===============>>>>>> Future plan >>>>>> ===============>>>>>> >>>>>> We plan to expand CSI in future versions by instrumenting additional >>>>>> program objects, such as atomic instructions, floating-point instructions, >>>>>> and exceptions. We are also planning to provide additional static >>>>>> information to tool writers, both through information encoded in the >>>>>> properties passed to hooks and by other means. In particular, we are also >>>>>> looking at mechanisms to present tool writers with more complex static >>>>>> information, such as how different program objects relate to each other, >>>>>> e.g., which basic blocks belong to a given function. >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >>> >>> -- >>> -- >>> Peter >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160620/92491af6/attachment.html>
Evgenii Stepanov via llvm-dev
2016-Jun-20 23:27 UTC
[llvm-dev] RFC: Comprehensive Static Instrumentation
I like this non-LTO use model a lot. In my opinion, it does not conflict with the LTO model - the only difference is when the implementation of CSI hooks is injected into the build - compile time vs link time. I don't see any problem with injecting it both at compile and link time - this would take care of CSI hooks in third party libraries. I also don't understand the need of the export list. As I see it, the IR library of the tool is loaded at compilation time, internalized and merged with the IR for the module being compiled. If any CSI hooks remain undefined at this point, they are replaced with empty functions. The IR tool library should not have any global state, of course - it is an equivalent of the C++ header file. Exactly the same happens at LTO link time, in case the link brought in new CSI hooks unresolved at compilation time. ThinLTO uses a different build model - it's not compile+link, but compile+merge+compile+link or something like that AFAIK. This requires build system changes, and I imagine the extra work would not be worth the benefits for many smaller projects. I'd love if CSI worked without any kind of LTO with reasonable performance (btw, do you have the numbers for the slowdown with an empty tool without LTO?). On Mon, Jun 20, 2016 at 4:00 PM, TB Schardl via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hey David, > > Thank you for your feedback. I'm trying to understand the model you > sketched in order to compare it to CSI's current approach, but the details > of your proposal are still fuzzy to me. In particular, although the model > you described would avoid using LTO to elide unused hooks, it seems more > complicated for both tool writers and tool users to use. Please clarify > your model and shed some light on the questions below. > > 1) What does the "CSI dev tool" you describe look like? In particular, how > does the tool writer produce and maintain an export list of non-stub hook > definitions for her tool? Maintaining an export list manually would seem to > be an error-prone hassle. One could imagine generating the export list > automatically with a custom compile-time pass that the tool-writer runs, but > maintaining consistency between a tool and its export list remains a concern > for both the tool writer and the tool user. > > 2) To clarify, in your scheme, the tool writer produces an export list as > well as an object/bitcode for the tool. The tool user compiles the > program-under-test using the export list, and then incorporates the tool > object/bitcode at link time. Is this what you have in mind? > > 3) Do I understand correctly that your model only gets rid of the need for > LTO to elide unused instrumentation hooks? In particular, are other > optimizations on the instrumentation still contingent on LTO (or ThinLTO)? > One nice feature of the current design is that tool writers can use > properties passed to hooks to elide instrumentation conditionally based on > common static analysis. It looks like optimizations based on properties are > still possible in the model you propose, but as with the current design, > they would still rely on LTO or ThinLTO. > > 4) In your model, if the tool user wants to analyze his program with several > different tools, then he must recompile his program from source once for > each tool. Is this correct? The argument in this thread seems to be that > recompiling the program from source is no worse than using LTO because LTO > incurs high overhead. (From the results I've found online > http://llvm.org/devmtg/2015-04/slides/ThinLTO_EuroLLVM2015.pdf, however, > ThinLTO seems to be much lower overhead than LTO. Are these results still > accurate?) > > 5) What happens when the program-under-test is built from multiple source > files? It seems that all source files must be compiled using the same > export list, which is a burden on the tool user. Although this complexity > could be managed by the program's build system, managing this complexity > through the build system or by some other means still seems like a burden on > the tool user. > > 6) What happens when the program-under-test uses third-party libraries? If > the tool user does not have access to the sources of those libraries, then I > don't see how they can properly elide hooks using an export list. At best, > it would seem that the library writer would distribute a object/bitcode for > the library with all hooks in place already, which is exactly what happens > with CSI's the existing design. > > Thanks again for your feedback. > > Cheers, > TB > > On Sun, Jun 19, 2016 at 6:45 PM, Xinliang David Li <xinliangli at gmail.com> > wrote: >> >> It is great that CSI does not depend on lto for correctness, but I think >> it should remove the dependency completely. Being able to work with lto >> should be something that is supported, but lto is not required. >> >> Such a design will greatly increase CSI usability not only for tool >> users(app developers), but also for CSI tool developers. >> >> My understanding is that it is not far to get there. Here is the CSI use >> model I am thinking: >> >> 1) a CSI dev tool can be provided to tool developers to produce a) hook >> library in native object format b) hook lib in bitcode format c) export list >> of non stub hook defs >> >> 2) the compiler implementation of CSI lowering will check the export list >> and decide whether to lower a hook call to noop or not >> >> 3) in thinlto mode, the bitcode can be readin for inlining. This works for >> lto too >> >> 4) compiler driver can hide all the above from the users >> >> >> In short, making it work in default mode makes it easier to deploy and can >> get you a long way >> >> Thanks, >> >> David >> >> >> On Sunday, June 19, 2016, TB Schardl <neboat at mit.edu> wrote: >>> >>> Hey Peter and David, >>> >>> Thank you for your comments. >>> >>> As mentioned elsewhere, the current design of CSI does not rely on LTO >>> for correctness. The tool-instrumented executable will run correctly even >>> if the linker performs no optimization. In particular, unused >>> instrumentation hooks are implemented by default as nop functions, which >>> just return immediately. CSI is a system that can use LTO to improve tool >>> performance; it does not require LTO to function. >>> >>> One of our considerations when developing CSI version 1 was design >>> simplicity. As such, CSI version 1 essentially consists of three >>> components: >>> 1) A compile-time pass that the tool user runs to insert instrumentation >>> hooks. >>> 2) A null-tool library that provides default nop implementations for each >>> instrumentation hook. When a tool writer implements a tool using the CSI >>> API, the tool writer's implemented hooks override the corresponding default >>> implementations. >>> 3) A runtime library that implements certain powerful features of CSI, >>> including contiguous sets of ID's for hooks. >>> >>> We've been thinking about how CSI might work with -mlink-bitcode-file. >>> From our admittedly limited understanding of that feature, it seems that a >>> design that uses -mlink-bitcode-file would still require something like the >>> first and third components of the existing design. Additional complexity >>> might be needed to get CSI to work with -mlink-bitcode-file, but these two >>> components seem to be core to CSI, regardless of whether -mlink-bitcode-file >>> is used. (Eliminating the null-tool library amounts to eliminating a pretty >>> simple 39-line C file, which at first blush doesn't look like a big win in >>> design complexity). >>> >>> CSI focuses on making it easy for tool writers to create many >>> dynamic-analysis tools. CSI can leverage standard compiler optimizations to >>> improve tool performance, if the tool user employs mechanisms such as LTO or >>> thinLTO, but LTO itself is not mandatory. It might be worthwhile to explore >>> other approaches with different trade-offs, such as -mlink-bitcode-file, but >>> the existing design doesn't preclude these approaches down the road, and >>> they will be able to share the same infrastructure. Unless the other >>> approaches are dramatically simpler, the existing design seems like a good >>> place to start. >>> >>> Cheers, >>> TB >>> >>> On Fri, Jun 17, 2016 at 4:25 PM, Peter Collingbourne via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>>> >>>> >>>> >>>> On Thu, Jun 16, 2016 at 10:16 PM, Xinliang David Li via llvm-dev >>>> <llvm-dev at lists.llvm.org> wrote: >>>>> >>>>> >>>>> >>>>> On Thu, Jun 16, 2016 at 3:27 PM, Mehdi Amini via llvm-dev >>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>> >>>>>> Hi TB, >>>>>> >>>>>> Thanks for you answer. >>>>>> >>>>>> On Jun 16, 2016, at 2:50 PM, TB Schardl <neboat at mit.edu> wrote: >>>>>> >>>>>> Hey Mehdi, >>>>>> >>>>>> Thank you for your comments. I've CC'd the CSI mailing list with your >>>>>> comments and put my responses inline. Please let me know any other >>>>>> questions you have. >>>>>> >>>>>> Cheers, >>>>>> TB >>>>>> >>>>>> On Thu, Jun 16, 2016 at 3:48 PM, Mehdi Amini <mehdi.amini at apple.com> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On Jun 16, 2016, at 9:01 AM, TB Schardl via llvm-dev >>>>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>>> >>>>>>> Hey LLVM-dev, >>>>>>> >>>>>>> We propose to build the CSI framework to provide a comprehensive >>>>>>> suite of compiler-inserted instrumentation hooks that dynamic-analysis tools >>>>>>> can use to observe and investigate program runtime behavior. Traditionally, >>>>>>> tools based on compiler instrumentation would each separately modify the >>>>>>> compiler to insert their own instrumentation. In contrast, CSI inserts a >>>>>>> standard collection of instrumentation hooks into the program-under-test. >>>>>>> Each CSI-tool is implemented as a library that defines relevant hooks, and >>>>>>> the remaining hooks are "nulled" out and elided during link-time >>>>>>> optimization (LTO), resulting in instrumented runtimes on par with custom >>>>>>> instrumentation. CSI allows many compiler-based tools to be written as >>>>>>> simple libraries without modifying the compiler, greatly lowering the bar >>>>>>> for >>>>>>> developing dynamic-analysis tools. >>>>>>> >>>>>>> ===============>>>>>>> Motivation >>>>>>> ===============>>>>>>> >>>>>>> Key to understanding and improving the behavior of any system is >>>>>>> visibility -- the ability to know what is going on inside the system. >>>>>>> Various dynamic-analysis tools, such as race detectors, memory checkers, >>>>>>> cache simulators, call-graph generators, code-coverage analyzers, and >>>>>>> performance profilers, rely on compiler instrumentation to gain visibility >>>>>>> into the program behaviors during execution. With this approach, the tool >>>>>>> writer modifies the compiler to insert instrumentation code into the >>>>>>> program-under-test so that it can execute behind the scene while the >>>>>>> program-under-test runs. This approach, however, means that the development >>>>>>> of new tools requires compiler work, which many potential tool writers are >>>>>>> ill equipped to do, and thus raises the bar for building new and innovative >>>>>>> tools. >>>>>>> >>>>>>> The goal of the CSI framework is to provide comprehensive static >>>>>>> instrumentation through the compiler, in order to simplify the task of >>>>>>> building efficient and effective platform-independent tools. The CSI >>>>>>> framework allows the tool writer to easily develop analysis tools that >>>>>>> require >>>>>>> compiler instrumentation without needing to understand the compiler >>>>>>> internals or modifying the compiler, which greatly lowers the bar for >>>>>>> developing dynamic-analysis tools. >>>>>>> >>>>>>> ===============>>>>>>> Approach >>>>>>> ===============>>>>>>> >>>>>>> The CSI framework inserts instrumentation hooks at salient locations >>>>>>> throughout the compiled code of a program-under-test, such as function entry >>>>>>> and exit points, basic-block entry and exit points, before and after each >>>>>>> memory operation, etc. Tool writers can instrument a program-under-test >>>>>>> simply by first writing a library that defines the semantics of relevant >>>>>>> hooks >>>>>>> and then statically linking their compiled library with the >>>>>>> program-under-test. >>>>>>> >>>>>>> At first glance, this brute-force method of inserting hooks at every >>>>>>> salient location in the program-under-test seems to be replete with >>>>>>> overheads. CSI overcomes these overheads through the use of >>>>>>> link-time-optimization (LTO), which is now readily available in most major >>>>>>> compilers, including GCC and LLVM. Using LTO, instrumentation hooks that >>>>>>> are not used by a particular tool can be elided, allowing the overheads of >>>>>>> these hooks to be avoided when the >>>>>>> >>>>>>> >>>>>>> I don't understand this flow: the front-end emits all the possible >>>>>>> instrumentation but the useless calls to the runtime will be removed during >>>>>>> the link? >>>>>>> It means that the final binary is specialized for a given tool right? >>>>>>> What is the advantage of generating this useless instrumentation in the >>>>>>> first place then? I'm missing a piece here... >>>>>> >>>>>> >>>>>> Here's the idea. When a tool user, who has a program they want to >>>>>> instrument, compiles their program source into an object/bitcode, he can >>>>>> turn on the CSI compile-time pass to insert instrumentation hooks (function >>>>>> calls to instrumentation routines) throughout the IR of the program. >>>>>> Separately, a tool writer implements a particular tool by writing a library >>>>>> that defines the subset of instrumentation hooks she cares about. At link >>>>>> time, the object/bitcode of the program source is linked with the object >>>>>> file/bitcode of the tool, resulting in a tool-instrumented executable. When >>>>>> LTO is used at link time, unused instrumentation is elided, and additional >>>>>> optimizations can run on the instrumented program. (I'm happy to send you a >>>>>> nice picture that we have of this flow, if the mailing list doesn't mind.) >>>>>> >>>>>> >>>>>> Ok this is roughly what I had in mind. >>>>>> >>>>>> I still believe it is not great to rely on LTO, and better, it is not >>>>>> needed to achieve this result. >>>>>> >>>>>> For instance, I don't see why the "library" that defines the subset of >>>>>> instrumentation hooks used by this tool can't be fed during a regular >>>>>> compile, and the useless hook be eliminated at this point. >>>>>> Implementation detail, but in practice, instead of feeding the library >>>>>> itself, the "framework" that allows to generate the library for the tool >>>>>> writer can output a "configuration file" along side the library, and this >>>>>> configuration file is what is fed to the compiler and tells the >>>>>> instrumentation pass which of the hooks to generate. It sounds more >>>>>> efficient to me, and remove the dependency on LTO. >>>>>> I imagine there is a possible drawback that I'm missing right now... >>>>> >>>>> >>>>> >>>>> I agree that the tool does not need to depend on full LTO. What is >>>>> needed is essentially an option or configuration such that the compiler can >>>>> find the bit code file(s) for the hooks during compilation time. It is >>>>> pretty much similar to how math function inlining can be done ... >>>> >>>> >>>> I agree, and I would strongly prefer that the design worked like this >>>> rather than relying on LTO. >>>> >>>> The flag for loading bitcode already exists, and is called >>>> -mlink-bitcode-file. Projects such as libclc already use it, I believe. >>>> >>>> What might be useful is if CSI improved the infrastructure around >>>> -mlink-bitcode-file to make it more convenient to produce compatible bitcode >>>> files. libclc for example relies on a post-processing pass to change symbol >>>> linkage, and I think that can be avoided by changing symbol linkages as they >>>> are imported from the bitcode file. >>>> >>>> Peter >>>> >>>>> David >>>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> The final binary is specialized to a given tool. One advantage of >>>>>> CSI, however, is that a single set of instrumentation covers the needs of a >>>>>> wide variety of tools, since different tools provide different >>>>>> implementations of the same hooks. The specialization of a binary to a >>>>>> given tool happens at link time. >>>>>> >>>>>>> >>>>>>> >>>>>>> instrumented program-under-test is run. Furthermore, LTO can >>>>>>> optimize a tool's instrumentation within a program using traditional >>>>>>> compiler optimizations. Our initial study indicates that the use of LTO >>>>>>> does not unduly slow down the build time >>>>>>> >>>>>>> >>>>>>> This is a false claim: LTO has a very large overhead, and especially >>>>>>> is not parallel, so the more core you have the more the difference will be. >>>>>>> We frequently observes builds that are 3 times slower. Moreover, LTO is not >>>>>>> incremental friendly and during debug (which is very relevant with >>>>>>> sanitizer) rebuilding involves waiting for the full link to occur again. >>>>>>> >>>>>> >>>>>> Can you please point us towards some projects where LTO incurs a 3x >>>>>> slowdown? We're interested in the overhead of LTO on build times, and >>>>>> although we've found LTO to incur more overhead on parallel build times than >>>>>> serial build times, as you mentioned, the overheads we've measured on serial >>>>>> or parallel builds have been less than 40% (which we saw when building the >>>>>> Apache HTTP server). >>>>>> >>>>>> >>>>>> I expect this to be reproducible on most non-trivial C/C++ programs. >>>>>> But taking clang as an example, just running `ninja clang` on OS X a >>>>>> not-so-recent 12-cores machine takes 970s with LTO and 252s without (and I >>>>>> believe this is without debug info...). >>>>>> Running just `ninja` to build all of llvm/clang here would take *a >>>>>> lot* longer with LTO, and not so much without. >>>>>> >>>>>> The LTO builds without assert >>>>>> >>>>>> Best, >>>>>> >>>>>> -- >>>>>> Mehdi >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> We've also designed CSI such that it does not depend on LTO for >>>>>> correctness; the program and tool will work correctly with ordinary ld. Of >>>>>> course, the downside of not using LTO is that instrumentation is not >>>>>> optimized, and in particular, unused instrumentation will incur overhead. >>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Mehdi >>>>>>> >>>>>>> , and the LTO can indeed optimize away unused hooks. One of our >>>>>>> experiments with Apache HTTP server shows that, compiling with CSI and >>>>>>> linking with the "null" CSI-tool (which consists solely of empty hooks) >>>>>>> slows down the build time of the Apache HTTP server by less than 40%, and >>>>>>> the resulting tool-instrumented executable is as fast as the original >>>>>>> uninstrumented code. >>>>>>> >>>>>>> >>>>>>> ===============>>>>>>> CSI version 1 >>>>>>> ===============>>>>>>> >>>>>>> The initial version of CSI supports a basic set of hooks that covers >>>>>>> the following categories of program objects: functions, function exits >>>>>>> (specifically, returns), basic blocks, call sites, loads, and stores. We >>>>>>> prioritized instrumenting these IR objects based on the need of seven >>>>>>> example CSI tools, including a race detector, a cache-reuse analyzer, and a >>>>>>> code-coverage analyzer. We plan to evolve the CSI API over time to be more >>>>>>> comprehensive, and we have designed the CSI API to be extensible, allowing >>>>>>> new instrumentation to be added as needs grow. We chose to initially >>>>>>> implement a minimal "core" set of hooks, because we felt it was best to add >>>>>>> new instrumentation on an as-needed basis in order to keep the interface >>>>>>> simple. >>>>>>> >>>>>>> There are three salient features about the design of CSI. First, CSI >>>>>>> assigns each instrumented program object a unique integer identifier within >>>>>>> one of the (currently) six program-object categories. Within each category, >>>>>>> the ID's are consecutively numbered from 0 up to the number of such objects >>>>>>> minus 1. The contiguous assignment of the ID's allows the tool writer to >>>>>>> easily keep track of IR objects in the program and iterate through all >>>>>>> objects in a category (whether the object is encountered during execution or >>>>>>> not). Second, CSI provides a platform-independent means to relate a given >>>>>>> program object to locations in the source code. Specifically, CSI provides >>>>>>> "front-end-data (FED)" tables, which provide file name and source lines for >>>>>>> each program object given the object's ID. Third, each CSI hook takes in as >>>>>>> a parameter a "property": a 64-bit unsigned integer that CSI uses to export >>>>>>> the results of compiler analyses and other information known at compile >>>>>>> time. The use of properties allow the tool to rely on compiler analyses to >>>>>>> optimize instrumentation and decrease overhead. In particular, since the >>>>>>> value of a property is known at compile time, LTO can constant-fold the >>>>>>> conditional test around a property to elide unnecessary instrumentation. >>>>>>> >>>>>>> ===============>>>>>>> Future plan >>>>>>> ===============>>>>>>> >>>>>>> We plan to expand CSI in future versions by instrumenting additional >>>>>>> program objects, such as atomic instructions, floating-point instructions, >>>>>>> and exceptions. We are also planning to provide additional static >>>>>>> information to tool writers, both through information encoded in the >>>>>>> properties passed to hooks and by other means. In particular, we are also >>>>>>> looking at mechanisms to present tool writers with more complex static >>>>>>> information, such as how different program objects relate to each other, >>>>>>> e.g., which basic blocks belong to a given function. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> llvm-dev at lists.llvm.org >>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>> >>>> >>>> >>>> -- >>>> -- >>>> Peter >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Xinliang David Li via llvm-dev
2016-Jun-21 00:16 UTC
[llvm-dev] RFC: Comprehensive Static Instrumentation
On Mon, Jun 20, 2016 at 4:00 PM, TB Schardl <neboat at mit.edu> wrote:> Hey David, > > Thank you for your feedback. I'm trying to understand the model you > sketched in order to compare it to CSI's current approach, but the details > of your proposal are still fuzzy to me. In particular, although the model > you described would avoid using LTO to elide unused hooks, it seems more > complicated for both tool writers and tool users to use. Please clarify > your model and shed some light on the questions below. > > 1) What does the "CSI dev tool" you describe look like? In particular, > how does the tool writer produce and maintain an export list of non-stub > hook definitions for her tool? Maintaining an export list manually would > seem to be an error-prone hassle. One could imagine generating the export > list automatically with a custom compile-time pass that the tool-writer > runs, but maintaining consistency between a tool and its export list > remains a concern for both the tool writer and the tool user. >I agree that maintaining consistence between the list and the library is important. That is why I think CSI tool needs to be more structured. If all tool developers follow some pre-defined code structure, it is probably not too complicated to define a tool that does everything (building/testing/packaging) for the tool. The use model I imagine is 1) Initialize the CSI tool with pre-defined structure: csi_tool init <project_path> after this step, basic directory plus stub definitions are populated 2) Development tool developer can then add their definitions in the pre-created directory 3) Build and packaging: csi_tool build_install <project_path> <toll_install_path> It generates the things I mentioned: native libraries, bitcode libraries, and export list 4) Release the tool. The use model of the tool is simple: clang -fcsi-instrument=<tool_install_path> -O2 .... Some CSI tool may also be packaged with the compiler, so their use model is like: clang -fcsi=<tool_name> -O2 ....> > 2) To clarify, in your scheme, the tool writer produces an export list as > well as an object/bitcode for the tool. The tool user compiles the > program-under-test using the export list, and then incorporates the tool > object/bitcode at link time. Is this what you have in mind? >yes, but mostly hidden to the developers and users. See examples above.> > 3) Do I understand correctly that your model only gets rid of the need for > LTO to elide unused instrumentation hooks? In particular, are other > optimizations on the instrumentation still contingent on LTO (or ThinLTO)? >My suggestion is to make it work by default, but it does not prevent it from being usable with LTO. It is an orthogonal issue.> One nice feature of the current design is that tool writers can use > properties passed to hooks to elide instrumentation conditionally based on > common static analysis. It looks like optimizations based on properties > are still possible in the model you propose, but as with the current > design, they would still rely on LTO or ThinLTO. >I have not looked at in details how property works, but I would guess it should work for all modes. It is just that the amount of static information available (can be passed to the hooks) are different.> > 4) In your model, if the tool user wants to analyze his program with > several different tools, then he must recompile his program from source > once for each tool. Is this correct? The argument in this thread seems to > be that recompiling the program from source is no worse than using LTO > because LTO incurs high overhead. (From the results I've found online > http://llvm.org/devmtg/2015-04/slides/ThinLTO_EuroLLVM2015.pdf, >The two use models are compatible. The instrumentation and lowering are two different passes. For users who want to experiment with shared IR .o with different tools, they can still produce bit code files and use LTO mode (in which case export list is optional).> however, ThinLTO seems to be much lower overhead than LTO. Are these > results still accurate?) >Those numbers were from prototypes. Incremental compilation is now supported for thinLTO (thanks to Medhi), so it is much better now.> > 5) What happens when the program-under-test is built from multiple source > files? It seems that all source files must be compiled using the same > export list, which is a burden on the tool user. Although this complexity > could be managed by the program's build system, managing this complexity > through the build system or by some other means still seems like a burden > on the tool user. >See examples above -- for tool user, it is a matter of throwing in an option : -fcsi-instrument=<...>. There should not be additional burden.> > 6) What happens when the program-under-test uses third-party libraries? > If the tool user does not have access to the sources of those libraries, > then I don't see how they can properly elide hooks using an export list. > At best, it would seem that the library writer would distribute a > object/bitcode for the library with all hooks in place already, which is > exactly what happens with CSI's the existing design. >Since lowering happens later, so third-party libaries can be distributed as bitcode with all hooks. Eliding unsed hooks just got delayed in this case (export list can still be used if not in full program mode). thanks, David> > Thanks again for your feedback. > > Cheers, > TB > > On Sun, Jun 19, 2016 at 6:45 PM, Xinliang David Li <xinliangli at gmail.com> > wrote: > >> It is great that CSI does not depend on lto for correctness, but I think >> it should remove the dependency completely. Being able to work with lto >> should be something that is supported, but lto is not required. >> >> Such a design will greatly increase CSI usability not only for tool >> users(app developers), but also for CSI tool developers. >> >> My understanding is that it is not far to get there. Here is the CSI use >> model I am thinking: >> >> 1) a CSI dev tool can be provided to tool developers to produce a) hook >> library in native object format b) hook lib in bitcode format c) export >> list of non stub hook defs >> >> 2) the compiler implementation of CSI lowering will check the export list >> and decide whether to lower a hook call to noop or not >> >> 3) in thinlto mode, the bitcode can be readin for inlining. This works >> for lto too >> >> 4) compiler driver can hide all the above from the users >> >> >> In short, making it work in default mode makes it easier to deploy and >> can get you a long way >> >> Thanks, >> >> David >> >> >> On Sunday, June 19, 2016, TB Schardl <neboat at mit.edu> wrote: >> >>> Hey Peter and David, >>> >>> Thank you for your comments. >>> >>> As mentioned elsewhere, the current design of CSI does not rely on LTO >>> for correctness. The tool-instrumented executable will run correctly even >>> if the linker performs no optimization. In particular, unused >>> instrumentation hooks are implemented by default as nop functions, which >>> just return immediately. CSI is a system that *can use* LTO to improve >>> tool performance; it does not *require *LTO to function. >>> >>> One of our considerations when developing CSI version 1 was design >>> simplicity. As such, CSI version 1 essentially consists of three >>> components: >>> 1) A compile-time pass that the tool user runs to insert instrumentation >>> hooks. >>> 2) A null-tool library that provides default nop implementations for >>> each instrumentation hook. When a tool writer implements a tool using the >>> CSI API, the tool writer's implemented hooks override the corresponding >>> default implementations. >>> 3) A runtime library that implements certain powerful features of CSI, >>> including contiguous sets of ID's for hooks. >>> >>> We've been thinking about how CSI might work with -mlink-bitcode-file. >>> From our admittedly limited understanding of that feature, it seems that a >>> design that uses -mlink-bitcode-file would still require something like the >>> first and third components of the existing design. Additional complexity >>> might be needed to get CSI to work with -mlink-bitcode-file, but these two >>> components seem to be core to CSI, regardless of whether >>> -mlink-bitcode-file is used. (Eliminating the null-tool library amounts to >>> eliminating a pretty simple 39-line C file, which at first blush doesn't >>> look like a big win in design complexity). >>> >>> CSI focuses on making it easy for tool writers to create many >>> dynamic-analysis tools. CSI can leverage standard compiler optimizations >>> to improve tool performance, if the tool user employs mechanisms such as >>> LTO or thinLTO, but LTO itself is not mandatory. It might be worthwhile to >>> explore other approaches with different trade-offs, such as >>> -mlink-bitcode-file, but the existing design doesn't preclude these >>> approaches down the road, and they will be able to share the same >>> infrastructure. Unless the other approaches are dramatically simpler, the >>> existing design seems like a good place to start. >>> >>> Cheers, >>> TB >>> >>> On Fri, Jun 17, 2016 at 4:25 PM, Peter Collingbourne via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> >>>> >>>> On Thu, Jun 16, 2016 at 10:16 PM, Xinliang David Li via llvm-dev < >>>> llvm-dev at lists.llvm.org> wrote: >>>> >>>>> >>>>> >>>>> On Thu, Jun 16, 2016 at 3:27 PM, Mehdi Amini via llvm-dev < >>>>> llvm-dev at lists.llvm.org> wrote: >>>>> >>>>>> Hi TB, >>>>>> >>>>>> Thanks for you answer. >>>>>> >>>>>> On Jun 16, 2016, at 2:50 PM, TB Schardl <neboat at mit.edu> wrote: >>>>>> >>>>>> Hey Mehdi, >>>>>> >>>>>> Thank you for your comments. I've CC'd the CSI mailing list with >>>>>> your comments and put my responses inline. Please let me know any other >>>>>> questions you have. >>>>>> >>>>>> Cheers, >>>>>> TB >>>>>> >>>>>> On Thu, Jun 16, 2016 at 3:48 PM, Mehdi Amini <mehdi.amini at apple.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> On Jun 16, 2016, at 9:01 AM, TB Schardl via llvm-dev < >>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>> >>>>>>> Hey LLVM-dev, >>>>>>> >>>>>>> We propose to build the CSI framework to provide a comprehensive >>>>>>> suite of compiler-inserted instrumentation hooks that dynamic-analysis >>>>>>> tools can use to observe and investigate program runtime behavior. >>>>>>> Traditionally, tools based on compiler instrumentation would each >>>>>>> separately modify the compiler to insert their own instrumentation. In >>>>>>> contrast, CSI inserts a standard collection of instrumentation hooks into >>>>>>> the program-under-test. Each CSI-tool is implemented as a library that >>>>>>> defines relevant hooks, and the remaining hooks are "nulled" out and elided >>>>>>> during link-time optimization (LTO), resulting in instrumented runtimes on >>>>>>> par with custom instrumentation. CSI allows many compiler-based tools to >>>>>>> be written as simple libraries without modifying the compiler, greatly >>>>>>> lowering the bar for >>>>>>> developing dynamic-analysis tools. >>>>>>> >>>>>>> ===============>>>>>>> Motivation >>>>>>> ===============>>>>>>> >>>>>>> Key to understanding and improving the behavior of any system is >>>>>>> visibility -- the ability to know what is going on inside the system. >>>>>>> Various dynamic-analysis tools, such as race detectors, memory checkers, >>>>>>> cache simulators, call-graph generators, code-coverage analyzers, and >>>>>>> performance profilers, rely on compiler instrumentation to gain visibility >>>>>>> into the program behaviors during execution. With this approach, the tool >>>>>>> writer modifies the compiler to insert instrumentation code into the >>>>>>> program-under-test so that it can execute behind the scene while the >>>>>>> program-under-test runs. This approach, however, means that the >>>>>>> development of new tools requires compiler work, which many potential tool >>>>>>> writers are ill equipped to do, and thus raises the bar for building new >>>>>>> and innovative tools. >>>>>>> >>>>>>> The goal of the CSI framework is to provide comprehensive static >>>>>>> instrumentation through the compiler, in order to simplify the task of >>>>>>> building efficient and effective platform-independent tools. The CSI >>>>>>> framework allows the tool writer to easily develop analysis tools that >>>>>>> require >>>>>>> compiler instrumentation without needing to understand the compiler >>>>>>> internals or modifying the compiler, which greatly lowers the bar for >>>>>>> developing dynamic-analysis tools. >>>>>>> >>>>>>> ===============>>>>>>> Approach >>>>>>> ===============>>>>>>> >>>>>>> The CSI framework inserts instrumentation hooks at salient locations >>>>>>> throughout the compiled code of a program-under-test, such as function >>>>>>> entry and exit points, basic-block entry and exit points, before and after >>>>>>> each memory operation, etc. Tool writers can instrument a >>>>>>> program-under-test simply by first writing a library that defines the >>>>>>> semantics of relevant hooks >>>>>>> and then statically linking their compiled library with the >>>>>>> program-under-test. >>>>>>> >>>>>>> At first glance, this brute-force method of inserting hooks at every >>>>>>> salient location in the program-under-test seems to be replete with >>>>>>> overheads. CSI overcomes these overheads through the use of >>>>>>> link-time-optimization (LTO), which is now readily available in most major >>>>>>> compilers, including GCC and LLVM. Using LTO, instrumentation hooks that >>>>>>> are not used by a particular tool can be elided, allowing the overheads of >>>>>>> these hooks to be avoided when the >>>>>>> >>>>>>> >>>>>>> I don't understand this flow: the front-end emits all the possible >>>>>>> instrumentation but the useless calls to the runtime will be removed during >>>>>>> the link? >>>>>>> It means that the final binary is specialized for a given tool >>>>>>> right? What is the advantage of generating this useless instrumentation in >>>>>>> the first place then? I'm missing a piece here... >>>>>>> >>>>>> >>>>>> Here's the idea. When a tool user, who has a program they want to >>>>>> instrument, compiles their program source into an object/bitcode, he can >>>>>> turn on the CSI compile-time pass to insert instrumentation hooks (function >>>>>> calls to instrumentation routines) throughout the IR of the program. >>>>>> Separately, a tool writer implements a particular tool by writing a library >>>>>> that defines the subset of instrumentation hooks she cares about. At link >>>>>> time, the object/bitcode of the program source is linked with the object >>>>>> file/bitcode of the tool, resulting in a tool-instrumented executable. >>>>>> When LTO is used at link time, unused instrumentation is elided, and >>>>>> additional optimizations can run on the instrumented program. (I'm happy >>>>>> to send you a nice picture that we have of this flow, if the mailing list >>>>>> doesn't mind.) >>>>>> >>>>>> >>>>>> Ok this is roughly what I had in mind. >>>>>> >>>>>> I still believe it is not great to rely on LTO, and better, it is not >>>>>> needed to achieve this result. >>>>>> >>>>>> For instance, I don't see why the "library" that defines the subset >>>>>> of instrumentation hooks used by this tool can't be fed during a regular >>>>>> compile, and the useless hook be eliminated at this point. >>>>>> Implementation detail, but in practice, instead of feeding the >>>>>> library itself, the "framework" that allows to generate the library for the >>>>>> tool writer can output a "configuration file" along side the library, and >>>>>> this configuration file is what is fed to the compiler and tells the >>>>>> instrumentation pass which of the hooks to generate. It sounds more >>>>>> efficient to me, and remove the dependency on LTO. >>>>>> I imagine there is a possible drawback that I'm missing right now... >>>>>> >>>>> >>>>> >>>>> I agree that the tool does not need to depend on full LTO. What is >>>>> needed is essentially an option or configuration such that the compiler can >>>>> find the bit code file(s) for the hooks during compilation time. It is >>>>> pretty much similar to how math function inlining can be done ... >>>>> >>>> >>>> I agree, and I would strongly prefer that the design worked like this >>>> rather than relying on LTO. >>>> >>>> The flag for loading bitcode already exists, and is called >>>> -mlink-bitcode-file. Projects such as libclc already use it, I believe. >>>> >>>> What might be useful is if CSI improved the infrastructure around >>>> -mlink-bitcode-file to make it more convenient to produce compatible >>>> bitcode files. libclc for example relies on a post-processing pass to >>>> change symbol linkage, and I think that can be avoided by changing symbol >>>> linkages as they are imported from the bitcode file. >>>> >>>> Peter >>>> >>>> David >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> The final binary is specialized to a given tool. One advantage of >>>>>> CSI, however, is that a single set of instrumentation covers the needs of a >>>>>> wide variety of tools, since different tools provide different >>>>>> implementations of the same hooks. The specialization of a binary to a >>>>>> given tool happens at link time. >>>>>> >>>>>> >>>>>>> >>>>>>> instrumented program-under-test is run. Furthermore, LTO can >>>>>>> optimize a tool's instrumentation within a program using traditional >>>>>>> compiler optimizations. Our initial study indicates that the use of LTO >>>>>>> does not unduly slow down the build time >>>>>>> >>>>>>> >>>>>>> This is a false claim: LTO has a very large overhead, and especially >>>>>>> is not parallel, so the more core you have the more the difference will be. >>>>>>> We frequently observes builds that are 3 times slower. Moreover, LTO is not >>>>>>> incremental friendly and during debug (which is very relevant with >>>>>>> sanitizer) rebuilding involves waiting for the full link to occur again. >>>>>>> >>>>>>> >>>>>> Can you please point us towards some projects where LTO incurs a 3x >>>>>> slowdown? We're interested in the overhead of LTO on build times, and >>>>>> although we've found LTO to incur more overhead on parallel build times >>>>>> than serial build times, as you mentioned, the overheads we've measured on >>>>>> serial or parallel builds have been less than 40% (which we saw when >>>>>> building the Apache HTTP server). >>>>>> >>>>>> >>>>>> I expect this to be reproducible on most non-trivial C/C++ programs. >>>>>> But taking clang as an example, just running `ninja clang` on OS X a >>>>>> not-so-recent 12-cores machine takes 970s with LTO and 252s without (and I >>>>>> believe this is without debug info...). >>>>>> Running just `ninja` to build all of llvm/clang here would take *a >>>>>> lot* longer with LTO, and not so much without. >>>>>> >>>>>> The LTO builds without assert >>>>>> >>>>>> Best, >>>>>> >>>>>> -- >>>>>> Mehdi >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> We've also designed CSI such that it does not depend on LTO for >>>>>> correctness; the program and tool will work correctly with ordinary ld. Of >>>>>> course, the downside of not using LTO is that instrumentation is not >>>>>> optimized, and in particular, unused instrumentation will incur overhead. >>>>>> >>>>>> >>>>>>> >>>>>>> -- >>>>>>> Mehdi >>>>>>> >>>>>>> , and the LTO can indeed optimize away unused hooks. One of our >>>>>>> experiments with Apache HTTP server shows that, compiling with CSI and >>>>>>> linking with the "null" CSI-tool (which consists solely of empty hooks) >>>>>>> slows down the build time of the Apache HTTP server by less than 40%, and >>>>>>> the resulting tool-instrumented executable is as fast as the original >>>>>>> uninstrumented code. >>>>>>> >>>>>>> >>>>>>> ===============>>>>>>> CSI version 1 >>>>>>> ===============>>>>>>> >>>>>>> The initial version of CSI supports a basic set of hooks that covers >>>>>>> the following categories of program objects: functions, function exits >>>>>>> (specifically, returns), basic blocks, call sites, loads, and stores. We >>>>>>> prioritized instrumenting these IR objects based on the need of seven >>>>>>> example CSI tools, including a race detector, a cache-reuse analyzer, and a >>>>>>> code-coverage analyzer. We plan to evolve the CSI API over time to be more >>>>>>> comprehensive, and we have designed the CSI API to be extensible, allowing >>>>>>> new instrumentation to be added as needs grow. We chose to initially >>>>>>> implement a minimal "core" set of hooks, because we felt it was best to add >>>>>>> new instrumentation on an as-needed basis in order to keep the interface >>>>>>> simple. >>>>>>> >>>>>>> There are three salient features about the design of CSI. First, >>>>>>> CSI assigns each instrumented program object a unique integer identifier >>>>>>> within one of the (currently) six program-object categories. Within each >>>>>>> category, the ID's are consecutively numbered from 0 up to the number of >>>>>>> such objects minus 1. The contiguous assignment of the ID's allows the >>>>>>> tool writer to easily keep track of IR objects in the program and iterate >>>>>>> through all objects in a category (whether the object is encountered during >>>>>>> execution or not). Second, CSI provides a platform-independent means to >>>>>>> relate a given program object to locations in the source code. >>>>>>> Specifically, CSI provides "front-end-data (FED)" tables, which provide >>>>>>> file name and source lines for each program object given the object's ID. >>>>>>> Third, each CSI hook takes in as a parameter a "property": a 64-bit >>>>>>> unsigned integer that CSI uses to export the results of compiler analyses >>>>>>> and other information known at compile time. The use of properties allow >>>>>>> the tool to rely on compiler analyses to optimize instrumentation and >>>>>>> decrease overhead. In particular, since the value of a property is known >>>>>>> at compile time, LTO can constant-fold the conditional test around a >>>>>>> property to elide unnecessary instrumentation. >>>>>>> >>>>>>> ===============>>>>>>> Future plan >>>>>>> ===============>>>>>>> >>>>>>> We plan to expand CSI in future versions by instrumenting additional >>>>>>> program objects, such as atomic instructions, floating-point instructions, >>>>>>> and exceptions. We are also planning to provide additional static >>>>>>> information to tool writers, both through information encoded in the >>>>>>> properties passed to hooks and by other means. In particular, we are also >>>>>>> looking at mechanisms to present tool writers with more complex static >>>>>>> information, such as how different program objects relate to each other, >>>>>>> e.g., which basic blocks belong to a given function. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> llvm-dev at lists.llvm.org >>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>>> >>>> >>>> >>>> -- >>>> -- >>>> Peter >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160620/f1215dba/attachment-0001.html>
Xinliang David Li via llvm-dev
2016-Jun-21 00:18 UTC
[llvm-dev] RFC: Comprehensive Static Instrumentation
On Mon, Jun 20, 2016 at 4:27 PM, Evgenii Stepanov <eugenis at google.com> wrote:> I like this non-LTO use model a lot. In my opinion, it does not > conflict with the LTO model - the only difference is when the > implementation of CSI hooks is injected into the build - compile time > vs link time. I don't see any problem with injecting it both at > compile and link time - this would take care of CSI hooks in third > party libraries. > > I also don't understand the need of the export list. As I see it, the > IR library of the tool is loaded at compilation time, internalized and > merged with the IR for the module being compiled. If any CSI hooks > remain undefined at this point, they are replaced with empty > functions.Yes, the IR library itself 'serves' as the export list in this case. Export list is needed when IR library is not even used. David> The IR tool library should not have any global state, of > course - it is an equivalent of the C++ header file. Exactly the same > happens at LTO link time, in case the link brought in new CSI hooks > unresolved at compilation time. > > ThinLTO uses a different build model - it's not compile+link, but > compile+merge+compile+link or something like that AFAIK. This requires > build system changes, and I imagine the extra work would not be worth > the benefits for many smaller projects. I'd love if CSI worked without > any kind of LTO with reasonable performance (btw, do you have the > numbers for the slowdown with an empty tool without LTO?). > > On Mon, Jun 20, 2016 at 4:00 PM, TB Schardl via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Hey David, > > > > Thank you for your feedback. I'm trying to understand the model you > > sketched in order to compare it to CSI's current approach, but the > details > > of your proposal are still fuzzy to me. In particular, although the > model > > you described would avoid using LTO to elide unused hooks, it seems more > > complicated for both tool writers and tool users to use. Please clarify > > your model and shed some light on the questions below. > > > > 1) What does the "CSI dev tool" you describe look like? In particular, > how > > does the tool writer produce and maintain an export list of non-stub hook > > definitions for her tool? Maintaining an export list manually would > seem to > > be an error-prone hassle. One could imagine generating the export list > > automatically with a custom compile-time pass that the tool-writer runs, > but > > maintaining consistency between a tool and its export list remains a > concern > > for both the tool writer and the tool user. > > > > 2) To clarify, in your scheme, the tool writer produces an export list as > > well as an object/bitcode for the tool. The tool user compiles the > > program-under-test using the export list, and then incorporates the tool > > object/bitcode at link time. Is this what you have in mind? > > > > 3) Do I understand correctly that your model only gets rid of the need > for > > LTO to elide unused instrumentation hooks? In particular, are other > > optimizations on the instrumentation still contingent on LTO (or > ThinLTO)? > > One nice feature of the current design is that tool writers can use > > properties passed to hooks to elide instrumentation conditionally based > on > > common static analysis. It looks like optimizations based on properties > are > > still possible in the model you propose, but as with the current design, > > they would still rely on LTO or ThinLTO. > > > > 4) In your model, if the tool user wants to analyze his program with > several > > different tools, then he must recompile his program from source once for > > each tool. Is this correct? The argument in this thread seems to be > that > > recompiling the program from source is no worse than using LTO because > LTO > > incurs high overhead. (From the results I've found online > > http://llvm.org/devmtg/2015-04/slides/ThinLTO_EuroLLVM2015.pdf, however, > > ThinLTO seems to be much lower overhead than LTO. Are these results > still > > accurate?) > > > > 5) What happens when the program-under-test is built from multiple source > > files? It seems that all source files must be compiled using the same > > export list, which is a burden on the tool user. Although this > complexity > > could be managed by the program's build system, managing this complexity > > through the build system or by some other means still seems like a > burden on > > the tool user. > > > > 6) What happens when the program-under-test uses third-party libraries? > If > > the tool user does not have access to the sources of those libraries, > then I > > don't see how they can properly elide hooks using an export list. At > best, > > it would seem that the library writer would distribute a object/bitcode > for > > the library with all hooks in place already, which is exactly what > happens > > with CSI's the existing design. > > > > Thanks again for your feedback. > > > > Cheers, > > TB > > > > On Sun, Jun 19, 2016 at 6:45 PM, Xinliang David Li <xinliangli at gmail.com > > > > wrote: > >> > >> It is great that CSI does not depend on lto for correctness, but I think > >> it should remove the dependency completely. Being able to work with lto > >> should be something that is supported, but lto is not required. > >> > >> Such a design will greatly increase CSI usability not only for tool > >> users(app developers), but also for CSI tool developers. > >> > >> My understanding is that it is not far to get there. Here is the CSI use > >> model I am thinking: > >> > >> 1) a CSI dev tool can be provided to tool developers to produce a) hook > >> library in native object format b) hook lib in bitcode format c) export > list > >> of non stub hook defs > >> > >> 2) the compiler implementation of CSI lowering will check the export > list > >> and decide whether to lower a hook call to noop or not > >> > >> 3) in thinlto mode, the bitcode can be readin for inlining. This works > for > >> lto too > >> > >> 4) compiler driver can hide all the above from the users > >> > >> > >> In short, making it work in default mode makes it easier to deploy and > can > >> get you a long way > >> > >> Thanks, > >> > >> David > >> > >> > >> On Sunday, June 19, 2016, TB Schardl <neboat at mit.edu> wrote: > >>> > >>> Hey Peter and David, > >>> > >>> Thank you for your comments. > >>> > >>> As mentioned elsewhere, the current design of CSI does not rely on LTO > >>> for correctness. The tool-instrumented executable will run correctly > even > >>> if the linker performs no optimization. In particular, unused > >>> instrumentation hooks are implemented by default as nop functions, > which > >>> just return immediately. CSI is a system that can use LTO to improve > tool > >>> performance; it does not require LTO to function. > >>> > >>> One of our considerations when developing CSI version 1 was design > >>> simplicity. As such, CSI version 1 essentially consists of three > >>> components: > >>> 1) A compile-time pass that the tool user runs to insert > instrumentation > >>> hooks. > >>> 2) A null-tool library that provides default nop implementations for > each > >>> instrumentation hook. When a tool writer implements a tool using the > CSI > >>> API, the tool writer's implemented hooks override the corresponding > default > >>> implementations. > >>> 3) A runtime library that implements certain powerful features of CSI, > >>> including contiguous sets of ID's for hooks. > >>> > >>> We've been thinking about how CSI might work with -mlink-bitcode-file. > >>> From our admittedly limited understanding of that feature, it seems > that a > >>> design that uses -mlink-bitcode-file would still require something > like the > >>> first and third components of the existing design. Additional > complexity > >>> might be needed to get CSI to work with -mlink-bitcode-file, but these > two > >>> components seem to be core to CSI, regardless of whether > -mlink-bitcode-file > >>> is used. (Eliminating the null-tool library amounts to eliminating a > pretty > >>> simple 39-line C file, which at first blush doesn't look like a big > win in > >>> design complexity). > >>> > >>> CSI focuses on making it easy for tool writers to create many > >>> dynamic-analysis tools. CSI can leverage standard compiler > optimizations to > >>> improve tool performance, if the tool user employs mechanisms such as > LTO or > >>> thinLTO, but LTO itself is not mandatory. It might be worthwhile to > explore > >>> other approaches with different trade-offs, such as > -mlink-bitcode-file, but > >>> the existing design doesn't preclude these approaches down the road, > and > >>> they will be able to share the same infrastructure. Unless the other > >>> approaches are dramatically simpler, the existing design seems like a > good > >>> place to start. > >>> > >>> Cheers, > >>> TB > >>> > >>> On Fri, Jun 17, 2016 at 4:25 PM, Peter Collingbourne via llvm-dev > >>> <llvm-dev at lists.llvm.org> wrote: > >>>> > >>>> > >>>> > >>>> On Thu, Jun 16, 2016 at 10:16 PM, Xinliang David Li via llvm-dev > >>>> <llvm-dev at lists.llvm.org> wrote: > >>>>> > >>>>> > >>>>> > >>>>> On Thu, Jun 16, 2016 at 3:27 PM, Mehdi Amini via llvm-dev > >>>>> <llvm-dev at lists.llvm.org> wrote: > >>>>>> > >>>>>> Hi TB, > >>>>>> > >>>>>> Thanks for you answer. > >>>>>> > >>>>>> On Jun 16, 2016, at 2:50 PM, TB Schardl <neboat at mit.edu> wrote: > >>>>>> > >>>>>> Hey Mehdi, > >>>>>> > >>>>>> Thank you for your comments. I've CC'd the CSI mailing list with > your > >>>>>> comments and put my responses inline. Please let me know any other > >>>>>> questions you have. > >>>>>> > >>>>>> Cheers, > >>>>>> TB > >>>>>> > >>>>>> On Thu, Jun 16, 2016 at 3:48 PM, Mehdi Amini <mehdi.amini at apple.com > > > >>>>>> wrote: > >>>>>>> > >>>>>>> > >>>>>>> On Jun 16, 2016, at 9:01 AM, TB Schardl via llvm-dev > >>>>>>> <llvm-dev at lists.llvm.org> wrote: > >>>>>>> > >>>>>>> Hey LLVM-dev, > >>>>>>> > >>>>>>> We propose to build the CSI framework to provide a comprehensive > >>>>>>> suite of compiler-inserted instrumentation hooks that > dynamic-analysis tools > >>>>>>> can use to observe and investigate program runtime behavior. > Traditionally, > >>>>>>> tools based on compiler instrumentation would each separately > modify the > >>>>>>> compiler to insert their own instrumentation. In contrast, CSI > inserts a > >>>>>>> standard collection of instrumentation hooks into the > program-under-test. > >>>>>>> Each CSI-tool is implemented as a library that defines relevant > hooks, and > >>>>>>> the remaining hooks are "nulled" out and elided during link-time > >>>>>>> optimization (LTO), resulting in instrumented runtimes on par with > custom > >>>>>>> instrumentation. CSI allows many compiler-based tools to be > written as > >>>>>>> simple libraries without modifying the compiler, greatly lowering > the bar > >>>>>>> for > >>>>>>> developing dynamic-analysis tools. > >>>>>>> > >>>>>>> ===============> >>>>>>> Motivation > >>>>>>> ===============> >>>>>>> > >>>>>>> Key to understanding and improving the behavior of any system is > >>>>>>> visibility -- the ability to know what is going on inside the > system. > >>>>>>> Various dynamic-analysis tools, such as race detectors, memory > checkers, > >>>>>>> cache simulators, call-graph generators, code-coverage analyzers, > and > >>>>>>> performance profilers, rely on compiler instrumentation to gain > visibility > >>>>>>> into the program behaviors during execution. With this approach, > the tool > >>>>>>> writer modifies the compiler to insert instrumentation code into > the > >>>>>>> program-under-test so that it can execute behind the scene while > the > >>>>>>> program-under-test runs. This approach, however, means that the > development > >>>>>>> of new tools requires compiler work, which many potential tool > writers are > >>>>>>> ill equipped to do, and thus raises the bar for building new and > innovative > >>>>>>> tools. > >>>>>>> > >>>>>>> The goal of the CSI framework is to provide comprehensive static > >>>>>>> instrumentation through the compiler, in order to simplify the > task of > >>>>>>> building efficient and effective platform-independent tools. The > CSI > >>>>>>> framework allows the tool writer to easily develop analysis tools > that > >>>>>>> require > >>>>>>> compiler instrumentation without needing to understand the compiler > >>>>>>> internals or modifying the compiler, which greatly lowers the bar > for > >>>>>>> developing dynamic-analysis tools. > >>>>>>> > >>>>>>> ===============> >>>>>>> Approach > >>>>>>> ===============> >>>>>>> > >>>>>>> The CSI framework inserts instrumentation hooks at salient > locations > >>>>>>> throughout the compiled code of a program-under-test, such as > function entry > >>>>>>> and exit points, basic-block entry and exit points, before and > after each > >>>>>>> memory operation, etc. Tool writers can instrument a > program-under-test > >>>>>>> simply by first writing a library that defines the semantics of > relevant > >>>>>>> hooks > >>>>>>> and then statically linking their compiled library with the > >>>>>>> program-under-test. > >>>>>>> > >>>>>>> At first glance, this brute-force method of inserting hooks at > every > >>>>>>> salient location in the program-under-test seems to be replete with > >>>>>>> overheads. CSI overcomes these overheads through the use of > >>>>>>> link-time-optimization (LTO), which is now readily available in > most major > >>>>>>> compilers, including GCC and LLVM. Using LTO, instrumentation > hooks that > >>>>>>> are not used by a particular tool can be elided, allowing the > overheads of > >>>>>>> these hooks to be avoided when the > >>>>>>> > >>>>>>> > >>>>>>> I don't understand this flow: the front-end emits all the possible > >>>>>>> instrumentation but the useless calls to the runtime will be > removed during > >>>>>>> the link? > >>>>>>> It means that the final binary is specialized for a given tool > right? > >>>>>>> What is the advantage of generating this useless instrumentation > in the > >>>>>>> first place then? I'm missing a piece here... > >>>>>> > >>>>>> > >>>>>> Here's the idea. When a tool user, who has a program they want to > >>>>>> instrument, compiles their program source into an object/bitcode, > he can > >>>>>> turn on the CSI compile-time pass to insert instrumentation hooks > (function > >>>>>> calls to instrumentation routines) throughout the IR of the program. > >>>>>> Separately, a tool writer implements a particular tool by writing a > library > >>>>>> that defines the subset of instrumentation hooks she cares about. > At link > >>>>>> time, the object/bitcode of the program source is linked with the > object > >>>>>> file/bitcode of the tool, resulting in a tool-instrumented > executable. When > >>>>>> LTO is used at link time, unused instrumentation is elided, and > additional > >>>>>> optimizations can run on the instrumented program. (I'm happy to > send you a > >>>>>> nice picture that we have of this flow, if the mailing list doesn't > mind.) > >>>>>> > >>>>>> > >>>>>> Ok this is roughly what I had in mind. > >>>>>> > >>>>>> I still believe it is not great to rely on LTO, and better, it is > not > >>>>>> needed to achieve this result. > >>>>>> > >>>>>> For instance, I don't see why the "library" that defines the subset > of > >>>>>> instrumentation hooks used by this tool can't be fed during a > regular > >>>>>> compile, and the useless hook be eliminated at this point. > >>>>>> Implementation detail, but in practice, instead of feeding the > library > >>>>>> itself, the "framework" that allows to generate the library for the > tool > >>>>>> writer can output a "configuration file" along side the library, > and this > >>>>>> configuration file is what is fed to the compiler and tells the > >>>>>> instrumentation pass which of the hooks to generate. It sounds more > >>>>>> efficient to me, and remove the dependency on LTO. > >>>>>> I imagine there is a possible drawback that I'm missing right now... > >>>>> > >>>>> > >>>>> > >>>>> I agree that the tool does not need to depend on full LTO. What is > >>>>> needed is essentially an option or configuration such that the > compiler can > >>>>> find the bit code file(s) for the hooks during compilation time. It > is > >>>>> pretty much similar to how math function inlining can be done ... > >>>> > >>>> > >>>> I agree, and I would strongly prefer that the design worked like this > >>>> rather than relying on LTO. > >>>> > >>>> The flag for loading bitcode already exists, and is called > >>>> -mlink-bitcode-file. Projects such as libclc already use it, I > believe. > >>>> > >>>> What might be useful is if CSI improved the infrastructure around > >>>> -mlink-bitcode-file to make it more convenient to produce compatible > bitcode > >>>> files. libclc for example relies on a post-processing pass to change > symbol > >>>> linkage, and I think that can be avoided by changing symbol linkages > as they > >>>> are imported from the bitcode file. > >>>> > >>>> Peter > >>>> > >>>>> David > >>>>> > >>>>> > >>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> The final binary is specialized to a given tool. One advantage of > >>>>>> CSI, however, is that a single set of instrumentation covers the > needs of a > >>>>>> wide variety of tools, since different tools provide different > >>>>>> implementations of the same hooks. The specialization of a binary > to a > >>>>>> given tool happens at link time. > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>> instrumented program-under-test is run. Furthermore, LTO can > >>>>>>> optimize a tool's instrumentation within a program using > traditional > >>>>>>> compiler optimizations. Our initial study indicates that the use > of LTO > >>>>>>> does not unduly slow down the build time > >>>>>>> > >>>>>>> > >>>>>>> This is a false claim: LTO has a very large overhead, and > especially > >>>>>>> is not parallel, so the more core you have the more the difference > will be. > >>>>>>> We frequently observes builds that are 3 times slower. Moreover, > LTO is not > >>>>>>> incremental friendly and during debug (which is very relevant with > >>>>>>> sanitizer) rebuilding involves waiting for the full link to occur > again. > >>>>>>> > >>>>>> > >>>>>> Can you please point us towards some projects where LTO incurs a 3x > >>>>>> slowdown? We're interested in the overhead of LTO on build times, > and > >>>>>> although we've found LTO to incur more overhead on parallel build > times than > >>>>>> serial build times, as you mentioned, the overheads we've measured > on serial > >>>>>> or parallel builds have been less than 40% (which we saw when > building the > >>>>>> Apache HTTP server). > >>>>>> > >>>>>> > >>>>>> I expect this to be reproducible on most non-trivial C/C++ programs. > >>>>>> But taking clang as an example, just running `ninja clang` on OS X a > >>>>>> not-so-recent 12-cores machine takes 970s with LTO and 252s without > (and I > >>>>>> believe this is without debug info...). > >>>>>> Running just `ninja` to build all of llvm/clang here would take *a > >>>>>> lot* longer with LTO, and not so much without. > >>>>>> > >>>>>> The LTO builds without assert > >>>>>> > >>>>>> Best, > >>>>>> > >>>>>> -- > >>>>>> Mehdi > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> We've also designed CSI such that it does not depend on LTO for > >>>>>> correctness; the program and tool will work correctly with ordinary > ld. Of > >>>>>> course, the downside of not using LTO is that instrumentation is not > >>>>>> optimized, and in particular, unused instrumentation will incur > overhead. > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Mehdi > >>>>>>> > >>>>>>> , and the LTO can indeed optimize away unused hooks. One of our > >>>>>>> experiments with Apache HTTP server shows that, compiling with CSI > and > >>>>>>> linking with the "null" CSI-tool (which consists solely of empty > hooks) > >>>>>>> slows down the build time of the Apache HTTP server by less than > 40%, and > >>>>>>> the resulting tool-instrumented executable is as fast as the > original > >>>>>>> uninstrumented code. > >>>>>>> > >>>>>>> > >>>>>>> ===============> >>>>>>> CSI version 1 > >>>>>>> ===============> >>>>>>> > >>>>>>> The initial version of CSI supports a basic set of hooks that > covers > >>>>>>> the following categories of program objects: functions, function > exits > >>>>>>> (specifically, returns), basic blocks, call sites, loads, and > stores. We > >>>>>>> prioritized instrumenting these IR objects based on the need of > seven > >>>>>>> example CSI tools, including a race detector, a cache-reuse > analyzer, and a > >>>>>>> code-coverage analyzer. We plan to evolve the CSI API over time > to be more > >>>>>>> comprehensive, and we have designed the CSI API to be extensible, > allowing > >>>>>>> new instrumentation to be added as needs grow. We chose to > initially > >>>>>>> implement a minimal "core" set of hooks, because we felt it was > best to add > >>>>>>> new instrumentation on an as-needed basis in order to keep the > interface > >>>>>>> simple. > >>>>>>> > >>>>>>> There are three salient features about the design of CSI. First, > CSI > >>>>>>> assigns each instrumented program object a unique integer > identifier within > >>>>>>> one of the (currently) six program-object categories. Within each > category, > >>>>>>> the ID's are consecutively numbered from 0 up to the number of > such objects > >>>>>>> minus 1. The contiguous assignment of the ID's allows the tool > writer to > >>>>>>> easily keep track of IR objects in the program and iterate through > all > >>>>>>> objects in a category (whether the object is encountered during > execution or > >>>>>>> not). Second, CSI provides a platform-independent means to relate > a given > >>>>>>> program object to locations in the source code. Specifically, CSI > provides > >>>>>>> "front-end-data (FED)" tables, which provide file name and source > lines for > >>>>>>> each program object given the object's ID. Third, each CSI hook > takes in as > >>>>>>> a parameter a "property": a 64-bit unsigned integer that CSI uses > to export > >>>>>>> the results of compiler analyses and other information known at > compile > >>>>>>> time. The use of properties allow the tool to rely on compiler > analyses to > >>>>>>> optimize instrumentation and decrease overhead. In particular, > since the > >>>>>>> value of a property is known at compile time, LTO can > constant-fold the > >>>>>>> conditional test around a property to elide unnecessary > instrumentation. > >>>>>>> > >>>>>>> ===============> >>>>>>> Future plan > >>>>>>> ===============> >>>>>>> > >>>>>>> We plan to expand CSI in future versions by instrumenting > additional > >>>>>>> program objects, such as atomic instructions, floating-point > instructions, > >>>>>>> and exceptions. We are also planning to provide additional static > >>>>>>> information to tool writers, both through information encoded in > the > >>>>>>> properties passed to hooks and by other means. In particular, we > are also > >>>>>>> looking at mechanisms to present tool writers with more complex > static > >>>>>>> information, such as how different program objects relate to each > other, > >>>>>>> e.g., which basic blocks belong to a given function. > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> LLVM Developers mailing list > >>>>>>> llvm-dev at lists.llvm.org > >>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> LLVM Developers mailing list > >>>>>> llvm-dev at lists.llvm.org > >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> LLVM Developers mailing list > >>>>> llvm-dev at lists.llvm.org > >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> -- > >>>> Peter > >>>> > >>>> _______________________________________________ > >>>> LLVM Developers mailing list > >>>> llvm-dev at lists.llvm.org > >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>>> > >>> > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160620/e3dcd1d9/attachment.html>
TB Schardl via llvm-dev
2016-Jun-23 02:36 UTC
[llvm-dev] RFC: Comprehensive Static Instrumentation
Hey David and Evgenii, Thank you for your responses; they clarify a lot. We like the idea of allowing tool users to optimize instrumentation without needing LTO, and based on our understanding of your model, we think that the existing CSI system can support this usage model with minimal changes. To describe the modification, I've attached a diagram of the compilation flow for CSI's existing design, assuming the tool-user enables LTO. (Gray rectangles delineate the concerns of the tool-user, the tool-writer, and the CSI-provided libraries. The tan shapes denote source, bitcode, or executable units for the program under test. The blue shapes denote units for the tool. Components of the CSI system itself are colored orange.) Conceptually, supporting the usage model of optimizing of instrumentation at compile time, a model we'll call CSI:CTO, involves a simple change. The simplest option seems to be to have the tool-user link in the tool bitcode (specifically, box O in the diagram) when compiling each source unit, just as Evgenii described, and then to run optimizations after the CSI pass. One problem with this approach is the introduction of multiple copies of the tool-writer's defined instrumentation hooks, which creates errors at static link time. Based on some experiments with -mlink-bitcode-file and Evgenii's suggestion, it seems that we can resolve these duplicate-symbol issues by setting the linkage of the CSI tool symbols to "linkonce_odr." Here is a sketch of the modifications to the existing compilation flow in order to support CSI:CTO: 1) The tool-user passes the null-default tool bitcode to the CSI pass, e.g., by issuing "clang -fcsi=<tool>.bc ..." to compile each source unit. (In the diagram, this flow corresponds to new edges from box O to boxes C and G.) 2) In addition to inserting instrumentation hooks throughout the IR of the translation-unit being compiled, the CSI pass also ensures that the symbols of <tool>.bc have "linkonce_odr" linkage. 3) Optimization passes run after the CSI pass to optimize the instrumentation, which includes eliding null hooks. 4) The tool bitcode no longer has to be linked in at link time. We believe that CSI:CTO achieves the effects compile-time-optimization effect you describe and is a simple addition to the existing CSI system. Is there any functionality in the scheme you propose that CSI:CTO seems to miss? Of course, we might have overlooked some implementation issues in this design sketch, but we plan to explore this scheme further and add support for CSI:CTO on top of the existing system. One issue with CSI:CTO is the risk that the tool-user incorporates the wrong tool by mistake when compiling some unit of his program. One advantage of using LTO/ThinLTO is that it avoids many potential instances of this issue by incorporating the tool bitcode at only one point in the compilation flow of the program-under-test. Nevertheless, the option of incorporating the tool and optimizing the instrumentation at compile time seems appealing. We can investigate techniques to avoid the inclusion of different tools in different units down the road. Cheers, TB On Mon, Jun 20, 2016 at 7:27 PM, Evgenii Stepanov <eugenis at google.com> wrote:> I like this non-LTO use model a lot. In my opinion, it does not > conflict with the LTO model - the only difference is when the > implementation of CSI hooks is injected into the build - compile time > vs link time. I don't see any problem with injecting it both at > compile and link time - this would take care of CSI hooks in third > party libraries. > > I also don't understand the need of the export list. As I see it, the > IR library of the tool is loaded at compilation time, internalized and > merged with the IR for the module being compiled. If any CSI hooks > remain undefined at this point, they are replaced with empty > functions. The IR tool library should not have any global state, of > course - it is an equivalent of the C++ header file. Exactly the same > happens at LTO link time, in case the link brought in new CSI hooks > unresolved at compilation time. > > ThinLTO uses a different build model - it's not compile+link, but > compile+merge+compile+link or something like that AFAIK. This requires > build system changes, and I imagine the extra work would not be worth > the benefits for many smaller projects. I'd love if CSI worked without > any kind of LTO with reasonable performance (btw, do you have the > numbers for the slowdown with an empty tool without LTO?). > > On Mon, Jun 20, 2016 at 4:00 PM, TB Schardl via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Hey David, > > > > Thank you for your feedback. I'm trying to understand the model you > > sketched in order to compare it to CSI's current approach, but the > details > > of your proposal are still fuzzy to me. In particular, although the > model > > you described would avoid using LTO to elide unused hooks, it seems more > > complicated for both tool writers and tool users to use. Please clarify > > your model and shed some light on the questions below. > > > > 1) What does the "CSI dev tool" you describe look like? In particular, > how > > does the tool writer produce and maintain an export list of non-stub hook > > definitions for her tool? Maintaining an export list manually would > seem to > > be an error-prone hassle. One could imagine generating the export list > > automatically with a custom compile-time pass that the tool-writer runs, > but > > maintaining consistency between a tool and its export list remains a > concern > > for both the tool writer and the tool user. > > > > 2) To clarify, in your scheme, the tool writer produces an export list as > > well as an object/bitcode for the tool. The tool user compiles the > > program-under-test using the export list, and then incorporates the tool > > object/bitcode at link time. Is this what you have in mind? > > > > 3) Do I understand correctly that your model only gets rid of the need > for > > LTO to elide unused instrumentation hooks? In particular, are other > > optimizations on the instrumentation still contingent on LTO (or > ThinLTO)? > > One nice feature of the current design is that tool writers can use > > properties passed to hooks to elide instrumentation conditionally based > on > > common static analysis. It looks like optimizations based on properties > are > > still possible in the model you propose, but as with the current design, > > they would still rely on LTO or ThinLTO. > > > > 4) In your model, if the tool user wants to analyze his program with > several > > different tools, then he must recompile his program from source once for > > each tool. Is this correct? The argument in this thread seems to be > that > > recompiling the program from source is no worse than using LTO because > LTO > > incurs high overhead. (From the results I've found online > > http://llvm.org/devmtg/2015-04/slides/ThinLTO_EuroLLVM2015.pdf, however, > > ThinLTO seems to be much lower overhead than LTO. Are these results > still > > accurate?) > > > > 5) What happens when the program-under-test is built from multiple source > > files? It seems that all source files must be compiled using the same > > export list, which is a burden on the tool user. Although this > complexity > > could be managed by the program's build system, managing this complexity > > through the build system or by some other means still seems like a > burden on > > the tool user. > > > > 6) What happens when the program-under-test uses third-party libraries? > If > > the tool user does not have access to the sources of those libraries, > then I > > don't see how they can properly elide hooks using an export list. At > best, > > it would seem that the library writer would distribute a object/bitcode > for > > the library with all hooks in place already, which is exactly what > happens > > with CSI's the existing design. > > > > Thanks again for your feedback. > > > > Cheers, > > TB > > > > On Sun, Jun 19, 2016 at 6:45 PM, Xinliang David Li <xinliangli at gmail.com > > > > wrote: > >> > >> It is great that CSI does not depend on lto for correctness, but I think > >> it should remove the dependency completely. Being able to work with lto > >> should be something that is supported, but lto is not required. > >> > >> Such a design will greatly increase CSI usability not only for tool > >> users(app developers), but also for CSI tool developers. > >> > >> My understanding is that it is not far to get there. Here is the CSI use > >> model I am thinking: > >> > >> 1) a CSI dev tool can be provided to tool developers to produce a) hook > >> library in native object format b) hook lib in bitcode format c) export > list > >> of non stub hook defs > >> > >> 2) the compiler implementation of CSI lowering will check the export > list > >> and decide whether to lower a hook call to noop or not > >> > >> 3) in thinlto mode, the bitcode can be readin for inlining. This works > for > >> lto too > >> > >> 4) compiler driver can hide all the above from the users > >> > >> > >> In short, making it work in default mode makes it easier to deploy and > can > >> get you a long way > >> > >> Thanks, > >> > >> David > >> > >> > >> On Sunday, June 19, 2016, TB Schardl <neboat at mit.edu> wrote: > >>> > >>> Hey Peter and David, > >>> > >>> Thank you for your comments. > >>> > >>> As mentioned elsewhere, the current design of CSI does not rely on LTO > >>> for correctness. The tool-instrumented executable will run correctly > even > >>> if the linker performs no optimization. In particular, unused > >>> instrumentation hooks are implemented by default as nop functions, > which > >>> just return immediately. CSI is a system that can use LTO to improve > tool > >>> performance; it does not require LTO to function. > >>> > >>> One of our considerations when developing CSI version 1 was design > >>> simplicity. As such, CSI version 1 essentially consists of three > >>> components: > >>> 1) A compile-time pass that the tool user runs to insert > instrumentation > >>> hooks. > >>> 2) A null-tool library that provides default nop implementations for > each > >>> instrumentation hook. When a tool writer implements a tool using the > CSI > >>> API, the tool writer's implemented hooks override the corresponding > default > >>> implementations. > >>> 3) A runtime library that implements certain powerful features of CSI, > >>> including contiguous sets of ID's for hooks. > >>> > >>> We've been thinking about how CSI might work with -mlink-bitcode-file. > >>> From our admittedly limited understanding of that feature, it seems > that a > >>> design that uses -mlink-bitcode-file would still require something > like the > >>> first and third components of the existing design. Additional > complexity > >>> might be needed to get CSI to work with -mlink-bitcode-file, but these > two > >>> components seem to be core to CSI, regardless of whether > -mlink-bitcode-file > >>> is used. (Eliminating the null-tool library amounts to eliminating a > pretty > >>> simple 39-line C file, which at first blush doesn't look like a big > win in > >>> design complexity). > >>> > >>> CSI focuses on making it easy for tool writers to create many > >>> dynamic-analysis tools. CSI can leverage standard compiler > optimizations to > >>> improve tool performance, if the tool user employs mechanisms such as > LTO or > >>> thinLTO, but LTO itself is not mandatory. It might be worthwhile to > explore > >>> other approaches with different trade-offs, such as > -mlink-bitcode-file, but > >>> the existing design doesn't preclude these approaches down the road, > and > >>> they will be able to share the same infrastructure. Unless the other > >>> approaches are dramatically simpler, the existing design seems like a > good > >>> place to start. > >>> > >>> Cheers, > >>> TB > >>> > >>> On Fri, Jun 17, 2016 at 4:25 PM, Peter Collingbourne via llvm-dev > >>> <llvm-dev at lists.llvm.org> wrote: > >>>> > >>>> > >>>> > >>>> On Thu, Jun 16, 2016 at 10:16 PM, Xinliang David Li via llvm-dev > >>>> <llvm-dev at lists.llvm.org> wrote: > >>>>> > >>>>> > >>>>> > >>>>> On Thu, Jun 16, 2016 at 3:27 PM, Mehdi Amini via llvm-dev > >>>>> <llvm-dev at lists.llvm.org> wrote: > >>>>>> > >>>>>> Hi TB, > >>>>>> > >>>>>> Thanks for you answer. > >>>>>> > >>>>>> On Jun 16, 2016, at 2:50 PM, TB Schardl <neboat at mit.edu> wrote: > >>>>>> > >>>>>> Hey Mehdi, > >>>>>> > >>>>>> Thank you for your comments. I've CC'd the CSI mailing list with > your > >>>>>> comments and put my responses inline. Please let me know any other > >>>>>> questions you have. > >>>>>> > >>>>>> Cheers, > >>>>>> TB > >>>>>> > >>>>>> On Thu, Jun 16, 2016 at 3:48 PM, Mehdi Amini <mehdi.amini at apple.com > > > >>>>>> wrote: > >>>>>>> > >>>>>>> > >>>>>>> On Jun 16, 2016, at 9:01 AM, TB Schardl via llvm-dev > >>>>>>> <llvm-dev at lists.llvm.org> wrote: > >>>>>>> > >>>>>>> Hey LLVM-dev, > >>>>>>> > >>>>>>> We propose to build the CSI framework to provide a comprehensive > >>>>>>> suite of compiler-inserted instrumentation hooks that > dynamic-analysis tools > >>>>>>> can use to observe and investigate program runtime behavior. > Traditionally, > >>>>>>> tools based on compiler instrumentation would each separately > modify the > >>>>>>> compiler to insert their own instrumentation. In contrast, CSI > inserts a > >>>>>>> standard collection of instrumentation hooks into the > program-under-test. > >>>>>>> Each CSI-tool is implemented as a library that defines relevant > hooks, and > >>>>>>> the remaining hooks are "nulled" out and elided during link-time > >>>>>>> optimization (LTO), resulting in instrumented runtimes on par with > custom > >>>>>>> instrumentation. CSI allows many compiler-based tools to be > written as > >>>>>>> simple libraries without modifying the compiler, greatly lowering > the bar > >>>>>>> for > >>>>>>> developing dynamic-analysis tools. > >>>>>>> > >>>>>>> ===============> >>>>>>> Motivation > >>>>>>> ===============> >>>>>>> > >>>>>>> Key to understanding and improving the behavior of any system is > >>>>>>> visibility -- the ability to know what is going on inside the > system. > >>>>>>> Various dynamic-analysis tools, such as race detectors, memory > checkers, > >>>>>>> cache simulators, call-graph generators, code-coverage analyzers, > and > >>>>>>> performance profilers, rely on compiler instrumentation to gain > visibility > >>>>>>> into the program behaviors during execution. With this approach, > the tool > >>>>>>> writer modifies the compiler to insert instrumentation code into > the > >>>>>>> program-under-test so that it can execute behind the scene while > the > >>>>>>> program-under-test runs. This approach, however, means that the > development > >>>>>>> of new tools requires compiler work, which many potential tool > writers are > >>>>>>> ill equipped to do, and thus raises the bar for building new and > innovative > >>>>>>> tools. > >>>>>>> > >>>>>>> The goal of the CSI framework is to provide comprehensive static > >>>>>>> instrumentation through the compiler, in order to simplify the > task of > >>>>>>> building efficient and effective platform-independent tools. The > CSI > >>>>>>> framework allows the tool writer to easily develop analysis tools > that > >>>>>>> require > >>>>>>> compiler instrumentation without needing to understand the compiler > >>>>>>> internals or modifying the compiler, which greatly lowers the bar > for > >>>>>>> developing dynamic-analysis tools. > >>>>>>> > >>>>>>> ===============> >>>>>>> Approach > >>>>>>> ===============> >>>>>>> > >>>>>>> The CSI framework inserts instrumentation hooks at salient > locations > >>>>>>> throughout the compiled code of a program-under-test, such as > function entry > >>>>>>> and exit points, basic-block entry and exit points, before and > after each > >>>>>>> memory operation, etc. Tool writers can instrument a > program-under-test > >>>>>>> simply by first writing a library that defines the semantics of > relevant > >>>>>>> hooks > >>>>>>> and then statically linking their compiled library with the > >>>>>>> program-under-test. > >>>>>>> > >>>>>>> At first glance, this brute-force method of inserting hooks at > every > >>>>>>> salient location in the program-under-test seems to be replete with > >>>>>>> overheads. CSI overcomes these overheads through the use of > >>>>>>> link-time-optimization (LTO), which is now readily available in > most major > >>>>>>> compilers, including GCC and LLVM. Using LTO, instrumentation > hooks that > >>>>>>> are not used by a particular tool can be elided, allowing the > overheads of > >>>>>>> these hooks to be avoided when the > >>>>>>> > >>>>>>> > >>>>>>> I don't understand this flow: the front-end emits all the possible > >>>>>>> instrumentation but the useless calls to the runtime will be > removed during > >>>>>>> the link? > >>>>>>> It means that the final binary is specialized for a given tool > right? > >>>>>>> What is the advantage of generating this useless instrumentation > in the > >>>>>>> first place then? I'm missing a piece here... > >>>>>> > >>>>>> > >>>>>> Here's the idea. When a tool user, who has a program they want to > >>>>>> instrument, compiles their program source into an object/bitcode, > he can > >>>>>> turn on the CSI compile-time pass to insert instrumentation hooks > (function > >>>>>> calls to instrumentation routines) throughout the IR of the program. > >>>>>> Separately, a tool writer implements a particular tool by writing a > library > >>>>>> that defines the subset of instrumentation hooks she cares about. > At link > >>>>>> time, the object/bitcode of the program source is linked with the > object > >>>>>> file/bitcode of the tool, resulting in a tool-instrumented > executable. When > >>>>>> LTO is used at link time, unused instrumentation is elided, and > additional > >>>>>> optimizations can run on the instrumented program. (I'm happy to > send you a > >>>>>> nice picture that we have of this flow, if the mailing list doesn't > mind.) > >>>>>> > >>>>>> > >>>>>> Ok this is roughly what I had in mind. > >>>>>> > >>>>>> I still believe it is not great to rely on LTO, and better, it is > not > >>>>>> needed to achieve this result. > >>>>>> > >>>>>> For instance, I don't see why the "library" that defines the subset > of > >>>>>> instrumentation hooks used by this tool can't be fed during a > regular > >>>>>> compile, and the useless hook be eliminated at this point. > >>>>>> Implementation detail, but in practice, instead of feeding the > library > >>>>>> itself, the "framework" that allows to generate the library for the > tool > >>>>>> writer can output a "configuration file" along side the library, > and this > >>>>>> configuration file is what is fed to the compiler and tells the > >>>>>> instrumentation pass which of the hooks to generate. It sounds more > >>>>>> efficient to me, and remove the dependency on LTO. > >>>>>> I imagine there is a possible drawback that I'm missing right now... > >>>>> > >>>>> > >>>>> > >>>>> I agree that the tool does not need to depend on full LTO. What is > >>>>> needed is essentially an option or configuration such that the > compiler can > >>>>> find the bit code file(s) for the hooks during compilation time. It > is > >>>>> pretty much similar to how math function inlining can be done ... > >>>> > >>>> > >>>> I agree, and I would strongly prefer that the design worked like this > >>>> rather than relying on LTO. > >>>> > >>>> The flag for loading bitcode already exists, and is called > >>>> -mlink-bitcode-file. Projects such as libclc already use it, I > believe. > >>>> > >>>> What might be useful is if CSI improved the infrastructure around > >>>> -mlink-bitcode-file to make it more convenient to produce compatible > bitcode > >>>> files. libclc for example relies on a post-processing pass to change > symbol > >>>> linkage, and I think that can be avoided by changing symbol linkages > as they > >>>> are imported from the bitcode file. > >>>> > >>>> Peter > >>>> > >>>>> David > >>>>> > >>>>> > >>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> The final binary is specialized to a given tool. One advantage of > >>>>>> CSI, however, is that a single set of instrumentation covers the > needs of a > >>>>>> wide variety of tools, since different tools provide different > >>>>>> implementations of the same hooks. The specialization of a binary > to a > >>>>>> given tool happens at link time. > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>> instrumented program-under-test is run. Furthermore, LTO can > >>>>>>> optimize a tool's instrumentation within a program using > traditional > >>>>>>> compiler optimizations. Our initial study indicates that the use > of LTO > >>>>>>> does not unduly slow down the build time > >>>>>>> > >>>>>>> > >>>>>>> This is a false claim: LTO has a very large overhead, and > especially > >>>>>>> is not parallel, so the more core you have the more the difference > will be. > >>>>>>> We frequently observes builds that are 3 times slower. Moreover, > LTO is not > >>>>>>> incremental friendly and during debug (which is very relevant with > >>>>>>> sanitizer) rebuilding involves waiting for the full link to occur > again. > >>>>>>> > >>>>>> > >>>>>> Can you please point us towards some projects where LTO incurs a 3x > >>>>>> slowdown? We're interested in the overhead of LTO on build times, > and > >>>>>> although we've found LTO to incur more overhead on parallel build > times than > >>>>>> serial build times, as you mentioned, the overheads we've measured > on serial > >>>>>> or parallel builds have been less than 40% (which we saw when > building the > >>>>>> Apache HTTP server). > >>>>>> > >>>>>> > >>>>>> I expect this to be reproducible on most non-trivial C/C++ programs. > >>>>>> But taking clang as an example, just running `ninja clang` on OS X a > >>>>>> not-so-recent 12-cores machine takes 970s with LTO and 252s without > (and I > >>>>>> believe this is without debug info...). > >>>>>> Running just `ninja` to build all of llvm/clang here would take *a > >>>>>> lot* longer with LTO, and not so much without. > >>>>>> > >>>>>> The LTO builds without assert > >>>>>> > >>>>>> Best, > >>>>>> > >>>>>> -- > >>>>>> Mehdi > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> We've also designed CSI such that it does not depend on LTO for > >>>>>> correctness; the program and tool will work correctly with ordinary > ld. Of > >>>>>> course, the downside of not using LTO is that instrumentation is not > >>>>>> optimized, and in particular, unused instrumentation will incur > overhead. > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Mehdi > >>>>>>> > >>>>>>> , and the LTO can indeed optimize away unused hooks. One of our > >>>>>>> experiments with Apache HTTP server shows that, compiling with CSI > and > >>>>>>> linking with the "null" CSI-tool (which consists solely of empty > hooks) > >>>>>>> slows down the build time of the Apache HTTP server by less than > 40%, and > >>>>>>> the resulting tool-instrumented executable is as fast as the > original > >>>>>>> uninstrumented code. > >>>>>>> > >>>>>>> > >>>>>>> ===============> >>>>>>> CSI version 1 > >>>>>>> ===============> >>>>>>> > >>>>>>> The initial version of CSI supports a basic set of hooks that > covers > >>>>>>> the following categories of program objects: functions, function > exits > >>>>>>> (specifically, returns), basic blocks, call sites, loads, and > stores. We > >>>>>>> prioritized instrumenting these IR objects based on the need of > seven > >>>>>>> example CSI tools, including a race detector, a cache-reuse > analyzer, and a > >>>>>>> code-coverage analyzer. We plan to evolve the CSI API over time > to be more > >>>>>>> comprehensive, and we have designed the CSI API to be extensible, > allowing > >>>>>>> new instrumentation to be added as needs grow. We chose to > initially > >>>>>>> implement a minimal "core" set of hooks, because we felt it was > best to add > >>>>>>> new instrumentation on an as-needed basis in order to keep the > interface > >>>>>>> simple. > >>>>>>> > >>>>>>> There are three salient features about the design of CSI. First, > CSI > >>>>>>> assigns each instrumented program object a unique integer > identifier within > >>>>>>> one of the (currently) six program-object categories. Within each > category, > >>>>>>> the ID's are consecutively numbered from 0 up to the number of > such objects > >>>>>>> minus 1. The contiguous assignment of the ID's allows the tool > writer to > >>>>>>> easily keep track of IR objects in the program and iterate through > all > >>>>>>> objects in a category (whether the object is encountered during > execution or > >>>>>>> not). Second, CSI provides a platform-independent means to relate > a given > >>>>>>> program object to locations in the source code. Specifically, CSI > provides > >>>>>>> "front-end-data (FED)" tables, which provide file name and source > lines for > >>>>>>> each program object given the object's ID. Third, each CSI hook > takes in as > >>>>>>> a parameter a "property": a 64-bit unsigned integer that CSI uses > to export > >>>>>>> the results of compiler analyses and other information known at > compile > >>>>>>> time. The use of properties allow the tool to rely on compiler > analyses to > >>>>>>> optimize instrumentation and decrease overhead. In particular, > since the > >>>>>>> value of a property is known at compile time, LTO can > constant-fold the > >>>>>>> conditional test around a property to elide unnecessary > instrumentation. > >>>>>>> > >>>>>>> ===============> >>>>>>> Future plan > >>>>>>> ===============> >>>>>>> > >>>>>>> We plan to expand CSI in future versions by instrumenting > additional > >>>>>>> program objects, such as atomic instructions, floating-point > instructions, > >>>>>>> and exceptions. We are also planning to provide additional static > >>>>>>> information to tool writers, both through information encoded in > the > >>>>>>> properties passed to hooks and by other means. In particular, we > are also > >>>>>>> looking at mechanisms to present tool writers with more complex > static > >>>>>>> information, such as how different program objects relate to each > other, > >>>>>>> e.g., which basic blocks belong to a given function. > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> LLVM Developers mailing list > >>>>>>> llvm-dev at lists.llvm.org > >>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> LLVM Developers mailing list > >>>>>> llvm-dev at lists.llvm.org > >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> LLVM Developers mailing list > >>>>> llvm-dev at lists.llvm.org > >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> -- > >>>> Peter > >>>> > >>>> _______________________________________________ > >>>> LLVM Developers mailing list > >>>> llvm-dev at lists.llvm.org > >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>>> > >>> > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160622/1be4a497/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: CSI-arch.pdf Type: application/pdf Size: 110039 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160622/1be4a497/attachment-0001.pdf>
Mehdi Amini via llvm-dev
2016-Jun-23 03:56 UTC
[llvm-dev] RFC: Comprehensive Static Instrumentation
> On Jun 20, 2016, at 7:27 PM, Evgenii Stepanov via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I like this non-LTO use model a lot. In my opinion, it does not > conflict with the LTO model - the only difference is when the > implementation of CSI hooks is injected into the build - compile time > vs link time. I don't see any problem with injecting it both at > compile and link time - this would take care of CSI hooks in third > party libraries. > > I also don't understand the need of the export list. As I see it, the > IR library of the tool is loaded at compilation time, internalized and > merged with the IR for the module being compiled. If any CSI hooks > remain undefined at this point, they are replaced with empty > functions. The IR tool library should not have any global state, of > course - it is an equivalent of the C++ header file. Exactly the same > happens at LTO link time, in case the link brought in new CSI hooks > unresolved at compilation time. > > ThinLTO uses a different build model - it's not compile+link, but > compile+merge+compile+link or something like that AFAIK. This requires > build system changes,No, ThinLTO does not require any build system change, the extra stage are hidden in the linker plugin implementation, so that it appears exactly as the regular LTO. — Mehdi> and I imagine the extra work would not be worth > the benefits for many smaller projects. I'd love if CSI worked without > any kind of LTO with reasonable performance (btw, do you have the > numbers for the slowdown with an empty tool without LTO?). > > On Mon, Jun 20, 2016 at 4:00 PM, TB Schardl via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Hey David, >> >> Thank you for your feedback. I'm trying to understand the model you >> sketched in order to compare it to CSI's current approach, but the details >> of your proposal are still fuzzy to me. In particular, although the model >> you described would avoid using LTO to elide unused hooks, it seems more >> complicated for both tool writers and tool users to use. Please clarify >> your model and shed some light on the questions below. >> >> 1) What does the "CSI dev tool" you describe look like? In particular, how >> does the tool writer produce and maintain an export list of non-stub hook >> definitions for her tool? Maintaining an export list manually would seem to >> be an error-prone hassle. One could imagine generating the export list >> automatically with a custom compile-time pass that the tool-writer runs, but >> maintaining consistency between a tool and its export list remains a concern >> for both the tool writer and the tool user. >> >> 2) To clarify, in your scheme, the tool writer produces an export list as >> well as an object/bitcode for the tool. The tool user compiles the >> program-under-test using the export list, and then incorporates the tool >> object/bitcode at link time. Is this what you have in mind? >> >> 3) Do I understand correctly that your model only gets rid of the need for >> LTO to elide unused instrumentation hooks? In particular, are other >> optimizations on the instrumentation still contingent on LTO (or ThinLTO)? >> One nice feature of the current design is that tool writers can use >> properties passed to hooks to elide instrumentation conditionally based on >> common static analysis. It looks like optimizations based on properties are >> still possible in the model you propose, but as with the current design, >> they would still rely on LTO or ThinLTO. >> >> 4) In your model, if the tool user wants to analyze his program with several >> different tools, then he must recompile his program from source once for >> each tool. Is this correct? The argument in this thread seems to be that >> recompiling the program from source is no worse than using LTO because LTO >> incurs high overhead. (From the results I've found online >> http://llvm.org/devmtg/2015-04/slides/ThinLTO_EuroLLVM2015.pdf, however, >> ThinLTO seems to be much lower overhead than LTO. Are these results still >> accurate?) >> >> 5) What happens when the program-under-test is built from multiple source >> files? It seems that all source files must be compiled using the same >> export list, which is a burden on the tool user. Although this complexity >> could be managed by the program's build system, managing this complexity >> through the build system or by some other means still seems like a burden on >> the tool user. >> >> 6) What happens when the program-under-test uses third-party libraries? If >> the tool user does not have access to the sources of those libraries, then I >> don't see how they can properly elide hooks using an export list. At best, >> it would seem that the library writer would distribute a object/bitcode for >> the library with all hooks in place already, which is exactly what happens >> with CSI's the existing design. >> >> Thanks again for your feedback. >> >> Cheers, >> TB >> >> On Sun, Jun 19, 2016 at 6:45 PM, Xinliang David Li <xinliangli at gmail.com> >> wrote: >>> >>> It is great that CSI does not depend on lto for correctness, but I think >>> it should remove the dependency completely. Being able to work with lto >>> should be something that is supported, but lto is not required. >>> >>> Such a design will greatly increase CSI usability not only for tool >>> users(app developers), but also for CSI tool developers. >>> >>> My understanding is that it is not far to get there. Here is the CSI use >>> model I am thinking: >>> >>> 1) a CSI dev tool can be provided to tool developers to produce a) hook >>> library in native object format b) hook lib in bitcode format c) export list >>> of non stub hook defs >>> >>> 2) the compiler implementation of CSI lowering will check the export list >>> and decide whether to lower a hook call to noop or not >>> >>> 3) in thinlto mode, the bitcode can be readin for inlining. This works for >>> lto too >>> >>> 4) compiler driver can hide all the above from the users >>> >>> >>> In short, making it work in default mode makes it easier to deploy and can >>> get you a long way >>> >>> Thanks, >>> >>> David >>> >>> >>> On Sunday, June 19, 2016, TB Schardl <neboat at mit.edu> wrote: >>>> >>>> Hey Peter and David, >>>> >>>> Thank you for your comments. >>>> >>>> As mentioned elsewhere, the current design of CSI does not rely on LTO >>>> for correctness. The tool-instrumented executable will run correctly even >>>> if the linker performs no optimization. In particular, unused >>>> instrumentation hooks are implemented by default as nop functions, which >>>> just return immediately. CSI is a system that can use LTO to improve tool >>>> performance; it does not require LTO to function. >>>> >>>> One of our considerations when developing CSI version 1 was design >>>> simplicity. As such, CSI version 1 essentially consists of three >>>> components: >>>> 1) A compile-time pass that the tool user runs to insert instrumentation >>>> hooks. >>>> 2) A null-tool library that provides default nop implementations for each >>>> instrumentation hook. When a tool writer implements a tool using the CSI >>>> API, the tool writer's implemented hooks override the corresponding default >>>> implementations. >>>> 3) A runtime library that implements certain powerful features of CSI, >>>> including contiguous sets of ID's for hooks. >>>> >>>> We've been thinking about how CSI might work with -mlink-bitcode-file. >>>> From our admittedly limited understanding of that feature, it seems that a >>>> design that uses -mlink-bitcode-file would still require something like the >>>> first and third components of the existing design. Additional complexity >>>> might be needed to get CSI to work with -mlink-bitcode-file, but these two >>>> components seem to be core to CSI, regardless of whether -mlink-bitcode-file >>>> is used. (Eliminating the null-tool library amounts to eliminating a pretty >>>> simple 39-line C file, which at first blush doesn't look like a big win in >>>> design complexity). >>>> >>>> CSI focuses on making it easy for tool writers to create many >>>> dynamic-analysis tools. CSI can leverage standard compiler optimizations to >>>> improve tool performance, if the tool user employs mechanisms such as LTO or >>>> thinLTO, but LTO itself is not mandatory. It might be worthwhile to explore >>>> other approaches with different trade-offs, such as -mlink-bitcode-file, but >>>> the existing design doesn't preclude these approaches down the road, and >>>> they will be able to share the same infrastructure. Unless the other >>>> approaches are dramatically simpler, the existing design seems like a good >>>> place to start. >>>> >>>> Cheers, >>>> TB >>>> >>>> On Fri, Jun 17, 2016 at 4:25 PM, Peter Collingbourne via llvm-dev >>>> <llvm-dev at lists.llvm.org> wrote: >>>>> >>>>> >>>>> >>>>> On Thu, Jun 16, 2016 at 10:16 PM, Xinliang David Li via llvm-dev >>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jun 16, 2016 at 3:27 PM, Mehdi Amini via llvm-dev >>>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>>> >>>>>>> Hi TB, >>>>>>> >>>>>>> Thanks for you answer. >>>>>>> >>>>>>> On Jun 16, 2016, at 2:50 PM, TB Schardl <neboat at mit.edu> wrote: >>>>>>> >>>>>>> Hey Mehdi, >>>>>>> >>>>>>> Thank you for your comments. I've CC'd the CSI mailing list with your >>>>>>> comments and put my responses inline. Please let me know any other >>>>>>> questions you have. >>>>>>> >>>>>>> Cheers, >>>>>>> TB >>>>>>> >>>>>>> On Thu, Jun 16, 2016 at 3:48 PM, Mehdi Amini <mehdi.amini at apple.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> On Jun 16, 2016, at 9:01 AM, TB Schardl via llvm-dev >>>>>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>>>> >>>>>>>> Hey LLVM-dev, >>>>>>>> >>>>>>>> We propose to build the CSI framework to provide a comprehensive >>>>>>>> suite of compiler-inserted instrumentation hooks that dynamic-analysis tools >>>>>>>> can use to observe and investigate program runtime behavior. Traditionally, >>>>>>>> tools based on compiler instrumentation would each separately modify the >>>>>>>> compiler to insert their own instrumentation. In contrast, CSI inserts a >>>>>>>> standard collection of instrumentation hooks into the program-under-test. >>>>>>>> Each CSI-tool is implemented as a library that defines relevant hooks, and >>>>>>>> the remaining hooks are "nulled" out and elided during link-time >>>>>>>> optimization (LTO), resulting in instrumented runtimes on par with custom >>>>>>>> instrumentation. CSI allows many compiler-based tools to be written as >>>>>>>> simple libraries without modifying the compiler, greatly lowering the bar >>>>>>>> for >>>>>>>> developing dynamic-analysis tools. >>>>>>>> >>>>>>>> ===============>>>>>>>> Motivation >>>>>>>> ===============>>>>>>>> >>>>>>>> Key to understanding and improving the behavior of any system is >>>>>>>> visibility -- the ability to know what is going on inside the system. >>>>>>>> Various dynamic-analysis tools, such as race detectors, memory checkers, >>>>>>>> cache simulators, call-graph generators, code-coverage analyzers, and >>>>>>>> performance profilers, rely on compiler instrumentation to gain visibility >>>>>>>> into the program behaviors during execution. With this approach, the tool >>>>>>>> writer modifies the compiler to insert instrumentation code into the >>>>>>>> program-under-test so that it can execute behind the scene while the >>>>>>>> program-under-test runs. This approach, however, means that the development >>>>>>>> of new tools requires compiler work, which many potential tool writers are >>>>>>>> ill equipped to do, and thus raises the bar for building new and innovative >>>>>>>> tools. >>>>>>>> >>>>>>>> The goal of the CSI framework is to provide comprehensive static >>>>>>>> instrumentation through the compiler, in order to simplify the task of >>>>>>>> building efficient and effective platform-independent tools. The CSI >>>>>>>> framework allows the tool writer to easily develop analysis tools that >>>>>>>> require >>>>>>>> compiler instrumentation without needing to understand the compiler >>>>>>>> internals or modifying the compiler, which greatly lowers the bar for >>>>>>>> developing dynamic-analysis tools. >>>>>>>> >>>>>>>> ===============>>>>>>>> Approach >>>>>>>> ===============>>>>>>>> >>>>>>>> The CSI framework inserts instrumentation hooks at salient locations >>>>>>>> throughout the compiled code of a program-under-test, such as function entry >>>>>>>> and exit points, basic-block entry and exit points, before and after each >>>>>>>> memory operation, etc. Tool writers can instrument a program-under-test >>>>>>>> simply by first writing a library that defines the semantics of relevant >>>>>>>> hooks >>>>>>>> and then statically linking their compiled library with the >>>>>>>> program-under-test. >>>>>>>> >>>>>>>> At first glance, this brute-force method of inserting hooks at every >>>>>>>> salient location in the program-under-test seems to be replete with >>>>>>>> overheads. CSI overcomes these overheads through the use of >>>>>>>> link-time-optimization (LTO), which is now readily available in most major >>>>>>>> compilers, including GCC and LLVM. Using LTO, instrumentation hooks that >>>>>>>> are not used by a particular tool can be elided, allowing the overheads of >>>>>>>> these hooks to be avoided when the >>>>>>>> >>>>>>>> >>>>>>>> I don't understand this flow: the front-end emits all the possible >>>>>>>> instrumentation but the useless calls to the runtime will be removed during >>>>>>>> the link? >>>>>>>> It means that the final binary is specialized for a given tool right? >>>>>>>> What is the advantage of generating this useless instrumentation in the >>>>>>>> first place then? I'm missing a piece here... >>>>>>> >>>>>>> >>>>>>> Here's the idea. When a tool user, who has a program they want to >>>>>>> instrument, compiles their program source into an object/bitcode, he can >>>>>>> turn on the CSI compile-time pass to insert instrumentation hooks (function >>>>>>> calls to instrumentation routines) throughout the IR of the program. >>>>>>> Separately, a tool writer implements a particular tool by writing a library >>>>>>> that defines the subset of instrumentation hooks she cares about. At link >>>>>>> time, the object/bitcode of the program source is linked with the object >>>>>>> file/bitcode of the tool, resulting in a tool-instrumented executable. When >>>>>>> LTO is used at link time, unused instrumentation is elided, and additional >>>>>>> optimizations can run on the instrumented program. (I'm happy to send you a >>>>>>> nice picture that we have of this flow, if the mailing list doesn't mind.) >>>>>>> >>>>>>> >>>>>>> Ok this is roughly what I had in mind. >>>>>>> >>>>>>> I still believe it is not great to rely on LTO, and better, it is not >>>>>>> needed to achieve this result. >>>>>>> >>>>>>> For instance, I don't see why the "library" that defines the subset of >>>>>>> instrumentation hooks used by this tool can't be fed during a regular >>>>>>> compile, and the useless hook be eliminated at this point. >>>>>>> Implementation detail, but in practice, instead of feeding the library >>>>>>> itself, the "framework" that allows to generate the library for the tool >>>>>>> writer can output a "configuration file" along side the library, and this >>>>>>> configuration file is what is fed to the compiler and tells the >>>>>>> instrumentation pass which of the hooks to generate. It sounds more >>>>>>> efficient to me, and remove the dependency on LTO. >>>>>>> I imagine there is a possible drawback that I'm missing right now... >>>>>> >>>>>> >>>>>> >>>>>> I agree that the tool does not need to depend on full LTO. What is >>>>>> needed is essentially an option or configuration such that the compiler can >>>>>> find the bit code file(s) for the hooks during compilation time. It is >>>>>> pretty much similar to how math function inlining can be done ... >>>>> >>>>> >>>>> I agree, and I would strongly prefer that the design worked like this >>>>> rather than relying on LTO. >>>>> >>>>> The flag for loading bitcode already exists, and is called >>>>> -mlink-bitcode-file. Projects such as libclc already use it, I believe. >>>>> >>>>> What might be useful is if CSI improved the infrastructure around >>>>> -mlink-bitcode-file to make it more convenient to produce compatible bitcode >>>>> files. libclc for example relies on a post-processing pass to change symbol >>>>> linkage, and I think that can be avoided by changing symbol linkages as they >>>>> are imported from the bitcode file. >>>>> >>>>> Peter >>>>> >>>>>> David >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> The final binary is specialized to a given tool. One advantage of >>>>>>> CSI, however, is that a single set of instrumentation covers the needs of a >>>>>>> wide variety of tools, since different tools provide different >>>>>>> implementations of the same hooks. The specialization of a binary to a >>>>>>> given tool happens at link time. >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> instrumented program-under-test is run. Furthermore, LTO can >>>>>>>> optimize a tool's instrumentation within a program using traditional >>>>>>>> compiler optimizations. Our initial study indicates that the use of LTO >>>>>>>> does not unduly slow down the build time >>>>>>>> >>>>>>>> >>>>>>>> This is a false claim: LTO has a very large overhead, and especially >>>>>>>> is not parallel, so the more core you have the more the difference will be. >>>>>>>> We frequently observes builds that are 3 times slower. Moreover, LTO is not >>>>>>>> incremental friendly and during debug (which is very relevant with >>>>>>>> sanitizer) rebuilding involves waiting for the full link to occur again. >>>>>>>> >>>>>>> >>>>>>> Can you please point us towards some projects where LTO incurs a 3x >>>>>>> slowdown? We're interested in the overhead of LTO on build times, and >>>>>>> although we've found LTO to incur more overhead on parallel build times than >>>>>>> serial build times, as you mentioned, the overheads we've measured on serial >>>>>>> or parallel builds have been less than 40% (which we saw when building the >>>>>>> Apache HTTP server). >>>>>>> >>>>>>> >>>>>>> I expect this to be reproducible on most non-trivial C/C++ programs. >>>>>>> But taking clang as an example, just running `ninja clang` on OS X a >>>>>>> not-so-recent 12-cores machine takes 970s with LTO and 252s without (and I >>>>>>> believe this is without debug info...). >>>>>>> Running just `ninja` to build all of llvm/clang here would take *a >>>>>>> lot* longer with LTO, and not so much without. >>>>>>> >>>>>>> The LTO builds without assert >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> -- >>>>>>> Mehdi >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> We've also designed CSI such that it does not depend on LTO for >>>>>>> correctness; the program and tool will work correctly with ordinary ld. Of >>>>>>> course, the downside of not using LTO is that instrumentation is not >>>>>>> optimized, and in particular, unused instrumentation will incur overhead. >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Mehdi >>>>>>>> >>>>>>>> , and the LTO can indeed optimize away unused hooks. One of our >>>>>>>> experiments with Apache HTTP server shows that, compiling with CSI and >>>>>>>> linking with the "null" CSI-tool (which consists solely of empty hooks) >>>>>>>> slows down the build time of the Apache HTTP server by less than 40%, and >>>>>>>> the resulting tool-instrumented executable is as fast as the original >>>>>>>> uninstrumented code. >>>>>>>> >>>>>>>> >>>>>>>> ===============>>>>>>>> CSI version 1 >>>>>>>> ===============>>>>>>>> >>>>>>>> The initial version of CSI supports a basic set of hooks that covers >>>>>>>> the following categories of program objects: functions, function exits >>>>>>>> (specifically, returns), basic blocks, call sites, loads, and stores. We >>>>>>>> prioritized instrumenting these IR objects based on the need of seven >>>>>>>> example CSI tools, including a race detector, a cache-reuse analyzer, and a >>>>>>>> code-coverage analyzer. We plan to evolve the CSI API over time to be more >>>>>>>> comprehensive, and we have designed the CSI API to be extensible, allowing >>>>>>>> new instrumentation to be added as needs grow. We chose to initially >>>>>>>> implement a minimal "core" set of hooks, because we felt it was best to add >>>>>>>> new instrumentation on an as-needed basis in order to keep the interface >>>>>>>> simple. >>>>>>>> >>>>>>>> There are three salient features about the design of CSI. First, CSI >>>>>>>> assigns each instrumented program object a unique integer identifier within >>>>>>>> one of the (currently) six program-object categories. Within each category, >>>>>>>> the ID's are consecutively numbered from 0 up to the number of such objects >>>>>>>> minus 1. The contiguous assignment of the ID's allows the tool writer to >>>>>>>> easily keep track of IR objects in the program and iterate through all >>>>>>>> objects in a category (whether the object is encountered during execution or >>>>>>>> not). Second, CSI provides a platform-independent means to relate a given >>>>>>>> program object to locations in the source code. Specifically, CSI provides >>>>>>>> "front-end-data (FED)" tables, which provide file name and source lines for >>>>>>>> each program object given the object's ID. Third, each CSI hook takes in as >>>>>>>> a parameter a "property": a 64-bit unsigned integer that CSI uses to export >>>>>>>> the results of compiler analyses and other information known at compile >>>>>>>> time. The use of properties allow the tool to rely on compiler analyses to >>>>>>>> optimize instrumentation and decrease overhead. In particular, since the >>>>>>>> value of a property is known at compile time, LTO can constant-fold the >>>>>>>> conditional test around a property to elide unnecessary instrumentation. >>>>>>>> >>>>>>>> ===============>>>>>>>> Future plan >>>>>>>> ===============>>>>>>>> >>>>>>>> We plan to expand CSI in future versions by instrumenting additional >>>>>>>> program objects, such as atomic instructions, floating-point instructions, >>>>>>>> and exceptions. We are also planning to provide additional static >>>>>>>> information to tool writers, both through information encoded in the >>>>>>>> properties passed to hooks and by other means. In particular, we are also >>>>>>>> looking at mechanisms to present tool writers with more complex static >>>>>>>> information, such as how different program objects relate to each other, >>>>>>>> e.g., which basic blocks belong to a given function. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> llvm-dev at lists.llvm.org >>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> llvm-dev at lists.llvm.org >>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -- >>>>> Peter >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Mehdi Amini via llvm-dev
2016-Jun-23 04:05 UTC
[llvm-dev] RFC: Comprehensive Static Instrumentation
> On Jun 20, 2016, at 7:00 PM, TB Schardl via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hey David, > > Thank you for your feedback. I'm trying to understand the model you sketched in order to compare it to CSI's current approach, but the details of your proposal are still fuzzy to me. In particular, although the model you described would avoid using LTO to elide unused hooks, it seems more complicated for both tool writers and tool users to use. Please clarify your model and shed some light on the questions below. > > 1) What does the "CSI dev tool" you describe look like? In particular, how does the tool writer produce and maintain an export list of non-stub hook definitions for her tool? Maintaining an export list manually would seem to be an error-prone hassle. One could imagine generating the export list automatically with a custom compile-time pass that the tool-writer runs, but maintaining consistency between a tool and its export list remains a concern for both the tool writer and the tool user. > > 2) To clarify, in your scheme, the tool writer produces an export list as well as an object/bitcode for the tool. The tool user compiles the program-under-test using the export list, and then incorporates the tool object/bitcode at link time. Is this what you have in mind? > > 3) Do I understand correctly that your model only gets rid of the need for LTO to elide unused instrumentation hooks? In particular, are other optimizations on the instrumentation still contingent on LTO (or ThinLTO)? One nice feature of the current design is that tool writers can use properties passed to hooks to elide instrumentation conditionally based on common static analysis. It looks like optimizations based on properties are still possible in the model you propose, but as with the current design, they would still rely on LTO or ThinLTO. > > 4) In your model, if the tool user wants to analyze his program with several different tools, then he must recompile his program from source once for each tool. Is this correct? The argument in this thread seems to be that recompiling the program from source is no worse than using LTO because LTO incurs high overhead. (From the results I've found online http://llvm.org/devmtg/2015-04/slides/ThinLTO_EuroLLVM2015.pdf <http://llvm.org/devmtg/2015-04/slides/ThinLTO_EuroLLVM2015.pdf>, however, ThinLTO seems to be much lower overhead than LTO. Are these results still accurate?) >We just published some updated results (including the new incremental mode): http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html <http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html> — Mehdi> 5) What happens when the program-under-test is built from multiple source files? It seems that all source files must be compiled using the same export list, which is a burden on the tool user. Although this complexity could be managed by the program's build system, managing this complexity through the build system or by some other means still seems like a burden on the tool user. > > 6) What happens when the program-under-test uses third-party libraries? If the tool user does not have access to the sources of those libraries, then I don't see how they can properly elide hooks using an export list. At best, it would seem that the library writer would distribute a object/bitcode for the library with all hooks in place already, which is exactly what happens with CSI's the existing design. > > Thanks again for your feedback. > > Cheers, > TB > > On Sun, Jun 19, 2016 at 6:45 PM, Xinliang David Li <xinliangli at gmail.com <mailto:xinliangli at gmail.com>> wrote: > It is great that CSI does not depend on lto for correctness, but I think it should remove the dependency completely. Being able to work with lto should be something that is supported, but lto is not required. > > Such a design will greatly increase CSI usability not only for tool users(app developers), but also for CSI tool developers. > > My understanding is that it is not far to get there. Here is the CSI use model I am thinking: > > 1) a CSI dev tool can be provided to tool developers to produce a) hook library in native object format b) hook lib in bitcode format c) export list of non stub hook defs > > 2) the compiler implementation of CSI lowering will check the export list and decide whether to lower a hook call to noop or not > > 3) in thinlto mode, the bitcode can be readin for inlining. This works for lto too > > 4) compiler driver can hide all the above from the users > > > In short, making it work in default mode makes it easier to deploy and can get you a long way > > Thanks, > > David > > > On Sunday, June 19, 2016, TB Schardl <neboat at mit.edu <mailto:neboat at mit.edu>> wrote: > Hey Peter and David, > > Thank you for your comments. > > As mentioned elsewhere, the current design of CSI does not rely on LTO for correctness. The tool-instrumented executable will run correctly even if the linker performs no optimization. In particular, unused instrumentation hooks are implemented by default as nop functions, which just return immediately. CSI is a system that can use LTO to improve tool performance; it does not require LTO to function. > > One of our considerations when developing CSI version 1 was design simplicity. As such, CSI version 1 essentially consists of three components: > 1) A compile-time pass that the tool user runs to insert instrumentation hooks. > 2) A null-tool library that provides default nop implementations for each instrumentation hook. When a tool writer implements a tool using the CSI API, the tool writer's implemented hooks override the corresponding default implementations. > 3) A runtime library that implements certain powerful features of CSI, including contiguous sets of ID's for hooks. > > We've been thinking about how CSI might work with -mlink-bitcode-file. From our admittedly limited understanding of that feature, it seems that a design that uses -mlink-bitcode-file would still require something like the first and third components of the existing design. Additional complexity might be needed to get CSI to work with -mlink-bitcode-file, but these two components seem to be core to CSI, regardless of whether -mlink-bitcode-file is used. (Eliminating the null-tool library amounts to eliminating a pretty simple 39-line C file, which at first blush doesn't look like a big win in design complexity). > > CSI focuses on making it easy for tool writers to create many dynamic-analysis tools. CSI can leverage standard compiler optimizations to improve tool performance, if the tool user employs mechanisms such as LTO or thinLTO, but LTO itself is not mandatory. It might be worthwhile to explore other approaches with different trade-offs, such as -mlink-bitcode-file, but the existing design doesn't preclude these approaches down the road, and they will be able to share the same infrastructure. Unless the other approaches are dramatically simpler, the existing design seems like a good place to start. > > Cheers, > TB > > On Fri, Jun 17, 2016 at 4:25 PM, Peter Collingbourne via llvm-dev <llvm-dev at lists.llvm.org <>> wrote: > > > On Thu, Jun 16, 2016 at 10:16 PM, Xinliang David Li via llvm-dev <llvm-dev at lists.llvm.org <>> wrote: > > > On Thu, Jun 16, 2016 at 3:27 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <>> wrote: > Hi TB, > > Thanks for you answer. > >> On Jun 16, 2016, at 2:50 PM, TB Schardl <neboat at mit.edu <>> wrote: >> >> Hey Mehdi, >> >> Thank you for your comments. I've CC'd the CSI mailing list with your comments and put my responses inline. Please let me know any other questions you have. >> >> Cheers, >> TB >> >> On Thu, Jun 16, 2016 at 3:48 PM, Mehdi Amini <mehdi.amini at apple.com <>> wrote: >> >>> On Jun 16, 2016, at 9:01 AM, TB Schardl via llvm-dev <llvm-dev at lists.llvm.org <>> wrote: >>> >>> Hey LLVM-dev, >>> >>> We propose to build the CSI framework to provide a comprehensive suite of compiler-inserted instrumentation hooks that dynamic-analysis tools can use to observe and investigate program runtime behavior. Traditionally, tools based on compiler instrumentation would each separately modify the compiler to insert their own instrumentation. In contrast, CSI inserts a standard collection of instrumentation hooks into the program-under-test. Each CSI-tool is implemented as a library that defines relevant hooks, and the remaining hooks are "nulled" out and elided during link-time optimization (LTO), resulting in instrumented runtimes on par with custom instrumentation. CSI allows many compiler-based tools to be written as simple libraries without modifying the compiler, greatly lowering the bar for >>> developing dynamic-analysis tools. >>> >>> ===============>>> Motivation >>> ===============>>> >>> Key to understanding and improving the behavior of any system is visibility -- the ability to know what is going on inside the system. Various dynamic-analysis tools, such as race detectors, memory checkers, cache simulators, call-graph generators, code-coverage analyzers, and performance profilers, rely on compiler instrumentation to gain visibility into the program behaviors during execution. With this approach, the tool writer modifies the compiler to insert instrumentation code into the program-under-test so that it can execute behind the scene while the program-under-test runs. This approach, however, means that the development of new tools requires compiler work, which many potential tool writers are ill equipped to do, and thus raises the bar for building new and innovative tools. >>> >>> The goal of the CSI framework is to provide comprehensive static instrumentation through the compiler, in order to simplify the task of building efficient and effective platform-independent tools. The CSI framework allows the tool writer to easily develop analysis tools that require >>> compiler instrumentation without needing to understand the compiler internals or modifying the compiler, which greatly lowers the bar for developing dynamic-analysis tools. >>> >>> ===============>>> Approach >>> ===============>>> >>> The CSI framework inserts instrumentation hooks at salient locations throughout the compiled code of a program-under-test, such as function entry and exit points, basic-block entry and exit points, before and after each memory operation, etc. Tool writers can instrument a program-under-test simply by first writing a library that defines the semantics of relevant hooks >>> and then statically linking their compiled library with the program-under-test. >>> >>> At first glance, this brute-force method of inserting hooks at every salient location in the program-under-test seems to be replete with overheads. CSI overcomes these overheads through the use of link-time-optimization (LTO), which is now readily available in most major compilers, including GCC and LLVM. Using LTO, instrumentation hooks that are not used by a particular tool can be elided, allowing the overheads of these hooks to be avoided when the >> >> I don't understand this flow: the front-end emits all the possible instrumentation but the useless calls to the runtime will be removed during the link? >> It means that the final binary is specialized for a given tool right? What is the advantage of generating this useless instrumentation in the first place then? I'm missing a piece here... >> >> Here's the idea. When a tool user, who has a program they want to instrument, compiles their program source into an object/bitcode, he can turn on the CSI compile-time pass to insert instrumentation hooks (function calls to instrumentation routines) throughout the IR of the program. Separately, a tool writer implements a particular tool by writing a library that defines the subset of instrumentation hooks she cares about. At link time, the object/bitcode of the program source is linked with the object file/bitcode of the tool, resulting in a tool-instrumented executable. When LTO is used at link time, unused instrumentation is elided, and additional optimizations can run on the instrumented program. (I'm happy to send you a nice picture that we have of this flow, if the mailing list doesn't mind.) > > Ok this is roughly what I had in mind. > > I still believe it is not great to rely on LTO, and better, it is not needed to achieve this result. > > For instance, I don't see why the "library" that defines the subset of instrumentation hooks used by this tool can't be fed during a regular compile, and the useless hook be eliminated at this point. > Implementation detail, but in practice, instead of feeding the library itself, the "framework" that allows to generate the library for the tool writer can output a "configuration file" along side the library, and this configuration file is what is fed to the compiler and tells the instrumentation pass which of the hooks to generate. It sounds more efficient to me, and remove the dependency on LTO. > I imagine there is a possible drawback that I'm missing right now... > > > I agree that the tool does not need to depend on full LTO. What is needed is essentially an option or configuration such that the compiler can find the bit code file(s) for the hooks during compilation time. It is pretty much similar to how math function inlining can be done ... > > I agree, and I would strongly prefer that the design worked like this rather than relying on LTO. > > The flag for loading bitcode already exists, and is called -mlink-bitcode-file. Projects such as libclc already use it, I believe. > > What might be useful is if CSI improved the infrastructure around -mlink-bitcode-file to make it more convenient to produce compatible bitcode files. libclc for example relies on a post-processing pass to change symbol linkage, and I think that can be avoided by changing symbol linkages as they are imported from the bitcode file. > > Peter > > David > > > > > >> >> The final binary is specialized to a given tool. One advantage of CSI, however, is that a single set of instrumentation covers the needs of a wide variety of tools, since different tools provide different implementations of the same hooks. The specialization of a binary to a given tool happens at link time. >> >> >> >>> instrumented program-under-test is run. Furthermore, LTO can optimize a tool's instrumentation within a program using traditional compiler optimizations. Our initial study indicates that the use of LTO does not unduly slow down the build time >> >> This is a false claim: LTO has a very large overhead, and especially is not parallel, so the more core you have the more the difference will be. We frequently observes builds that are 3 times slower. Moreover, LTO is not incremental friendly and during debug (which is very relevant with sanitizer) rebuilding involves waiting for the full link to occur again. >> >> >> Can you please point us towards some projects where LTO incurs a 3x slowdown? We're interested in the overhead of LTO on build times, and although we've found LTO to incur more overhead on parallel build times than serial build times, as you mentioned, the overheads we've measured on serial or parallel builds have been less than 40% (which we saw when building the Apache HTTP server). > > I expect this to be reproducible on most non-trivial C/C++ programs. > But taking clang as an example, just running `ninja clang` on OS X a not-so-recent 12-cores machine takes 970s with LTO and 252s without (and I believe this is without debug info...). > Running just `ninja` to build all of llvm/clang here would take *a lot* longer with LTO, and not so much without. > > The LTO builds without assert > > Best, > > -- > Mehdi > > > > >> >> We've also designed CSI such that it does not depend on LTO for correctness; the program and tool will work correctly with ordinary ld. Of course, the downside of not using LTO is that instrumentation is not optimized, and in particular, unused instrumentation will incur overhead. >> >> >> -- >> Mehdi >> >>> , and the LTO can indeed optimize away unused hooks. One of our experiments with Apache HTTP server shows that, compiling with CSI and linking with the "null" CSI-tool (which consists solely of empty hooks) slows down the build time of the Apache HTTP server by less than 40%, and the resulting tool-instrumented executable is as fast as the original uninstrumented code. >>> >>> ===============>>> CSI version 1 >>> ===============>>> >>> The initial version of CSI supports a basic set of hooks that covers the following categories of program objects: functions, function exits (specifically, returns), basic blocks, call sites, loads, and stores. We prioritized instrumenting these IR objects based on the need of seven example CSI tools, including a race detector, a cache-reuse analyzer, and a code-coverage analyzer. We plan to evolve the CSI API over time to be more comprehensive, and we have designed the CSI API to be extensible, allowing new instrumentation to be added as needs grow. We chose to initially implement a minimal "core" set of hooks, because we felt it was best to add new instrumentation on an as-needed basis in order to keep the interface simple. >>> >>> There are three salient features about the design of CSI. First, CSI assigns each instrumented program object a unique integer identifier within one of the (currently) six program-object categories. Within each category, the ID's are consecutively numbered from 0 up to the number of such objects minus 1. The contiguous assignment of the ID's allows the tool writer to easily keep track of IR objects in the program and iterate through all objects in a category (whether the object is encountered during execution or not). Second, CSI provides a platform-independent means to relate a given program object to locations in the source code. Specifically, CSI provides "front-end-data (FED)" tables, which provide file name and source lines for each program object given the object's ID. Third, each CSI hook takes in as a parameter a "property": a 64-bit unsigned integer that CSI uses to export the results of compiler analyses and other information known at compile time. The use of properties allow the tool to rely on compiler analyses to optimize instrumentation and decrease overhead. In particular, since the value of a property is known at compile time, LTO can constant-fold the conditional test around a property to elide unnecessary instrumentation. >>> >>> ===============>>> Future plan >>> ===============>>> >>> We plan to expand CSI in future versions by instrumenting additional program objects, such as atomic instructions, floating-point instructions, and exceptions. We are also planning to provide additional static information to tool writers, both through information encoded in the properties passed to hooks and by other means. In particular, we are also looking at mechanisms to present tool writers with more complex static information, such as how different program objects relate to each other, e.g., which basic blocks belong to a given function. >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org <> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > > > -- > -- > Peter > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160623/5e035245/attachment.html>