Ammar Ben Khadra via llvm-dev
2020-Jun-28 12:39 UTC
[llvm-dev] Introducing the binary-level coverage analysis tool bcov
Hi Fangrui, Many thanks for providing such detailed feedback! Please find my comments inlined below. - Ammar On Sun, Jun 28, 2020 at 5:59 AM Fangrui Song <maskray at google.com> wrote:> On 2020-06-26, Ammar Ben Khadra via llvm-dev wrote: > >## TL;DR > > > >We introduce bcov, an open-source binary-level coverage analysis tool [1]. > >The details are discussed in our paper [2], which is accepted to > >ESEC/FSE'20. bcov statically instruments x86-64 ELF binaries without > >compiler support. It features several techniques that allow it to achieve > >high performance, transparency, and flexibility. > > > >For example, running "make check-llvm-codegen" with an instrumented > version > >of llc introduces a mean performance overhead of about 32%. The overhead > >for smaller binaries can be as low as 2%. bcov accurately reports code > >coverage with a mean F-score of 99,86%. All of that without introducing > any > >test regressions. > > > > > >## Design overview > > > >Given an ELF module as input, bcov will first analyze it to choose > >appropriate probe locations and estimate the size of the program segments > >required for patching. This analysis depends on the instrumentation policy > >chosen by the user. Then, bcov patches the input module by adding two > >program segments; one segment contains trampoline code and the other keeps > >coverage data. > > > >bcov uses trampolines to instrument the binary where it inserts probes in > >targeted locations to track basic block coverage. Each probe consists of a > >detour that diverts control flow to a designated trampoline. The > trampoline > >updates coverage data using a single pc-relative mov instruction (e.g., > mov > >BYTE PTR [rip+0xadd88],1), executes relocated instructions, and then > >restores control flow to its original state. > > > >To dump coverage data, the user can inject our small runtime library, > >libbcov-rt, using the LD_PRELOAD mechanism. This library scans the memory > >mappings of the current process to dump every data segment that starts > with > >a specific magic number. This means that coverage data will be dumped > >separately for each ELF module. Dumping data takes place at process > >shutdown or upon receiving a user signal. The latter feature enables > online > >coverage tracking. > > > > > >## Key features > > > >bcov integrates several techniques including: > > > > - Probe pruning. We instrument only what is necessary for tracking code > >coverage. To this end, we bring Agrawal's probe pruning technique [3] to > >binary-level instrumentation. > > > > - Precise CFG analysis. Imprecision in the recovered CFG can have several > >effects, which include causing the instrumented binary to crash. To > improve > >CFG precision, we propose a novel jump table analysis and implement a > >variant of an available non-return analysis [4]. > > > > - Static instrumentation. We implement several techniques like (1) > >optimized probe selection, (2) jump table instrumentation, (3) and > >systematic detour hosting. > > > >The result is that our approach can work transparently, and with low > >overhead, on large and well-tested C and C++ programs. > > > > > >## Relation to the compiler > > > >Code coverage analysis at the binary level, as implemented in bcov, can > >offer several benefits: first, it is highly flexible. A user can analyze > >the coverage of a particular module only. Even better, coverage tracking > >can be limited to specific functions, e.g., the ones affected by recent > >changes. Second, it is largely independent of the used compiler and the > >build system. We observed consistent results after experimenting with four > >different versions of gcc and clang and three different build types, > >namely, debug, release, and LTO. Finally, the primitives we use to divert > >control-flow and update coverage are simple. One can imagine (and hope) > >that extending bcov to system code, or other native languages, will not > >require a lot of work. > > > >However, the viability of static binary-level coverage analysis ultimately > >depends on the compiler toolchain that produces the binary. Specifically, > >we can think of two areas where compilers might help our tool by adding > >extra support, or at least by not emitting code that is difficult to > >analyze: > > > > - Source-level coverage. The ability to know whether a particular basic > >block was covered in the binary is quite interesting. However, developers > >would ultimately want to map this coverage information to source-level > >artifacts. Statement coverage can be supported already with the help of > >dwarf debugging information. But bcov can report the coverage of > individual > >branch decisions at the binary level. Therefore, leveraging this > >information to obtain source-level MC/DC seems to be a natural next step. > >Ideally, we want to infer MC/DC using what we have already, namely, > >binary-level coverage and dwarf information. However, adding support to a > >custom mapping format similar to the one available in llvm-cov might be > >necessary. In this regard, we appreciate your suggestions on the best way > >to support MC/DC, if at all feasible. (feedback needed) > > > > - Function model. We assume that a function occupies a continuous code > >region. The function's main entry is expected to be at the beginning of > >this region. Also, we consider each landing pad to be a function entry as > >well. Based on this, we assume that the identified entries predominate all > >basic blocks inside the function. In other words, a function can not reach > >an arbitrary basic block inside another function without first visiting > one > >of its entries. This assumption seems to hold pretty well in the binaries > >we experimented with, but we are not sure to what extent it generalizes to > >other compilers and ISAs. (feedback needed) > > > > > >## Potential future work > > > >Updating coverage using a single pc-relative mov instruction is > transparent > >in the sense that it does not clobber any general-purpose register. It > even > >preserves the CPU's eflags. This, in turn, has greatly simplified the > >implementation of our prototype. That is, we did not have to think about > >saving/restoring the CPU state. However, putting mechanisms in place to > >allow eflags to be *safely* clobbered, e.g., liveness analysis, can open > >the door for several design alternatives where a pc-relative mov (e.g., > mov > >BYTE PTR [rip+0xadd88],1) can be replaced with: > > > > - or BYTE PTR [rip+0xadd88],c: where c is one-hot constant. This will > >reduce the size of required coverage data to 1/8 of the size currently > >required by bcov. > > > > - add BYTE PTR [rip+0xadd88],1: 8-bit counter per superblock (or > selected > >set of basic blocks). This might be useful for fuzzing. > > > > - add DWORD PTR [rip+0xadd88],1: 32-bit counter per superblock (or > >selected set of basic blocks). This can be useful for profiling. > > > > > >So there is significant potential here. However, we have not explored > these > >ideas any further, and we would highly appreciate any feedback about their > >viability and potential added value in comparison to what is already > >available in llvm-cov and sancov. (feedback needed) > > > >To conclude, we introduced bcov [1] and discussed a few areas where the > >LLVM community can help us assess its potential, identify related work, > and > >plan its future direction. Many thanks in advance for your feedback. > > > > > >Kind regards, > > > >Ammar > > > > > > [1]: https://github.com/abenkhadra/bcov > > [2]: https://arxiv.org/pdf/2004.14191.pdf > > [3]: https://dl.acm.org/doi/10.1145/174675.175935 > > [4]: https://dl.acm.org/doi/10.1145/2931037.2931047 > > Hi Ammar, > > Great work! > > I think the following statement in the paper is not true: "gcov relies on > debug > information which is less accurate in optimized builds." GCC's gcov > implementation does not rely on debugging information. This can be > verified with > -fdump-debug: `gcc --coverage -fdump-debug -c a.c` dumps an empty > a.c.*.debug > file. > >After a second look at https://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html#Gcov-Intro, it seems that I misunderstood why it is recommended to disable optimizations in gcov. You are right. This should be fixed. Thanks!> clang supports gcov instrumentation as well. clang --coverage does leverage > debugging information, but the usage is very minimum - debugging > information is > just used to retrieve the start line of a function. I have studied and > contributed a bit to clang's gcov implementation recently and thus I am > fairly > confident about this. > > > In comparison, llvm-cov features a custom mapping format embedded in > > LLVM’s intermediate representation (IR). > > I don't know whether a clarification is needed here. llvm-cov is a > frontend tool > which supports two formats: (1) coverage mapping (clang > -fprofile-instr-generate > -fcoverage-mapping) and (2) gcov. Using -fcoverage-mapping might be > clearer in > the context. > >Actually, we are referring to the first usage, which is, as I understand, the most popular. A clarification can indeed help here.> > Also, sancov is tightly coupled with LLVM sanitizers (e.g., ASan) which > add > > varying overhead. Extending bcov with additional feedback signals, > similar to > > sancov, is an interesting future work > > SanitizerCoverage is a standalone instrumentation pass > (llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp) > which is not coupled with asan. -fsanitize-coverage= can be used > standalone, or > together with asan, lsan, msan, ubsan, etc. > > Its overhead can be very small, especially if you use the recent > inline-bool-flag > https://reviews.llvm.org/D77244 > (`clang -fsanitize-coverage=inline-bool-flag a.c`) or inline-8bit-counters. > You can choose 3 modes: edge (default), bb, and func. > The flags are stored in a section __sancov_bools. > There is no standalone dumper though... > > >I remember that the term "tightly coupled" was used by a sancov developer in a Github issue. I can not find that thread now. This perhaps refers to the fact that (please correct me) casual users are not expected to use sancov as an independent pass. Instead, they should rely on the runtime already provided by one of the sanitizers. I shall revisit this issue to improve the wording. However, I think that the main point still holds; given this dependence on the sanitizer runtime and the variety of options offered by sancov, one can not refer to a 'reference' sancov to report its instrumentation overhead. Nevertheless, the functionality offered by the sancov option 'trace-pc-guard' is perhaps closest to that in bcov. If we use this as the basis for measurement, then the performance overhead should be small. Can you please refer me to instructions on how I can use sancov without any sanitizer runtime? A few random comments below.> > > If I understand correctly, by leveraging superblock dominator graph (from > "Dominators, Super Blocks, and Program Coverage") and the any-node policy, > the > coverage result is binary: not-covered vs covered. I agree that in many > cases > this is sufficient. gcov and -fcoverage-mapping, on the other hand, can > provide > line execution counts, which are handy in other cases. This fact should > probably > mentioned somewhere, "RQ2: Instrumentation overhead" or another place. > Both > -fcoverage-mapping and gcov provide more information, so they have larger > overhead (I agree that gcov isn't a format optimized for file sizes - it > uses > uint32_t records everywhere which can be quite expensive representing > smaller > integers). > > >Yes, the reported coverage is binary. It indicates whether a basic block is covered or not based on the coverage of its superblock (a set of basic blocks with the same coverage information). Profiling represents an interesting use case, indeed. I noted in the 'Potential future work' section that updating coverage data using: add DWORD PTR [rip+0xadd88],1 instead of the current: mov BYTE PTR [rip+0xadd88],1 Enables profiling using a 32-bit counter per superblock. However, a superblock can span multiple lines in the source code which renders this counter difficult to relate to. Instead of grouping basic blocks based on superblocks, we need to think about mapping each source line to its corresponding basic blocks in the binary and then choose the best BB to instrument with such a counter. Expectedly, this will incur more overhead. Also, note that "RQ2: Instrumentation overhead" is concerned with evaluating the overhead of bcov only. We did not attempt to directly compare this overhead to that of llvm-cov or gcov since, as you noted already, both tools track different artifacts, i.e., not directly comparable. However, we mentioned that llvm-cov can dump "320GB of coverage (and profiling) data" if online merging was not enabled. So there was an explicit emphasis on the role of profiling here. And that discussion was in the context of showing that the same online merging feature can be leveraged by bcov as well.> bcov currently rewrites program headers but not the section header table, > so for > example, objdump -d a.any cannot disassemble the synthesized code. > > >The original section header table is preserved. But adding two additional section headers to describe the patch segments is a nice to have feature. I developed a simple tool to disassemble the patch code which was sufficient for my debugging needs. I'm planning to open source this tool later. In the meantime, using gdb as described in the artifacts repository ( https://github.com/abenkhadra/bcov-artifacts/blob/master/sample-binaries/README.md) might be the easiest way to inspect the patch code.> bcov depends on capstone, which appears to be a pretty standard disassembly > tool... It is based on a ~2013 shimmed-down version of LLVM MC and > LLVM*Disassembler. The generated files (e.g. > arch/X86/X86DisassemblerDecoder.c) > get ad-hoc sporadic amendment from contributors over the years.. It is a > bit > unfortunate for LLVM that MC and LLVM*Disassembler (Machine Code, including > assembly/disassembly libraries) cannot be easily adapted by downstream > users..... (Well, downstream packages can use LLVM exported CMake files > and use > find_program(LLVM) and link against some LLVM* libraries but this may pull > in a > large set of dependencies) >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200628/ce8608a3/attachment-0001.html>
Fangrui Song via llvm-dev
2020-Jul-08 00:17 UTC
[llvm-dev] Introducing the binary-level coverage analysis tool bcov
I noticed another tool in this area which was probably available before 1991. MIPS pixie. From a document of IRIX, https://irix7.com/techpubs/007-2479-001.pdf p90, I believe it works in a way very similar to bcov. Run pixie to generate the equivalent program containing basic-block-counting code. % pixie myprog ... pixie takes myprog and writes an equivalent program, myprog.pixie, containing additional code that counts the execution of each basic block IRIX is a discontinued operating system. I can't find more information on this tool. Hope some MIPS folks on the list can provide more information. On 2020-06-28, Ammar Ben Khadra wrote:>Hi Fangrui, > >Many thanks for providing such detailed feedback! Please find my comments >inlined below. > >- Ammar > > >On Sun, Jun 28, 2020 at 5:59 AM Fangrui Song <maskray at google.com> wrote: > >> On 2020-06-26, Ammar Ben Khadra via llvm-dev wrote: >> >## TL;DR >> > >> >We introduce bcov, an open-source binary-level coverage analysis tool [1]. >> >The details are discussed in our paper [2], which is accepted to >> >ESEC/FSE'20. bcov statically instruments x86-64 ELF binaries without >> >compiler support. It features several techniques that allow it to achieve >> >high performance, transparency, and flexibility. >> > >> >For example, running "make check-llvm-codegen" with an instrumented >> version >> >of llc introduces a mean performance overhead of about 32%. The overhead >> >for smaller binaries can be as low as 2%. bcov accurately reports code >> >coverage with a mean F-score of 99,86%. All of that without introducing >> any >> >test regressions. >> > >> > >> >## Design overview >> > >> >Given an ELF module as input, bcov will first analyze it to choose >> >appropriate probe locations and estimate the size of the program segments >> >required for patching. This analysis depends on the instrumentation policy >> >chosen by the user. Then, bcov patches the input module by adding two >> >program segments; one segment contains trampoline code and the other keeps >> >coverage data. >> > >> >bcov uses trampolines to instrument the binary where it inserts probes in >> >targeted locations to track basic block coverage. Each probe consists of a >> >detour that diverts control flow to a designated trampoline. The >> trampoline >> >updates coverage data using a single pc-relative mov instruction (e.g., >> mov >> >BYTE PTR [rip+0xadd88],1), executes relocated instructions, and then >> >restores control flow to its original state. >> > >> >To dump coverage data, the user can inject our small runtime library, >> >libbcov-rt, using the LD_PRELOAD mechanism. This library scans the memory >> >mappings of the current process to dump every data segment that starts >> with >> >a specific magic number. This means that coverage data will be dumped >> >separately for each ELF module. Dumping data takes place at process >> >shutdown or upon receiving a user signal. The latter feature enables >> online >> >coverage tracking. >> > >> > >> >## Key features >> > >> >bcov integrates several techniques including: >> > >> > - Probe pruning. We instrument only what is necessary for tracking code >> >coverage. To this end, we bring Agrawal's probe pruning technique [3] to >> >binary-level instrumentation. >> > >> > - Precise CFG analysis. Imprecision in the recovered CFG can have several >> >effects, which include causing the instrumented binary to crash. To >> improve >> >CFG precision, we propose a novel jump table analysis and implement a >> >variant of an available non-return analysis [4]. >> > >> > - Static instrumentation. We implement several techniques like (1) >> >optimized probe selection, (2) jump table instrumentation, (3) and >> >systematic detour hosting. >> > >> >The result is that our approach can work transparently, and with low >> >overhead, on large and well-tested C and C++ programs. >> > >> > >> >## Relation to the compiler >> > >> >Code coverage analysis at the binary level, as implemented in bcov, can >> >offer several benefits: first, it is highly flexible. A user can analyze >> >the coverage of a particular module only. Even better, coverage tracking >> >can be limited to specific functions, e.g., the ones affected by recent >> >changes. Second, it is largely independent of the used compiler and the >> >build system. We observed consistent results after experimenting with four >> >different versions of gcc and clang and three different build types, >> >namely, debug, release, and LTO. Finally, the primitives we use to divert >> >control-flow and update coverage are simple. One can imagine (and hope) >> >that extending bcov to system code, or other native languages, will not >> >require a lot of work. >> > >> >However, the viability of static binary-level coverage analysis ultimately >> >depends on the compiler toolchain that produces the binary. Specifically, >> >we can think of two areas where compilers might help our tool by adding >> >extra support, or at least by not emitting code that is difficult to >> >analyze: >> > >> > - Source-level coverage. The ability to know whether a particular basic >> >block was covered in the binary is quite interesting. However, developers >> >would ultimately want to map this coverage information to source-level >> >artifacts. Statement coverage can be supported already with the help of >> >dwarf debugging information. But bcov can report the coverage of >> individual >> >branch decisions at the binary level. Therefore, leveraging this >> >information to obtain source-level MC/DC seems to be a natural next step. >> >Ideally, we want to infer MC/DC using what we have already, namely, >> >binary-level coverage and dwarf information. However, adding support to a >> >custom mapping format similar to the one available in llvm-cov might be >> >necessary. In this regard, we appreciate your suggestions on the best way >> >to support MC/DC, if at all feasible. (feedback needed) >> > >> > - Function model. We assume that a function occupies a continuous code >> >region. The function's main entry is expected to be at the beginning of >> >this region. Also, we consider each landing pad to be a function entry as >> >well. Based on this, we assume that the identified entries predominate all >> >basic blocks inside the function. In other words, a function can not reach >> >an arbitrary basic block inside another function without first visiting >> one >> >of its entries. This assumption seems to hold pretty well in the binaries >> >we experimented with, but we are not sure to what extent it generalizes to >> >other compilers and ISAs. (feedback needed) >> > >> > >> >## Potential future work >> > >> >Updating coverage using a single pc-relative mov instruction is >> transparent >> >in the sense that it does not clobber any general-purpose register. It >> even >> >preserves the CPU's eflags. This, in turn, has greatly simplified the >> >implementation of our prototype. That is, we did not have to think about >> >saving/restoring the CPU state. However, putting mechanisms in place to >> >allow eflags to be *safely* clobbered, e.g., liveness analysis, can open >> >the door for several design alternatives where a pc-relative mov (e.g., >> mov >> >BYTE PTR [rip+0xadd88],1) can be replaced with: >> > >> > - or BYTE PTR [rip+0xadd88],c: where c is one-hot constant. This will >> >reduce the size of required coverage data to 1/8 of the size currently >> >required by bcov. >> > >> > - add BYTE PTR [rip+0xadd88],1: 8-bit counter per superblock (or >> selected >> >set of basic blocks). This might be useful for fuzzing. >> > >> > - add DWORD PTR [rip+0xadd88],1: 32-bit counter per superblock (or >> >selected set of basic blocks). This can be useful for profiling. >> > >> > >> >So there is significant potential here. However, we have not explored >> these >> >ideas any further, and we would highly appreciate any feedback about their >> >viability and potential added value in comparison to what is already >> >available in llvm-cov and sancov. (feedback needed) >> > >> >To conclude, we introduced bcov [1] and discussed a few areas where the >> >LLVM community can help us assess its potential, identify related work, >> and >> >plan its future direction. Many thanks in advance for your feedback. >> > >> > >> >Kind regards, >> > >> >Ammar >> > >> > >> > [1]: https://github.com/abenkhadra/bcov >> > [2]: https://arxiv.org/pdf/2004.14191.pdf >> > [3]: https://dl.acm.org/doi/10.1145/174675.175935 >> > [4]: https://dl.acm.org/doi/10.1145/2931037.2931047 >> >> Hi Ammar, >> >> Great work! >> >> I think the following statement in the paper is not true: "gcov relies on >> debug >> information which is less accurate in optimized builds." GCC's gcov >> implementation does not rely on debugging information. This can be >> verified with >> -fdump-debug: `gcc --coverage -fdump-debug -c a.c` dumps an empty >> a.c.*.debug >> file. >> >> >After a second look at >https://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html#Gcov-Intro, it seems >that I misunderstood why it is recommended to disable optimizations in >gcov. You are right. This should be fixed. Thanks! > > >> clang supports gcov instrumentation as well. clang --coverage does leverage >> debugging information, but the usage is very minimum - debugging >> information is >> just used to retrieve the start line of a function. I have studied and >> contributed a bit to clang's gcov implementation recently and thus I am >> fairly >> confident about this. >> >> > In comparison, llvm-cov features a custom mapping format embedded in >> > LLVM’s intermediate representation (IR). >> >> I don't know whether a clarification is needed here. llvm-cov is a >> frontend tool >> which supports two formats: (1) coverage mapping (clang >> -fprofile-instr-generate >> -fcoverage-mapping) and (2) gcov. Using -fcoverage-mapping might be >> clearer in >> the context. >> >> >Actually, we are referring to the first usage, which is, as I understand, >the most popular. A clarification can indeed help here. > > >> > Also, sancov is tightly coupled with LLVM sanitizers (e.g., ASan) which >> add >> > varying overhead. Extending bcov with additional feedback signals, >> similar to >> > sancov, is an interesting future work >> >> SanitizerCoverage is a standalone instrumentation pass >> (llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp) >> which is not coupled with asan. -fsanitize-coverage= can be used >> standalone, or >> together with asan, lsan, msan, ubsan, etc. >> >> Its overhead can be very small, especially if you use the recent >> inline-bool-flag >> https://reviews.llvm.org/D77244 >> (`clang -fsanitize-coverage=inline-bool-flag a.c`) or inline-8bit-counters. >> You can choose 3 modes: edge (default), bb, and func. >> The flags are stored in a section __sancov_bools. >> There is no standalone dumper though... >> >> >> >I remember that the term "tightly coupled" was used by a sancov developer >in a Github issue. I can not find that thread now. This perhaps refers to >the fact that (please correct me) casual users are not expected to use >sancov as an independent pass. Instead, they should rely on the runtime >already provided by one of the sanitizers. I shall revisit this issue to >improve the wording. However, I think that the main point still holds; >given this dependence on the sanitizer runtime and the variety of options >offered by sancov, one can not refer to a 'reference' sancov to report >its instrumentation overhead. Nevertheless, the functionality offered by >the sancov option 'trace-pc-guard' is perhaps closest to that in bcov. If >we use this as the basis for measurement, then the performance overhead >should be small. Can you please refer me to instructions on how I can use >sancov without any sanitizer runtime?Sadly it can't. trace-pc-guard is the only feature which can currently dump information to a .sancov file, which can then be inspected by the 'sancov' tool. The runtime must be provided by a sanitizer, -fsanitize={address,memory,thread,fuzzer,undefined,cfi}. There must be a runtime. So saying SanitizerCoverage is coupled with sanitizers is probably fine.>A few random comments below. >> >> >> If I understand correctly, by leveraging superblock dominator graph (from >> "Dominators, Super Blocks, and Program Coverage") and the any-node policy, >> the >> coverage result is binary: not-covered vs covered. I agree that in many >> cases >> this is sufficient. gcov and -fcoverage-mapping, on the other hand, can >> provide >> line execution counts, which are handy in other cases. This fact should >> probably >> mentioned somewhere, "RQ2: Instrumentation overhead" or another place. >> Both >> -fcoverage-mapping and gcov provide more information, so they have larger >> overhead (I agree that gcov isn't a format optimized for file sizes - it >> uses >> uint32_t records everywhere which can be quite expensive representing >> smaller >> integers). >> >> >> >Yes, the reported coverage is binary. It indicates whether a basic block is >covered or not based on the coverage of its superblock (a set of basic >blocks with the same coverage information). Profiling represents an >interesting use case, indeed. I noted in the 'Potential future work' >section that updating coverage data using: > >add DWORD PTR [rip+0xadd88],1 > >instead of the current: > >mov BYTE PTR [rip+0xadd88],1 > >Enables profiling using a 32-bit counter per superblock. However, a >superblock can span multiple lines in the source code which renders this >counter difficult to relate to. Instead of grouping basic blocks based on >superblocks, we need to think about mapping each source line to its >corresponding basic blocks in the binary and then choose the best BB to >instrument with such a counter. Expectedly, this will incur more overhead. > >Also, note that "RQ2: Instrumentation overhead" is concerned with >evaluating the overhead of bcov only. We did not attempt to directly >compare this overhead to that of llvm-cov or gcov since, as you noted >already, both tools track different artifacts, i.e., not directly >comparable. However, we mentioned that llvm-cov can dump "320GB of coverage >(and profiling) data" if online merging was not enabled. So there was an >explicit emphasis on the role of profiling here. And that discussion was in >the context of showing that the same online merging feature can be >leveraged by bcov as well. > > > >> bcov currently rewrites program headers but not the section header table, >> so for >> example, objdump -d a.any cannot disassemble the synthesized code. >> >> >> > >The original section header table is preserved. But adding two additional >section headers to describe the patch segments is a nice to have feature. I >developed a simple tool to disassemble the patch code which was sufficient >for my debugging needs. I'm planning to open source this tool later. In the >meantime, using gdb as described in the artifacts repository ( >https://github.com/abenkhadra/bcov-artifacts/blob/master/sample-binaries/README.md) >might be the easiest way to inspect the patch code. > > > >> bcov depends on capstone, which appears to be a pretty standard disassembly >> tool... It is based on a ~2013 shimmed-down version of LLVM MC and >> LLVM*Disassembler. The generated files (e.g. >> arch/X86/X86DisassemblerDecoder.c) >> get ad-hoc sporadic amendment from contributors over the years.. It is a >> bit >> unfortunate for LLVM that MC and LLVM*Disassembler (Machine Code, including >> assembly/disassembly libraries) cannot be easily adapted by downstream >> users..... (Well, downstream packages can use LLVM exported CMake files >> and use >> find_program(LLVM) and link against some LLVM* libraries but this may pull >> in a >> large set of dependencies) >>
Simon Atanasyan via llvm-dev
2020-Jul-08 14:25 UTC
[llvm-dev] Introducing the binary-level coverage analysis tool bcov
On Wed, Jul 8, 2020 at 3:17 AM Fangrui Song via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > I noticed another tool in this area which was probably available before 1991. > MIPS pixie. > > From a document of IRIX, https://irix7.com/techpubs/007-2479-001.pdf > p90, I believe it works in a way very similar to bcov. > > Run pixie to generate the equivalent program containing > basic-block-counting code. > % pixie myprog > ... > pixie takes myprog and writes an equivalent program, myprog.pixie, > containing additional code that counts the execution of each basic > block > > IRIX is a discontinued operating system. I can't find more information > on this tool. Hope some MIPS folks on the list can provide more > information.As to me, I cannot provide any additional info. This tool is too ancient. -- Simon Atanasyan
Ammar Ben Khadra via llvm-dev
2020-Jul-08 14:25 UTC
[llvm-dev] Introducing the binary-level coverage analysis tool bcov
Hi Fangrui, The tool "perf -pixie" is interesting indeed. However, the information about it is sparse, as you mentioned. Judging by the statement on p89: "Note that the execution time of an instrumented program is two-to-five times longer than an instrumented one." This large overhead leads me to believe that they implemented the straightforward approach of instrumenting every basic block with a counter. That is, they did not leverage any optimizations. As an aside, Ball and Larus have interesting work on optimal profiling (https://dl.acm.org/doi/10.5555/243846.243857). However, this research line seems to be discontinued. On the other hand, bcov focuses on coverage analysis only but leverages several optimizations to reduce the instrumentation overhead. While coverage analysis is less general than profiling, it is more commonly used during software development. Additionally, I believe that the techniques proposed in our paper, and implemented in bcov, lay the groundwork for interesting future work that can go well beyond profiling. - Ammar On Wed, Jul 8, 2020 at 2:17 AM Fangrui Song <maskray at google.com> wrote:> I noticed another tool in this area which was probably available before > 1991. > MIPS pixie. > > From a document of IRIX, https://irix7.com/techpubs/007-2479-001.pdf > p90, I believe it works in a way very similar to bcov. > > Run pixie to generate the equivalent program containing > basic-block-counting code. > % pixie myprog > ... > pixie takes myprog and writes an equivalent program, myprog.pixie, > containing additional code that counts the execution of each basic > block > > IRIX is a discontinued operating system. I can't find more information > on this tool. Hope some MIPS folks on the list can provide more > information. > > On 2020-06-28, Ammar Ben Khadra wrote: > >Hi Fangrui, > > > >Many thanks for providing such detailed feedback! Please find my comments > >inlined below. > > > >- Ammar > > > > > >On Sun, Jun 28, 2020 at 5:59 AM Fangrui Song <maskray at google.com> wrote: > > > >> On 2020-06-26, Ammar Ben Khadra via llvm-dev wrote: > >> >## TL;DR > >> > > >> >We introduce bcov, an open-source binary-level coverage analysis tool > [1]. > >> >The details are discussed in our paper [2], which is accepted to > >> >ESEC/FSE'20. bcov statically instruments x86-64 ELF binaries without > >> >compiler support. It features several techniques that allow it to > achieve > >> >high performance, transparency, and flexibility. > >> > > >> >For example, running "make check-llvm-codegen" with an instrumented > >> version > >> >of llc introduces a mean performance overhead of about 32%. The > overhead > >> >for smaller binaries can be as low as 2%. bcov accurately reports code > >> >coverage with a mean F-score of 99,86%. All of that without introducing > >> any > >> >test regressions. > >> > > >> > > >> >## Design overview > >> > > >> >Given an ELF module as input, bcov will first analyze it to choose > >> >appropriate probe locations and estimate the size of the program > segments > >> >required for patching. This analysis depends on the instrumentation > policy > >> >chosen by the user. Then, bcov patches the input module by adding two > >> >program segments; one segment contains trampoline code and the other > keeps > >> >coverage data. > >> > > >> >bcov uses trampolines to instrument the binary where it inserts probes > in > >> >targeted locations to track basic block coverage. Each probe consists > of a > >> >detour that diverts control flow to a designated trampoline. The > >> trampoline > >> >updates coverage data using a single pc-relative mov instruction (e.g., > >> mov > >> >BYTE PTR [rip+0xadd88],1), executes relocated instructions, and then > >> >restores control flow to its original state. > >> > > >> >To dump coverage data, the user can inject our small runtime library, > >> >libbcov-rt, using the LD_PRELOAD mechanism. This library scans the > memory > >> >mappings of the current process to dump every data segment that starts > >> with > >> >a specific magic number. This means that coverage data will be dumped > >> >separately for each ELF module. Dumping data takes place at process > >> >shutdown or upon receiving a user signal. The latter feature enables > >> online > >> >coverage tracking. > >> > > >> > > >> >## Key features > >> > > >> >bcov integrates several techniques including: > >> > > >> > - Probe pruning. We instrument only what is necessary for tracking > code > >> >coverage. To this end, we bring Agrawal's probe pruning technique [3] > to > >> >binary-level instrumentation. > >> > > >> > - Precise CFG analysis. Imprecision in the recovered CFG can have > several > >> >effects, which include causing the instrumented binary to crash. To > >> improve > >> >CFG precision, we propose a novel jump table analysis and implement a > >> >variant of an available non-return analysis [4]. > >> > > >> > - Static instrumentation. We implement several techniques like (1) > >> >optimized probe selection, (2) jump table instrumentation, (3) and > >> >systematic detour hosting. > >> > > >> >The result is that our approach can work transparently, and with low > >> >overhead, on large and well-tested C and C++ programs. > >> > > >> > > >> >## Relation to the compiler > >> > > >> >Code coverage analysis at the binary level, as implemented in bcov, can > >> >offer several benefits: first, it is highly flexible. A user can > analyze > >> >the coverage of a particular module only. Even better, coverage > tracking > >> >can be limited to specific functions, e.g., the ones affected by recent > >> >changes. Second, it is largely independent of the used compiler and the > >> >build system. We observed consistent results after experimenting with > four > >> >different versions of gcc and clang and three different build types, > >> >namely, debug, release, and LTO. Finally, the primitives we use to > divert > >> >control-flow and update coverage are simple. One can imagine (and hope) > >> >that extending bcov to system code, or other native languages, will not > >> >require a lot of work. > >> > > >> >However, the viability of static binary-level coverage analysis > ultimately > >> >depends on the compiler toolchain that produces the binary. > Specifically, > >> >we can think of two areas where compilers might help our tool by adding > >> >extra support, or at least by not emitting code that is difficult to > >> >analyze: > >> > > >> > - Source-level coverage. The ability to know whether a particular > basic > >> >block was covered in the binary is quite interesting. However, > developers > >> >would ultimately want to map this coverage information to source-level > >> >artifacts. Statement coverage can be supported already with the help of > >> >dwarf debugging information. But bcov can report the coverage of > >> individual > >> >branch decisions at the binary level. Therefore, leveraging this > >> >information to obtain source-level MC/DC seems to be a natural next > step. > >> >Ideally, we want to infer MC/DC using what we have already, namely, > >> >binary-level coverage and dwarf information. However, adding support > to a > >> >custom mapping format similar to the one available in llvm-cov might be > >> >necessary. In this regard, we appreciate your suggestions on the best > way > >> >to support MC/DC, if at all feasible. (feedback needed) > >> > > >> > - Function model. We assume that a function occupies a continuous code > >> >region. The function's main entry is expected to be at the beginning of > >> >this region. Also, we consider each landing pad to be a function entry > as > >> >well. Based on this, we assume that the identified entries predominate > all > >> >basic blocks inside the function. In other words, a function can not > reach > >> >an arbitrary basic block inside another function without first visiting > >> one > >> >of its entries. This assumption seems to hold pretty well in the > binaries > >> >we experimented with, but we are not sure to what extent it > generalizes to > >> >other compilers and ISAs. (feedback needed) > >> > > >> > > >> >## Potential future work > >> > > >> >Updating coverage using a single pc-relative mov instruction is > >> transparent > >> >in the sense that it does not clobber any general-purpose register. It > >> even > >> >preserves the CPU's eflags. This, in turn, has greatly simplified the > >> >implementation of our prototype. That is, we did not have to think > about > >> >saving/restoring the CPU state. However, putting mechanisms in place to > >> >allow eflags to be *safely* clobbered, e.g., liveness analysis, can > open > >> >the door for several design alternatives where a pc-relative mov (e.g., > >> mov > >> >BYTE PTR [rip+0xadd88],1) can be replaced with: > >> > > >> > - or BYTE PTR [rip+0xadd88],c: where c is one-hot constant. This > will > >> >reduce the size of required coverage data to 1/8 of the size currently > >> >required by bcov. > >> > > >> > - add BYTE PTR [rip+0xadd88],1: 8-bit counter per superblock (or > >> selected > >> >set of basic blocks). This might be useful for fuzzing. > >> > > >> > - add DWORD PTR [rip+0xadd88],1: 32-bit counter per superblock (or > >> >selected set of basic blocks). This can be useful for profiling. > >> > > >> > > >> >So there is significant potential here. However, we have not explored > >> these > >> >ideas any further, and we would highly appreciate any feedback about > their > >> >viability and potential added value in comparison to what is already > >> >available in llvm-cov and sancov. (feedback needed) > >> > > >> >To conclude, we introduced bcov [1] and discussed a few areas where the > >> >LLVM community can help us assess its potential, identify related work, > >> and > >> >plan its future direction. Many thanks in advance for your feedback. > >> > > >> > > >> >Kind regards, > >> > > >> >Ammar > >> > > >> > > >> > [1]: https://github.com/abenkhadra/bcov > >> > [2]: https://arxiv.org/pdf/2004.14191.pdf > >> > [3]: https://dl.acm.org/doi/10.1145/174675.175935 > >> > [4]: https://dl.acm.org/doi/10.1145/2931037.2931047 > >> > >> Hi Ammar, > >> > >> Great work! > >> > >> I think the following statement in the paper is not true: "gcov relies > on > >> debug > >> information which is less accurate in optimized builds." GCC's gcov > >> implementation does not rely on debugging information. This can be > >> verified with > >> -fdump-debug: `gcc --coverage -fdump-debug -c a.c` dumps an empty > >> a.c.*.debug > >> file. > >> > >> > >After a second look at > >https://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html#Gcov-Intro, it seems > >that I misunderstood why it is recommended to disable optimizations in > >gcov. You are right. This should be fixed. Thanks! > > > > > >> clang supports gcov instrumentation as well. clang --coverage does > leverage > >> debugging information, but the usage is very minimum - debugging > >> information is > >> just used to retrieve the start line of a function. I have studied and > >> contributed a bit to clang's gcov implementation recently and thus I am > >> fairly > >> confident about this. > >> > >> > In comparison, llvm-cov features a custom mapping format embedded in > >> > LLVM’s intermediate representation (IR). > >> > >> I don't know whether a clarification is needed here. llvm-cov is a > >> frontend tool > >> which supports two formats: (1) coverage mapping (clang > >> -fprofile-instr-generate > >> -fcoverage-mapping) and (2) gcov. Using -fcoverage-mapping might be > >> clearer in > >> the context. > >> > >> > >Actually, we are referring to the first usage, which is, as I understand, > >the most popular. A clarification can indeed help here. > > > > > >> > Also, sancov is tightly coupled with LLVM sanitizers (e.g., ASan) > which > >> add > >> > varying overhead. Extending bcov with additional feedback signals, > >> similar to > >> > sancov, is an interesting future work > >> > >> SanitizerCoverage is a standalone instrumentation pass > >> (llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp) > >> which is not coupled with asan. -fsanitize-coverage= can be used > >> standalone, or > >> together with asan, lsan, msan, ubsan, etc. > >> > >> Its overhead can be very small, especially if you use the recent > >> inline-bool-flag > >> https://reviews.llvm.org/D77244 > >> (`clang -fsanitize-coverage=inline-bool-flag a.c`) or > inline-8bit-counters. > >> You can choose 3 modes: edge (default), bb, and func. > >> The flags are stored in a section __sancov_bools. > >> There is no standalone dumper though... > >> > >> > >> > >I remember that the term "tightly coupled" was used by a sancov developer > >in a Github issue. I can not find that thread now. This perhaps refers to > >the fact that (please correct me) casual users are not expected to use > >sancov as an independent pass. Instead, they should rely on the runtime > >already provided by one of the sanitizers. I shall revisit this issue to > >improve the wording. However, I think that the main point still holds; > >given this dependence on the sanitizer runtime and the variety of options > >offered by sancov, one can not refer to a 'reference' sancov to report > >its instrumentation overhead. Nevertheless, the functionality offered by > >the sancov option 'trace-pc-guard' is perhaps closest to that in bcov. If > >we use this as the basis for measurement, then the performance overhead > >should be small. Can you please refer me to instructions on how I can use > >sancov without any sanitizer runtime? > > Sadly it can't. trace-pc-guard is the only feature which can currently > dump information to a .sancov file, which can then be inspected by > the 'sancov' tool. > > The runtime must be provided by a sanitizer, > -fsanitize={address,memory,thread,fuzzer,undefined,cfi}. There must be a > runtime. So saying SanitizerCoverage is coupled with sanitizers is > probably fine. > > >A few random comments below. > >> > >> > >> If I understand correctly, by leveraging superblock dominator graph > (from > >> "Dominators, Super Blocks, and Program Coverage") and the any-node > policy, > >> the > >> coverage result is binary: not-covered vs covered. I agree that in many > >> cases > >> this is sufficient. gcov and -fcoverage-mapping, on the other hand, can > >> provide > >> line execution counts, which are handy in other cases. This fact should > >> probably > >> mentioned somewhere, "RQ2: Instrumentation overhead" or another place. > >> Both > >> -fcoverage-mapping and gcov provide more information, so they have > larger > >> overhead (I agree that gcov isn't a format optimized for file sizes - it > >> uses > >> uint32_t records everywhere which can be quite expensive representing > >> smaller > >> integers). > >> > >> > >> > >Yes, the reported coverage is binary. It indicates whether a basic block > is > >covered or not based on the coverage of its superblock (a set of basic > >blocks with the same coverage information). Profiling represents an > >interesting use case, indeed. I noted in the 'Potential future work' > >section that updating coverage data using: > > > >add DWORD PTR [rip+0xadd88],1 > > > >instead of the current: > > > >mov BYTE PTR [rip+0xadd88],1 > > > >Enables profiling using a 32-bit counter per superblock. However, a > >superblock can span multiple lines in the source code which renders this > >counter difficult to relate to. Instead of grouping basic blocks based on > >superblocks, we need to think about mapping each source line to its > >corresponding basic blocks in the binary and then choose the best BB to > >instrument with such a counter. Expectedly, this will incur more overhead. > > > >Also, note that "RQ2: Instrumentation overhead" is concerned with > >evaluating the overhead of bcov only. We did not attempt to directly > >compare this overhead to that of llvm-cov or gcov since, as you noted > >already, both tools track different artifacts, i.e., not directly > >comparable. However, we mentioned that llvm-cov can dump "320GB of > coverage > >(and profiling) data" if online merging was not enabled. So there was an > >explicit emphasis on the role of profiling here. And that discussion was > in > >the context of showing that the same online merging feature can be > >leveraged by bcov as well. > > > > > > > >> bcov currently rewrites program headers but not the section header > table, > >> so for > >> example, objdump -d a.any cannot disassemble the synthesized code. > >> > >> > >> > > > >The original section header table is preserved. But adding two additional > >section headers to describe the patch segments is a nice to have feature. > I > >developed a simple tool to disassemble the patch code which was sufficient > >for my debugging needs. I'm planning to open source this tool later. In > the > >meantime, using gdb as described in the artifacts repository ( > > > https://github.com/abenkhadra/bcov-artifacts/blob/master/sample-binaries/README.md > ) > >might be the easiest way to inspect the patch code. > > > > > > > >> bcov depends on capstone, which appears to be a pretty standard > disassembly > >> tool... It is based on a ~2013 shimmed-down version of LLVM MC and > >> LLVM*Disassembler. The generated files (e.g. > >> arch/X86/X86DisassemblerDecoder.c) > >> get ad-hoc sporadic amendment from contributors over the years.. It is a > >> bit > >> unfortunate for LLVM that MC and LLVM*Disassembler (Machine Code, > including > >> assembly/disassembly libraries) cannot be easily adapted by downstream > >> users..... (Well, downstream packages can use LLVM exported CMake files > >> and use > >> find_program(LLVM) and link against some LLVM* libraries but this may > pull > >> in a > >> large set of dependencies) > >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200708/eaa7a0f6/attachment.html>