Hello all, below is a proposal to include LLCov, a simple but helpful little tool based on LLVM/Clang, into the main LLVM code. Tl;dr: It's a module pass that instruments basic blocks with calls to an external function, and it can be used for various things, including (live!) basic block coverage. I'm looking forward to hear opinions on this :) Best, Chris === Problem description == Code coverage always has been considered an important aspect in testing. Especially for automated testing (e.g. fuzzing), coverage is a requirement for success. Some recent fuzzing research is going into the direction of genetic algorithms where coverage can be a part of the fitness function. However, applying this all to a large codebase in a practical way is a complex endeavor. Popular code coverage tools like GCov are not exactly designed to be used to obtain coverage while the program is running. Since we want to make decisions based on coverage without terminating the program though (mainly for performance reasons but depending on the type of fuzzing also because one would like to alter the mutation strategy mid-fuzzing), we need to get coverage feedback live when it happens. Furthermore, we are often not interested in all of the coverage. Often, a particular portion of the code is targeted and the rest (which is the majority) would only slow us down if instrumented. === Proposed solution ==I propose to include LLCov into the main LLVM tree. LLCov is implemented as a module pass and allows to selectively instrument code portions for basic block coverage measurement (or any other task that should be performed per basic block). It can instrument based on a combination of black- and whitelist that works based on files, lines or functions. All of the instrumented code calls an arbitrary external function per basic block (that is, per control flow node). This external function can do whatever the tester wants it to do. The simplest task would be to output coverage information on stderr and have the fuzzer collect it there. It could also provide the information over a network socket though. === Current status of the tool ==The current LLCov code is maintained at https://github.com/choller/LLCov and consists of the main LLCov.cpp file, implementing the module pass, as well as two patches (one integrating the LLVM pass, the other patching the Clang frontend to support the necessary compiler flag and to link the runtime). Over the time, the module pass itself only required little adjustment (e.g. some includes changed), but rebasing the patches for the frontend typically required manual work. === Alternatives ==One alternative would be to add an interface such that the changes required to integrate this and other passes (especially into the Clang frontend), can be made dynamically. I'm not sure if this is possible though. Another alternative would be to add this functionality to the GCov pass, but I am not sure if that is easily doable given the way GCov typically works. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4312 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140616/3ac9710b/attachment.bin>
Kostya Serebryany
2014-Jul-16 14:42 UTC
[LLVMdev] Proposal for the inclusion of LLCov code
Chris, [I missed this message, thanks glider to pointing me to it] Do you want a general mechanism or just coverage? The general mechanism could make sense for general things (just like gcc's -finstrument-functions, but at BB level), but it is less suitable for coverage for performance reasons -- calling a function per every block is expensive. If you want fuzzing with genetic algorithms take a look here: https://code.google.com/p/address-sanitizer/wiki/AsanCoverage We are already doing this kind of things with asan :) And asan already supports blacklists. --kcc On Tue, Jun 17, 2014 at 12:54 AM, Christian Holler <choller at mozilla.com> wrote:> Hello all, > > below is a proposal to include LLCov, a simple but helpful little tool > based on LLVM/Clang, into the main LLVM code. > > Tl;dr: It's a module pass that instruments basic blocks with calls to an > external function, and it can be used for various things, including > (live!) basic block coverage. > > > I'm looking forward to hear opinions on this :) > > > Best, > > Chris > > > === Problem description ==> > Code coverage always has been considered an important aspect in testing. > Especially for automated testing (e.g. fuzzing), coverage is a > requirement for success. Some recent fuzzing research is going into the > direction of genetic algorithms where coverage can be a part of the > fitness function. > However, applying this all to a large codebase in a practical way is a > complex endeavor. Popular code coverage tools like GCov are not exactly > designed to be used to obtain coverage while the program is running. > Since we want to make decisions based on coverage without terminating > the program though (mainly for performance reasons but depending on the > type of fuzzing also because one would like to alter the mutation > strategy mid-fuzzing), we need to get coverage feedback live when it > happens. Furthermore, we are often not interested in all of the > coverage. Often, a particular portion of the code is targeted and the > rest (which is the majority) would only slow us down if instrumented. > > === Proposed solution ==> I propose to include LLCov into the main LLVM tree. LLCov is implemented > as a module pass and allows to selectively instrument code portions for > basic block coverage measurement (or any other task that should be > performed per basic block). It can instrument based on a combination of > black- and whitelist that works based on files, lines or functions. All > of the instrumented code calls an arbitrary external function per basic > block (that is, per control flow node). This external function can do > whatever the tester wants it to do. The simplest task would be to output > coverage information on stderr and have the fuzzer collect it there. It > could also provide the information over a network socket though. > > === Current status of the tool ==> The current LLCov code is maintained at https://github.com/choller/LLCov > and consists of the main LLCov.cpp file, implementing the module pass, > as well as two patches (one integrating the LLVM pass, the other > patching the Clang frontend to support the necessary compiler flag and > to link the runtime). Over the time, the module pass itself only > required little adjustment (e.g. some includes changed), but rebasing > the patches for the frontend typically required manual work. > > === Alternatives ==> One alternative would be to add an interface such that the changes > required to integrate this and other passes (especially into the Clang > frontend), can be made dynamically. I'm not sure if this is possible > though. > Another alternative would be to add this functionality to the GCov pass, > but I am not sure if that is easily doable given the way GCov typically > works. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140716/06e7ef8e/attachment.html>