Hello, I have few questions about coverage. Is there any user-facing documentation for clang's "-coverage" flag? The coverage instrumentation seems to happen before asan, and so if asan is also enabled asan will instrument accesses to @__llvm_gcov_ctr. This is undesirable and so we'd like to skip these accesses. Looks like GEP around @__llvm_gcov_ctr have special metadata attached: %2 = getelementptr inbounds [4 x i64]* @__llvm_gcov_ctr, i64 0, i64 %1 %3 = load i64* %2, align 8 %4 = add i64 %3, 1 store i64 %4, i64* %2, align 8 ... !1 = metadata !{...; [ DW_TAG_compile_unit ] ... /home/kcc/tmp/cond.cc] [DW_LANG_C_plus_plus] Can we rely on having this metadata attached to @__llvm_gcov_ctr? Should we attach some metadata to the actual accesses as well, or simply find the corresponding GEP? Finally, does anyone have performance numbers for coverage? As of today it seems completely thread-hostile since __llvm_gcov_ctr is not thread-local. A simple stress test shows that coverage slows down by 50x! % cat ~/tmp/coverage_mt.cc #include <pthread.h> __thread int x; __attribute__((noinline)) void foo() { x++; } void *Thread(void *) { for (int i = 0; i < 100000000; i++) foo(); return 0; } int main() { static const int kNumThreads = 16; pthread_t t[kNumThreads]; for (int i = 0; i < kNumThreads; i++) pthread_create(&t[i], 0, Thread, 0); for (int i = 0; i < kNumThreads; i++) pthread_join(t[i], 0); return 0; } % clang -O2 ~/tmp/coverage_mt.cc -lpthread ; time ./a.out TIME: real: 0.284; user: 3.560; system: 0.000 % clang -O2 ~/tmp/coverage_mt.cc -lpthread -coverage ; time ./a.out TIME: real: 13.327; user: 174.510; system: 0.000 Any principal objections against making __llvm_gcov_ctr thread-local, perhaps under a flag? If anyone is curious, my intent is to enable running coverage and asan in one process. Thanks, --kcc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131003/2a935104/attachment.html>
Another question is about the performance of coverage's at-exit actions (dumping coverage data on disk). I've built chromium's base_unittests with -fprofile-arcs -ftest-coverage and the coverage's at-exit hook takes 22 seconds, which is 44x more than I am willing to pay. Most of the time is spent here: #0 0x00007ffff3b034cd in msync () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000000003a8c818 in llvm_gcda_end_file () #2 0x0000000003a8c914 in llvm_writeout_files () #3 0x00007ffff2f5e901 in __run_exit_handlers The test depends on ~700 source files and so the profiling library calls msync ~700 times. Full chromium depends on ~12000 source files, so we'll be dumping the coverage data for 5 minutes this way. I understand that we have to support the lcov/gcov format (broken in may ways) and this may be the reason for being slow. But I really need something much faster (and maybe simpler). Is anyone planing any work on coverage in the nearest months? If no, we'll probably cook something simple and gcov-independent. Thoughts? --kcc On Thu, Oct 3, 2013 at 6:47 PM, Kostya Serebryany <kcc at google.com> wrote:> Hello, > > I have few questions about coverage. > > Is there any user-facing documentation for clang's "-coverage" flag? > The coverage instrumentation seems to happen before asan, and so if asan > is also enabled > asan will instrument accesses to @__llvm_gcov_ctr. > This is undesirable and so we'd like to skip these accesses. > Looks like GEP around @__llvm_gcov_ctr have special metadata attached: > %2 = getelementptr inbounds [4 x i64]* @__llvm_gcov_ctr, i64 0, i64 %1 > %3 = load i64* %2, align 8 > %4 = add i64 %3, 1 > store i64 %4, i64* %2, align 8 > ... > !1 = metadata !{...; [ DW_TAG_compile_unit ] ... /home/kcc/tmp/cond.cc] > [DW_LANG_C_plus_plus] > > Can we rely on having this metadata attached to @__llvm_gcov_ctr? > Should we attach some metadata to the actual accesses as well, or simply > find the corresponding GEP? > > Finally, does anyone have performance numbers for coverage? > As of today it seems completely thread-hostile since __llvm_gcov_ctr is > not thread-local. > A simple stress test shows that coverage slows down by 50x! > % cat ~/tmp/coverage_mt.cc > #include <pthread.h> > __thread int x; > __attribute__((noinline)) > void foo() { > x++; > } > > void *Thread(void *) { > for (int i = 0; i < 100000000; i++) > foo(); > return 0; > } > > int main() { > static const int kNumThreads = 16; > pthread_t t[kNumThreads]; > for (int i = 0; i < kNumThreads; i++) > pthread_create(&t[i], 0, Thread, 0); > for (int i = 0; i < kNumThreads; i++) > pthread_join(t[i], 0); > return 0; > } > > % clang -O2 ~/tmp/coverage_mt.cc -lpthread ; time ./a.out > TIME: real: 0.284; user: 3.560; system: 0.000 > % clang -O2 ~/tmp/coverage_mt.cc -lpthread -coverage ; time ./a.out > TIME: real: 13.327; user: 174.510; system: 0.000 > > Any principal objections against making __llvm_gcov_ctr thread-local, > perhaps under a flag? > > If anyone is curious, my intent is to enable running coverage and asan in > one process. > > Thanks, > --kcc >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131004/0b024294/attachment.html>
The instrumentation that I have proposed (on cfe-dev) for PGO is also intended to provide the necessary info for code coverage. I have not yet measured the performance of the code to write out the data, but it ought to be quite a bit faster than what we have now. On Oct 4, 2013, at 1:40 AM, Kostya Serebryany <kcc at google.com> wrote:> Another question is about the performance of coverage's at-exit actions (dumping coverage data on disk). > I've built chromium's base_unittests with -fprofile-arcs -ftest-coverage and the coverage's at-exit hook takes 22 seconds, > which is 44x more than I am willing to pay. > Most of the time is spent here: > #0 0x00007ffff3b034cd in msync () at ../sysdeps/unix/syscall-template.S:82 > #1 0x0000000003a8c818 in llvm_gcda_end_file () > #2 0x0000000003a8c914 in llvm_writeout_files () > #3 0x00007ffff2f5e901 in __run_exit_handlers > The test depends on ~700 source files and so the profiling library calls msync ~700 times. > Full chromium depends on ~12000 source files, so we'll be dumping the coverage data for 5 minutes this way. > I understand that we have to support the lcov/gcov format (broken in may ways) and this may be the reason for being slow. > But I really need something much faster (and maybe simpler). > > Is anyone planing any work on coverage in the nearest months? > If no, we'll probably cook something simple and gcov-independent. > Thoughts? > > --kcc > > > On Thu, Oct 3, 2013 at 6:47 PM, Kostya Serebryany <kcc at google.com> wrote: > Hello, > > I have few questions about coverage. > > Is there any user-facing documentation for clang's "-coverage" flag? > The coverage instrumentation seems to happen before asan, and so if asan is also enabled > asan will instrument accesses to @__llvm_gcov_ctr. > This is undesirable and so we'd like to skip these accesses. > Looks like GEP around @__llvm_gcov_ctr have special metadata attached: > %2 = getelementptr inbounds [4 x i64]* @__llvm_gcov_ctr, i64 0, i64 %1 > %3 = load i64* %2, align 8 > %4 = add i64 %3, 1 > store i64 %4, i64* %2, align 8 > ... > !1 = metadata !{...; [ DW_TAG_compile_unit ] ... /home/kcc/tmp/cond.cc] [DW_LANG_C_plus_plus] > > Can we rely on having this metadata attached to @__llvm_gcov_ctr? > Should we attach some metadata to the actual accesses as well, or simply find the corresponding GEP? > > Finally, does anyone have performance numbers for coverage? > As of today it seems completely thread-hostile since __llvm_gcov_ctr is not thread-local. > A simple stress test shows that coverage slows down by 50x! > % cat ~/tmp/coverage_mt.cc > #include <pthread.h> > __thread int x; > __attribute__((noinline)) > void foo() { > x++; > } > > void *Thread(void *) { > for (int i = 0; i < 100000000; i++) > foo(); > return 0; > } > > int main() { > static const int kNumThreads = 16; > pthread_t t[kNumThreads]; > for (int i = 0; i < kNumThreads; i++) > pthread_create(&t[i], 0, Thread, 0); > for (int i = 0; i < kNumThreads; i++) > pthread_join(t[i], 0); > return 0; > } > > % clang -O2 ~/tmp/coverage_mt.cc -lpthread ; time ./a.out > TIME: real: 0.284; user: 3.560; system: 0.000 > % clang -O2 ~/tmp/coverage_mt.cc -lpthread -coverage ; time ./a.out > TIME: real: 13.327; user: 174.510; system: 0.000 > > Any principal objections against making __llvm_gcov_ctr thread-local, perhaps under a flag? > > If anyone is curious, my intent is to enable running coverage and asan in one process. > > Thanks, > --kcc > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131004/f333db8b/attachment.html>
Seemingly Similar Threads
- [LLVMdev] question about -coverage
- Writing a test for gcov style coverage crashing after dlclose
- [LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
- [LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
- [LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)