I've built chromium with " -fprofile-instr-generate -fsanitize=address" -- the performance looks good! The file format from r198638 is indeed rudimentary. Do you already know how the real output format will look like? Just to summarize what I think is important: - minimal size on disk, minimal amount of files - minimal i/o while writing to disk, no lockf or some such - separate process produces separate file(s) - fast merging of outputs from different runs of the same binary - only the coverage output files and the binary (+DSOs) are required to "symbolize" the coverage data (map the data to file names & line numbers) Ideally, symbolizing the cov data will use the debug info in the binary (i.e. llvm-symbolizer, addr2-line or atos), this is what we've done in AsanCov, but I don't see a clean way to make it work with counters... --kcc On Wed, Feb 19, 2014 at 2:08 PM, Kostya Serebryany <kcc at google.com> wrote:> Ah, that's -fprofile-instr-generate > > > On Wed, Feb 19, 2014 at 2:03 PM, Kostya Serebryany <kcc at google.com> wrote: > >> >> >> >> On Tue, Feb 18, 2014 at 10:15 PM, Bob Wilson <bob.wilson at apple.com>wrote: >> >>> Our instrumentation code is basically done now, so you can try it out >>> and compare the performance. Just build with -finstr-profile-generate. >>> >> >> Is this in trunk? >> clang-3.5: error: unknown argument: '-finstr-profile-generate' >> >> --kcc >> >> >>> You may want to hack the compiler-rt code in lib/profile/PGOProfiling.c >>> to disable writing the output, since the current implementation is pretty >>> naive. >>> >>> If it turns out as you suggest, and the different kinds of >>> instrumentation serve different purposes, then I think it would make sense >>> to do both. >>> >>> On Feb 18, 2014, at 3:13 AM, Kostya Serebryany <kcc at google.com> wrote: >>> >>> Regarding performance, I've made a simple coverage with counters and >>> compared it with AsanCoverage. >>> >>> AsanCoverage produces code like this: >>> mov 0xe86cce(%rip),%al >>> test %al,%al >>> je 48b4a0 # to call __sanitizer_cov >>> ... >>> callq 4715b0 <__sanitizer_cov> >>> >>> A simple counter-based thing (which just increments counters and does >>> nothing else useful) produces this: >>> incq 0xe719c6(%rip) >>> >>> The performance is more or less the same, although the issue with false >>> sharing still remains >>> (http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066116.html) >>> >>> Do you have any more details about the planned clang coverage? >>> >>> Thanks, >>> >>> --kcc >>> >>> >>> On Tue, Feb 18, 2014 at 1:00 PM, Kostya Serebryany <kcc at google.com>wrote: >>> >>>> >>>> >>>> >>>> On Tue, Feb 18, 2014 at 12:20 AM, Bob Wilson <bob.wilson at apple.com>wrote: >>>> >>>>> >>>>> On Feb 17, 2014, at 5:13 AM, Kostya Serebryany <kcc at google.com> wrote: >>>>> > Then my question: will there be any objection if I disentangle >>>>> AsanCoverage from ASan and make it a separate LLVM phase with the proper >>>>> clang driver support? >>>>> > Or it will be an unwelcome competition with the planned clang >>>>> coverage? >>>>> >>>>> I don’t view it as a competition, but assuming that we both succeed in >>>>> our plans, LLVM would then end up with two very similar solutions for code >>>>> coverage. Does it really make sense to have two? >>>> >>>> >>>> It depends. If the two will indeed have the same functionality -- then >>>> no. >>>> My understanding about your plans is that the upcoming coverage will >>>> provide "counters" (== how many times a bb/edge was executed). >>>> AsanCoverage produces booleans (== 1, iff a function/bb was executed), >>>> which is less information, but faster. >>>> How much faster -- I can't tell w/o your performance numbers. >>>> For our early users the performance is critical and booleans are >>>> sufficient. >>>> >>>> If we end up needing both variants, we may want to keep them similar >>>> from user perspective, e.g. have flag combinations like these: >>>> -coverage=per-edge,counter >>>> -coverage=per-function,counter >>>> -coverage=per-block,counter >>>> -coverage=per-function,boolean >>>> -coverage=per-block,boolean >>>> >>>> --kcc >>>> >>> >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140219/6d6544d8/attachment.html>
Kostya Serebryany <kcc at google.com> writes:> I've built chromium with " -fprofile-instr-generate -fsanitize=address" > -- the performance looks good! > The file format from r198638 is indeed rudimentary. > Do you already know how the real output format will look like?We have an idea of what it will look like, but we're still working out some details.> Just to summarize what I think is important: > - minimal size on disk, minimal amount of files > - minimal i/o while writing to disk, no lockf or some such > - separate process produces separate file(s) > - fast merging of outputs from different runs of the same binary > - only the coverage output files and the binary (+DSOs) are required to > "symbolize" the coverage data (map the data to file names & line numbers)I think we agree on all of these goals. Specifically, we're going for: - A binary format, one file per program (rather than per-source or per-object). - Each run of the process generates data for that run only. - Merging outputs will be done by a separate tool - The data file and the binary should be sufficient for symbolizing. Additionally, we want fast lookup of data for a particular function, but that's less important for coverage than it is for PGO.> Ideally, symbolizing the cov data will use the debug info in the binary (i.e. > llvm-symbolizer, addr2-line or atos), > this is what we've done in AsanCov, but I don't see a clean way to make it > work with counters...We may need some additional info. I haven't put a ton of thought into this, but I'm hoping we can either (a) use debug info as is or add some extra (valid) debug info to support this, or (b) add an extra debug-info-like section to instrumented binaries with the information we need.
> > > > We may need some additional info.What kind of additional info?> I haven't put a ton of thought into > this, but I'm hoping we can either (a) use debug info as is or add some > extra (valid) debug info to support this, or (b) add an extra > debug-info-like section to instrumented binaries with the information we > need. >I'd try this data format (binary equivalent): /path/to/binary/or/dso1 num_counters1 pc1 counter1 pc2 counter2 pc3 counter3 ... /path/to/binary/or/dso2 num_counters2 pc1 counter1 pc2 counter2 pc3 counter3 ... I don't see a straightforward way to produce such data today because individual Instructions do not work as labels. But I think this can be supported in LLVM codegen. Here is a *raw* patch with comments, just to get the idea. Index: lib/CodeGen/CodeGenPGO.cpp ==================================================================--- lib/CodeGen/CodeGenPGO.cpp (revision 201843) +++ lib/CodeGen/CodeGenPGO.cpp (working copy) @@ -199,7 +199,8 @@ llvm::Type *Args[] = { Int8PtrTy, // const char *MangledName Int32Ty, // uint32_t NumCounters - Int64PtrTy // uint64_t *Counters + Int64PtrTy, // uint64_t *Counters + Int64PtrTy // uint64_t *PCs }; llvm::FunctionType *FTy llvm::FunctionType::get(PGOBuilder.getVoidTy(), Args, false); @@ -209,9 +210,10 @@ llvm::Constant *MangledName CGM.GetAddrOfConstantCString(CGM.getMangledName(GD), "__llvm_pgo_name"); MangledName = llvm::ConstantExpr::getBitCast(MangledName, Int8PtrTy); - PGOBuilder.CreateCall3(EmitFunc, MangledName, + PGOBuilder.CreateCall4(EmitFunc, MangledName, PGOBuilder.getInt32(NumRegionCounters), - PGOBuilder.CreateBitCast(RegionCounters, Int64PtrTy)); + PGOBuilder.CreateBitCast(RegionCounters, Int64PtrTy), + PGOBuilder.CreateBitCast(RegionPCs, Int64PtrTy)); } llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule &CGM) { @@ -769,6 +771,13 @@ llvm::GlobalVariable::PrivateLinkage, llvm::Constant::getNullValue(CounterTy), "__llvm_pgo_ctr"); + + RegionPCs + new llvm::GlobalVariable(CGM.getModule(), CounterTy, false, + llvm::GlobalVariable::PrivateLinkage, + llvm::Constant::getNullValue(CounterTy), + "__llvm_pgo_pcs"); + } void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, unsigned Counter) { @@ -779,6 +788,21 @@ llvm::Value *Count = Builder.CreateLoad(Addr, "pgocount"); Count = Builder.CreateAdd(Count, Builder.getInt64(1)); Builder.CreateStore(Count, Addr); + // We should put the PC of the instruction that increments __llvm_pgo_ctr + // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit. + // This patch is wrong in many ways: + // * We pass the PC of the Function instead of the PC of the Instruction, + // because the latter doesn't work like this. We'll need to support + // Instructions as labels in LLVM codegen. + // * We actually store the PC on each increment, while we should initialize + // this array at link time (need to refactor this code a bit). + // + Builder.CreateStore( + Builder.CreatePointerCast( + cast<llvm::Instruction>(Count)->getParent()->getParent(), + Builder.getInt64Ty() // FIXME: use a better type + ), + Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter)); } Index: lib/CodeGen/CodeGenPGO.h ==================================================================--- lib/CodeGen/CodeGenPGO.h (revision 201843) +++ lib/CodeGen/CodeGenPGO.h (working copy) @@ -59,6 +59,7 @@ unsigned NumRegionCounters; llvm::GlobalVariable *RegionCounters; + llvm::GlobalVariable *RegionPCs; llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap; llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap; std::vector<uint64_t> *RegionCounts; @@ -66,8 +67,9 @@ public: CodeGenPGO(CodeGenModule &CGM) - : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionCounterMap(0), - StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {} + : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionPCs(0), + RegionCounterMap(0), StmtCountMap(0), RegionCounts(0), + CurrentRegionCount(0) {} ~CodeGenPGO() {} /// Whether or not we have PGO region data for the current function. This is -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140221/b01a08fa/attachment.html>
Reasonably Related Threads
- [LLVMdev] asan coverage
- llvm 5.0 release rc1 : ExecutionEngine fatal error on MCJIT::getFunctionAddress
- [ORC JIT] Exposing IndirectStubsManager from CompileOnDemandLayer.h
- [ORC JIT] Exposing IndirectStubsManager from CompileOnDemandLayer.h
- [ORC JIT] Exposing IndirectStubsManager from CompileOnDemandLayer.h