> > > > We may need some additional info.What kind of additional info?> I haven't put a ton of thought into > this, but I'm hoping we can either (a) use debug info as is or add some > extra (valid) debug info to support this, or (b) add an extra > debug-info-like section to instrumented binaries with the information we > need. >I'd try this data format (binary equivalent): /path/to/binary/or/dso1 num_counters1 pc1 counter1 pc2 counter2 pc3 counter3 ... /path/to/binary/or/dso2 num_counters2 pc1 counter1 pc2 counter2 pc3 counter3 ... I don't see a straightforward way to produce such data today because individual Instructions do not work as labels. But I think this can be supported in LLVM codegen. Here is a *raw* patch with comments, just to get the idea. Index: lib/CodeGen/CodeGenPGO.cpp ==================================================================--- lib/CodeGen/CodeGenPGO.cpp (revision 201843) +++ lib/CodeGen/CodeGenPGO.cpp (working copy) @@ -199,7 +199,8 @@ llvm::Type *Args[] = { Int8PtrTy, // const char *MangledName Int32Ty, // uint32_t NumCounters - Int64PtrTy // uint64_t *Counters + Int64PtrTy, // uint64_t *Counters + Int64PtrTy // uint64_t *PCs }; llvm::FunctionType *FTy llvm::FunctionType::get(PGOBuilder.getVoidTy(), Args, false); @@ -209,9 +210,10 @@ llvm::Constant *MangledName CGM.GetAddrOfConstantCString(CGM.getMangledName(GD), "__llvm_pgo_name"); MangledName = llvm::ConstantExpr::getBitCast(MangledName, Int8PtrTy); - PGOBuilder.CreateCall3(EmitFunc, MangledName, + PGOBuilder.CreateCall4(EmitFunc, MangledName, PGOBuilder.getInt32(NumRegionCounters), - PGOBuilder.CreateBitCast(RegionCounters, Int64PtrTy)); + PGOBuilder.CreateBitCast(RegionCounters, Int64PtrTy), + PGOBuilder.CreateBitCast(RegionPCs, Int64PtrTy)); } llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule &CGM) { @@ -769,6 +771,13 @@ llvm::GlobalVariable::PrivateLinkage, llvm::Constant::getNullValue(CounterTy), "__llvm_pgo_ctr"); + + RegionPCs + new llvm::GlobalVariable(CGM.getModule(), CounterTy, false, + llvm::GlobalVariable::PrivateLinkage, + llvm::Constant::getNullValue(CounterTy), + "__llvm_pgo_pcs"); + } void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, unsigned Counter) { @@ -779,6 +788,21 @@ llvm::Value *Count = Builder.CreateLoad(Addr, "pgocount"); Count = Builder.CreateAdd(Count, Builder.getInt64(1)); Builder.CreateStore(Count, Addr); + // We should put the PC of the instruction that increments __llvm_pgo_ctr + // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit. + // This patch is wrong in many ways: + // * We pass the PC of the Function instead of the PC of the Instruction, + // because the latter doesn't work like this. We'll need to support + // Instructions as labels in LLVM codegen. + // * We actually store the PC on each increment, while we should initialize + // this array at link time (need to refactor this code a bit). + // + Builder.CreateStore( + Builder.CreatePointerCast( + cast<llvm::Instruction>(Count)->getParent()->getParent(), + Builder.getInt64Ty() // FIXME: use a better type + ), + Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter)); } Index: lib/CodeGen/CodeGenPGO.h ==================================================================--- lib/CodeGen/CodeGenPGO.h (revision 201843) +++ lib/CodeGen/CodeGenPGO.h (working copy) @@ -59,6 +59,7 @@ unsigned NumRegionCounters; llvm::GlobalVariable *RegionCounters; + llvm::GlobalVariable *RegionPCs; llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap; llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap; std::vector<uint64_t> *RegionCounts; @@ -66,8 +67,9 @@ public: CodeGenPGO(CodeGenModule &CGM) - : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionCounterMap(0), - StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {} + : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionPCs(0), + RegionCounterMap(0), StmtCountMap(0), RegionCounts(0), + CurrentRegionCount(0) {} ~CodeGenPGO() {} /// Whether or not we have PGO region data for the current function. This is -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140221/b01a08fa/attachment.html>
We’re not going to use debug info at all. We’re emitting the counters in the clang front-end. We just need to emit separate info to show how to map those counters to source locations. Mapping to PCs and then using debug info to get from the PCs to the source locations just makes things harder and loses information in the process. On Feb 21, 2014, at 2:57 AM, Kostya Serebryany <kcc at google.com> wrote:> > > We may need some additional info. > What kind of additional info? > > I haven't put a ton of thought into > this, but I'm hoping we can either (a) use debug info as is or add some > extra (valid) debug info to support this, or (b) add an extra > debug-info-like section to instrumented binaries with the information we > need. > > I'd try this data format (binary equivalent): > > /path/to/binary/or/dso1 num_counters1 > pc1 counter1 > pc2 counter2 > pc3 counter3 > ... > /path/to/binary/or/dso2 num_counters2 > pc1 counter1 > pc2 counter2 > pc3 counter3 > ... > > I don't see a straightforward way to produce such data today because individual Instructions do not work as labels. > But I think this can be supported in LLVM codegen. > Here is a *raw* patch with comments, just to get the idea. > > > Index: lib/CodeGen/CodeGenPGO.cpp > ==================================================================> --- lib/CodeGen/CodeGenPGO.cpp (revision 201843) > +++ lib/CodeGen/CodeGenPGO.cpp (working copy) > @@ -199,7 +199,8 @@ > llvm::Type *Args[] = { > Int8PtrTy, // const char *MangledName > Int32Ty, // uint32_t NumCounters > - Int64PtrTy // uint64_t *Counters > + Int64PtrTy, // uint64_t *Counters > + Int64PtrTy // uint64_t *PCs > }; > llvm::FunctionType *FTy > llvm::FunctionType::get(PGOBuilder.getVoidTy(), Args, false); > @@ -209,9 +210,10 @@ > llvm::Constant *MangledName > CGM.GetAddrOfConstantCString(CGM.getMangledName(GD), "__llvm_pgo_name"); > MangledName = llvm::ConstantExpr::getBitCast(MangledName, Int8PtrTy); > - PGOBuilder.CreateCall3(EmitFunc, MangledName, > + PGOBuilder.CreateCall4(EmitFunc, MangledName, > PGOBuilder.getInt32(NumRegionCounters), > - PGOBuilder.CreateBitCast(RegionCounters, Int64PtrTy)); > + PGOBuilder.CreateBitCast(RegionCounters, Int64PtrTy), > + PGOBuilder.CreateBitCast(RegionPCs, Int64PtrTy)); > } > > llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule &CGM) { > @@ -769,6 +771,13 @@ > llvm::GlobalVariable::PrivateLinkage, > llvm::Constant::getNullValue(CounterTy), > "__llvm_pgo_ctr"); > + > + RegionPCs > + new llvm::GlobalVariable(CGM.getModule(), CounterTy, false, > + llvm::GlobalVariable::PrivateLinkage, > + llvm::Constant::getNullValue(CounterTy), > + "__llvm_pgo_pcs"); > + > } > > void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, unsigned Counter) { > @@ -779,6 +788,21 @@ > llvm::Value *Count = Builder.CreateLoad(Addr, "pgocount"); > Count = Builder.CreateAdd(Count, Builder.getInt64(1)); > Builder.CreateStore(Count, Addr); > + // We should put the PC of the instruction that increments __llvm_pgo_ctr > + // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit. > + // This patch is wrong in many ways: > + // * We pass the PC of the Function instead of the PC of the Instruction, > + // because the latter doesn't work like this. We'll need to support > + // Instructions as labels in LLVM codegen. > + // * We actually store the PC on each increment, while we should initialize > + // this array at link time (need to refactor this code a bit). > + // > + Builder.CreateStore( > + Builder.CreatePointerCast( > + cast<llvm::Instruction>(Count)->getParent()->getParent(), > + Builder.getInt64Ty() // FIXME: use a better type > + ), > + Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter)); > } > > Index: lib/CodeGen/CodeGenPGO.h > ==================================================================> --- lib/CodeGen/CodeGenPGO.h (revision 201843) > +++ lib/CodeGen/CodeGenPGO.h (working copy) > @@ -59,6 +59,7 @@ > > unsigned NumRegionCounters; > llvm::GlobalVariable *RegionCounters; > + llvm::GlobalVariable *RegionPCs; > llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap; > llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap; > std::vector<uint64_t> *RegionCounts; > @@ -66,8 +67,9 @@ > > public: > CodeGenPGO(CodeGenModule &CGM) > - : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionCounterMap(0), > - StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {} > + : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionPCs(0), > + RegionCounterMap(0), StmtCountMap(0), RegionCounts(0), > + CurrentRegionCount(0) {} > ~CodeGenPGO() {} > > /// Whether or not we have PGO region data for the current function. This is > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140221/2f12cf01/attachment.html>
I understand why you don't want to rely on debug info and instead produce your own section. We did this with our early version of llvm-based tsan and it was simpler to implement. But here is a data point to support my suggestion: chromium binary built with asan, coverage and -gline-tables-only is 1.6Gb. The same binary is 1.1Gb when stripped, so, the line tables require 500Mb. Separate line info for coverage will essentially double this amount. The size of binary is a serious concern for our users, please take it into consideration. Thanks! --kcc On Fri, Feb 21, 2014 at 8:28 PM, Bob Wilson <bob.wilson at apple.com> wrote:> We’re not going to use debug info at all. We’re emitting the counters in > the clang front-end. We just need to emit separate info to show how to map > those counters to source locations. Mapping to PCs and then using debug > info to get from the PCs to the source locations just makes things harder > and loses information in the process. > > On Feb 21, 2014, at 2:57 AM, Kostya Serebryany <kcc at google.com> wrote: > > >> >> We may need some additional info. > > What kind of additional info? > > >> I haven't put a ton of thought into >> this, but I'm hoping we can either (a) use debug info as is or add some >> extra (valid) debug info to support this, or (b) add an extra >> debug-info-like section to instrumented binaries with the information we >> need. >> > > I'd try this data format (binary equivalent): > > /path/to/binary/or/dso1 num_counters1 > pc1 counter1 > pc2 counter2 > pc3 counter3 > ... > /path/to/binary/or/dso2 num_counters2 > pc1 counter1 > pc2 counter2 > pc3 counter3 > ... > > I don't see a straightforward way to produce such data today because > individual Instructions do not work as labels. > But I think this can be supported in LLVM codegen. > Here is a *raw* patch with comments, just to get the idea. > > > Index: lib/CodeGen/CodeGenPGO.cpp > ==================================================================> --- lib/CodeGen/CodeGenPGO.cpp (revision 201843) > +++ lib/CodeGen/CodeGenPGO.cpp (working copy) > @@ -199,7 +199,8 @@ > llvm::Type *Args[] = { > Int8PtrTy, // const char *MangledName > Int32Ty, // uint32_t NumCounters > - Int64PtrTy // uint64_t *Counters > + Int64PtrTy, // uint64_t *Counters > + Int64PtrTy // uint64_t *PCs > }; > llvm::FunctionType *FTy > llvm::FunctionType::get(PGOBuilder.getVoidTy(), Args, false); > @@ -209,9 +210,10 @@ > llvm::Constant *MangledName > CGM.GetAddrOfConstantCString(CGM.getMangledName(GD), > "__llvm_pgo_name"); > MangledName = llvm::ConstantExpr::getBitCast(MangledName, Int8PtrTy); > - PGOBuilder.CreateCall3(EmitFunc, MangledName, > + PGOBuilder.CreateCall4(EmitFunc, MangledName, > PGOBuilder.getInt32(NumRegionCounters), > - PGOBuilder.CreateBitCast(RegionCounters, > Int64PtrTy)); > + PGOBuilder.CreateBitCast(RegionCounters, > Int64PtrTy), > + PGOBuilder.CreateBitCast(RegionPCs, Int64PtrTy)); > } > > llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule &CGM) { > @@ -769,6 +771,13 @@ > llvm::GlobalVariable::PrivateLinkage, > llvm::Constant::getNullValue(CounterTy), > "__llvm_pgo_ctr"); > + > + RegionPCs > + new llvm::GlobalVariable(CGM.getModule(), CounterTy, false, > + llvm::GlobalVariable::PrivateLinkage, > + llvm::Constant::getNullValue(CounterTy), > + "__llvm_pgo_pcs"); > + > } > > void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, unsigned > Counter) { > @@ -779,6 +788,21 @@ > llvm::Value *Count = Builder.CreateLoad(Addr, "pgocount"); > Count = Builder.CreateAdd(Count, Builder.getInt64(1)); > Builder.CreateStore(Count, Addr); > + // We should put the PC of the instruction that increments > __llvm_pgo_ctr > + // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit. > + // This patch is wrong in many ways: > + // * We pass the PC of the Function instead of the PC of the > Instruction, > + // because the latter doesn't work like this. We'll need to support > + // Instructions as labels in LLVM codegen. > + // * We actually store the PC on each increment, while we should > initialize > + // this array at link time (need to refactor this code a bit). > + // > + Builder.CreateStore( > + Builder.CreatePointerCast( > + cast<llvm::Instruction>(Count)->getParent()->getParent(), > + Builder.getInt64Ty() // FIXME: use a better type > + ), > + Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter)); > } > > Index: lib/CodeGen/CodeGenPGO.h > ==================================================================> --- lib/CodeGen/CodeGenPGO.h (revision 201843) > +++ lib/CodeGen/CodeGenPGO.h (working copy) > @@ -59,6 +59,7 @@ > > unsigned NumRegionCounters; > llvm::GlobalVariable *RegionCounters; > + llvm::GlobalVariable *RegionPCs; > llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap; > llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap; > std::vector<uint64_t> *RegionCounts; > @@ -66,8 +67,9 @@ > > public: > CodeGenPGO(CodeGenModule &CGM) > - : CGM(CGM), NumRegionCounters(0), RegionCounters(0), > RegionCounterMap(0), > - StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {} > + : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionPCs(0), > + RegionCounterMap(0), StmtCountMap(0), RegionCounts(0), > + CurrentRegionCount(0) {} > ~CodeGenPGO() {} > > /// Whether or not we have PGO region data for the current function. > This is > > > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140222/0a5ed0f2/attachment.html>
Why is the binary size a concern for coverage testing? On Feb 21, 2014, at 8:43 PM, Kostya Serebryany <kcc at google.com> wrote:> I understand why you don't want to rely on debug info and instead produce your own section. > We did this with our early version of llvm-based tsan and it was simpler to implement. > But here is a data point to support my suggestion: > chromium binary built with asan, coverage and -gline-tables-only is 1.6Gb. > The same binary is 1.1Gb when stripped, so, the line tables require 500Mb. > Separate line info for coverage will essentially double this amount. > The size of binary is a serious concern for our users, please take it into consideration. > > Thanks! > --kcc > > > > On Fri, Feb 21, 2014 at 8:28 PM, Bob Wilson <bob.wilson at apple.com> wrote: > We’re not going to use debug info at all. We’re emitting the counters in the clang front-end. We just need to emit separate info to show how to map those counters to source locations. Mapping to PCs and then using debug info to get from the PCs to the source locations just makes things harder and loses information in the process. > > On Feb 21, 2014, at 2:57 AM, Kostya Serebryany <kcc at google.com> wrote: > >> >> >> We may need some additional info. >> What kind of additional info? >> >> I haven't put a ton of thought into >> this, but I'm hoping we can either (a) use debug info as is or add some >> extra (valid) debug info to support this, or (b) add an extra >> debug-info-like section to instrumented binaries with the information we >> need. >> >> I'd try this data format (binary equivalent): >> >> /path/to/binary/or/dso1 num_counters1 >> pc1 counter1 >> pc2 counter2 >> pc3 counter3 >> ... >> /path/to/binary/or/dso2 num_counters2 >> pc1 counter1 >> pc2 counter2 >> pc3 counter3 >> ... >> >> I don't see a straightforward way to produce such data today because individual Instructions do not work as labels. >> But I think this can be supported in LLVM codegen. >> Here is a *raw* patch with comments, just to get the idea. >> >> >> Index: lib/CodeGen/CodeGenPGO.cpp >> ==================================================================>> --- lib/CodeGen/CodeGenPGO.cpp (revision 201843) >> +++ lib/CodeGen/CodeGenPGO.cpp (working copy) >> @@ -199,7 +199,8 @@ >> llvm::Type *Args[] = { >> Int8PtrTy, // const char *MangledName >> Int32Ty, // uint32_t NumCounters >> - Int64PtrTy // uint64_t *Counters >> + Int64PtrTy, // uint64_t *Counters >> + Int64PtrTy // uint64_t *PCs >> }; >> llvm::FunctionType *FTy >> llvm::FunctionType::get(PGOBuilder.getVoidTy(), Args, false); >> @@ -209,9 +210,10 @@ >> llvm::Constant *MangledName >> CGM.GetAddrOfConstantCString(CGM.getMangledName(GD), "__llvm_pgo_name"); >> MangledName = llvm::ConstantExpr::getBitCast(MangledName, Int8PtrTy); >> - PGOBuilder.CreateCall3(EmitFunc, MangledName, >> + PGOBuilder.CreateCall4(EmitFunc, MangledName, >> PGOBuilder.getInt32(NumRegionCounters), >> - PGOBuilder.CreateBitCast(RegionCounters, Int64PtrTy)); >> + PGOBuilder.CreateBitCast(RegionCounters, Int64PtrTy), >> + PGOBuilder.CreateBitCast(RegionPCs, Int64PtrTy)); >> } >> >> llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule &CGM) { >> @@ -769,6 +771,13 @@ >> llvm::GlobalVariable::PrivateLinkage, >> llvm::Constant::getNullValue(CounterTy), >> "__llvm_pgo_ctr"); >> + >> + RegionPCs >> + new llvm::GlobalVariable(CGM.getModule(), CounterTy, false, >> + llvm::GlobalVariable::PrivateLinkage, >> + llvm::Constant::getNullValue(CounterTy), >> + "__llvm_pgo_pcs"); >> + >> } >> >> void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, unsigned Counter) { >> @@ -779,6 +788,21 @@ >> llvm::Value *Count = Builder.CreateLoad(Addr, "pgocount"); >> Count = Builder.CreateAdd(Count, Builder.getInt64(1)); >> Builder.CreateStore(Count, Addr); >> + // We should put the PC of the instruction that increments __llvm_pgo_ctr >> + // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit. >> + // This patch is wrong in many ways: >> + // * We pass the PC of the Function instead of the PC of the Instruction, >> + // because the latter doesn't work like this. We'll need to support >> + // Instructions as labels in LLVM codegen. >> + // * We actually store the PC on each increment, while we should initialize >> + // this array at link time (need to refactor this code a bit). >> + // >> + Builder.CreateStore( >> + Builder.CreatePointerCast( >> + cast<llvm::Instruction>(Count)->getParent()->getParent(), >> + Builder.getInt64Ty() // FIXME: use a better type >> + ), >> + Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter)); >> } >> >> Index: lib/CodeGen/CodeGenPGO.h >> ==================================================================>> --- lib/CodeGen/CodeGenPGO.h (revision 201843) >> +++ lib/CodeGen/CodeGenPGO.h (working copy) >> @@ -59,6 +59,7 @@ >> >> unsigned NumRegionCounters; >> llvm::GlobalVariable *RegionCounters; >> + llvm::GlobalVariable *RegionPCs; >> llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap; >> llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap; >> std::vector<uint64_t> *RegionCounts; >> @@ -66,8 +67,9 @@ >> >> public: >> CodeGenPGO(CodeGenModule &CGM) >> - : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionCounterMap(0), >> - StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {} >> + : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionPCs(0), >> + RegionCounterMap(0), StmtCountMap(0), RegionCounts(0), >> + CurrentRegionCount(0) {} >> ~CodeGenPGO() {} >> >> /// Whether or not we have PGO region data for the current function. This is >> >> >> >> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140221/4af3903a/attachment.html>
Our users combine asan and coverage testing and they do it on thousands machines. (An older blog post about using asan: http://blog.chromium.org/2012/04/fuzzing-for-security.html) The binaries need to be shipped to virtual machines, where they will be run. The VMs are *very* short of disk and the network bandwidth has a cost too. We may be able to ship stripped binaries to those machine but this will complicate the logic immensely. Besides, zip-ed binaries are stored for several revisions every day and the storage also costs money. Just to give you the taste ( https://commondatastorage.googleapis.com/chromium-browser-asan/index.html): asan-symbolized-linux-release-252010.zip 2014-02-19 14:34:24 406.35MB asan-symbolized-linux-release-252017.zip 2014-02-19 18:22:54 406.41MB asan-symbolized-linux-release-252025.zip 2014-02-19 21:35:49 406.35MB asan-symbolized-linux-release-252031.zip 2014-02-20 00:44:25 406.35MB asan-symbolized-linux-release-252160.zip 2014-02-20 06:30:16 406.34MB asan-symbolized-linux-release-252185.zip 2014-02-20 09:21:47 408.52MB asan-symbolized-linux-release-252188.zip 2014-02-20 12:20:05 408.52MB asan-symbolized-linux-release-252194.zip 2014-02-20 15:01:05 408.52MB asan-symbolized-linux-release-252218.zip 2014-02-20 18:00:42 408.54MB asan-symbolized-linux-release-252265.zip 2014-02-20 21:00:03 408.65MB asan-symbolized-linux-release-252272.zip 2014-02-21 00:00:40 408.66MB --kcc On Sat, Feb 22, 2014 at 8:58 AM, Bob Wilson <bob.wilson at apple.com> wrote:> Why is the binary size a concern for coverage testing? > > On Feb 21, 2014, at 8:43 PM, Kostya Serebryany <kcc at google.com> wrote: > > I understand why you don't want to rely on debug info and instead produce > your own section. > We did this with our early version of llvm-based tsan and it was simpler > to implement. > But here is a data point to support my suggestion: > chromium binary built with asan, coverage and -gline-tables-only is 1.6Gb. > The same binary is 1.1Gb when stripped, so, the line tables require 500Mb. > Separate line info for coverage will essentially double this amount. > The size of binary is a serious concern for our users, please take it into > consideration. > > Thanks! > --kcc > > > > On Fri, Feb 21, 2014 at 8:28 PM, Bob Wilson <bob.wilson at apple.com> wrote: > >> We’re not going to use debug info at all. We’re emitting the counters in >> the clang front-end. We just need to emit separate info to show how to map >> those counters to source locations. Mapping to PCs and then using debug >> info to get from the PCs to the source locations just makes things harder >> and loses information in the process. >> >> On Feb 21, 2014, at 2:57 AM, Kostya Serebryany <kcc at google.com> wrote: >> >> >>> >>> We may need some additional info. >> >> What kind of additional info? >> >> >>> I haven't put a ton of thought into >>> this, but I'm hoping we can either (a) use debug info as is or add some >>> extra (valid) debug info to support this, or (b) add an extra >>> debug-info-like section to instrumented binaries with the information we >>> need. >>> >> >> I'd try this data format (binary equivalent): >> >> /path/to/binary/or/dso1 num_counters1 >> pc1 counter1 >> pc2 counter2 >> pc3 counter3 >> ... >> /path/to/binary/or/dso2 num_counters2 >> pc1 counter1 >> pc2 counter2 >> pc3 counter3 >> ... >> >> I don't see a straightforward way to produce such data today because >> individual Instructions do not work as labels. >> But I think this can be supported in LLVM codegen. >> Here is a *raw* patch with comments, just to get the idea. >> >> >> Index: lib/CodeGen/CodeGenPGO.cpp >> ==================================================================>> --- lib/CodeGen/CodeGenPGO.cpp (revision 201843) >> +++ lib/CodeGen/CodeGenPGO.cpp (working copy) >> @@ -199,7 +199,8 @@ >> llvm::Type *Args[] = { >> Int8PtrTy, // const char *MangledName >> Int32Ty, // uint32_t NumCounters >> - Int64PtrTy // uint64_t *Counters >> + Int64PtrTy, // uint64_t *Counters >> + Int64PtrTy // uint64_t *PCs >> }; >> llvm::FunctionType *FTy >> llvm::FunctionType::get(PGOBuilder.getVoidTy(), Args, false); >> @@ -209,9 +210,10 @@ >> llvm::Constant *MangledName >> CGM.GetAddrOfConstantCString(CGM.getMangledName(GD), >> "__llvm_pgo_name"); >> MangledName = llvm::ConstantExpr::getBitCast(MangledName, Int8PtrTy); >> - PGOBuilder.CreateCall3(EmitFunc, MangledName, >> + PGOBuilder.CreateCall4(EmitFunc, MangledName, >> PGOBuilder.getInt32(NumRegionCounters), >> - PGOBuilder.CreateBitCast(RegionCounters, >> Int64PtrTy)); >> + PGOBuilder.CreateBitCast(RegionCounters, >> Int64PtrTy), >> + PGOBuilder.CreateBitCast(RegionPCs, >> Int64PtrTy)); >> } >> >> llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule &CGM) { >> @@ -769,6 +771,13 @@ >> llvm::GlobalVariable::PrivateLinkage, >> llvm::Constant::getNullValue(CounterTy), >> "__llvm_pgo_ctr"); >> + >> + RegionPCs >> + new llvm::GlobalVariable(CGM.getModule(), CounterTy, false, >> + llvm::GlobalVariable::PrivateLinkage, >> + llvm::Constant::getNullValue(CounterTy), >> + "__llvm_pgo_pcs"); >> + >> } >> >> void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, unsigned >> Counter) { >> @@ -779,6 +788,21 @@ >> llvm::Value *Count = Builder.CreateLoad(Addr, "pgocount"); >> Count = Builder.CreateAdd(Count, Builder.getInt64(1)); >> Builder.CreateStore(Count, Addr); >> + // We should put the PC of the instruction that increments >> __llvm_pgo_ctr >> + // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit. >> + // This patch is wrong in many ways: >> + // * We pass the PC of the Function instead of the PC of the >> Instruction, >> + // because the latter doesn't work like this. We'll need to support >> + // Instructions as labels in LLVM codegen. >> + // * We actually store the PC on each increment, while we should >> initialize >> + // this array at link time (need to refactor this code a bit). >> + // >> + Builder.CreateStore( >> + Builder.CreatePointerCast( >> + cast<llvm::Instruction>(Count)->getParent()->getParent(), >> + Builder.getInt64Ty() // FIXME: use a better type >> + ), >> + Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter)); >> } >> >> Index: lib/CodeGen/CodeGenPGO.h >> ==================================================================>> --- lib/CodeGen/CodeGenPGO.h (revision 201843) >> +++ lib/CodeGen/CodeGenPGO.h (working copy) >> @@ -59,6 +59,7 @@ >> >> unsigned NumRegionCounters; >> llvm::GlobalVariable *RegionCounters; >> + llvm::GlobalVariable *RegionPCs; >> llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap; >> llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap; >> std::vector<uint64_t> *RegionCounts; >> @@ -66,8 +67,9 @@ >> >> public: >> CodeGenPGO(CodeGenModule &CGM) >> - : CGM(CGM), NumRegionCounters(0), RegionCounters(0), >> RegionCounterMap(0), >> - StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {} >> + : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionPCs(0), >> + RegionCounterMap(0), StmtCountMap(0), RegionCounts(0), >> + CurrentRegionCount(0) {} >> ~CodeGenPGO() {} >> >> /// Whether or not we have PGO region data for the current function. >> This is >> >> >> >> >> >> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140222/68bb07f8/attachment.html>
On Mar 28, 2014, at 1:33 AM, Kostya Serebryany <kcc at google.com> wrote:> Some more data on code size. > > I've build CPU2006/483.xalancbmk with > a) -O2 -fsanitize=address -m64 -gline-tables-only -mllvm -asan-coverage=1 > b) -O2 -fsanitize=address -m64 -gline-tables-only -fprofile-instr-generate > > The first is 27Mb and the second is 48Mb. > > The extra size comes from __llvm_prf_* sections. > You may be able to make these sections more compact, but you will not make them tiny. > > The instrumentation code generated by -asan-coverage=1 is less efficient than -fprofile-instr-generate > in several ways (slower, fatter, provides less data). > But it does not add any extra sections to the binary and wins in the overall binary size. > Ideally, I'd like to see such options for -fprofile-instr-generate as well. > > —kccIt might make sense to move at least some of the counters into the .bss section so they don’t take up space in the executable. We’re also seeing that the instrumentation bloats the code size much more than expected and we’re still investigating to see why that is the case.> > > > On Sat, Feb 22, 2014 at 9:13 AM, Kostya Serebryany <kcc at google.com> wrote: > Our users combine asan and coverage testing and they do it on thousands machines. > (An older blog post about using asan: http://blog.chromium.org/2012/04/fuzzing-for-security.html) > The binaries need to be shipped to virtual machines, where they will be run. > The VMs are *very* short of disk and the network bandwidth has a cost too. > We may be able to ship stripped binaries to those machine but this will complicate the logic immensely. > > Besides, zip-ed binaries are stored for several revisions every day and the storage also costs money. > Just to give you the taste (https://commondatastorage.googleapis.com/chromium-browser-asan/index.html): > asan-symbolized-linux-release-252010.zip 2014-02-19 14:34:24 406.35MB > asan-symbolized-linux-release-252017.zip 2014-02-19 18:22:54 406.41MB > asan-symbolized-linux-release-252025.zip 2014-02-19 21:35:49 406.35MB > asan-symbolized-linux-release-252031.zip 2014-02-20 00:44:25 406.35MB > asan-symbolized-linux-release-252160.zip 2014-02-20 06:30:16 406.34MB > asan-symbolized-linux-release-252185.zip 2014-02-20 09:21:47 408.52MB > asan-symbolized-linux-release-252188.zip 2014-02-20 12:20:05 408.52MB > asan-symbolized-linux-release-252194.zip 2014-02-20 15:01:05 408.52MB > asan-symbolized-linux-release-252218.zip 2014-02-20 18:00:42 408.54MB > asan-symbolized-linux-release-252265.zip 2014-02-20 21:00:03 408.65MB > asan-symbolized-linux-release-252272.zip 2014-02-21 00:00:40 408.66MB > > --kcc > > > On Sat, Feb 22, 2014 at 8:58 AM, Bob Wilson <bob.wilson at apple.com> wrote: > Why is the binary size a concern for coverage testing? > > On Feb 21, 2014, at 8:43 PM, Kostya Serebryany <kcc at google.com> wrote: > >> I understand why you don't want to rely on debug info and instead produce your own section. >> We did this with our early version of llvm-based tsan and it was simpler to implement. >> But here is a data point to support my suggestion: >> chromium binary built with asan, coverage and -gline-tables-only is 1.6Gb. >> The same binary is 1.1Gb when stripped, so, the line tables require 500Mb. >> Separate line info for coverage will essentially double this amount. >> The size of binary is a serious concern for our users, please take it into consideration. >> >> Thanks! >> --kcc >> >> >> >> On Fri, Feb 21, 2014 at 8:28 PM, Bob Wilson <bob.wilson at apple.com> wrote: >> We’re not going to use debug info at all. We’re emitting the counters in the clang front-end. We just need to emit separate info to show how to map those counters to source locations. Mapping to PCs and then using debug info to get from the PCs to the source locations just makes things harder and loses information in the process. >> >> On Feb 21, 2014, at 2:57 AM, Kostya Serebryany <kcc at google.com> wrote: >> >>> >>> >>> We may need some additional info. >>> What kind of additional info? >>> >>> I haven't put a ton of thought into >>> this, but I'm hoping we can either (a) use debug info as is or add some >>> extra (valid) debug info to support this, or (b) add an extra >>> debug-info-like section to instrumented binaries with the information we >>> need. >>> >>> I'd try this data format (binary equivalent): >>> >>> /path/to/binary/or/dso1 num_counters1 >>> pc1 counter1 >>> pc2 counter2 >>> pc3 counter3 >>> ... >>> /path/to/binary/or/dso2 num_counters2 >>> pc1 counter1 >>> pc2 counter2 >>> pc3 counter3 >>> ... >>> >>> I don't see a straightforward way to produce such data today because individual Instructions do not work as labels. >>> But I think this can be supported in LLVM codegen. >>> Here is a *raw* patch with comments, just to get the idea. >>> >>> >>> Index: lib/CodeGen/CodeGenPGO.cpp >>> ==================================================================>>> --- lib/CodeGen/CodeGenPGO.cpp (revision 201843) >>> +++ lib/CodeGen/CodeGenPGO.cpp (working copy) >>> @@ -199,7 +199,8 @@ >>> llvm::Type *Args[] = { >>> Int8PtrTy, // const char *MangledName >>> Int32Ty, // uint32_t NumCounters >>> - Int64PtrTy // uint64_t *Counters >>> + Int64PtrTy, // uint64_t *Counters >>> + Int64PtrTy // uint64_t *PCs >>> }; >>> llvm::FunctionType *FTy >>> llvm::FunctionType::get(PGOBuilder.getVoidTy(), Args, false); >>> @@ -209,9 +210,10 @@ >>> llvm::Constant *MangledName >>> CGM.GetAddrOfConstantCString(CGM.getMangledName(GD), "__llvm_pgo_name"); >>> MangledName = llvm::ConstantExpr::getBitCast(MangledName, Int8PtrTy); >>> - PGOBuilder.CreateCall3(EmitFunc, MangledName, >>> + PGOBuilder.CreateCall4(EmitFunc, MangledName, >>> PGOBuilder.getInt32(NumRegionCounters), >>> - PGOBuilder.CreateBitCast(RegionCounters, Int64PtrTy)); >>> + PGOBuilder.CreateBitCast(RegionCounters, Int64PtrTy), >>> + PGOBuilder.CreateBitCast(RegionPCs, Int64PtrTy)); >>> } >>> >>> llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule &CGM) { >>> @@ -769,6 +771,13 @@ >>> llvm::GlobalVariable::PrivateLinkage, >>> llvm::Constant::getNullValue(CounterTy), >>> "__llvm_pgo_ctr"); >>> + >>> + RegionPCs >>> + new llvm::GlobalVariable(CGM.getModule(), CounterTy, false, >>> + llvm::GlobalVariable::PrivateLinkage, >>> + llvm::Constant::getNullValue(CounterTy), >>> + "__llvm_pgo_pcs"); >>> + >>> } >>> >>> void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, unsigned Counter) { >>> @@ -779,6 +788,21 @@ >>> llvm::Value *Count = Builder.CreateLoad(Addr, "pgocount"); >>> Count = Builder.CreateAdd(Count, Builder.getInt64(1)); >>> Builder.CreateStore(Count, Addr); >>> + // We should put the PC of the instruction that increments __llvm_pgo_ctr >>> + // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit. >>> + // This patch is wrong in many ways: >>> + // * We pass the PC of the Function instead of the PC of the Instruction, >>> + // because the latter doesn't work like this. We'll need to support >>> + // Instructions as labels in LLVM codegen. >>> + // * We actually store the PC on each increment, while we should initialize >>> + // this array at link time (need to refactor this code a bit). >>> + // >>> + Builder.CreateStore( >>> + Builder.CreatePointerCast( >>> + cast<llvm::Instruction>(Count)->getParent()->getParent(), >>> + Builder.getInt64Ty() // FIXME: use a better type >>> + ), >>> + Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter)); >>> } >>> >>> Index: lib/CodeGen/CodeGenPGO.h >>> ==================================================================>>> --- lib/CodeGen/CodeGenPGO.h (revision 201843) >>> +++ lib/CodeGen/CodeGenPGO.h (working copy) >>> @@ -59,6 +59,7 @@ >>> >>> unsigned NumRegionCounters; >>> llvm::GlobalVariable *RegionCounters; >>> + llvm::GlobalVariable *RegionPCs; >>> llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap; >>> llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap; >>> std::vector<uint64_t> *RegionCounts; >>> @@ -66,8 +67,9 @@ >>> >>> public: >>> CodeGenPGO(CodeGenModule &CGM) >>> - : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionCounterMap(0), >>> - StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {} >>> + : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionPCs(0), >>> + RegionCounterMap(0), StmtCountMap(0), RegionCounts(0), >>> + CurrentRegionCount(0) {} >>> ~CodeGenPGO() {} >>> >>> /// Whether or not we have PGO region data for the current function. This is >>> >>> >>> >>> >>> >>> >> >> > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140328/25c0025f/attachment.html>
On 2014 Mar 28, at 14:59, Bob Wilson <bob.wilson at apple.com> wrote:> > On Mar 28, 2014, at 1:33 AM, Kostya Serebryany <kcc at google.com> wrote: > >> Some more data on code size. >> >> I've build CPU2006/483.xalancbmk with >> a) -O2 -fsanitize=address -m64 -gline-tables-only -mllvm -asan-coverage=1 >> b) -O2 -fsanitize=address -m64 -gline-tables-only -fprofile-instr-generate >> >> The first is 27Mb and the second is 48Mb. >> >> The extra size comes from __llvm_prf_* sections. >> You may be able to make these sections more compact, but you will not make them tiny. >> >> The instrumentation code generated by -asan-coverage=1 is less efficient than -fprofile-instr-generate >> in several ways (slower, fatter, provides less data). >> But it does not add any extra sections to the binary and wins in the overall binary size. >> Ideally, I'd like to see such options for -fprofile-instr-generate as well. >> >> —kcc > > It might make sense to move at least some of the counters into the .bss section so they don’t take up space in the executable. > > We’re also seeing that the instrumentation bloats the code size much more than expected and we’re still investigating to see why that is the case.The __llvm_prf_cnts section is likely the largest. It's zero-initialized, so it's a good candidate for .bss or similar. The counters are currently in their own section to make write out easy (just one big array), but we could change that. Or, is there linker magic that can make a special section behave like the .bss?
On Fri, Mar 28, 2014 at 3:18 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:> > On 2014 Mar 28, at 14:59, Bob Wilson <bob.wilson at apple.com> wrote: > >> >> On Mar 28, 2014, at 1:33 AM, Kostya Serebryany <kcc at google.com> wrote: >> >>> Some more data on code size. >>> >>> I've build CPU2006/483.xalancbmk with >>> a) -O2 -fsanitize=address -m64 -gline-tables-only -mllvm -asan-coverage=1 >>> b) -O2 -fsanitize=address -m64 -gline-tables-only -fprofile-instr-generate >>> >>> The first is 27Mb and the second is 48Mb. >>> >>> The extra size comes from __llvm_prf_* sections. >>> You may be able to make these sections more compact, but you will not make them tiny. >>> >>> The instrumentation code generated by -asan-coverage=1 is less efficient than -fprofile-instr-generate >>> in several ways (slower, fatter, provides less data). >>> But it does not add any extra sections to the binary and wins in the overall binary size. >>> Ideally, I'd like to see such options for -fprofile-instr-generate as well. >>> >>> --kcc >> >> It might make sense to move at least some of the counters into the .bss section so they don't take up space in the executable. >> >> We're also seeing that the instrumentation bloats the code size much more than expected and we're still investigating to see why that is the case. > > The __llvm_prf_cnts section is likely the largest. It's zero-initialized, > so it's a good candidate for .bss or similar. The counters are currently in > their own section to make write out easy (just one big array), but we could > change that. Or, is there linker magic that can make a special section > behave like the .bss?Possibly. The zerofill stuff is a bit weird, but you should be able to specify a large enough block to zerofill and a concrete section. A separate section with the S_ZEROFILL attribute would probably work to get it all initialized and just switch sections and not use the zerofill directive. -eric> _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On 2014 Mar 28, at 15:47, Eric Christopher <echristo at gmail.com> wrote:> On Fri, Mar 28, 2014 at 3:18 PM, Duncan P. N. Exon Smith > <dexonsmith at apple.com> wrote: >> >> On 2014 Mar 28, at 14:59, Bob Wilson <bob.wilson at apple.com> wrote: >> >>> >>> On Mar 28, 2014, at 1:33 AM, Kostya Serebryany <kcc at google.com> wrote: >>> >>>> Some more data on code size. >>>> >>>> I've build CPU2006/483.xalancbmk with >>>> a) -O2 -fsanitize=address -m64 -gline-tables-only -mllvm -asan-coverage=1 >>>> b) -O2 -fsanitize=address -m64 -gline-tables-only -fprofile-instr-generate >>>> >>>> The first is 27Mb and the second is 48Mb. >>>> >>>> The extra size comes from __llvm_prf_* sections. >>>> You may be able to make these sections more compact, but you will not make them tiny. >>>> >>>> The instrumentation code generated by -asan-coverage=1 is less efficient than -fprofile-instr-generate >>>> in several ways (slower, fatter, provides less data). >>>> But it does not add any extra sections to the binary and wins in the overall binary size. >>>> Ideally, I'd like to see such options for -fprofile-instr-generate as well. >>>> >>>> --kcc >>> >>> It might make sense to move at least some of the counters into the .bss section so they don't take up space in the executable. >>> >>> We're also seeing that the instrumentation bloats the code size much more than expected and we're still investigating to see why that is the case. >> >> The __llvm_prf_cnts section is likely the largest. It's zero-initialized, >> so it's a good candidate for .bss or similar. The counters are currently in >> their own section to make write out easy (just one big array), but we could >> change that. Or, is there linker magic that can make a special section >> behave like the .bss? > > Possibly. The zerofill stuff is a bit weird, but you should be able to > specify a large enough block to zerofill and a concrete section. A > separate section with the S_ZEROFILL attribute would probably work to > get it all initialized and just switch sections and not use the > zerofill directive. > > -ericHeh, I'm a little lost here. Where can we specify this? I had a look in MCSectionMachO.cpp, and S_ZEROFILL isn't accessible from LLVM IR. Should we add logic somewhere to recognize these sections? Will that actually reduce the executable size? (I tried hacking it in but that didn't seem to save disk space.) Also, this doesn't solve ELF. Can we do similar things there? For clarity, there are three __llvm_prf_* sections. Without having seen Kostya's data, I'm speculating that __llvm_prf_cnts is the largest section. - __llvm_prf_data is 32B per function, and has pointers into the other two sections. This section is necessary to avoid static initialization (implemented on Darwin, not quite on ELF). - __llvm_prf_names is the mangled name of every function. It should be on the same order of magnitude as __llvm_prf_data. This section is convenient for writing out the profile, since the names are effectively placed in one big char array whose bounds are known at link time. - __llvm_prf_cnts is 8B per region counter. Each function has one at entry and roughly one per CFG-relevant AST node (for, if, etc.). This section is convenient for writing out the profile, since the counters are effectively placed in one big array whose bounds are known at link time. However, I don't think the data in this section needs to be explicitly stored in the executable if we can somehow make it act like .bss (or the like). In the latter two cases, it's possible to avoid the special sections if there are good reasons, but it will add runtime overhead.
On Sat, Mar 29, 2014 at 4:02 AM, Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote:> > On 2014 Mar 28, at 15:47, Eric Christopher <echristo at gmail.com> wrote: > > > On Fri, Mar 28, 2014 at 3:18 PM, Duncan P. N. Exon Smith > > <dexonsmith at apple.com> wrote: > >> > >> On 2014 Mar 28, at 14:59, Bob Wilson <bob.wilson at apple.com> wrote: > >> > >>> > >>> On Mar 28, 2014, at 1:33 AM, Kostya Serebryany <kcc at google.com> wrote: > >>> > >>>> Some more data on code size. > >>>> > >>>> I've build CPU2006/483.xalancbmk with > >>>> a) -O2 -fsanitize=address -m64 -gline-tables-only -mllvm > -asan-coverage=1 > >>>> b) -O2 -fsanitize=address -m64 -gline-tables-only > -fprofile-instr-generate > >>>> > >>>> The first is 27Mb and the second is 48Mb. > >>>> > >>>> The extra size comes from __llvm_prf_* sections. > >>>> You may be able to make these sections more compact, but you will not > make them tiny. > >>>> > >>>> The instrumentation code generated by -asan-coverage=1 is less > efficient than -fprofile-instr-generate > >>>> in several ways (slower, fatter, provides less data). > >>>> But it does not add any extra sections to the binary and wins in the > overall binary size. > >>>> Ideally, I'd like to see such options for -fprofile-instr-generate as > well. > >>>> > >>>> --kcc > >>> > >>> It might make sense to move at least some of the counters into the > .bss section so they don't take up space in the executable. > >>> > >>> We're also seeing that the instrumentation bloats the code size much > more than expected and we're still investigating to see why that is the > case. > >> > >> The __llvm_prf_cnts section is likely the largest. It's > zero-initialized, > >> so it's a good candidate for .bss or similar. The counters are > currently in > >> their own section to make write out easy (just one big array), but we > could > >> change that. Or, is there linker magic that can make a special section > >> behave like the .bss? > > > > Possibly. The zerofill stuff is a bit weird, but you should be able to > > specify a large enough block to zerofill and a concrete section. A > > separate section with the S_ZEROFILL attribute would probably work to > > get it all initialized and just switch sections and not use the > > zerofill directive. > > > > -eric > > Heh, I'm a little lost here. Where can we specify this? I had a look > in MCSectionMachO.cpp, and S_ZEROFILL isn't accessible from LLVM IR. > Should we add logic somewhere to recognize these sections? Will that > actually reduce the executable size? (I tried hacking it in but that > didn't seem to save disk space.) > > Also, this doesn't solve ELF. Can we do similar things there? > > For clarity, there are three __llvm_prf_* sections. Without having > seen Kostya's data, I'm speculating that __llvm_prf_cnts is the largest > section. >This is what I see on 483.xalancbmk: 15 __llvm_prf_names 0036fcb0 00000000010abf40 00000000010abf40 00cabf40 2**5 16 __llvm_prf_data 001b77e0 000000000141bc00 000000000141bc00 0101bc00 2**5 32 __llvm_prf_cnts 00123468 0000000001af2fe0 0000000001af2fe0 014f2fe0 2**5 __llvm_prf_names is larger than the other two combined. 483.xalancbmk is C++ and the function names are long. The same is true for most of the code we care about. Can't we use function PCs instead of function names in this table (and retrieve the function names from debug info)?> > - __llvm_prf_data is 32B per function, and has pointers into the > other two sections. This section is necessary to avoid static > initialization (implemented on Darwin, not quite on ELF). > > - __llvm_prf_names is the mangled name of every function. It should > be on the same order of magnitude as __llvm_prf_data. This > section is convenient for writing out the profile, since the names > are effectively placed in one big char array whose bounds are known > at link time. > > - __llvm_prf_cnts is 8B per region counter. Each function has one > at entry and roughly one per CFG-relevant AST node (for, if, etc.). > This section is convenient for writing out the profile, since the > counters are effectively placed in one big array whose bounds are > known at link time. However, I don't think the data in this > section needs to be explicitly stored in the executable if we can > somehow make it act like .bss (or the like). >Why can't we simply create this buffer at startup? We solve similar task in asan, so it's definitely possible. --kcc> > In the latter two cases, it's possible to avoid the special sections if > there are good reasons, but it will add runtime overhead. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140331/9b312d89/attachment.html>
On Mar 31, 2014, at 1:53 AM, Kostya Serebryany <kcc at google.com> wrote:> > This is what I see on 483.xalancbmk: > 15 __llvm_prf_names 0036fcb0 00000000010abf40 00000000010abf40 00cabf40 2**5 > 16 __llvm_prf_data 001b77e0 000000000141bc00 000000000141bc00 0101bc00 2**5 > 32 __llvm_prf_cnts 00123468 0000000001af2fe0 0000000001af2fe0 014f2fe0 2**5 > > __llvm_prf_names is larger than the other two combined. > 483.xalancbmk is C++ and the function names are long. The same is true for most of the code we care about. > Can't we use function PCs instead of function names in this table (and retrieve the function names from debug info)?That’s a bit surprising. We should check to make sure there isn’t anything obviously wrong.> > > > - __llvm_prf_data is 32B per function, and has pointers into the > other two sections. This section is necessary to avoid static > initialization (implemented on Darwin, not quite on ELF). > > - __llvm_prf_names is the mangled name of every function. It should > be on the same order of magnitude as __llvm_prf_data. This > section is convenient for writing out the profile, since the names > are effectively placed in one big char array whose bounds are known > at link time. > > - __llvm_prf_cnts is 8B per region counter. Each function has one > at entry and roughly one per CFG-relevant AST node (for, if, etc.). > This section is convenient for writing out the profile, since the > counters are effectively placed in one big array whose bounds are > known at link time. However, I don't think the data in this > section needs to be explicitly stored in the executable if we can > somehow make it act like .bss (or the like). > > Why can't we simply create this buffer at startup? > We solve similar task in asan, so it's definitely possible.We need this to work for environments where minimal runtime support is available (e.g., in kernel code). We used to call malloc() to create some data structures at runtime but that prevented us from using it from kernel code. I’m hoping the additional size will not be such a problem that we have to implement separate solutions for different environments. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140331/85927eab/attachment.html>
On Mar 31, 2014, at 10:45, Bob Wilson <bob.wilson at apple.com> wrote:> > On Mar 31, 2014, at 1:53 AM, Kostya Serebryany <kcc at google.com> wrote: >> >> This is what I see on 483.xalancbmk: >> 15 __llvm_prf_names 0036fcb0 00000000010abf40 00000000010abf40 00cabf40 2**5 >> 16 __llvm_prf_data 001b77e0 000000000141bc00 000000000141bc00 0101bc00 2**5 >> 32 __llvm_prf_cnts 00123468 0000000001af2fe0 0000000001af2fe0 014f2fe0 2**5 >> >> __llvm_prf_names is larger than the other two combined. >> 483.xalancbmk is C++ and the function names are long. The same is true for most of the code we care about. >> Can't we use function PCs instead of function names in this table (and retrieve the function names from debug info)? > > That’s a bit surprising. We should check to make sure there isn’t anything obviously wrong.I'm surprised too, but it makes some sense. - Most functions in the implementation details of the STL have long enough mangled names (and little enough control flow) that the names would dominate the size overhead. - Many class member functions will have this property, too, like getters, setters, and typical constructors, copy/move operators, and destructors. There's likely a way to emit these names more compactly using a tree (with links to parent nodes). Consider _ZN9Namespace5Class3fooEv, the mangled name for Namespace::Class::foo(). We could do something like this: %__llvm_profile_name = type {i8*, %__llvm_profile_name*} @__llvm_profile_str__ZN = linkonce constant [4 x i8] c"_ZN\00" @__llvm_profile_name__ZN = linkonce constant %__llvm_profile_name { i8* getelementptr ([4 x i8]* @__llvm_profile_str__ZN, i32 0, i32 0), %__llvm_profile_name* null } @__llvm_profile_str_9Namespace = linkonce constant [11 x i8] c"9Namespace\00" @__llvm_profile_name__ZN9Namespace = linkonce constant %__llvm_profile_name { i8* getelementptr ([11 x i8]* @__llvm_profile_str_9Namespace, i32 0, i32 0), %__llvm_profile_name* @__llvm_profile_name__ZN } @__llvm_profile_str_5Class = linkonce constant [7 x i8] c"5Class\00" @__llvm_profile_name__ZN9Namespace5Class = linkonce constant %__llvm_profile_name { i8* getelementptr ([7 x i8]* @__llvm_profile_str_5Class, i32 0, i32 0), %__llvm_profile_name* @__llvm_profile_name__ZN9Namespace } @__llvm_profile_str_3fooEv = linkonce constant [7 x i8] c"3fooEv\00" @__llvm_profile_name__ZN9Namespace5Class3fooEv = linkonce constant %__llvm_profile_name { i8* getelementptr ([7 x i8]* @__llvm_profile_str_3fooEv, i32 0, i32 0), %__llvm_profile_name* @__llvm_profile_name__ZN9Namespace5Class } Then _ZN9Namespace5Class3barEv (Namespace::Class::bar()) would be cheap. @__llvm_profile_str_3barEv = linkonce constant [7 x i8] c"3barEv\00" @__llvm_profile_name__ZN9Namespace5Class3barEv = linkonce constant %__llvm_profile_name { i8* getelementptr ([7 x i8]* @__llvm_profile_str_3barEv, i32 0, i32 0), %__llvm_profile_name* @__llvm_profile_name__ZN9Namespace5Class } These would merge cleanly even between object files generated from different translation units. Doing it like that requires understanding the mangling and wouldn't compact common prefixes in C. We could do something similar with a heuristic (split every 8 characters), or split on common prefixes per translation unit (and hope they line up between translation units).
My slides from EuroLLVM regarding coverage, in case you are interested... http://llvm.org/devmtg/2014-04/PDFs/LightningTalks/EuroLLVM'14%20--%20ASan%20%2B%20Coverage.pdf --kcc On Tue, Apr 1, 2014 at 2:13 AM, Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote:> > On Mar 31, 2014, at 10:45, Bob Wilson <bob.wilson at apple.com> wrote: > > > > > On Mar 31, 2014, at 1:53 AM, Kostya Serebryany <kcc at google.com> wrote: > >> > >> This is what I see on 483.xalancbmk: > >> 15 __llvm_prf_names 0036fcb0 00000000010abf40 00000000010abf40 > 00cabf40 2**5 > >> 16 __llvm_prf_data 001b77e0 000000000141bc00 000000000141bc00 > 0101bc00 2**5 > >> 32 __llvm_prf_cnts 00123468 0000000001af2fe0 0000000001af2fe0 > 014f2fe0 2**5 > >> > >> __llvm_prf_names is larger than the other two combined. > >> 483.xalancbmk is C++ and the function names are long. The same is true > for most of the code we care about. > >> Can't we use function PCs instead of function names in this table (and > retrieve the function names from debug info)? > > > > That’s a bit surprising. We should check to make sure there isn’t > anything obviously wrong. > > I'm surprised too, but it makes some sense. > > - Most functions in the implementation details of the STL have long > enough mangled > names (and little enough control flow) that the names would dominate > the size > overhead. > > - Many class member functions will have this property, too, like > getters, setters, > and typical constructors, copy/move operators, and destructors. > > There's likely a way to emit these names more compactly using a tree (with > links > to parent nodes). > > Consider _ZN9Namespace5Class3fooEv, the mangled name for > Namespace::Class::foo(). > We could do something like this: > > %__llvm_profile_name = type {i8*, %__llvm_profile_name*} > > @__llvm_profile_str__ZN = linkonce constant [4 x i8] c"_ZN\00" > @__llvm_profile_name__ZN = linkonce constant %__llvm_profile_name { > i8* getelementptr ([4 x i8]* @__llvm_profile_str__ZN, i32 0, i32 0), > %__llvm_profile_name* null } > > @__llvm_profile_str_9Namespace = linkonce constant [11 x i8] > c"9Namespace\00" > @__llvm_profile_name__ZN9Namespace = linkonce constant > %__llvm_profile_name { > i8* getelementptr ([11 x i8]* @__llvm_profile_str_9Namespace, i32 0, > i32 0), > %__llvm_profile_name* @__llvm_profile_name__ZN } > > @__llvm_profile_str_5Class = linkonce constant [7 x i8] c"5Class\00" > @__llvm_profile_name__ZN9Namespace5Class = linkonce constant > %__llvm_profile_name { > i8* getelementptr ([7 x i8]* @__llvm_profile_str_5Class, i32 0, i32 > 0), > %__llvm_profile_name* @__llvm_profile_name__ZN9Namespace } > > @__llvm_profile_str_3fooEv = linkonce constant [7 x i8] c"3fooEv\00" > @__llvm_profile_name__ZN9Namespace5Class3fooEv = linkonce constant > %__llvm_profile_name { > i8* getelementptr ([7 x i8]* @__llvm_profile_str_3fooEv, i32 0, i32 > 0), > %__llvm_profile_name* @__llvm_profile_name__ZN9Namespace5Class } > > Then _ZN9Namespace5Class3barEv (Namespace::Class::bar()) would be cheap. > > @__llvm_profile_str_3barEv = linkonce constant [7 x i8] c"3barEv\00" > @__llvm_profile_name__ZN9Namespace5Class3barEv = linkonce constant > %__llvm_profile_name { > i8* getelementptr ([7 x i8]* @__llvm_profile_str_3barEv, i32 0, i32 > 0), > %__llvm_profile_name* @__llvm_profile_name__ZN9Namespace5Class } > > These would merge cleanly even between object files generated from > different > translation units. > > Doing it like that requires understanding the mangling and wouldn't > compact common > prefixes in C. We could do something similar with a heuristic (split > every 8 > characters), or split on common prefixes per translation unit (and hope > they line up > between translation units). > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140411/44f24198/attachment.html>