thr3ads.net - llvm dev - [LLVMdev] asan coverage [Feb 2014]

If this information is useful, please help other people find it:
Share via:

Kostya Serebryany

2014-Feb-21 10:57 UTC

[LLVMdev] asan coverage

>
>
>
> We may need some additional info.
What kind of additional info?

> I haven't put a ton of thought into
> this, but I'm hoping we can either (a) use debug info as is or add some
> extra (valid) debug info to support this, or (b) add an extra
> debug-info-like section to instrumented binaries with the information we
> need.
>
I'd try this data format (binary equivalent):

/path/to/binary/or/dso1 num_counters1
pc1 counter1
pc2 counter2
pc3 counter3
...
/path/to/binary/or/dso2 num_counters2
pc1 counter1
pc2 counter2
pc3 counter3
...

I don't see a straightforward way to produce such data today because
individual Instructions do not work as labels.
But I think this can be supported in LLVM codegen.
Here is a *raw* patch with comments, just to get the idea.


Index: lib/CodeGen/CodeGenPGO.cpp
==================================================================---
lib/CodeGen/CodeGenPGO.cpp  (revision 201843)
+++ lib/CodeGen/CodeGenPGO.cpp  (working copy)
@@ -199,7 +199,8 @@
   llvm::Type *Args[] = {
     Int8PtrTy,                       // const char *MangledName
     Int32Ty,                         // uint32_t NumCounters
-    Int64PtrTy                       // uint64_t *Counters
+    Int64PtrTy,                       // uint64_t *Counters
+    Int64PtrTy                       // uint64_t *PCs
   };
   llvm::FunctionType *FTy      llvm::FunctionType::get(PGOBuilder.getVoidTy(),
Args, false);
@@ -209,9 +210,10 @@
   llvm::Constant *MangledName     
CGM.GetAddrOfConstantCString(CGM.getMangledName(GD),
"__llvm_pgo_name");
   MangledName = llvm::ConstantExpr::getBitCast(MangledName, Int8PtrTy);
-  PGOBuilder.CreateCall3(EmitFunc, MangledName,
+  PGOBuilder.CreateCall4(EmitFunc, MangledName,
                          PGOBuilder.getInt32(NumRegionCounters),
-                         PGOBuilder.CreateBitCast(RegionCounters,
Int64PtrTy));
+                         PGOBuilder.CreateBitCast(RegionCounters,
Int64PtrTy),
+                         PGOBuilder.CreateBitCast(RegionPCs, Int64PtrTy));
 }

 llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule &CGM) {
@@ -769,6 +771,13 @@
                              llvm::GlobalVariable::PrivateLinkage,
                              llvm::Constant::getNullValue(CounterTy),
                              "__llvm_pgo_ctr");
+
+  RegionPCs +    new llvm::GlobalVariable(CGM.getModule(), CounterTy, false,
+                             llvm::GlobalVariable::PrivateLinkage,
+                             llvm::Constant::getNullValue(CounterTy),
+                             "__llvm_pgo_pcs");
+
 }

 void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, unsigned
Counter) {
@@ -779,6 +788,21 @@
   llvm::Value *Count = Builder.CreateLoad(Addr, "pgocount");
   Count = Builder.CreateAdd(Count, Builder.getInt64(1));
   Builder.CreateStore(Count, Addr);
+  // We should put the PC of the instruction that increments __llvm_pgo_ctr
+  // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit.
+  // This patch is wrong in many ways:
+  //   * We pass the PC of the Function instead of the PC of the
Instruction,
+  //   because the latter doesn't work like this. We'll need to support
+  //   Instructions as labels in LLVM codegen.
+  //   * We actually store the PC on each increment, while we should
initialize
+  //   this array at link time (need to refactor this code a bit).
+  //
+  Builder.CreateStore(
+      Builder.CreatePointerCast(
+          cast<llvm::Instruction>(Count)->getParent()->getParent(),
+          Builder.getInt64Ty()  // FIXME: use a better type
+          ),
+      Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter));
 }

Index: lib/CodeGen/CodeGenPGO.h
==================================================================---
lib/CodeGen/CodeGenPGO.h    (revision 201843)
+++ lib/CodeGen/CodeGenPGO.h    (working copy)
@@ -59,6 +59,7 @@

   unsigned NumRegionCounters;
   llvm::GlobalVariable *RegionCounters;
+  llvm::GlobalVariable *RegionPCs;
   llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap;
   llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap;
   std::vector<uint64_t> *RegionCounts;
@@ -66,8 +67,9 @@

 public:
   CodeGenPGO(CodeGenModule &CGM)
-    : CGM(CGM), NumRegionCounters(0), RegionCounters(0),
RegionCounterMap(0),
-      StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {}
+      : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionPCs(0),
+        RegionCounterMap(0), StmtCountMap(0), RegionCounts(0),
+        CurrentRegionCount(0) {}
   ~CodeGenPGO() {}

   /// Whether or not we have PGO region data for the current function.
This is
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140221/b01a08fa/attachment.html>

Bob Wilson

2014-Feb-21 16:28 UTC

head link

[LLVMdev] asan coverage

We’re not going to use debug info at all. We’re emitting the counters in the
clang front-end. We just need to emit separate info to show how to map those
counters to source locations. Mapping to PCs and then using debug info to get
from the PCs to the source locations just makes things harder and loses
information in the process.

On Feb 21, 2014, at 2:57 AM, Kostya Serebryany <kcc at google.com> wrote:
> 
> 
> We may need some additional info.
> What kind of additional info? 
>  
> I haven't put a ton of thought into
> this, but I'm hoping we can either (a) use debug info as is or add some
> extra (valid) debug info to support this, or (b) add an extra
> debug-info-like section to instrumented binaries with the information we
> need.
> 
> I'd try this data format (binary equivalent): 
> 
> /path/to/binary/or/dso1 num_counters1
> pc1 counter1
> pc2 counter2
> pc3 counter3
> ...
> /path/to/binary/or/dso2 num_counters2
> pc1 counter1
> pc2 counter2
> pc3 counter3
> ...
> 
> I don't see a straightforward way to produce such data today because
individual Instructions do not work as labels.
> But I think this can be supported in LLVM codegen. 
> Here is a *raw* patch with comments, just to get the idea.
> 
> 
> Index: lib/CodeGen/CodeGenPGO.cpp
> ==================================================================> ---
lib/CodeGen/CodeGenPGO.cpp  (revision 201843)
> +++ lib/CodeGen/CodeGenPGO.cpp  (working copy)
> @@ -199,7 +199,8 @@
>    llvm::Type *Args[] = {
>      Int8PtrTy,                       // const char *MangledName
>      Int32Ty,                         // uint32_t NumCounters
> -    Int64PtrTy                       // uint64_t *Counters
> +    Int64PtrTy,                       // uint64_t *Counters
> +    Int64PtrTy                       // uint64_t *PCs
>    };
>    llvm::FunctionType *FTy >     
llvm::FunctionType::get(PGOBuilder.getVoidTy(), Args, false);
> @@ -209,9 +210,10 @@
>    llvm::Constant *MangledName >     
CGM.GetAddrOfConstantCString(CGM.getMangledName(GD),
"__llvm_pgo_name");
>    MangledName = llvm::ConstantExpr::getBitCast(MangledName, Int8PtrTy);
> -  PGOBuilder.CreateCall3(EmitFunc, MangledName,
> +  PGOBuilder.CreateCall4(EmitFunc, MangledName,
>                           PGOBuilder.getInt32(NumRegionCounters),
> -                         PGOBuilder.CreateBitCast(RegionCounters,
Int64PtrTy));
> +                         PGOBuilder.CreateBitCast(RegionCounters,
Int64PtrTy),
> +                         PGOBuilder.CreateBitCast(RegionPCs, Int64PtrTy));
>  }
>  
>  llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule &CGM) {
> @@ -769,6 +771,13 @@
>                               llvm::GlobalVariable::PrivateLinkage,
>                               llvm::Constant::getNullValue(CounterTy),
>                               "__llvm_pgo_ctr");
> +
> +  RegionPCs > +    new llvm::GlobalVariable(CGM.getModule(), CounterTy,
false,
> +                             llvm::GlobalVariable::PrivateLinkage,
> +                             llvm::Constant::getNullValue(CounterTy),
> +                             "__llvm_pgo_pcs");
> +
>  }
>  
>  void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, unsigned
Counter) {
> @@ -779,6 +788,21 @@
>    llvm::Value *Count = Builder.CreateLoad(Addr, "pgocount");
>    Count = Builder.CreateAdd(Count, Builder.getInt64(1));
>    Builder.CreateStore(Count, Addr);
> +  // We should put the PC of the instruction that increments
__llvm_pgo_ctr
> +  // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit.
> +  // This patch is wrong in many ways:
> +  //   * We pass the PC of the Function instead of the PC of the
Instruction,
> +  //   because the latter doesn't work like this. We'll need to
support
> +  //   Instructions as labels in LLVM codegen.
> +  //   * We actually store the PC on each increment, while we should
initialize
> +  //   this array at link time (need to refactor this code a bit).
> +  //
> +  Builder.CreateStore(
> +      Builder.CreatePointerCast(
> +         
cast<llvm::Instruction>(Count)->getParent()->getParent(),
> +          Builder.getInt64Ty()  // FIXME: use a better type
> +          ),
> +      Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter));
>  }
>  
> Index: lib/CodeGen/CodeGenPGO.h
> ==================================================================> ---
lib/CodeGen/CodeGenPGO.h    (revision 201843)
> +++ lib/CodeGen/CodeGenPGO.h    (working copy)
> @@ -59,6 +59,7 @@
>  
>    unsigned NumRegionCounters;
>    llvm::GlobalVariable *RegionCounters;
> +  llvm::GlobalVariable *RegionPCs;
>    llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap;
>    llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap;
>    std::vector<uint64_t> *RegionCounts;
> @@ -66,8 +67,9 @@
>  
>  public:
>    CodeGenPGO(CodeGenModule &CGM)
> -    : CGM(CGM), NumRegionCounters(0), RegionCounters(0),
RegionCounterMap(0),
> -      StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {}
> +      : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionPCs(0),
> +        RegionCounterMap(0), StmtCountMap(0), RegionCounts(0),
> +        CurrentRegionCount(0) {}
>    ~CodeGenPGO() {}
>  
>    /// Whether or not we have PGO region data for the current function.
This is
> 
> 
> 
> 
>  
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140221/2f12cf01/attachment.html>

Kostya Serebryany

2014-Feb-22 04:43 UTC

head link

[LLVMdev] asan coverage

I understand why you don't want to rely on debug info and instead produce
your own section.
We did this with our early version of llvm-based tsan and it was simpler to
implement.
But here is a data point to support my suggestion:
chromium binary built with asan, coverage and -gline-tables-only is 1.6Gb.
The same binary is 1.1Gb when stripped, so, the line tables require 500Mb.
Separate line info for coverage will essentially double this amount.
The size of binary is a serious concern for our users, please take it into
consideration.

Thanks!
--kcc



On Fri, Feb 21, 2014 at 8:28 PM, Bob Wilson <bob.wilson at apple.com>
wrote:
> We’re not going to use debug info at all. We’re emitting the counters in
> the clang front-end. We just need to emit separate info to show how to map
> those counters to source locations. Mapping to PCs and then using debug
> info to get from the PCs to the source locations just makes things harder
> and loses information in the process.
>
> On Feb 21, 2014, at 2:57 AM, Kostya Serebryany <kcc at google.com>
wrote:
>
>
>>
>> We may need some additional info.
>
> What kind of additional info?
>
>
>> I haven't put a ton of thought into
>> this, but I'm hoping we can either (a) use debug info as is or add
some
>> extra (valid) debug info to support this, or (b) add an extra
>> debug-info-like section to instrumented binaries with the information
we
>> need.
>>
>
> I'd try this data format (binary equivalent):
>
> /path/to/binary/or/dso1 num_counters1
> pc1 counter1
> pc2 counter2
> pc3 counter3
> ...
> /path/to/binary/or/dso2 num_counters2
> pc1 counter1
> pc2 counter2
> pc3 counter3
> ...
>
> I don't see a straightforward way to produce such data today because
> individual Instructions do not work as labels.
> But I think this can be supported in LLVM codegen.
> Here is a *raw* patch with comments, just to get the idea.
>
>
> Index: lib/CodeGen/CodeGenPGO.cpp
> ==================================================================> ---
lib/CodeGen/CodeGenPGO.cpp  (revision 201843)
> +++ lib/CodeGen/CodeGenPGO.cpp  (working copy)
> @@ -199,7 +199,8 @@
>    llvm::Type *Args[] = {
>      Int8PtrTy,                       // const char *MangledName
>      Int32Ty,                         // uint32_t NumCounters
> -    Int64PtrTy                       // uint64_t *Counters
> +    Int64PtrTy,                       // uint64_t *Counters
> +    Int64PtrTy                       // uint64_t *PCs
>    };
>    llvm::FunctionType *FTy >     
llvm::FunctionType::get(PGOBuilder.getVoidTy(), Args, false);
> @@ -209,9 +210,10 @@
>    llvm::Constant *MangledName >     
CGM.GetAddrOfConstantCString(CGM.getMangledName(GD),
> "__llvm_pgo_name");
>    MangledName = llvm::ConstantExpr::getBitCast(MangledName, Int8PtrTy);
> -  PGOBuilder.CreateCall3(EmitFunc, MangledName,
> +  PGOBuilder.CreateCall4(EmitFunc, MangledName,
>                           PGOBuilder.getInt32(NumRegionCounters),
> -                         PGOBuilder.CreateBitCast(RegionCounters,
> Int64PtrTy));
> +                         PGOBuilder.CreateBitCast(RegionCounters,
> Int64PtrTy),
> +                         PGOBuilder.CreateBitCast(RegionPCs, Int64PtrTy));
>  }
>
>  llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule &CGM) {
> @@ -769,6 +771,13 @@
>                               llvm::GlobalVariable::PrivateLinkage,
>                               llvm::Constant::getNullValue(CounterTy),
>                               "__llvm_pgo_ctr");
> +
> +  RegionPCs > +    new llvm::GlobalVariable(CGM.getModule(), CounterTy,
false,
> +                             llvm::GlobalVariable::PrivateLinkage,
> +                             llvm::Constant::getNullValue(CounterTy),
> +                             "__llvm_pgo_pcs");
> +
>  }
>
>  void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, unsigned
> Counter) {
> @@ -779,6 +788,21 @@
>    llvm::Value *Count = Builder.CreateLoad(Addr, "pgocount");
>    Count = Builder.CreateAdd(Count, Builder.getInt64(1));
>    Builder.CreateStore(Count, Addr);
> +  // We should put the PC of the instruction that increments
> __llvm_pgo_ctr
> +  // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit.
> +  // This patch is wrong in many ways:
> +  //   * We pass the PC of the Function instead of the PC of the
> Instruction,
> +  //   because the latter doesn't work like this. We'll need to
support
> +  //   Instructions as labels in LLVM codegen.
> +  //   * We actually store the PC on each increment, while we should
> initialize
> +  //   this array at link time (need to refactor this code a bit).
> +  //
> +  Builder.CreateStore(
> +      Builder.CreatePointerCast(
> +         
cast<llvm::Instruction>(Count)->getParent()->getParent(),
> +          Builder.getInt64Ty()  // FIXME: use a better type
> +          ),
> +      Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter));
>  }
>
> Index: lib/CodeGen/CodeGenPGO.h
> ==================================================================> ---
lib/CodeGen/CodeGenPGO.h    (revision 201843)
> +++ lib/CodeGen/CodeGenPGO.h    (working copy)
> @@ -59,6 +59,7 @@
>
>    unsigned NumRegionCounters;
>    llvm::GlobalVariable *RegionCounters;
> +  llvm::GlobalVariable *RegionPCs;
>    llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap;
>    llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap;
>    std::vector<uint64_t> *RegionCounts;
> @@ -66,8 +67,9 @@
>
>  public:
>    CodeGenPGO(CodeGenModule &CGM)
> -    : CGM(CGM), NumRegionCounters(0), RegionCounters(0),
> RegionCounterMap(0),
> -      StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {}
> +      : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionPCs(0),
> +        RegionCounterMap(0), StmtCountMap(0), RegionCounts(0),
> +        CurrentRegionCount(0) {}
>    ~CodeGenPGO() {}
>
>    /// Whether or not we have PGO region data for the current function.
> This is
>
>
>
>
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140222/0a5ed0f2/attachment.html>

Bob Wilson

2014-Feb-22 04:58 UTC

head link

[LLVMdev] asan coverage

Why is the binary size a concern for coverage testing?

On Feb 21, 2014, at 8:43 PM, Kostya Serebryany <kcc at google.com> wrote:
> I understand why you don't want to rely on debug info and instead
produce your own section.
> We did this with our early version of llvm-based tsan and it was simpler to
implement.
> But here is a data point to support my suggestion: 
> chromium binary built with asan, coverage and -gline-tables-only is 1.6Gb.
> The same binary is 1.1Gb when stripped, so, the line tables require 500Mb. 
> Separate line info for coverage will essentially double this amount. 
> The size of binary is a serious concern for our users, please take it into
consideration.
> 
> Thanks! 
> --kcc 
> 
> 
> 
> On Fri, Feb 21, 2014 at 8:28 PM, Bob Wilson <bob.wilson at apple.com>
wrote:
> We’re not going to use debug info at all. We’re emitting the counters in
the clang front-end. We just need to emit separate info to show how to map those
counters to source locations. Mapping to PCs and then using debug info to get
from the PCs to the source locations just makes things harder and loses
information in the process.
> 
> On Feb 21, 2014, at 2:57 AM, Kostya Serebryany <kcc at google.com>
wrote:
> 
>> 
>> 
>> We may need some additional info.
>> What kind of additional info? 
>>  
>> I haven't put a ton of thought into
>> this, but I'm hoping we can either (a) use debug info as is or add
some
>> extra (valid) debug info to support this, or (b) add an extra
>> debug-info-like section to instrumented binaries with the information
we
>> need.
>> 
>> I'd try this data format (binary equivalent): 
>> 
>> /path/to/binary/or/dso1 num_counters1
>> pc1 counter1
>> pc2 counter2
>> pc3 counter3
>> ...
>> /path/to/binary/or/dso2 num_counters2
>> pc1 counter1
>> pc2 counter2
>> pc3 counter3
>> ...
>> 
>> I don't see a straightforward way to produce such data today
because individual Instructions do not work as labels.
>> But I think this can be supported in LLVM codegen. 
>> Here is a *raw* patch with comments, just to get the idea.
>> 
>> 
>> Index: lib/CodeGen/CodeGenPGO.cpp
>>
==================================================================>> ---
lib/CodeGen/CodeGenPGO.cpp  (revision 201843)
>> +++ lib/CodeGen/CodeGenPGO.cpp  (working copy)
>> @@ -199,7 +199,8 @@
>>    llvm::Type *Args[] = {
>>      Int8PtrTy,                       // const char *MangledName
>>      Int32Ty,                         // uint32_t NumCounters
>> -    Int64PtrTy                       // uint64_t *Counters
>> +    Int64PtrTy,                       // uint64_t *Counters
>> +    Int64PtrTy                       // uint64_t *PCs
>>    };
>>    llvm::FunctionType *FTy >>     
llvm::FunctionType::get(PGOBuilder.getVoidTy(), Args, false);
>> @@ -209,9 +210,10 @@
>>    llvm::Constant *MangledName >>     
CGM.GetAddrOfConstantCString(CGM.getMangledName(GD),
"__llvm_pgo_name");
>>    MangledName = llvm::ConstantExpr::getBitCast(MangledName,
Int8PtrTy);
>> -  PGOBuilder.CreateCall3(EmitFunc, MangledName,
>> +  PGOBuilder.CreateCall4(EmitFunc, MangledName,
>>                           PGOBuilder.getInt32(NumRegionCounters),
>> -                         PGOBuilder.CreateBitCast(RegionCounters,
Int64PtrTy));
>> +                         PGOBuilder.CreateBitCast(RegionCounters,
Int64PtrTy),
>> +                         PGOBuilder.CreateBitCast(RegionPCs,
Int64PtrTy));
>>  }
>>  
>>  llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule &CGM)
{
>> @@ -769,6 +771,13 @@
>>                               llvm::GlobalVariable::PrivateLinkage,
>>                               llvm::Constant::getNullValue(CounterTy),
>>                               "__llvm_pgo_ctr");
>> +
>> +  RegionPCs >> +    new llvm::GlobalVariable(CGM.getModule(),
CounterTy, false,
>> +                             llvm::GlobalVariable::PrivateLinkage,
>> +                             llvm::Constant::getNullValue(CounterTy),
>> +                             "__llvm_pgo_pcs");
>> +
>>  }
>>  
>>  void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder,
unsigned Counter) {
>> @@ -779,6 +788,21 @@
>>    llvm::Value *Count = Builder.CreateLoad(Addr, "pgocount");
>>    Count = Builder.CreateAdd(Count, Builder.getInt64(1));
>>    Builder.CreateStore(Count, Addr);
>> +  // We should put the PC of the instruction that increments
__llvm_pgo_ctr
>> +  // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit.
>> +  // This patch is wrong in many ways:
>> +  //   * We pass the PC of the Function instead of the PC of the
Instruction,
>> +  //   because the latter doesn't work like this. We'll need
to support
>> +  //   Instructions as labels in LLVM codegen.
>> +  //   * We actually store the PC on each increment, while we should
initialize
>> +  //   this array at link time (need to refactor this code a bit).
>> +  //
>> +  Builder.CreateStore(
>> +      Builder.CreatePointerCast(
>> +         
cast<llvm::Instruction>(Count)->getParent()->getParent(),
>> +          Builder.getInt64Ty()  // FIXME: use a better type
>> +          ),
>> +      Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter));
>>  }
>>  
>> Index: lib/CodeGen/CodeGenPGO.h
>>
==================================================================>> ---
lib/CodeGen/CodeGenPGO.h    (revision 201843)
>> +++ lib/CodeGen/CodeGenPGO.h    (working copy)
>> @@ -59,6 +59,7 @@
>>  
>>    unsigned NumRegionCounters;
>>    llvm::GlobalVariable *RegionCounters;
>> +  llvm::GlobalVariable *RegionPCs;
>>    llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap;
>>    llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap;
>>    std::vector<uint64_t> *RegionCounts;
>> @@ -66,8 +67,9 @@
>>  
>>  public:
>>    CodeGenPGO(CodeGenModule &CGM)
>> -    : CGM(CGM), NumRegionCounters(0), RegionCounters(0),
RegionCounterMap(0),
>> -      StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {}
>> +      : CGM(CGM), NumRegionCounters(0), RegionCounters(0),
RegionPCs(0),
>> +        RegionCounterMap(0), StmtCountMap(0), RegionCounts(0),
>> +        CurrentRegionCount(0) {}
>>    ~CodeGenPGO() {}
>>  
>>    /// Whether or not we have PGO region data for the current function.
This is
>> 
>> 
>> 
>> 
>>  
>> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140221/4af3903a/attachment.html>

Kostya Serebryany

2014-Feb-22 05:13 UTC

head link

[LLVMdev] asan coverage

Our users combine asan and coverage testing and they do it on thousands
machines.
(An older blog post about using asan:
http://blog.chromium.org/2012/04/fuzzing-for-security.html)
The binaries need to be shipped to virtual machines, where they will be run.
The VMs are *very* short of disk and the network bandwidth has a cost too.
We may be able to ship stripped binaries to those machine but this will
complicate the logic immensely.

Besides, zip-ed binaries are stored for several revisions every day and the
storage also costs money.
Just to give you the taste (
https://commondatastorage.googleapis.com/chromium-browser-asan/index.html):
asan-symbolized-linux-release-252010.zip 2014-02-19 14:34:24 406.35MB
asan-symbolized-linux-release-252017.zip 2014-02-19 18:22:54 406.41MB
asan-symbolized-linux-release-252025.zip 2014-02-19 21:35:49 406.35MB
asan-symbolized-linux-release-252031.zip 2014-02-20 00:44:25 406.35MB
asan-symbolized-linux-release-252160.zip 2014-02-20 06:30:16 406.34MB
asan-symbolized-linux-release-252185.zip 2014-02-20 09:21:47 408.52MB
asan-symbolized-linux-release-252188.zip 2014-02-20 12:20:05 408.52MB
asan-symbolized-linux-release-252194.zip 2014-02-20 15:01:05 408.52MB
asan-symbolized-linux-release-252218.zip 2014-02-20 18:00:42 408.54MB
asan-symbolized-linux-release-252265.zip 2014-02-20 21:00:03 408.65MB
asan-symbolized-linux-release-252272.zip 2014-02-21 00:00:40 408.66MB

--kcc


On Sat, Feb 22, 2014 at 8:58 AM, Bob Wilson <bob.wilson at apple.com>
wrote:
> Why is the binary size a concern for coverage testing?
>
> On Feb 21, 2014, at 8:43 PM, Kostya Serebryany <kcc at google.com>
wrote:
>
> I understand why you don't want to rely on debug info and instead
produce
> your own section.
> We did this with our early version of llvm-based tsan and it was simpler
> to implement.
> But here is a data point to support my suggestion:
> chromium binary built with asan, coverage and -gline-tables-only is 1.6Gb.
> The same binary is 1.1Gb when stripped, so, the line tables require 500Mb.
> Separate line info for coverage will essentially double this amount.
> The size of binary is a serious concern for our users, please take it into
> consideration.
>
> Thanks!
> --kcc
>
>
>
> On Fri, Feb 21, 2014 at 8:28 PM, Bob Wilson <bob.wilson at apple.com>
wrote:
>
>> We’re not going to use debug info at all. We’re emitting the counters
in
>> the clang front-end. We just need to emit separate info to show how to
map
>> those counters to source locations. Mapping to PCs and then using debug
>> info to get from the PCs to the source locations just makes things
harder
>> and loses information in the process.
>>
>> On Feb 21, 2014, at 2:57 AM, Kostya Serebryany <kcc at
google.com> wrote:
>>
>>
>>>
>>> We may need some additional info.
>>
>> What kind of additional info?
>>
>>
>>> I haven't put a ton of thought into
>>> this, but I'm hoping we can either (a) use debug info as is or
add some
>>> extra (valid) debug info to support this, or (b) add an extra
>>> debug-info-like section to instrumented binaries with the
information we
>>> need.
>>>
>>
>> I'd try this data format (binary equivalent):
>>
>> /path/to/binary/or/dso1 num_counters1
>> pc1 counter1
>> pc2 counter2
>> pc3 counter3
>> ...
>> /path/to/binary/or/dso2 num_counters2
>> pc1 counter1
>> pc2 counter2
>> pc3 counter3
>> ...
>>
>> I don't see a straightforward way to produce such data today
because
>> individual Instructions do not work as labels.
>> But I think this can be supported in LLVM codegen.
>> Here is a *raw* patch with comments, just to get the idea.
>>
>>
>> Index: lib/CodeGen/CodeGenPGO.cpp
>>
==================================================================>> ---
lib/CodeGen/CodeGenPGO.cpp  (revision 201843)
>> +++ lib/CodeGen/CodeGenPGO.cpp  (working copy)
>> @@ -199,7 +199,8 @@
>>     llvm::Type *Args[] = {
>>      Int8PtrTy,                       // const char *MangledName
>>      Int32Ty,                         // uint32_t NumCounters
>> -    Int64PtrTy                       // uint64_t *Counters
>> +    Int64PtrTy,                       // uint64_t *Counters
>> +    Int64PtrTy                       // uint64_t *PCs
>>    };
>>    llvm::FunctionType *FTy >>     
llvm::FunctionType::get(PGOBuilder.getVoidTy(), Args, false);
>> @@ -209,9 +210,10 @@
>>    llvm::Constant *MangledName >>     
CGM.GetAddrOfConstantCString(CGM.getMangledName(GD),
>> "__llvm_pgo_name");
>>    MangledName = llvm::ConstantExpr::getBitCast(MangledName,
Int8PtrTy);
>> -  PGOBuilder.CreateCall3(EmitFunc, MangledName,
>> +  PGOBuilder.CreateCall4(EmitFunc, MangledName,
>>                           PGOBuilder.getInt32(NumRegionCounters),
>> -                         PGOBuilder.CreateBitCast(RegionCounters,
>> Int64PtrTy));
>> +                         PGOBuilder.CreateBitCast(RegionCounters,
>> Int64PtrTy),
>> +                         PGOBuilder.CreateBitCast(RegionPCs,
>> Int64PtrTy));
>>  }
>>
>>  llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule &CGM)
{
>> @@ -769,6 +771,13 @@
>>                               llvm::GlobalVariable::PrivateLinkage,
>>                               llvm::Constant::getNullValue(CounterTy),
>>                               "__llvm_pgo_ctr");
>> +
>> +  RegionPCs >> +    new llvm::GlobalVariable(CGM.getModule(),
CounterTy, false,
>> +                             llvm::GlobalVariable::PrivateLinkage,
>> +                             llvm::Constant::getNullValue(CounterTy),
>> +                             "__llvm_pgo_pcs");
>> +
>>  }
>>
>>  void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder,
unsigned
>> Counter) {
>> @@ -779,6 +788,21 @@
>>    llvm::Value *Count = Builder.CreateLoad(Addr, "pgocount");
>>    Count = Builder.CreateAdd(Count, Builder.getInt64(1));
>>    Builder.CreateStore(Count, Addr);
>> +  // We should put the PC of the instruction that increments
>> __llvm_pgo_ctr
>> +  // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit.
>> +  // This patch is wrong in many ways:
>> +  //   * We pass the PC of the Function instead of the PC of the
>> Instruction,
>> +  //   because the latter doesn't work like this. We'll need
to support
>> +  //   Instructions as labels in LLVM codegen.
>> +  //   * We actually store the PC on each increment, while we should
>> initialize
>> +  //   this array at link time (need to refactor this code a bit).
>> +  //
>> +  Builder.CreateStore(
>> +      Builder.CreatePointerCast(
>> +         
cast<llvm::Instruction>(Count)->getParent()->getParent(),
>> +          Builder.getInt64Ty()  // FIXME: use a better type
>> +          ),
>> +      Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter));
>>  }
>>
>> Index: lib/CodeGen/CodeGenPGO.h
>>
==================================================================>> ---
lib/CodeGen/CodeGenPGO.h    (revision 201843)
>> +++ lib/CodeGen/CodeGenPGO.h    (working copy)
>> @@ -59,6 +59,7 @@
>>
>>    unsigned NumRegionCounters;
>>    llvm::GlobalVariable *RegionCounters;
>> +  llvm::GlobalVariable *RegionPCs;
>>    llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap;
>>    llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap;
>>    std::vector<uint64_t> *RegionCounts;
>> @@ -66,8 +67,9 @@
>>
>>  public:
>>    CodeGenPGO(CodeGenModule &CGM)
>> -    : CGM(CGM), NumRegionCounters(0), RegionCounters(0),
>> RegionCounterMap(0),
>> -      StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {}
>> +      : CGM(CGM), NumRegionCounters(0), RegionCounters(0),
RegionPCs(0),
>> +        RegionCounterMap(0), StmtCountMap(0), RegionCounts(0),
>> +        CurrentRegionCount(0) {}
>>    ~CodeGenPGO() {}
>>
>>    /// Whether or not we have PGO region data for the current function.
>> This is
>>
>>
>>
>>
>>
>>
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140222/68bb07f8/attachment.html>

Bob Wilson

2014-Mar-28 21:59 UTC

head link

[LLVMdev] asan coverage

On Mar 28, 2014, at 1:33 AM, Kostya Serebryany <kcc at google.com> wrote:
> Some more data on code size. 
> 
> I've build CPU2006/483.xalancbmk with 
> a) -O2 -fsanitize=address -m64 -gline-tables-only -mllvm -asan-coverage=1
> b) -O2 -fsanitize=address -m64 -gline-tables-only -fprofile-instr-generate
> 
> The first is 27Mb and the second is 48Mb. 
> 
> The extra size comes from __llvm_prf_* sections. 
> You may be able to make these sections more compact, but you will not make
them tiny.
> 
> The instrumentation code generated by -asan-coverage=1 is less efficient
than -fprofile-instr-generate
> in several ways (slower, fatter, provides less data).
> But it does not add any extra sections to the binary and wins in the
overall binary size.
> Ideally, I'd like to see such options for -fprofile-instr-generate as
well.
> 
> —kcc 
It might make sense to move at least some of the counters into the .bss section
so they don’t take up space in the executable.

We’re also seeing that the instrumentation bloats the code size much more than
expected and we’re still investigating to see why that is the case.
> 
> 
> 
> On Sat, Feb 22, 2014 at 9:13 AM, Kostya Serebryany <kcc at
google.com> wrote:
> Our users combine asan and coverage testing and they do it on thousands
machines.
> (An older blog post about using asan:
http://blog.chromium.org/2012/04/fuzzing-for-security.html)
> The binaries need to be shipped to virtual machines, where they will be
run.
> The VMs are *very* short of disk and the network bandwidth has a cost too.
> We may be able to ship stripped binaries to those machine but this will
complicate the logic immensely.
> 
> Besides, zip-ed binaries are stored for several revisions every day and the
storage also costs money.
> Just to give you the taste
(https://commondatastorage.googleapis.com/chromium-browser-asan/index.html):
> asan-symbolized-linux-release-252010.zip 2014-02-19 14:34:24 406.35MB
> asan-symbolized-linux-release-252017.zip 2014-02-19 18:22:54 406.41MB
> asan-symbolized-linux-release-252025.zip 2014-02-19 21:35:49 406.35MB
> asan-symbolized-linux-release-252031.zip 2014-02-20 00:44:25 406.35MB
> asan-symbolized-linux-release-252160.zip 2014-02-20 06:30:16 406.34MB
> asan-symbolized-linux-release-252185.zip 2014-02-20 09:21:47 408.52MB
> asan-symbolized-linux-release-252188.zip 2014-02-20 12:20:05 408.52MB
> asan-symbolized-linux-release-252194.zip 2014-02-20 15:01:05 408.52MB
> asan-symbolized-linux-release-252218.zip 2014-02-20 18:00:42 408.54MB
> asan-symbolized-linux-release-252265.zip 2014-02-20 21:00:03 408.65MB
> asan-symbolized-linux-release-252272.zip 2014-02-21 00:00:40 408.66MB
> 
> --kcc 
> 
> 
> On Sat, Feb 22, 2014 at 8:58 AM, Bob Wilson <bob.wilson at apple.com>
wrote:
> Why is the binary size a concern for coverage testing?
> 
> On Feb 21, 2014, at 8:43 PM, Kostya Serebryany <kcc at google.com>
wrote:
> 
>> I understand why you don't want to rely on debug info and instead
produce your own section.
>> We did this with our early version of llvm-based tsan and it was
simpler to implement.
>> But here is a data point to support my suggestion: 
>> chromium binary built with asan, coverage and -gline-tables-only is
1.6Gb.
>> The same binary is 1.1Gb when stripped, so, the line tables require
500Mb.
>> Separate line info for coverage will essentially double this amount. 
>> The size of binary is a serious concern for our users, please take it
into consideration.
>> 
>> Thanks! 
>> --kcc 
>> 
>> 
>> 
>> On Fri, Feb 21, 2014 at 8:28 PM, Bob Wilson <bob.wilson at
apple.com> wrote:
>> We’re not going to use debug info at all. We’re emitting the counters
in the clang front-end. We just need to emit separate info to show how to map
those counters to source locations. Mapping to PCs and then using debug info to
get from the PCs to the source locations just makes things harder and loses
information in the process.
>> 
>> On Feb 21, 2014, at 2:57 AM, Kostya Serebryany <kcc at
google.com> wrote:
>> 
>>> 
>>> 
>>> We may need some additional info.
>>> What kind of additional info? 
>>>  
>>> I haven't put a ton of thought into
>>> this, but I'm hoping we can either (a) use debug info as is or
add some
>>> extra (valid) debug info to support this, or (b) add an extra
>>> debug-info-like section to instrumented binaries with the
information we
>>> need.
>>> 
>>> I'd try this data format (binary equivalent): 
>>> 
>>> /path/to/binary/or/dso1 num_counters1
>>> pc1 counter1
>>> pc2 counter2
>>> pc3 counter3
>>> ...
>>> /path/to/binary/or/dso2 num_counters2
>>> pc1 counter1
>>> pc2 counter2
>>> pc3 counter3
>>> ...
>>> 
>>> I don't see a straightforward way to produce such data today
because individual Instructions do not work as labels.
>>> But I think this can be supported in LLVM codegen. 
>>> Here is a *raw* patch with comments, just to get the idea.
>>> 
>>> 
>>> Index: lib/CodeGen/CodeGenPGO.cpp
>>>
==================================================================>>>
--- lib/CodeGen/CodeGenPGO.cpp  (revision 201843)
>>> +++ lib/CodeGen/CodeGenPGO.cpp  (working copy)
>>> @@ -199,7 +199,8 @@
>>>    llvm::Type *Args[] = {
>>>      Int8PtrTy,                       // const char *MangledName
>>>      Int32Ty,                         // uint32_t NumCounters
>>> -    Int64PtrTy                       // uint64_t *Counters
>>> +    Int64PtrTy,                       // uint64_t *Counters
>>> +    Int64PtrTy                       // uint64_t *PCs
>>>    };
>>>    llvm::FunctionType *FTy >>>     
llvm::FunctionType::get(PGOBuilder.getVoidTy(), Args, false);
>>> @@ -209,9 +210,10 @@
>>>    llvm::Constant *MangledName >>>     
CGM.GetAddrOfConstantCString(CGM.getMangledName(GD),
"__llvm_pgo_name");
>>>    MangledName = llvm::ConstantExpr::getBitCast(MangledName,
Int8PtrTy);
>>> -  PGOBuilder.CreateCall3(EmitFunc, MangledName,
>>> +  PGOBuilder.CreateCall4(EmitFunc, MangledName,
>>>                           PGOBuilder.getInt32(NumRegionCounters),
>>> -                         PGOBuilder.CreateBitCast(RegionCounters,
Int64PtrTy));
>>> +                         PGOBuilder.CreateBitCast(RegionCounters,
Int64PtrTy),
>>> +                         PGOBuilder.CreateBitCast(RegionPCs,
Int64PtrTy));
>>>  }
>>>  
>>>  llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule
&CGM) {
>>> @@ -769,6 +771,13 @@
>>>                               llvm::GlobalVariable::PrivateLinkage,
>>>                              
llvm::Constant::getNullValue(CounterTy),
>>>                               "__llvm_pgo_ctr");
>>> +
>>> +  RegionPCs >>> +    new
llvm::GlobalVariable(CGM.getModule(), CounterTy, false,
>>> +                             llvm::GlobalVariable::PrivateLinkage,
>>> +                            
llvm::Constant::getNullValue(CounterTy),
>>> +                             "__llvm_pgo_pcs");
>>> +
>>>  }
>>>  
>>>  void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder,
unsigned Counter) {
>>> @@ -779,6 +788,21 @@
>>>    llvm::Value *Count = Builder.CreateLoad(Addr,
"pgocount");
>>>    Count = Builder.CreateAdd(Count, Builder.getInt64(1));
>>>    Builder.CreateStore(Count, Addr);
>>> +  // We should put the PC of the instruction that increments
__llvm_pgo_ctr
>>> +  // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit.
>>> +  // This patch is wrong in many ways:
>>> +  //   * We pass the PC of the Function instead of the PC of the
Instruction,
>>> +  //   because the latter doesn't work like this. We'll
need to support
>>> +  //   Instructions as labels in LLVM codegen.
>>> +  //   * We actually store the PC on each increment, while we
should initialize
>>> +  //   this array at link time (need to refactor this code a bit).
>>> +  //
>>> +  Builder.CreateStore(
>>> +      Builder.CreatePointerCast(
>>> +         
cast<llvm::Instruction>(Count)->getParent()->getParent(),
>>> +          Builder.getInt64Ty()  // FIXME: use a better type
>>> +          ),
>>> +      Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter));
>>>  }
>>>  
>>> Index: lib/CodeGen/CodeGenPGO.h
>>>
==================================================================>>>
--- lib/CodeGen/CodeGenPGO.h    (revision 201843)
>>> +++ lib/CodeGen/CodeGenPGO.h    (working copy)
>>> @@ -59,6 +59,7 @@
>>>  
>>>    unsigned NumRegionCounters;
>>>    llvm::GlobalVariable *RegionCounters;
>>> +  llvm::GlobalVariable *RegionPCs;
>>>    llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap;
>>>    llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap;
>>>    std::vector<uint64_t> *RegionCounts;
>>> @@ -66,8 +67,9 @@
>>>  
>>>  public:
>>>    CodeGenPGO(CodeGenModule &CGM)
>>> -    : CGM(CGM), NumRegionCounters(0), RegionCounters(0),
RegionCounterMap(0),
>>> -      StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {}
>>> +      : CGM(CGM), NumRegionCounters(0), RegionCounters(0),
RegionPCs(0),
>>> +        RegionCounterMap(0), StmtCountMap(0), RegionCounts(0),
>>> +        CurrentRegionCount(0) {}
>>>    ~CodeGenPGO() {}
>>>  
>>>    /// Whether or not we have PGO region data for the current
function. This is
>>> 
>>> 
>>> 
>>> 
>>>  
>>> 
>> 
>> 
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140328/25c0025f/attachment.html>

Duncan P. N. Exon Smith

2014-Mar-28 22:18 UTC

head link

[LLVMdev] asan coverage

On 2014 Mar 28, at 14:59, Bob Wilson <bob.wilson at apple.com> wrote:
> 
> On Mar 28, 2014, at 1:33 AM, Kostya Serebryany <kcc at google.com>
wrote:
> 
>> Some more data on code size. 
>> 
>> I've build CPU2006/483.xalancbmk with 
>> a) -O2 -fsanitize=address -m64 -gline-tables-only -mllvm
-asan-coverage=1
>> b) -O2 -fsanitize=address -m64 -gline-tables-only
-fprofile-instr-generate
>> 
>> The first is 27Mb and the second is 48Mb. 
>> 
>> The extra size comes from __llvm_prf_* sections. 
>> You may be able to make these sections more compact, but you will not
make them tiny.
>> 
>> The instrumentation code generated by -asan-coverage=1 is less
efficient than -fprofile-instr-generate
>> in several ways (slower, fatter, provides less data).
>> But it does not add any extra sections to the binary and wins in the
overall binary size.
>> Ideally, I'd like to see such options for -fprofile-instr-generate
as well.
>> 
>> —kcc 
> 
> It might make sense to move at least some of the counters into the .bss
section so they don’t take up space in the executable.
> 
> We’re also seeing that the instrumentation bloats the code size much more
than expected and we’re still investigating to see why that is the case.
The __llvm_prf_cnts section is likely the largest.  It's zero-initialized,
so it's a good candidate for .bss or similar.  The counters are currently in
their own section to make write out easy (just one big array), but we could
change that.  Or, is there linker magic that can make a special section
behave like the .bss?

Eric Christopher

2014-Mar-28 22:47 UTC

head link

[LLVMdev] asan coverage

On Fri, Mar 28, 2014 at 3:18 PM, Duncan P. N. Exon Smith
<dexonsmith at apple.com> wrote:>
> On 2014 Mar 28, at 14:59, Bob Wilson <bob.wilson at apple.com> wrote:
>
>>
>> On Mar 28, 2014, at 1:33 AM, Kostya Serebryany <kcc at
google.com> wrote:
>>
>>> Some more data on code size.
>>>
>>> I've build CPU2006/483.xalancbmk with
>>> a) -O2 -fsanitize=address -m64 -gline-tables-only -mllvm
-asan-coverage=1
>>> b) -O2 -fsanitize=address -m64 -gline-tables-only
-fprofile-instr-generate
>>>
>>> The first is 27Mb and the second is 48Mb.
>>>
>>> The extra size comes from __llvm_prf_* sections.
>>> You may be able to make these sections more compact, but you will
not make them tiny.
>>>
>>> The instrumentation code generated by -asan-coverage=1 is less
efficient than -fprofile-instr-generate
>>> in several ways (slower, fatter, provides less data).
>>> But it does not add any extra sections to the binary and wins in
the overall binary size.
>>> Ideally, I'd like to see such options for
-fprofile-instr-generate as well.
>>>
>>> --kcc
>>
>> It might make sense to move at least some of the counters into the .bss
section so they don't take up space in the executable.
>>
>> We're also seeing that the instrumentation bloats the code size
much more than expected and we're still investigating to see why that is the
case.
>
> The __llvm_prf_cnts section is likely the largest.  It's
zero-initialized,
> so it's a good candidate for .bss or similar.  The counters are
currently in
> their own section to make write out easy (just one big array), but we could
> change that.  Or, is there linker magic that can make a special section
> behave like the .bss?
Possibly. The zerofill stuff is a bit weird, but you should be able to
specify a large enough block to zerofill and a concrete section. A
separate section with the S_ZEROFILL attribute would probably work to
get it all initialized and just switch sections and not use the
zerofill directive.

-eric
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Duncan P. N. Exon Smith

2014-Mar-29 00:02 UTC

head link

[LLVMdev] asan coverage

On 2014 Mar 28, at 15:47, Eric Christopher <echristo at gmail.com> wrote:
> On Fri, Mar 28, 2014 at 3:18 PM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>> 
>> On 2014 Mar 28, at 14:59, Bob Wilson <bob.wilson at apple.com>
wrote:
>> 
>>> 
>>> On Mar 28, 2014, at 1:33 AM, Kostya Serebryany <kcc at
google.com> wrote:
>>> 
>>>> Some more data on code size.
>>>> 
>>>> I've build CPU2006/483.xalancbmk with
>>>> a) -O2 -fsanitize=address -m64 -gline-tables-only -mllvm
-asan-coverage=1
>>>> b) -O2 -fsanitize=address -m64 -gline-tables-only
-fprofile-instr-generate
>>>> 
>>>> The first is 27Mb and the second is 48Mb.
>>>> 
>>>> The extra size comes from __llvm_prf_* sections.
>>>> You may be able to make these sections more compact, but you
will not make them tiny.
>>>> 
>>>> The instrumentation code generated by -asan-coverage=1 is less
efficient than -fprofile-instr-generate
>>>> in several ways (slower, fatter, provides less data).
>>>> But it does not add any extra sections to the binary and wins
in the overall binary size.
>>>> Ideally, I'd like to see such options for
-fprofile-instr-generate as well.
>>>> 
>>>> --kcc
>>> 
>>> It might make sense to move at least some of the counters into the
.bss section so they don't take up space in the executable.
>>> 
>>> We're also seeing that the instrumentation bloats the code size
much more than expected and we're still investigating to see why that is the
case.
>> 
>> The __llvm_prf_cnts section is likely the largest.  It's
zero-initialized,
>> so it's a good candidate for .bss or similar.  The counters are
currently in
>> their own section to make write out easy (just one big array), but we
could
>> change that.  Or, is there linker magic that can make a special section
>> behave like the .bss?
> 
> Possibly. The zerofill stuff is a bit weird, but you should be able to
> specify a large enough block to zerofill and a concrete section. A
> separate section with the S_ZEROFILL attribute would probably work to
> get it all initialized and just switch sections and not use the
> zerofill directive.
> 
> -eric
Heh, I'm a little lost here.  Where can we specify this?  I had a look
in MCSectionMachO.cpp, and S_ZEROFILL isn't accessible from LLVM IR.
Should we add logic somewhere to recognize these sections?  Will that
actually reduce the executable size?  (I tried hacking it in but that
didn't seem to save disk space.)

Also, this doesn't solve ELF.  Can we do similar things there?

For clarity, there are three __llvm_prf_* sections.  Without having
seen Kostya's data, I'm speculating that __llvm_prf_cnts is the largest
section.

  - __llvm_prf_data is 32B per function, and has pointers into the
    other two sections.  This section is necessary to avoid static
    initialization (implemented on Darwin, not quite on ELF).

  - __llvm_prf_names is the mangled name of every function.  It should
    be on the same order of magnitude as __llvm_prf_data.  This
    section is convenient for writing out the profile, since the names
    are effectively placed in one big char array whose bounds are known
    at link time.

  - __llvm_prf_cnts is 8B per region counter.  Each function has one
    at entry and roughly one per CFG-relevant AST node (for, if, etc.).
    This section is convenient for writing out the profile, since the
    counters are effectively placed in one big array whose bounds are
    known at link time.  However, I don't think the data in this
    section needs to be explicitly stored in the executable if we can
    somehow make it act like .bss (or the like).

In the latter two cases, it's possible to avoid the special sections if
there are good reasons, but it will add runtime overhead.

Kostya Serebryany

2014-Mar-31 08:53 UTC

head link

[LLVMdev] asan coverage

On Sat, Mar 29, 2014 at 4:02 AM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:
>
> On 2014 Mar 28, at 15:47, Eric Christopher <echristo at gmail.com>
wrote:
>
> > On Fri, Mar 28, 2014 at 3:18 PM, Duncan P. N. Exon Smith
> > <dexonsmith at apple.com> wrote:
> >>
> >> On 2014 Mar 28, at 14:59, Bob Wilson <bob.wilson at
apple.com> wrote:
> >>
> >>>
> >>> On Mar 28, 2014, at 1:33 AM, Kostya Serebryany <kcc at
google.com> wrote:
> >>>
> >>>> Some more data on code size.
> >>>>
> >>>> I've build CPU2006/483.xalancbmk with
> >>>> a) -O2 -fsanitize=address -m64 -gline-tables-only -mllvm
> -asan-coverage=1
> >>>> b) -O2 -fsanitize=address -m64 -gline-tables-only
> -fprofile-instr-generate
> >>>>
> >>>> The first is 27Mb and the second is 48Mb.
> >>>>
> >>>> The extra size comes from __llvm_prf_* sections.
> >>>> You may be able to make these sections more compact, but
you will not
> make them tiny.
> >>>>
> >>>> The instrumentation code generated by -asan-coverage=1 is
less
> efficient than -fprofile-instr-generate
> >>>> in several ways (slower, fatter, provides less data).
> >>>> But it does not add any extra sections to the binary and
wins in the
> overall binary size.
> >>>> Ideally, I'd like to see such options for
-fprofile-instr-generate as
> well.
> >>>>
> >>>> --kcc
> >>>
> >>> It might make sense to move at least some of the counters into
the
> .bss section so they don't take up space in the executable.
> >>>
> >>> We're also seeing that the instrumentation bloats the code
size much
> more than expected and we're still investigating to see why that is the
> case.
> >>
> >> The __llvm_prf_cnts section is likely the largest.  It's
> zero-initialized,
> >> so it's a good candidate for .bss or similar.  The counters
are
> currently in
> >> their own section to make write out easy (just one big array), but
we
> could
> >> change that.  Or, is there linker magic that can make a special
section
> >> behave like the .bss?
> >
> > Possibly. The zerofill stuff is a bit weird, but you should be able to
> > specify a large enough block to zerofill and a concrete section. A
> > separate section with the S_ZEROFILL attribute would probably work to
> > get it all initialized and just switch sections and not use the
> > zerofill directive.
> >
> > -eric
>
> Heh, I'm a little lost here.  Where can we specify this?  I had a look
> in MCSectionMachO.cpp, and S_ZEROFILL isn't accessible from LLVM IR.
> Should we add logic somewhere to recognize these sections?  Will that
> actually reduce the executable size?  (I tried hacking it in but that
> didn't seem to save disk space.)
>
> Also, this doesn't solve ELF.  Can we do similar things there?
>
> For clarity, there are three __llvm_prf_* sections.  Without having
> seen Kostya's data, I'm speculating that __llvm_prf_cnts is the
largest
> section.
>
This is what I see on 483.xalancbmk:
 15 __llvm_prf_names 0036fcb0  00000000010abf40  00000000010abf40  00cabf40
 2**5
 16 __llvm_prf_data 001b77e0  000000000141bc00  000000000141bc00  0101bc00
 2**5
 32 __llvm_prf_cnts 00123468  0000000001af2fe0  0000000001af2fe0  014f2fe0
 2**5

__llvm_prf_names is larger than the other two combined.
483.xalancbmk is C++ and the function names are long. The same is true for
most of the code we care about.
Can't we use function PCs instead of function names in this table (and
retrieve the function names from debug info)?


>
>   - __llvm_prf_data is 32B per function, and has pointers into the
>     other two sections.  This section is necessary to avoid static
>     initialization (implemented on Darwin, not quite on ELF).
>
>   - __llvm_prf_names is the mangled name of every function.  It should
>     be on the same order of magnitude as __llvm_prf_data.  This
>     section is convenient for writing out the profile, since the names
>     are effectively placed in one big char array whose bounds are known
>     at link time.
>
>   - __llvm_prf_cnts is 8B per region counter.  Each function has one
>     at entry and roughly one per CFG-relevant AST node (for, if, etc.).
>     This section is convenient for writing out the profile, since the
>     counters are effectively placed in one big array whose bounds are
>     known at link time.  However, I don't think the data in this
>     section needs to be explicitly stored in the executable if we can
>     somehow make it act like .bss (or the like).
>
Why can't we simply create this buffer at startup?
We solve similar task in asan, so it's definitely possible.

--kcc

>
> In the latter two cases, it's possible to avoid the special sections if
> there are good reasons, but it will add runtime overhead.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140331/9b312d89/attachment.html>

Bob Wilson

2014-Mar-31 17:45 UTC

head link

[LLVMdev] asan coverage

On Mar 31, 2014, at 1:53 AM, Kostya Serebryany <kcc at google.com>
wrote:> 
> This is what I see on 483.xalancbmk:
>  15 __llvm_prf_names 0036fcb0  00000000010abf40  00000000010abf40  00cabf40
2**5
>  16 __llvm_prf_data 001b77e0  000000000141bc00  000000000141bc00  0101bc00 
2**5
>  32 __llvm_prf_cnts 00123468  0000000001af2fe0  0000000001af2fe0  014f2fe0 
2**5
> 
> __llvm_prf_names is larger than the other two combined. 
> 483.xalancbmk is C++ and the function names are long. The same is true for
most of the code we care about.
> Can't we use function PCs instead of function names in this table (and
retrieve the function names from debug info)?
That’s a bit surprising. We should check to make sure there isn’t anything
obviously wrong.
> 
>  
> 
>   - __llvm_prf_data is 32B per function, and has pointers into the
>     other two sections.  This section is necessary to avoid static
>     initialization (implemented on Darwin, not quite on ELF).
> 
>   - __llvm_prf_names is the mangled name of every function.  It should
>     be on the same order of magnitude as __llvm_prf_data.  This
>     section is convenient for writing out the profile, since the names
>     are effectively placed in one big char array whose bounds are known
>     at link time.
> 
>   - __llvm_prf_cnts is 8B per region counter.  Each function has one
>     at entry and roughly one per CFG-relevant AST node (for, if, etc.).
>     This section is convenient for writing out the profile, since the
>     counters are effectively placed in one big array whose bounds are
>     known at link time.  However, I don't think the data in this
>     section needs to be explicitly stored in the executable if we can
>     somehow make it act like .bss (or the like).
> 
> Why can't we simply create this buffer at startup? 
> We solve similar task in asan, so it's definitely possible. 
We need this to work for environments where minimal runtime support is available
(e.g., in kernel code). We used to call malloc() to create some data structures
at runtime but that prevented us from using it from kernel code. I’m hoping the
additional size will not be such a problem that we have to implement separate
solutions for different environments.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140331/85927eab/attachment.html>

Duncan P. N. Exon Smith

2014-Mar-31 22:13 UTC

head link

[LLVMdev] asan coverage

On Mar 31, 2014, at 10:45, Bob Wilson <bob.wilson at apple.com> wrote:
> 
> On Mar 31, 2014, at 1:53 AM, Kostya Serebryany <kcc at google.com>
wrote:
>> 
>> This is what I see on 483.xalancbmk:
>>  15 __llvm_prf_names 0036fcb0  00000000010abf40  00000000010abf40 
00cabf40  2**5
>>  16 __llvm_prf_data 001b77e0  000000000141bc00  000000000141bc00 
0101bc00  2**5
>>  32 __llvm_prf_cnts 00123468  0000000001af2fe0  0000000001af2fe0 
014f2fe0  2**5
>> 
>> __llvm_prf_names is larger than the other two combined. 
>> 483.xalancbmk is C++ and the function names are long. The same is true
for most of the code we care about.
>> Can't we use function PCs instead of function names in this table
(and retrieve the function names from debug info)?
> 
> That’s a bit surprising. We should check to make sure there isn’t anything
obviously wrong.
I'm surprised too, but it makes some sense.

  - Most functions in the implementation details of the STL have long enough
mangled
    names (and little enough control flow) that the names would dominate the
size
    overhead.

  - Many class member functions will have this property, too, like getters,
setters,
    and typical constructors, copy/move operators, and destructors.

There's likely a way to emit these names more compactly using a tree (with
links
to parent nodes).

Consider _ZN9Namespace5Class3fooEv, the mangled name for
Namespace::Class::foo().
We could do something like this:

    %__llvm_profile_name = type {i8*, %__llvm_profile_name*}

    @__llvm_profile_str__ZN = linkonce constant [4 x i8] c"_ZN\00"
    @__llvm_profile_name__ZN = linkonce constant %__llvm_profile_name {
      i8* getelementptr ([4 x i8]* @__llvm_profile_str__ZN, i32 0, i32 0),
      %__llvm_profile_name* null }

    @__llvm_profile_str_9Namespace = linkonce constant [11 x i8]
c"9Namespace\00"
    @__llvm_profile_name__ZN9Namespace = linkonce constant %__llvm_profile_name
{
      i8* getelementptr ([11 x i8]* @__llvm_profile_str_9Namespace, i32 0, i32
0),
      %__llvm_profile_name* @__llvm_profile_name__ZN }

    @__llvm_profile_str_5Class = linkonce constant [7 x i8]
c"5Class\00"
    @__llvm_profile_name__ZN9Namespace5Class = linkonce constant
      %__llvm_profile_name {
      i8* getelementptr ([7 x i8]* @__llvm_profile_str_5Class, i32 0, i32 0),
      %__llvm_profile_name* @__llvm_profile_name__ZN9Namespace }

    @__llvm_profile_str_3fooEv = linkonce constant [7 x i8]
c"3fooEv\00"
    @__llvm_profile_name__ZN9Namespace5Class3fooEv = linkonce constant
      %__llvm_profile_name {
      i8* getelementptr ([7 x i8]* @__llvm_profile_str_3fooEv, i32 0, i32 0),
      %__llvm_profile_name* @__llvm_profile_name__ZN9Namespace5Class }

Then _ZN9Namespace5Class3barEv (Namespace::Class::bar()) would be cheap.

    @__llvm_profile_str_3barEv = linkonce constant [7 x i8]
c"3barEv\00"
    @__llvm_profile_name__ZN9Namespace5Class3barEv = linkonce constant
      %__llvm_profile_name {
      i8* getelementptr ([7 x i8]* @__llvm_profile_str_3barEv, i32 0, i32 0),
      %__llvm_profile_name* @__llvm_profile_name__ZN9Namespace5Class }

These would merge cleanly even between object files generated from different
translation units.

Doing it like that requires understanding the mangling and wouldn't compact
common
prefixes in C.  We could do something similar with a heuristic (split every 8
characters), or split on common prefixes per translation unit (and hope they
line up
between translation units).

Kostya Serebryany

2014-Apr-11 07:22 UTC

head link

[LLVMdev] asan coverage

My slides from EuroLLVM regarding coverage, in case you are interested...
http://llvm.org/devmtg/2014-04/PDFs/LightningTalks/EuroLLVM'14%20--%20ASan%20%2B%20Coverage.pdf

--kcc


On Tue, Apr 1, 2014 at 2:13 AM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:
>
> On Mar 31, 2014, at 10:45, Bob Wilson <bob.wilson at apple.com>
wrote:
>
> >
> > On Mar 31, 2014, at 1:53 AM, Kostya Serebryany <kcc at
google.com> wrote:
> >>
> >> This is what I see on 483.xalancbmk:
> >>  15 __llvm_prf_names 0036fcb0  00000000010abf40  00000000010abf40
>  00cabf40  2**5
> >>  16 __llvm_prf_data 001b77e0  000000000141bc00  000000000141bc00
>  0101bc00  2**5
> >>  32 __llvm_prf_cnts 00123468  0000000001af2fe0  0000000001af2fe0
>  014f2fe0  2**5
> >>
> >> __llvm_prf_names is larger than the other two combined.
> >> 483.xalancbmk is C++ and the function names are long. The same is
true
> for most of the code we care about.
> >> Can't we use function PCs instead of function names in this
table (and
> retrieve the function names from debug info)?
> >
> > That’s a bit surprising. We should check to make sure there isn’t
> anything obviously wrong.
>
> I'm surprised too, but it makes some sense.
>
>   - Most functions in the implementation details of the STL have long
> enough mangled
>     names (and little enough control flow) that the names would dominate
> the size
>     overhead.
>
>   - Many class member functions will have this property, too, like
> getters, setters,
>     and typical constructors, copy/move operators, and destructors.
>
> There's likely a way to emit these names more compactly using a tree
(with
> links
> to parent nodes).
>
> Consider _ZN9Namespace5Class3fooEv, the mangled name for
> Namespace::Class::foo().
> We could do something like this:
>
>     %__llvm_profile_name = type {i8*, %__llvm_profile_name*}
>
>     @__llvm_profile_str__ZN = linkonce constant [4 x i8]
c"_ZN\00"
>     @__llvm_profile_name__ZN = linkonce constant %__llvm_profile_name {
>       i8* getelementptr ([4 x i8]* @__llvm_profile_str__ZN, i32 0, i32 0),
>       %__llvm_profile_name* null }
>
>     @__llvm_profile_str_9Namespace = linkonce constant [11 x i8]
> c"9Namespace\00"
>     @__llvm_profile_name__ZN9Namespace = linkonce constant
> %__llvm_profile_name {
>       i8* getelementptr ([11 x i8]* @__llvm_profile_str_9Namespace, i32 0,
> i32 0),
>       %__llvm_profile_name* @__llvm_profile_name__ZN }
>
>     @__llvm_profile_str_5Class = linkonce constant [7 x i8]
c"5Class\00"
>     @__llvm_profile_name__ZN9Namespace5Class = linkonce constant
>       %__llvm_profile_name {
>       i8* getelementptr ([7 x i8]* @__llvm_profile_str_5Class, i32 0, i32
> 0),
>       %__llvm_profile_name* @__llvm_profile_name__ZN9Namespace }
>
>     @__llvm_profile_str_3fooEv = linkonce constant [7 x i8]
c"3fooEv\00"
>     @__llvm_profile_name__ZN9Namespace5Class3fooEv = linkonce constant
>       %__llvm_profile_name {
>       i8* getelementptr ([7 x i8]* @__llvm_profile_str_3fooEv, i32 0, i32
> 0),
>       %__llvm_profile_name* @__llvm_profile_name__ZN9Namespace5Class }
>
> Then _ZN9Namespace5Class3barEv (Namespace::Class::bar()) would be cheap.
>
>     @__llvm_profile_str_3barEv = linkonce constant [7 x i8]
c"3barEv\00"
>     @__llvm_profile_name__ZN9Namespace5Class3barEv = linkonce constant
>       %__llvm_profile_name {
>       i8* getelementptr ([7 x i8]* @__llvm_profile_str_3barEv, i32 0, i32
> 0),
>       %__llvm_profile_name* @__llvm_profile_name__ZN9Namespace5Class }
>
> These would merge cleanly even between object files generated from
> different
> translation units.
>
> Doing it like that requires understanding the mangling and wouldn't
> compact common
> prefixes in C.  We could do something similar with a heuristic (split
> every 8
> characters), or split on common prefixes per translation unit (and hope
> they line up
> between translation units).
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140411/44f24198/attachment.html>

llvm dev - Feb 2014 - [LLVMdev] asan coverage

[LLVMdev] asan coverage

[LLVMdev] asan coverage

[LLVMdev] asan coverage

[LLVMdev] asan coverage

[LLVMdev] asan coverage

[LLVMdev] asan coverage

[LLVMdev] asan coverage

[LLVMdev] asan coverage

[LLVMdev] asan coverage

[LLVMdev] asan coverage

[LLVMdev] asan coverage

[LLVMdev] asan coverage

[LLVMdev] asan coverage