Silky Arora
2013-Jan-14 06:47 UTC
[LLVMdev] Dynamic Profiling - Instrumentation basic query
Hi, @Alastair: Thanks a bunch for explaining this so well. I was able to write a simple profiler, and run it. I need to profile the code for branches (branch mis predicts simulation), load/store instructions (for cache hits/miss rate), and a couple of other things and therefore, would need to instrument the code. However, I would like to know if writing the output to a file would increase the execution time, or is it the profiling itself? I can probably use a data structure to store the output instead. Also, I have heard of Intel's Pin tool which can provide memory trace information. Could you please explain to me what you meant by hardware counters for dcache miss/hit rates. @Criswell: Thank you so much for helping me with this. I am starting to write my own code, but having a look at the existing code would definitely help me. Thanks and Regards, Silky On Mon, Jan 14, 2013 at 12:06 AM, Criswell, John T <criswell at illinois.edu>wrote:> There is code that does this for older versions of LLVM. I believe it is > in the giri project in the LLVM SVN repository. I can look into more > details when I get back from vacation. Swarup may also be able to provide > information on the giri code. > > -- John T. > > ________________________________________ > From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] on behalf > of Silky Arora [silkyar at umich.edu] > Sent: Saturday, January 12, 2013 10:28 PM > To: llvmdev at cs.uiuc.edu > Subject: [LLVMdev] Dynamic Profiling - Instrumentation basic query > > Hi, > > I am new to LLVM, and would like to write a dynamic profiler, say which > prints out the load address of all the load instructions encountered in a > program. > From what I could pick up, edge-profiler.cpp increments a counter > dynamically which is somehow dumped onto llvmprof.out by profile.pl > > Could anyone explain me how this works? Can I instrument the code to dump > out the load addresses or other such information to a file? > > Thanks! > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130114/4f8cdf8b/attachment.html>
Alastair Murray
2013-Jan-15 06:11 UTC
[LLVMdev] Dynamic Profiling - Instrumentation basic query
Hi Silky, On 14/01/13 01:47, Silky Arora wrote:> I need to profile the code for branches (branch mis predicts > simulation), load/store instructions (for cache hits/miss rate), and a > couple of other things and therefore, would need to instrument the code. > However, I would like to know if writing the output to a file would > increase the execution time, or is it the profiling itself? I can > probably use a data structure to store the output instead. > > Also, I have heard of Intel's Pin tool which can provide memory trace > information. Could you please explain to me what you meant by hardware > counters for dcache miss/hit rates.I've also heard of Pin, but never actually used it. Regarding the hardware counters: x86 processors count various hardware events via internal counters. I think both Intel and AMD processors can do this, but I've only tried out Intel. The easiest way to access these on Linux is probably via the 'perf' tool [1]. (There are other options on other platforms. I think 'Intel VTune' can use these counters as well.) [1] https://perf.wiki.kernel.org/ The result of running 'perf' on a random command (xz -9e dictionary) is in the attached file (because my mail client was destroying the formatting). I just chose some counters which seemed to match what you mention, there were many more though. 'perf list' will show them. The only issue I can think of is that the hardware counters aren't available inside (most?) virtual machines. If you need to individually determine the hit/miss-rate, mispredict ratios etc per load/store/branch then I'm not sure if these counters are very useful. Regards, Alastair. -------------- next part -------------- /usr/sbin/perf stat -e cycles -e instructions -e cache-references -e cache-misses -e branch-instructions -e branch-misses -e L1-dcache-loads -e L1-dcache-load-misses -e L1-dcache-stores -e L1-dcache-store-misses -e dTLB-loads -e dTLB-load-misses xz -9e dictionary Performance counter stats for 'xz -9e dictionary': 2,838,843,997 cycles # 0.000 GHz [24.96%] 3,017,892,661 instructions # 1.06 insns per cycle [33.31%] 28,281,385 cache-references [33.29%] 6,820,873 cache-misses # 24.118 % of all cache refs [33.31%] 403,480,157 branches [16.70%] 34,978,751 branch-misses # 8.67% of all branches [16.71%] 1,028,322,850 L1-dcache-loads [16.73%] 30,888,348 L1-dcache-misses # 3.00% of all L1-dcache hits [16.70%] 278,389,483 L1-dcache-stores [16.66%] 17,185,362 L1-dcache-misses [16.68%] 1,023,191,908 dTLB-loads [16.71%] 1,585,411 dTLB-misses # 0.15% of all dTLB cache hits [16.67%] 2.892917184 seconds time elapsed
Hi Alastair, Thank you so much for the information on the tools. Actually, I need to analyze which sections of code are prone to misses and mis predicts, and would have to eventually instrument the code. I was able to instrument and call an external function, but faced an issue while passing an argument to the function. I am following EdgeProfiling.cpp but couldn't figure out the problem. Could you please see where I am going wrong here - virtual bool runOnModule(Module &M) { Constant *hookFunc; LLVMContext& context = M.getContext(); hookFunc M.getOrInsertFunction("cacheCounter",Type::getVoidTy(M.getContext()), llvm::Type::getInt32Ty(M.getContext()), (Type*)0); cacheCounter= cast<Function>(hookFunc); for(Module::iterator F = M.begin(), E = M.end(); F!= E; ++F) { for(Function::iterator BB = F->begin(), E = F->end(); BB !E; ++BB) { cacheProf::runOnBasicBlock(BB, hookFunc, context); } } return false; } virtual bool runOnBasicBlock(Function::iterator &BB, Constant* hookFunc, LLVMContext& context) { for(BasicBlock::iterator BI = BB->begin(), BE = BB->end(); BI !BE; ++BI) { std::vector<Value*> Args(1); unsigned a =100; Args[0] = ConstantInt::get(Type::getInt32Ty(context), a); if(isa<LoadInst>(&(*BI)) ) { CallInst *newInst = CallInst::Create(hookFunc, Args, "",BI); } } return true; } The C code is as follows - extern "C" void cacheCounter(unsigned a){ std::cout<<a<<" Load instruction\n"; } Error: line 8: 18499 Segmentation fault (core dumped) lli out.bc Also, the code works fine when I don't try to print out 'a'. Thanks for your help. Regards, Silky -- View this message in context: http://llvm.1065342.n5.nabble.com/Dynamic-Profiling-Instrumentation-basic-query-tp53611p53744.html Sent from the LLVM - Dev mailing list archive at Nabble.com.
Seemingly Similar Threads
- [LLVMdev] Dynamic Profiling - Instrumentation basic query
- [LLVMdev] Dynamic Profiling - Instrumentation basic query
- [LLVMdev] Dynamic Profiling - Instrumentation basic query
- [PATCH] virtio_ring: Shadow available ring flags & index
- [PATCH] virtio_ring: Shadow available ring flags & index