Hello everybody, I am looking for a tool (in Linux or Windows) that allow me to get performance measures like cycle execution, cache accesses, etc. for an x86 architecture. I want to estimate the performance overhead due to the modification that I do using LLVM. Any suggestion is welcome. Thanks in advance, -- Juan Carlos -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090901/2d327a1c/attachment.html>
You mean 'cachegrind'? http://valgrind.org/info/tools.html#cachegrind I don't know any public tool better than this (but someone please tell me if I am misinformed). - Daniel On Tue, Sep 1, 2009 at 2:42 PM, Juan Carlos Martinez Santos<juanc.martinez.santos at gmail.com> wrote:> Hello everybody, > > I am looking for a tool (in Linux or Windows) that allow me to get > performance measures like cycle execution, cache accesses, etc. for an x86 > architecture. I want to estimate the performance overhead due to the > modification that I do using LLVM. > > Any suggestion is welcome. > > Thanks in advance, > > -- > Juan Carlos > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
Oprofile for Linux is a pretty good alternative. (http://oprofile.sourceforge.net/about/) It uses hardware performance counters to collect profiling information and therefore has very low overhead, whereas Valgrind performs dynamic binary instrumentation and can be significantly slow (20-50x slower). In addition, Cachegrind 'simulates' cache behavior through it's own cache model, whereas Oprofile (or other counter based profilers) report real cache events. Depending on what your needs are (ease of use, runtime overhead, etc) you could pick either. On Tue, Sep 1, 2009 at 6:13 PM, Daniel Dunbar<daniel at zuster.org> wrote:> You mean 'cachegrind'? > http://valgrind.org/info/tools.html#cachegrind > > I don't know any public tool better than this (but someone please tell > me if I am misinformed). > > - Daniel > > On Tue, Sep 1, 2009 at 2:42 PM, Juan Carlos Martinez > Santos<juanc.martinez.santos at gmail.com> wrote: >> Hello everybody, >> >> I am looking for a tool (in Linux or Windows) that allow me to get >> performance measures like cycle execution, cache accesses, etc. for an x86 >> architecture. I want to estimate the performance overhead due to the >> modification that I do using LLVM. >> >> Any suggestion is welcome. >> >> Thanks in advance, >> >> -- >> Juan Carlos >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Giri
Helps if I send it to the list.... On Tue, Sep 1, 2009 at 5:33 PM, Giridhar S<thisisgiri at gmail.com> wrote:> Oprofile for Linux is a pretty good alternative. > (http://oprofile.sourceforge.net/about/) > > It uses hardware performance counters to collect profiling information > and therefore has very low overhead, whereas Valgrind performs dynamic > binary instrumentation and can be significantly slow (20-50x slower). > In addition, Cachegrind 'simulates' cache behavior through it's own > cache model, whereas Oprofile (or other counter based profilers) > report real cache events. > > Depending on what your needs are (ease of use, runtime overhead, etc) > you could pick either.I am curious, how do you think AMD CodeAnalyst works in regards to performance counting? It seems to be quite fast to me, only causing a slow-down of between 2x to 6x depending on the program.
I have never used CodeAnalyst first-hand, but the slow-down figures that you quote lead me to believe that it must use hardware performance counters. Instrumentation based profilers rarely, if ever, display such low overhead. Also, instrumentation based profilers cannot profile kernel routines, unless there is explicit support from within the kernel (such as in Sun Solaris 10 and DTrace). Taking a quick look at AMD's website seems to confirm this theory: http://developer.amd.com/Assets/Basic_Performance_Measurements.pdf If this topic is getting out-of-scope for the LLVM Dev list, we can take it offline. On Tue, Sep 1, 2009 at 10:32 PM, OvermindDL1<overminddl1 at gmail.com> wrote:> Helps if I send it to the list.... > > On Tue, Sep 1, 2009 at 5:33 PM, Giridhar S<thisisgiri at gmail.com> wrote: >> Oprofile for Linux is a pretty good alternative. >> (http://oprofile.sourceforge.net/about/) >> >> It uses hardware performance counters to collect profiling information >> and therefore has very low overhead, whereas Valgrind performs dynamic >> binary instrumentation and can be significantly slow (20-50x slower). >> In addition, Cachegrind 'simulates' cache behavior through it's own >> cache model, whereas Oprofile (or other counter based profilers) >> report real cache events. >> >> Depending on what your needs are (ease of use, runtime overhead, etc) >> you could pick either. > > I am curious, how do you think AMD CodeAnalyst works in regards to > performance counting? It seems to be quite fast to me, only causing a > slow-down of between 2x to 6x depending on the program. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Giri