Hi John,
> The perfctr Linux kernel patch virtualizes the CPU's performance
> counters (so that each thread saves/restores the performance counter
> registers when it is taken off/put on the CPU). This, combined with the
> perfex command line tool, allows you to find the number of cache misses,
> cycles executed, etc. for a program at native execution speed.
>
> Perfctr's advantage is speed; however, I don't believe it has been
> incorporated into a tool that gives the detailed report information that
> cachegrind seems to provide; perfex will only report the number of
> events between program start and program termination. One could write
> an LLVM pass that instruments a program with calls to a profiling
> runtime; that runtime could use the perfctr driver to collect the
> performance counter information on a
> per-function/per-basic-block/per-whatever basis.
>
> So, perfctr is faster. Cachegrind is probably much easier to
> install/use and looks like it will provide more detailed information.
> Both are open-source and publicly available.
oprofile will annotate your source code with all kinds of information
obtained from the CPU performance counters, for example showing the
percentage of time / cache misses in a particular function.
Ciao,
Duncan.