Because I've been doing a bit of performance work recently, and because using gprof with the C backend has some limitations, I wrote a little "llvm-prof" utility. Here's a synopsis of how to use it if you're interested: Basic usage: llvm/utils/profile.pl <program.bc> <program arguments> This instruments the bytecode file, executes it with the JIT (_appending_ information into an llvmprof.out file), then runs the llvm-prof utility to format it into a human readable report (llvm-prof is documented here: http://llvm.cs.uiuc.edu/docs/CommandGuide/llvm-prof.html ) Running this on the em3d Olden benchmark produces this output: <all of the program output> ===-------------------------------------------------------------------------==LLVM profiling output for execution: Output/em3d.llvm.bc ===-------------------------------------------------------------------------==Function execution frequencies: ## Frequency 1. 390/516 check_percent 2. 102/516 gen_signed_number 3. 2/516 compute_nodes 4. 2/516 make_table 5. 2/516 fill_table 6. 2/516 make_neighbors 7. 2/516 update_from_coeffs 8. 2/516 fill_from_fields 9. 2/516 localize_local 10. 1/516 initialize_graph 11. 1/516 clear_nummiss 12. 1/516 localize 13. 1/516 fill_all_from_fields 14. 1/516 update_all_from_coeffs 15. 1/516 make_all_neighbors 16. 1/516 make_tables 17. 1/516 __main 18. 1/516 main 19. 1/516 dealwithargs NOTE: 1 function was never executed! I've implemented function and basicblock profiling, because they were simple. We should be able to add the path profiling component with little trouble. The number of blocks instrumented could be reduced significantly by making use of control equivalent blocks, but this optimization is not done yet. To get basic block counts, run the same as before, but with the -block option: $ ~/llvm/utils/profile.pl -block Output/em3d.llvm.bc <all of the stuff from before> ===-------------------------------------------------------------------------==Top 20 most frequently executed basic blocks: ## %% Frequency 1. 4.60% 393/8545 make_neighbors() - no_exit.2 2. 4.56% 390/8545 make_neighbors() - loopentry.2 3. 4.56% 390/8545 make_neighbors() - endif.1 4. 4.56% 390/8545 make_neighbors() - endif.2 5. 4.56% 390/8545 make_neighbors() - loopexit.3 6. 4.56% 390/8545 check_percent() - entry 7. 4.53% 387/8545 make_neighbors() - endif.3 8. 4.49% 384/8545 fill_from_fields() - no_exit.1 9. 4.49% 384/8545 fill_from_fields() - endif.0 10. 4.49% 384/8545 make_neighbors() - loopexit.2 11. 4.49% 384/8545 make_neighbors() - shortcirc_next 12. 4.49% 384/8545 make_neighbors() - endif.4 13. 3.37% 288/8545 check_percent() - then 14. 3.07% 262/8545 make_neighbors() - no_exit.2.preheader 15. 1.84% 157/8545 compute_nodes() - endif.1 16. 1.84% 157/8545 compute_nodes() - no_exit.1 17. 1.84% 157/8545 compute_nodes() - then.0 18. 1.84% 157/8545 compute_nodes() - then.1 19. 1.84% 157/8545 compute_nodes() - endif.0 20. 1.50% 128/8545 fill_from_fields() - no_exit.0 Finally, if you pass -A to the script, llvm-prof will print out the LLVM source code for the program, annotated with frequency counts. Like this: <snip> ;;; %check_percent called 390 times. ;;; internal int %check_percent(int %percent.1) { entry: ; No predecessors! ;;; Executed 390 times. %tmp.0 = call double %drand48( ) ; <double> [#uses=1] %tmp.2 = cast int %percent.1 to double ; <double> [#uses=1] %tmp.3 = div double %tmp.2, 0x4059000000000000 ; <double> [#uses=1] %tmp.4 = setlt double %tmp.0, %tmp.3 ; <bool> [#uses=1] %tmp.5 = cast bool %tmp.4 to int ; <int> [#uses=3] %tmp.6 = load int* %.percentcheck_1 ; <int> [#uses=1] %inc.0 = add int %tmp.6, 1 ; <int> [#uses=1] store int %inc.0, int* %.percentcheck_1 %tmp.8 = setne int %tmp.5, 0 ; <bool> [#uses=1] br bool %tmp.8, label %then, label %endif then: ; preds = %entry ;;; Executed 288 times. %tmp.10 = load int* %.numlocal_2 ; <int> [#uses=1] %inc.1 = add int %tmp.10, 1 ; <int> [#uses=1] store int %inc.1, int* %.numlocal_2 ret int %tmp.5 endif: ; preds = %entry ;;; Executed 102 times. ret int %tmp.5 } </snip> If you're interested, this is implemented by the following code: lib/Transforms/Instrumentation/BlockProfiling.cpp runtime/libprofile/ tools/llvm-prof/ utils/profile.pl I have given only a little thought on how to integrate this with the JIT and runtime system, but perhaps this is a step towards FDO. The code should be pretty simple to extend to new profiling implementations, and add lots of cool features. If you think of any neat extensions, please let me know. -Chris -- http://llvm.cs.uiuc.edu/ http://www.nondot.org/~sabre/Projects/