Andres Freund via llvm-dev
2016-Dec-10 21:27 UTC
[llvm-dev] Interest in integrating a linux perf JITEventListener?
Hi, Under linux a large portion of the profiling these days happens with perf, but there's no support for it from LLVM's JITs. For a while perf could associate address+size to symbols by writing a /tmp/perf-$pid.map file: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/jit-interface.txt A year or so perf also gained the ability to actually see code & debug info. It's even being documented now: http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=b3151ea500655f232255ddcdf2bbcf691cb39646 I've a very preliminary JITEventListener using both of this. Is there interest in integrating that into LLVM? If so, I'll try to follow LLVM coding standards and submit it, otherwise I'll keep it somewhere in postgres... If in LLVM, does anybody have preference about how to integrate the two methods above? Because the second method currently (linux limitation) only works if the program is profiled from start, I think it makes to allow either or both of the methods to be used, which seems to suggest an argument to the listener constructor, rather than different wrappers for either. Comments? Greetings, Andres Freund
Philip Reames via llvm-dev
2016-Dec-29 21:17 UTC
[llvm-dev] Interest in integrating a linux perf JITEventListener?
Having something like this available in tree would definitely be useful. For simplicity, why don't we start with support for the second style? This is the long term useful one and would be a good starting point for getting the code in tree. Can you give a pointer to the patch so that I can assess the rough complexity? If it's simple enough, I'd be happy to help get it reviewed and in. If it's more complicated, I probably won't have the time to assist. Philip On 12/10/2016 01:27 PM, Andres Freund via llvm-dev wrote:> Hi, > > Under linux a large portion of the profiling these days happens with > perf, but there's no support for it from LLVM's JITs. > > For a while perf could associate address+size to symbols by writing a > /tmp/perf-$pid.map file: > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/jit-interface.txt > > A year or so perf also gained the ability to actually see code & debug > info. It's even being documented now: > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=b3151ea500655f232255ddcdf2bbcf691cb39646 > > I've a very preliminary JITEventListener using both of this. Is there > interest in integrating that into LLVM? If so, I'll try to follow LLVM > coding standards and submit it, otherwise I'll keep it somewhere in > postgres... > > If in LLVM, does anybody have preference about how to integrate the two > methods above? Because the second method currently (linux limitation) > only works if the program is profiled from start, I think it makes to > allow either or both of the methods to be used, which seems to suggest > an argument to the listener constructor, rather than different wrappers > for either. Comments? > > Greetings, > > Andres Freund > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Andres Freund via llvm-dev
2017-Feb-02 07:20 UTC
[llvm-dev] Interest in integrating a linux perf JITEventListener?
Hi, On 2016-12-29 13:17:50 -0800, Philip Reames wrote:> Having something like this available in tree would definitely be > useful.Cool.> For simplicity, why don't we start with support for the second style? This > is the long term useful one and would be a good starting point for getting > the code in tree.Works for me.> Can you give a pointer to the patch so that I can assess the rough > complexity? If it's simple enough, I'd be happy to help get it > reviewed and in. If it's more complicated, I probably won't have the > time to assist.Patch (and a prerequisite) attached. Took me a while to get it cleaned up to some degree - I'm a C programmer these days, and a lot of my C++ knowledge has been replaced by other things... It's still not super clean, but I think in a good enough state for you to estimate complexity. What do you think? I've below included some example output to show what's going on. Regards, Andres A random example (source c file also attached): # generate some IR, with debug info clang -ggdb -S -c -emit-llvm expensive_loop.c -o tmp/expensive_loop.ll # record profile ('-k1; is the clocksource, -g hierarchical) perf record -g -k 1 lli -jit-kind=mcjit /tmp/expensive_loop.ll 1 # enrich profile with JIT information emitted due to patch perf inject --jit -i perf.data -o perf.jit.data # and show information perf report -i perf.jit.data Example output: Samples: 3K of event 'cycles:ppp', Event count (approx.): 3127026392 Overhead Command Shared Object Symbol - 93.41% lli jitted-27248-2.so [.] stupid_isprime stupid_isprime main llvm::MCJIT::runFunction llvm::ExecutionEngine::runFunctionAsMain main __libc_start_main 0xec26258d4c544155 + 0.55% lli ld-2.24.so [.] do_lookup_x + 0.22% lli ld-2.24.so [.] _dl_lookup_symbol_x + 0.17% lli [kernel.vmlinux] [k] unmap_page_range + 0.16% lli ld-2.24.so [.] _dl_fixup Instruction level view: │ Disassembly of section .text: │ │ 0000000000000040 <stupid_isprime>: │ stupid_isprime(): │ #include <stdint.h> │ #include <stdbool.h> │ │ bool stupid_isprime(uint64_t num) │ { │ push %rbp │ mov %rsp,%rbp │ mov %rdi,-0x10(%rbp) │ if (num == 2) │ cmp $0x2,%rdi │ ↓ jne 14 │1 e:┌─→movb $0x1,-0x1(%rbp) │ │↓ jmp 55 │ │ return true; │ │ if (num < 1 || num % 2 == 0) │1 14:│ cmpq $0x0,-0x10(%rbp) │ │↓ je 51 │ │ testb $0x1,-0x10(%rbp) │ │↓ je 51 │ │ return false; │ │ for(uint64_t i = 3; i < num / 2; i+= 2) { │ │ movq $0x3,-0x18(%rbp) │ │↓ jmp 35 │ │ nop │1 30:│ addq $0x2,-0x18(%rbp) 4.03 │1 35:│ mov -0x10(%rbp),%rax 0.06 │ │ shr %rax │ │ cmp %rax,-0x18(%rbp) │ └──jae e │ if (num % i == 0) 3.74 │ mov -0x10(%rbp),%rax 0.09 │ xor %edx,%edx 91.82 │ divq -0x18(%rbp) │ test %rdx,%rdx 0.23 │ ↑ jne 30 0.03 │2 51: movb $0x0,-0x1(%rbp) │1 55: mov -0x1(%rbp),%al │ pop %rbp │ ← retq (the missing colors make it harder to see what's going on) -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-MCJIT-Call-JIT-notifiers-only-after-code-sections-ar.patch Type: text/x-patch Size: 2617 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170201/19627903/attachment-0002.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-Add-PerfJITEventListener-for-perf-profiling-support.patch Type: text/x-patch Size: 22529 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170201/19627903/attachment-0003.bin> -------------- next part -------------- #include <stdint.h> #include <stdbool.h> bool stupid_isprime(uint64_t num) { if (num == 2) return true; if (num < 1 || num % 2 == 0) return false; for(uint64_t i = 3; i < num / 2; i+= 2) { if (num % i == 0) return false; } return true; } int main(int argc, char **argv) { int numprimes = 0; for (uint64_t num = argc; num < 100000; num++) { if (stupid_isprime(num)) numprimes++; } return numprimes; }
Possibly Parallel Threads
- Interest in integrating a linux perf JITEventListener?
- Interest in integrating a linux perf JITEventListener?
- Interest in integrating a linux perf JITEventListener?
- PerfJITEventListener needs perf-<pid>.map?
- ORC JIT Weekly #7 -- JITEventListener support and Swift Immediate Mode Migration