Rafael Avila de Espindola via llvm-dev
2017-Dec-01 21:26 UTC
[llvm-dev] gnu X sysv hash performance
I got curious how the lld produced gnu hash tables compared to gold. To test that I timed "perf record ninja check-llvm" (just the lit run) in a BUILD_SHARED_LIBS build. The performance was almost identical, so I decided to try sysv versus gnu (both produced by lld). The results are interesting: % grep -v '^#' perf-gnu/perf.report-by-dso-sym | head 38.77% ld-2.24.so [.] do_lookup_x 8.08% ld-2.24.so [.] strcmp 2.66% ld-2.24.so [.] _dl_relocate_object 2.58% ld-2.24.so [.] _dl_lookup_symbol_x 1.85% ld-2.24.so [.] _dl_name_match_p 1.46% [kernel.kallsyms] [k] copy_page 1.38% ld-2.24.so [.] _dl_map_object 1.30% [kernel.kallsyms] [k] unmap_page_range 1.28% [kernel.kallsyms] [k] filemap_map_pages 1.26% libLLVMSupport.so.6.0.0svn [.] sstep % grep -v '^#' perf-sysv/perf.report-by-dso-sym | head 42.18% ld-2.24.so [.] do_lookup_x 17.73% ld-2.24.so [.] check_match 14.41% ld-2.24.so [.] strcmp 1.22% ld-2.24.so [.] _dl_relocate_object 1.13% ld-2.24.so [.] _dl_lookup_symbol_x 0.91% ld-2.24.so [.] _dl_name_match_p 0.67% ld-2.24.so [.] _dl_map_object 0.65% [kernel.kallsyms] [k] unmap_page_range 0.63% [kernel.kallsyms] [k] copy_page 0.59% libLLVMSupport.so.6.0.0svn [.] sstep So the gnu hash table helps a lot, but BUILD_SHARED_LIBS is still crazy inefficient. Cheers, Rafael
On Fri, Dec 1, 2017 at 1:26 PM, Rafael Avila de Espindola < rafael.espindola at gmail.com> wrote:> > I got curious how the lld produced gnu hash tables compared to gold. To > test that I timed "perf record ninja check-llvm" (just the lit run) in a > BUILD_SHARED_LIBS build. > > The performance was almost identical, so I decided to try sysv versus > gnu (both produced by lld). The results are interesting: > > % grep -v '^#' perf-gnu/perf.report-by-dso-sym | head > 38.77% ld-2.24.so [.] do_lookup_x > 8.08% ld-2.24.so [.] strcmp > 2.66% ld-2.24.so [.] > _dl_relocate_object > 2.58% ld-2.24.so [.] > _dl_lookup_symbol_x > 1.85% ld-2.24.so [.] _dl_name_match_p > 1.46% [kernel.kallsyms] [k] copy_page > 1.38% ld-2.24.so [.] _dl_map_object > 1.30% [kernel.kallsyms] [k] unmap_page_range > 1.28% [kernel.kallsyms] [k] > filemap_map_pages > 1.26% libLLVMSupport.so.6.0.0svn [.] sstep > % grep -v '^#' perf-sysv/perf.report-by-dso-sym | head > 42.18% ld-2.24.so [.] do_lookup_x > 17.73% ld-2.24.so [.] check_match > 14.41% ld-2.24.so [.] strcmp > 1.22% ld-2.24.so [.] > _dl_relocate_object > 1.13% ld-2.24.so [.] > _dl_lookup_symbol_x > 0.91% ld-2.24.so [.] _dl_name_match_p > 0.67% ld-2.24.so [.] _dl_map_object > 0.65% [kernel.kallsyms] [k] unmap_page_range > 0.63% [kernel.kallsyms] [k] copy_page > 0.59% libLLVMSupport.so.6.0.0svn [.] sstep > > So the gnu hash table helps a lot, but BUILD_SHARED_LIBS is still crazy > inefficient.What is "100%" in these numbers? If 100% means all execution time, ld-2.24.so takes more than 70% of execution time. Is this real? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171201/62a01d4b/attachment.html>
On Fri, Dec 1, 2017 at 3:55 PM, Rui Ueyama via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Fri, Dec 1, 2017 at 1:26 PM, Rafael Avila de Espindola < > rafael.espindola at gmail.com> wrote: > >> >> I got curious how the lld produced gnu hash tables compared to gold. To >> test that I timed "perf record ninja check-llvm" (just the lit run) in a >> BUILD_SHARED_LIBS build. >> >> The performance was almost identical, so I decided to try sysv versus >> gnu (both produced by lld). The results are interesting: >> >> % grep -v '^#' perf-gnu/perf.report-by-dso-sym | head >> 38.77% ld-2.24.so [.] do_lookup_x >> 8.08% ld-2.24.so [.] strcmp >> 2.66% ld-2.24.so [.] >> _dl_relocate_object >> 2.58% ld-2.24.so [.] >> _dl_lookup_symbol_x >> 1.85% ld-2.24.so [.] _dl_name_match_p >> 1.46% [kernel.kallsyms] [k] copy_page >> 1.38% ld-2.24.so [.] _dl_map_object >> 1.30% [kernel.kallsyms] [k] unmap_page_range >> 1.28% [kernel.kallsyms] [k] >> filemap_map_pages >> 1.26% libLLVMSupport.so.6.0.0svn [.] sstep >> % grep -v '^#' perf-sysv/perf.report-by-dso-sym | head >> 42.18% ld-2.24.so [.] do_lookup_x >> 17.73% ld-2.24.so [.] check_match >> 14.41% ld-2.24.so [.] strcmp >> 1.22% ld-2.24.so [.] >> _dl_relocate_object >> 1.13% ld-2.24.so [.] >> _dl_lookup_symbol_x >> 0.91% ld-2.24.so [.] _dl_name_match_p >> 0.67% ld-2.24.so [.] _dl_map_object >> 0.65% [kernel.kallsyms] [k] unmap_page_range >> 0.63% [kernel.kallsyms] [k] copy_page >> 0.59% libLLVMSupport.so.6.0.0svn [.] sstep >> >> So the gnu hash table helps a lot, but BUILD_SHARED_LIBS is still crazy >> inefficient. > > > What is "100%" in these numbers? If 100% means all execution time, > ld-2.24.so takes more than 70% of execution time. Is this real? > > >perf usually measures cycles ("CPU_CLK_UNHALTED" for core/xeon, e.g.). So it's not time but cycles. This is a critical distinction when the thing being measured has delays/synchronization/disk/network I/O. Also it looks like this report might be decomposed by some other attribute (DSO-at-a-time?) that would affect what "100%" means. Doing perf on "ninja check-llvm" seems like it would measure cycles contributed by lots of non-lld things, in fact it's worth ruling out whether it's dominated by non-lld things. Doesn't testing itself perhaps spend more cycles than the linking being done here? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171201/b499fdd6/attachment.html>
Rafael Avila de Espindola via llvm-dev
2017-Dec-01 22:50 UTC
[llvm-dev] gnu X sysv hash performance
Rui Ueyama <ruiu at google.com> writes:> On Fri, Dec 1, 2017 at 1:26 PM, Rafael Avila de Espindola < > rafael.espindola at gmail.com> wrote: > >> >> I got curious how the lld produced gnu hash tables compared to gold. To >> test that I timed "perf record ninja check-llvm" (just the lit run) in a >> BUILD_SHARED_LIBS build. >> >> The performance was almost identical, so I decided to try sysv versus >> gnu (both produced by lld). The results are interesting: >> >> % grep -v '^#' perf-gnu/perf.report-by-dso-sym | head >> 38.77% ld-2.24.so [.] do_lookup_x >> 8.08% ld-2.24.so [.] strcmp >> 2.66% ld-2.24.so [.] >> _dl_relocate_object >> 2.58% ld-2.24.so [.] >> _dl_lookup_symbol_x >> 1.85% ld-2.24.so [.] _dl_name_match_p >> 1.46% [kernel.kallsyms] [k] copy_page >> 1.38% ld-2.24.so [.] _dl_map_object >> 1.30% [kernel.kallsyms] [k] unmap_page_range >> 1.28% [kernel.kallsyms] [k] >> filemap_map_pages >> 1.26% libLLVMSupport.so.6.0.0svn [.] sstep >> % grep -v '^#' perf-sysv/perf.report-by-dso-sym | head >> 42.18% ld-2.24.so [.] do_lookup_x >> 17.73% ld-2.24.so [.] check_match >> 14.41% ld-2.24.so [.] strcmp >> 1.22% ld-2.24.so [.] >> _dl_relocate_object >> 1.13% ld-2.24.so [.] >> _dl_lookup_symbol_x >> 0.91% ld-2.24.so [.] _dl_name_match_p >> 0.67% ld-2.24.so [.] _dl_map_object >> 0.65% [kernel.kallsyms] [k] unmap_page_range >> 0.63% [kernel.kallsyms] [k] copy_page >> 0.59% libLLVMSupport.so.6.0.0svn [.] sstep >> >> So the gnu hash table helps a lot, but BUILD_SHARED_LIBS is still crazy >> inefficient. > > > What is "100%" in these numbers? If 100% means all execution time, > ld-2.24.so takes more than 70% of execution time. Is this real?I think so, BUILD_SHARED_LIBS is very slow. On another machine this time (amazon c5.9x) I just checked the time that lit reports in "ninja check-llvm": regular build: Testing Time: 23.69s BUILD_SHARED_LIBS: Testing Time: 57.60s It is a lot of libraries where almost all the symbols have default visibility. Cheers, Rafael