Ingo Molnar via llvm-dev
2018-Feb-14 23:07 UTC
[llvm-dev] clang asm-goto support (Was Re: [PATCH v2] x86/retpoline: Add clang support)
* Ingo Molnar <mingo at kernel.org> wrote:> To quantify it: I just performed a test build of a Linux distro kernel config > (Fedora x86-64), and counted the number of callsites that use 'asm goto' > functionality with the v4.15 kernel (including drivers). > > The results: > > Linux distro | !CONFIG_TRACING > ----------------------------------------------------------------------------- > total # of functions : 191,567 | 184,443 > total # of instructions : 14,251,355 | 13,526,112 > ----------------------------------------------------------------------------- > total # of spin_lock*() calls : 25,246 | 25,177 > total # of mutex_lock*() calls : 13,062 | 12,861 > total # of kmalloc*() calls : 5,148 | 5,118 > ----------------------------------------------------------------------------- > total # of 'asm goto' usage sites : 34,851 | 31,059 > total # of 'asm goto' using functions : 18,209 | 16,089 > ----------------------------------------------------------------------------- > percent of kernel functions using 'asm goto' : 9.5% | 8.7% > -----------------------------------------------------------------------------Here's the size stats of kernel/sched/built-in.o for the same distro config: optimized | no asm goto ----------------------------------------------------------------------------- total # of functions : 765 | 764 total # of instructions : 46,830 | 47,051 I.e. asm goto support reduces scheduler size by ~0.5%, which is a major generated code size reduction. This doesn't count the live branch patching performance advantages: many of those asm goto usage sites are in hot paths, so the performance impact of it is much larger than that: easily a couple of percentage points in scheduler intensive benchmarks, as Peter mentioned. For example here's a thread context switch benchmark comparison on a modern x86 system running a v4.15 kernel: $ perf stat --repeat 20 --sync --null perf bench sched messaging -t -g 25 no asm goto: 0.136778505 seconds time elapsed ( stddev: +- 0.55% ) asm goto optimized: 0.133773904 seconds time elapsed ( stddev: +- 0.51% ) The asm goto enabled kernel is ~2.25% faster in this benchmark, and the performance penalty of not having asm goto support will only increase in the future. i.e. it very much makes sense to implement asm goto support not just for compatibility reasons, but for performance reasons as well. Thanks, Ingo