Zhang, Annita via llvm-dev
2019-Dec-16 08:41 UTC
[llvm-dev] Discuss about the LLVM SW mitigation to Jump Conditional Code Erratum
Below is the performance and code size ratio of SPEC CPU2017. Table 3 shows the observed performance impact of the Microcode Update on the SPECrate2017_int_base and SPECrate2017_fp_base benchmark suite when compiled with LLVM compiler. All data is the ratio comparing with the baseline. The columns labeled HW show a 2.6% and 1.3% performance effect in INTRATE geomean and FPRATE geomean respectively. Performance effect on individual components were observed up to 5.1%. Software-based tools to mitigate these effects are outlined below. From our tests, recompiling the benchmarks recovered the geomean performance to within 99% of the originally observed performance, and the maximum performance loss in SPEC benchmarks was subsequently reduced to within 2.2% of the original performance. Comparing with the hw_sw_prefix (prefix padding) with hw_sw_nop (nop padding) of SW mitigation, the hw_sw_prefix can provide better performance (0.3%~0.5% in geomean). In individual cases, we have observed a 1.4% performance improvement in prefix padding vs. nop padding. Comparing with sw_prefix with sw_nop on a system w/o MCU, we observed 0.7% better performance in sw_prefix. In our experiments, we observed that nop padding introduced extra nop instructions into frequently executed code. The additional nop instructions caused capacity pressure in the DSB and caused performance reduction. We introduced the prefix padding to resolve this performance issue. Since the performance delta in prefix padding and nop padding is incremental, starting from nop padding may be easier to implement as a first step, with additional prefix padding options to explore for additional performance optimizations. Comparing with hw_sw_prefix (prefix padding to a set of branches) with hw_sw_prefix_align_all (prefix padding to all type of branches), the performance is almost the same in this test. Table 3 - SPEC CPU2017 SW/Microcode Update vs. baseline performance ratio: SPEC performance sw_prefix sw_nop sw_prefix_align_all hw hw_sw_prefix hw_sw_nop hw_sw_prefix_align_all 500.perlbench_r 1.005 0.992 0.999 0.963 0.994 0.980 0.989 502.gcc_r 0.998 0.982 0.988 0.985 0.998 0.992 0.985 505.mcf_r 0.995 0.985 0.992 0.965 0.993 0.997 0.999 520.omnetpp_r 1.001 0.995 0.996 0.995 0.994 0.995 0.996 523.xalancbmk_r 0.994 0.991 0.993 0.984 0.988 0.984 0.990 525.x264_r 0.995 0.989 0.993 0.965 0.986 0.982 0.993 531.deepsjeng_r 0.978 0.971 0.986 0.981 0.978 0.979 0.986 541.leela_r 0.983 0.982 0.980 0.985 0.997 0.996 0.993 557.xz_r 1.004 1.007 1.002 0.949 1.009 1.005 1.006 SIR geomean 0.995 0.988 0.992 0.974 0.993 0.990 0.993 508.namd_r 0.996 0.996 0.998 0.999 0.999 0.995 1.002 510.parest_r 0.997 0.997 0.996 0.992 0.997 0.998 0.996 511.povray_r 1.006 1.006 0.998 0.976 0.992 0.984 0.994 519.lbm_r 0.999 0.999 0.995 0.992 0.999 0.999 0.992 526.blender_r 0.998 0.998 1.000 0.974 1.002 0.995 1.005 538.imagick_r 1.032 1.032 1.025 0.997 1.015 1.015 1.025 544.nab_r 0.997 0.997 1.005 0.977 0.995 0.981 0.987 SFR geomean 1.003 1.003 1.002 0.987 1.000 0.995 1.000 We also measured the increase in code size due to the padding to instructions to align branches correctly (Table 4). The geomean code size increase is 2-3% in both prefix padding and nop padding, with the individual outliers up to 4%. In sw_prefix_align_all, the geomean code size increase is 3-4%, with the individual outliers up to 6%. This data indicates that aligning all types of branches will have more code size overhead, but with less performance gain. However, it may be variant case by case. Table 4 - SPEC CPU2017 SW mitigation vs. baseline Code Size ratio: SPEC code size sw_prefix sw_nop sw_prefix_align_all 500.perlbench_r 1.037 1.037 1.043 502.gcc_r 1.036 1.036 1.045 505.mcf_r 1.022 1.022 1.026 520.omnetpp_r 1.035 1.035 1.060 523.xalancbmk_r 1.031 1.031 1.050 525.x264_r 1.020 1.020 1.025 531.deepsjeng_r 1.016 1.016 1.018 541.leela_r 1.027 1.027 1.032 557.xz_r 1.029 1.029 1.034 SIR geomean 1.028 1.028 1.037 508.namd_r 1.014 1.014 1.015 510.parest_r 1.025 1.025 1.032 511.povray_r 1.024 1.023 1.031 519.lbm_r 1.009 1.009 1.013 526.blender_r 1.032 1.032 1.047 538.imagick_r 1.026 1.026 1.031 544.nab_r 1.029 1.029 1.033 SFR geomean 1.023 1.023 1.029 Test date: 2019/12/9 System Configuration: Platform: Intel Internal Reference Validation Platform OS: Red Hat* 8.0 x86_64 Memory: 192 GB CPUCount: 2 CoreCount: 40 Intel HyperThreading: yes CPU Model: Intel(r) Xeon(r) Gold 6148 CPU @ 2.40GHz Microcode w/o microcode update: 0x200005e Microcode with microcode update: 0x2000065 Compiler options: Baseline & hw: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto ***sw_prefix: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto -x86-branches-within-32B-boundaries ***sw_nop: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto -x86-align-branch-boundary=32 -x86-align-branch-prefix-size=0 -x86-align-branch=fused+jcc+jmp ***sw_prefix_align_all: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto -x86-align-branch-boundary=32 -x86-align-branch-prefix-size=5 -x86-align-branch=fused+jcc+jmp+indirect+call+ret Notes: 1. Source: Intel Corporation; SPEC CPU2017 results should be considered estimates as they are measured on non-production platforms and are being provided for research purposes. 2. Baseline means the system w/o microcode update and w/o SW mitigation. 3. sw_prefix means SW mitigation of prefix padding is applied to a system w/o microcode update. 4. sw_nop means SW mitigation of nop padding is applied to a system w/o microcode update. 5. sw_prefix_align_all means SW mitigation of prefix padding is applied to all impacted branches including call, ret and indirect jump, to a system w/o microcode update. 6. hw means the microcode update is applied w/o SW mitigation. 7. hw_sw_prefix means both microcode update and SW mitigation of prefix padding are applied. 8. hw_sw_nop means both microcode update and SW mitigation of nop padding are applied. 9. hw_sw_prefix_align_all means microcode update is applied, and SW mitigation of prefix padding is applied to all impacted branches including call, ret and indirect jump. 10. LLVM measurements are only limited to C/C++ benchmarks. All Fortran benchmarks are excluded. 11. The test was built with an engineering LLVM compiler plus the SW mitigation patch. The performance data may be variant from build to build. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks<http://www.intel.com/benchmarks>. For specific information and notices/disclaimers regarding the Jump Conditional Code Erratum, visit https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191216/d3da6483/attachment.html>