Zhang, Annita via llvm-dev
2019-Dec-16  08:41 UTC
[llvm-dev] Discuss about the LLVM SW mitigation to Jump Conditional Code Erratum
Below is the performance and code size ratio of SPEC CPU2017.
Table 3 shows the observed performance impact of the Microcode Update on the
SPECrate2017_int_base and SPECrate2017_fp_base benchmark suite when compiled
with LLVM compiler. All data is the ratio comparing with the baseline. The
columns labeled HW show a 2.6% and 1.3% performance effect in INTRATE geomean
and FPRATE geomean respectively. Performance effect on individual components
were observed up to 5.1%.
Software-based tools to mitigate these effects are outlined below. From our
tests, recompiling the benchmarks recovered the geomean performance to within
99% of the originally observed performance, and the maximum performance loss in
SPEC benchmarks was subsequently reduced to within 2.2% of the original
performance.
Comparing with the hw_sw_prefix (prefix padding) with hw_sw_nop (nop padding) of
SW mitigation, the hw_sw_prefix can provide better performance (0.3%~0.5% in
geomean). In individual cases, we have observed a 1.4% performance improvement
in prefix padding vs. nop padding. Comparing with sw_prefix with sw_nop on a
system w/o MCU, we observed 0.7% better performance in sw_prefix.
In our experiments, we observed that nop padding introduced extra nop
instructions into frequently executed code. The additional nop instructions
caused capacity pressure in the DSB and caused performance reduction. We
introduced the prefix padding to resolve this performance issue.
Since the performance delta in prefix padding and nop padding is incremental,
starting from nop padding may be easier to implement as a first step, with
additional prefix padding options to explore for additional performance
optimizations.
Comparing with hw_sw_prefix (prefix padding to a set of branches) with
hw_sw_prefix_align_all (prefix padding to all type of branches), the performance
is almost the same in this test.
Table 3 - SPEC CPU2017 SW/Microcode Update vs. baseline performance ratio:
SPEC performance  sw_prefix   sw_nop      sw_prefix_align_all    hw         
hw_sw_prefix  hw_sw_nop     hw_sw_prefix_align_all
500.perlbench_r   1.005       0.992       0.999                  0.963      
0.994         0.980         0.989
502.gcc_r         0.998       0.982       0.988                  0.985      
0.998         0.992         0.985
505.mcf_r         0.995       0.985       0.992                  0.965      
0.993         0.997         0.999
520.omnetpp_r     1.001       0.995       0.996                  0.995      
0.994         0.995         0.996
523.xalancbmk_r   0.994       0.991       0.993                  0.984      
0.988         0.984         0.990
525.x264_r        0.995       0.989       0.993                  0.965      
0.986         0.982         0.993
531.deepsjeng_r   0.978       0.971       0.986                  0.981      
0.978         0.979         0.986
541.leela_r       0.983       0.982       0.980                  0.985      
0.997         0.996         0.993
557.xz_r          1.004       1.007       1.002                  0.949      
1.009         1.005         1.006
SIR geomean       0.995       0.988       0.992                  0.974      
0.993         0.990         0.993
508.namd_r        0.996       0.996       0.998                  0.999      
0.999         0.995         1.002
510.parest_r      0.997       0.997       0.996                  0.992      
0.997         0.998         0.996
511.povray_r      1.006       1.006       0.998                  0.976      
0.992         0.984         0.994
519.lbm_r         0.999       0.999       0.995                  0.992      
0.999         0.999         0.992
526.blender_r     0.998       0.998       1.000                  0.974      
1.002         0.995         1.005
538.imagick_r     1.032       1.032       1.025                  0.997      
1.015         1.015         1.025
544.nab_r         0.997       0.997       1.005                  0.977      
0.995         0.981         0.987
SFR geomean       1.003       1.003       1.002                  0.987      
1.000         0.995         1.000
We also measured the increase in code size due to the padding to instructions to
align branches correctly (Table 4). The geomean code size increase is 2-3% in
both prefix padding and nop padding, with the individual outliers up to 4%.
In sw_prefix_align_all, the geomean code size increase is 3-4%, with the
individual outliers up to 6%. This data indicates that aligning all types of
branches will have more code size overhead, but with less performance gain.
However, it may be variant case by case.
Table 4 - SPEC CPU2017 SW mitigation vs. baseline Code Size ratio:
SPEC code size  sw_prefix       sw_nop          sw_prefix_align_all
500.perlbench_r 1.037           1.037           1.043
502.gcc_r       1.036           1.036           1.045
505.mcf_r       1.022           1.022           1.026
520.omnetpp_r   1.035           1.035           1.060
523.xalancbmk_r 1.031           1.031           1.050
525.x264_r      1.020           1.020           1.025
531.deepsjeng_r 1.016           1.016           1.018
541.leela_r     1.027           1.027           1.032
557.xz_r        1.029           1.029           1.034
SIR geomean     1.028           1.028           1.037
508.namd_r      1.014           1.014           1.015
510.parest_r    1.025           1.025           1.032
511.povray_r    1.024           1.023           1.031
519.lbm_r       1.009           1.009           1.013
526.blender_r   1.032           1.032           1.047
538.imagick_r   1.026           1.026           1.031
544.nab_r       1.029           1.029           1.033
SFR geomean     1.023           1.023           1.029
Test date:
              2019/12/9
 System Configuration:
              Platform: Intel Internal Reference Validation Platform
OS: Red Hat* 8.0 x86_64
Memory: 192 GB
CPUCount: 2
CoreCount: 40
Intel HyperThreading: yes
CPU Model: Intel(r) Xeon(r) Gold 6148 CPU @ 2.40GHz
Microcode w/o microcode update: 0x200005e
Microcode with microcode update: 0x2000065
Compiler options:
              Baseline & hw: -march=skylake-avx512 -mfpmath=sse -Ofast
-funroll-loops -flto
***sw_prefix: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto
-x86-branches-within-32B-boundaries
              ***sw_nop: -march=skylake-avx512 -mfpmath=sse -Ofast
-funroll-loops -flto -x86-align-branch-boundary=32
-x86-align-branch-prefix-size=0 -x86-align-branch=fused+jcc+jmp
              ***sw_prefix_align_all: -march=skylake-avx512 -mfpmath=sse -Ofast
-funroll-loops -flto -x86-align-branch-boundary=32
-x86-align-branch-prefix-size=5
-x86-align-branch=fused+jcc+jmp+indirect+call+ret
Notes:
1.     Source: Intel Corporation; SPEC CPU2017 results should be considered
estimates as they are measured on non-production platforms and are being
provided for research purposes.
2.     Baseline means the system w/o microcode update and w/o SW mitigation.
3.     sw_prefix means SW mitigation of prefix padding is applied to a system
w/o microcode update.
4.     sw_nop means SW mitigation of nop padding is applied to a system w/o
microcode update.
5.     sw_prefix_align_all means SW mitigation of prefix padding is applied to
all impacted branches including call, ret and indirect jump, to a system w/o
microcode update.
6.     hw means the microcode update is applied w/o SW mitigation.
7.     hw_sw_prefix means both microcode update and SW mitigation of prefix
padding are applied.
8.     hw_sw_nop means both microcode update and SW mitigation of nop padding
are applied.
9.     hw_sw_prefix_align_all means microcode update is applied, and SW
mitigation of prefix padding is applied to all impacted branches including call,
ret and indirect jump.
10.  LLVM measurements are only limited to C/C++ benchmarks. All Fortran
benchmarks are excluded.
11.  The test was built with an engineering LLVM compiler plus the SW mitigation
patch. The performance data may be variant from build to build.
For more complete information about performance and benchmark results, visit
www.intel.com/benchmarks<http://www.intel.com/benchmarks>.  For specific
information and notices/disclaimers regarding the Jump Conditional Code Erratum,
visit
https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191216/d3da6483/attachment.html>