thr3ads.net - llvm dev - [llvm-dev] Discuss about the LLVM SW mitigation to Jump Conditional Code Erratum [Dec 2019]

If this information is useful, please help other people find it:
Share via:
Zhang, Annita via llvm-dev
2019-Dec-16 08:41 UTC
[llvm-dev] Discuss about the LLVM SW mitigation to Jump Conditional Code Erratum

Below is the performance and code size ratio of SPEC CPU2017.

Table 3 shows the observed performance impact of the Microcode Update on the
SPECrate2017_int_base and SPECrate2017_fp_base benchmark suite when compiled
with LLVM compiler. All data is the ratio comparing with the baseline. The
columns labeled HW show a 2.6% and 1.3% performance effect in INTRATE geomean
and FPRATE geomean respectively. Performance effect on individual components
were observed up to 5.1%.



Software-based tools to mitigate these effects are outlined below. From our
tests, recompiling the benchmarks recovered the geomean performance to within
99% of the originally observed performance, and the maximum performance loss in
SPEC benchmarks was subsequently reduced to within 2.2% of the original
performance.



Comparing with the hw_sw_prefix (prefix padding) with hw_sw_nop (nop padding) of
SW mitigation, the hw_sw_prefix can provide better performance (0.3%~0.5% in
geomean). In individual cases, we have observed a 1.4% performance improvement
in prefix padding vs. nop padding. Comparing with sw_prefix with sw_nop on a
system w/o MCU, we observed 0.7% better performance in sw_prefix.


In our experiments, we observed that nop padding introduced extra nop
instructions into frequently executed code. The additional nop instructions
caused capacity pressure in the DSB and caused performance reduction. We
introduced the prefix padding to resolve this performance issue.
Since the performance delta in prefix padding and nop padding is incremental,
starting from nop padding may be easier to implement as a first step, with
additional prefix padding options to explore for additional performance
optimizations.

Comparing with hw_sw_prefix (prefix padding to a set of branches) with
hw_sw_prefix_align_all (prefix padding to all type of branches), the performance
is almost the same in this test.



Table 3 - SPEC CPU2017 SW/Microcode Update vs. baseline performance ratio:

SPEC performance  sw_prefix   sw_nop      sw_prefix_align_all    hw         
hw_sw_prefix  hw_sw_nop     hw_sw_prefix_align_all

500.perlbench_r   1.005       0.992       0.999                  0.963      
0.994         0.980         0.989

502.gcc_r         0.998       0.982       0.988                  0.985      
0.998         0.992         0.985

505.mcf_r         0.995       0.985       0.992                  0.965      
0.993         0.997         0.999

520.omnetpp_r     1.001       0.995       0.996                  0.995      
0.994         0.995         0.996

523.xalancbmk_r   0.994       0.991       0.993                  0.984      
0.988         0.984         0.990

525.x264_r        0.995       0.989       0.993                  0.965      
0.986         0.982         0.993

531.deepsjeng_r   0.978       0.971       0.986                  0.981      
0.978         0.979         0.986

541.leela_r       0.983       0.982       0.980                  0.985      
0.997         0.996         0.993

557.xz_r          1.004       1.007       1.002                  0.949      
1.009         1.005         1.006

SIR geomean       0.995       0.988       0.992                  0.974      
0.993         0.990         0.993



508.namd_r        0.996       0.996       0.998                  0.999      
0.999         0.995         1.002

510.parest_r      0.997       0.997       0.996                  0.992      
0.997         0.998         0.996

511.povray_r      1.006       1.006       0.998                  0.976      
0.992         0.984         0.994

519.lbm_r         0.999       0.999       0.995                  0.992      
0.999         0.999         0.992

526.blender_r     0.998       0.998       1.000                  0.974      
1.002         0.995         1.005

538.imagick_r     1.032       1.032       1.025                  0.997      
1.015         1.015         1.025

544.nab_r         0.997       0.997       1.005                  0.977      
0.995         0.981         0.987

SFR geomean       1.003       1.003       1.002                  0.987      
1.000         0.995         1.000



We also measured the increase in code size due to the padding to instructions to
align branches correctly (Table 4). The geomean code size increase is 2-3% in
both prefix padding and nop padding, with the individual outliers up to 4%.

In sw_prefix_align_all, the geomean code size increase is 3-4%, with the
individual outliers up to 6%. This data indicates that aligning all types of
branches will have more code size overhead, but with less performance gain.
However, it may be variant case by case.



Table 4 - SPEC CPU2017 SW mitigation vs. baseline Code Size ratio:

SPEC code size  sw_prefix       sw_nop          sw_prefix_align_all

500.perlbench_r 1.037           1.037           1.043

502.gcc_r       1.036           1.036           1.045

505.mcf_r       1.022           1.022           1.026

520.omnetpp_r   1.035           1.035           1.060

523.xalancbmk_r 1.031           1.031           1.050

525.x264_r      1.020           1.020           1.025

531.deepsjeng_r 1.016           1.016           1.018

541.leela_r     1.027           1.027           1.032

557.xz_r        1.029           1.029           1.034

SIR geomean     1.028           1.028           1.037



508.namd_r      1.014           1.014           1.015

510.parest_r    1.025           1.025           1.032

511.povray_r    1.024           1.023           1.031

519.lbm_r       1.009           1.009           1.013

526.blender_r   1.032           1.032           1.047

538.imagick_r   1.026           1.026           1.031

544.nab_r       1.029           1.029           1.033

SFR geomean     1.023           1.023           1.029


Test date:
              2019/12/9

 System Configuration:
              Platform: Intel Internal Reference Validation Platform
OS: Red Hat* 8.0 x86_64
Memory: 192 GB
CPUCount: 2
CoreCount: 40
Intel HyperThreading: yes
CPU Model: Intel(r) Xeon(r) Gold 6148 CPU @ 2.40GHz
Microcode w/o microcode update: 0x200005e
Microcode with microcode update: 0x2000065



Compiler options:
              Baseline & hw: -march=skylake-avx512 -mfpmath=sse -Ofast
-funroll-loops -flto
***sw_prefix: -march=skylake-avx512 -mfpmath=sse -Ofast -funroll-loops -flto
-x86-branches-within-32B-boundaries
              ***sw_nop: -march=skylake-avx512 -mfpmath=sse -Ofast
-funroll-loops -flto -x86-align-branch-boundary=32
-x86-align-branch-prefix-size=0 -x86-align-branch=fused+jcc+jmp
              ***sw_prefix_align_all: -march=skylake-avx512 -mfpmath=sse -Ofast
-funroll-loops -flto -x86-align-branch-boundary=32
-x86-align-branch-prefix-size=5
-x86-align-branch=fused+jcc+jmp+indirect+call+ret



Notes:

1.     Source: Intel Corporation; SPEC CPU2017 results should be considered
estimates as they are measured on non-production platforms and are being
provided for research purposes.

2.     Baseline means the system w/o microcode update and w/o SW mitigation.

3.     sw_prefix means SW mitigation of prefix padding is applied to a system
w/o microcode update.

4.     sw_nop means SW mitigation of nop padding is applied to a system w/o
microcode update.

5.     sw_prefix_align_all means SW mitigation of prefix padding is applied to
all impacted branches including call, ret and indirect jump, to a system w/o
microcode update.

6.     hw means the microcode update is applied w/o SW mitigation.

7.     hw_sw_prefix means both microcode update and SW mitigation of prefix
padding are applied.

8.     hw_sw_nop means both microcode update and SW mitigation of nop padding
are applied.

9.     hw_sw_prefix_align_all means microcode update is applied, and SW
mitigation of prefix padding is applied to all impacted branches including call,
ret and indirect jump.

10.  LLVM measurements are only limited to C/C++ benchmarks. All Fortran
benchmarks are excluded.

11.  The test was built with an engineering LLVM compiler plus the SW mitigation
patch. The performance data may be variant from build to build.


For more complete information about performance and benchmark results, visit
www.intel.com/benchmarks<http://www.intel.com/benchmarks>.  For specific
information and notices/disclaimers regarding the Jump Conditional Code Erratum,
visit
https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191216/d3da6483/attachment.html>
llvm dev - Dec 2019 - Discuss about the LLVM SW mitigation to Jump Conditional Code Erratum

[llvm-dev] Discuss about the LLVM SW mitigation to Jump Conditional Code Erratum