Zhang, Annita via llvm-dev
2019-Dec-16 03:41 UTC
[llvm-dev] Discuss about the LLVM SW mitigation to Jump Conditional Code Erratum
Hi all, We have more data based on LLVM test suite and SPEC2017. It covered the performance, code size and build time. It collected the performance of the SW mitigation on the servers w/o MCU and with MCU, the prefix padding and nop padding, the padding to jcc+fused+jmp and the padding to all types of branches. I expect the data can help us to move forward and make decision. For the code size of Chrome, Firefox, or Safari, we don't have data on hand. We are willing to work with community to get those data. Below is the LLVM test suite we measured including the performance, code size and build time. The data indicates some performance effect (1.7%) from the microcode update, which was reduced to 0.5% with the SW mitigation of prefix padding. The code size increase in test suite is ~0.5%. And the compile time increase is ~2%. Comparing with hw_sw_prefix and hw_sw_nop, the exec_time difference is within -0.5%~0.5%, which may be a within the margin of error. Comparing with hw_sw_prefix and hw_sw_prefix_align_all, the exec_time difference is even less at 0.1%. Given that LLVM test-suite is a relatively small benchmark, we do not conclude which padding is preferable, hw_sw_prefix, hw_sw_nop or hw_sw_prefix_align_all. Table 1 - test suite result data <table> <tr> <th><br> LLVM test-suite<br> </th> <th><br> Baseline<br> </th> <th><br> sw_prefix<br> </th> <th><br> sw_nop<br> </th> <th><br> sw_prefix_<br> align_all<br> </th> <th><br> hw<br> </th> <th><br> hw_sw_<br> prefix<br> </th> <th><br> hw_sw_nop<br> </th> <th><br> hw_sw_prefix_<br> align_all<br> </th> </tr> <tr> <td><br> compile_time<br> </td> <td><br> 0.276<br> </td> <td><br> 0.282<br> </td> <td><br> 0.277<br> </td> <td><br> 0.282<br> </td> <td><br> 0.276<br> </td> <td><br> 0.282<br> </td> <td><br> 0.277<br> </td> <td><br> 0.282<br> </td> </tr> <tr> <td><br> exec_time<br> </td> <td><br> 286.465<br> </td> <td><br> 285.017<br> </td> <td><br> 287.125<br> </td> <td><br> 285.696<br> </td> <td><br> 291.294<br> </td> <td><br> 287.766<br> </td> <td><br> 285.027<br> </td> <td><br> 288.2<br> </td> </tr> <tr> <td><br> code_size<br> </td> <td><br> 3.868<br> </td> <td><br> 3.889<br> </td> <td><br> 3.888<br> </td> <td><br> 3.895<br> </td> <td><br> 3.868<br> </td> <td><br> 3.889<br> </td> <td><br> 3.888<br> </td> <td><br> 3.895<br> </td> </tr> </table> Table 2 - normalized test suite result data <table> <tr> <th><br> LLVM test-suite<br> </th> <th><br> Baseline<br> </th> <th><br> sw_prefix<br> </th> <th><br> sw_nop<br> </th> <th><br> sw_prefix_<br> align_all<br> </th> <th><br> hw<br> </th> <th><br> hw_sw_<br> prefix<br> </th> <th><br> hw_sw_nop<br> </th> <th><br> hw_sw_prefix_<br> align_all<br> </th> </tr> <tr> <td><br> compile_time<br> </td> <td><br> 1<br> </td> <td><br> 1.021<br> </td> <td><br> 1.005<br> </td> <td><br> 1.022<br> </td> <td><br> 1<br> </td> <td><br> 1.021<br> </td> <td><br> 1.005<br> </td> <td><br> 1.022<br> </td> </tr> <tr> <td><br> exec_time<br> </td> <td><br> 1<br> </td> <td><br> 0.995<br> </td> <td><br> 1.002<br> </td> <td><br> 0.997<br> </td> <td><br> 1.017<br> </td> <td><br> 1.005<br> </td> <td><br> 0.995<br> </td> <td><br> 1.006<br> </td> </tr> <tr> <td><br> code_size<br> </td> <td><br> 1<br> </td> <td><br> 1.005<br> </td> <td><br> 1.005<br> </td> <td><br> 1.007<br> </td> <td><br> 1<br> </td> <td><br> 1.005<br> </td> <td><br> 1.005<br> </td> <td><br> 1.007<br> </td> </tr> </table> Test date: 2019/11/25 System Configuration: Platform: Intel Internal Reference Validation Platform OS: Red Hat* 8.0 x86_64 Memory: 192 GB CPUCount: 2 CoreCount: 40 Intel HyperThreading: yes CPU Model: Intel(r) Xeon(r) Gold 6148 CPU @ 2.40GHz Microcode w/o microcode update: 0x200005e Microcode with microcode update: 0x2000063 Compiler options: ***sw_prefix: -x86-branches-within-32B-boundaries ***sw_nop: -x86-align-branch-boundary=32 -x86-align-branch-prefix-size=0 -x86-align-branch=fused+jcc+jmp ***sw_prefix_align_all: -x86-align-branch-boundary=32 -x86-align-branch-prefix-size=5 -x86-align-branch=fused+jcc+jmp+indirect+call+ret Notes: 1. Baseline means the system w/o microcode update and w/o SW mitigation. 2. sw_prefix means SW mitigation of prefix padding is applied to a system w/o microcode update. 3. sw_nop means SW mitigation of nop padding is applied to a system w/o microcode update. 4. sw_prefix_align_all means SW mitigation of prefix padding is applied to all impacted branches including call, ret and indirect jump, to a system w/o microcode update. 5. hw means the microcode update is applied w/o SW mitigation. 6. hw_sw_prefix means both microcode update and SW mitigation of prefix padding are applied. 7. hw_sw_nop means both microcode update and SW mitigation of nop padding are applied. 8. hw_sw_prefix_align_all means microcode update is applied, and SW mitigation of prefix padding is applied to all impacted branches including call, ret and indirect jump. 9. The data in 2nd table is normalized as the ratio vs. baseline (i.e. baseline =1, so the smaller the better). 10. The test was built with an engineering LLVM compiler plus the SW mitigation patch. The performance data may be variant from build to build. 11. The tested Microcode 0x2000063 is an engineering version with microcode update. The production version 0x2000064 and above contain the microcode update. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks<http://www.intel.com/benchmarks>. For specific information and notices/disclaimers regarding the Jump Conditional Code Erratum, visit https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191216/90390e47/attachment.html>